Discuss / Python / 练习

练习

Topic source

冬凉默殇

#1 Created at ... [Delete] [Delete and Lock User]
#!/usr/bin/env python3
# -*- coding: utf-8 -*-

from html.parser import HTMLParser
from html.entities import name2codepoint

class MyHTMLParser(HTMLParser):

    def __init__(self):
        super(MyHTMLParser, self).__init__()
        self._flag = ''

    def handle_starttag(self, tag, attrs):
        if ('class', 'event-title') in attrs:
            self._flag = 'Title:'
        elif ('class', 'event-location') in attrs:
            self._flag = 'Location:'
        elif tag == "time":
            self._flag = 0

    def handle_data(self, data):
        if self._flag in ('Title:', 'Location:'):
            if self._flag == 'Title:':
                print('-'*30)
            print(self._flag, data.strip())
            self._flag = ''
        if isinstance(self._flag, int):
            l = ['-', ',', '\n']
            if self._flag < 3:
                print(data.strip(), end=l[self._flag])
                self._flag += 1

parser = MyHTMLParser()
with open('index.html') as html:
    parser.feed(html.read())

Lucibriel

#2 Created at ... [Delete] [Delete and Lock User]

大多数都能懂,就是这一段不懂,能讲解下吗?

 if isinstance(self._flag, int):
            l = ['-', ',', '\n']
            if self._flag < 3:
                print(data.strip(), end=l[self._flag])
                self._flag += 1

我自随行

#3 Created at ... [Delete] [Delete and Lock User]

关于time这段,我理解运行的时候是这样的:

1、网页中时间的html是这样的: <time datetime="2015-09-18T00:00:00+00:00">18 Sept. – 20 Sept. <span class="say-no-more"> 2015</span></time> 2、解析的时候,解析出来的tag是time,attrs是2015-09-18T00:00:00+00:00 3、解析的时候,解析出来的data是分三段的(也就是<time...和</time>之间的数据,其中包括了<span.../span>之间的数据): (A)18 Sept. (B)20 Sept (C)2015 这三段数据打印的格式是:A-B,C\n 其中‘-’,‘,’和‘\n’是由程序插入到打印输出中的。这就是列表l = ['-', ',', '\n']的用处。知道了l的这个用法,相信你可以理解下标如果控制在A、B、C间输出对应的列表值了。


  • 1

Reply