Python 正则表达式

Discuss / Python / Python 正则表达式

Back

#1 Created at ... [Delete] [Delete and Lock User]

import urllib2 import re from BeautifulSoup import BeautifulSoup

html = urllib2.urlopen("http://dict.youdao.com/search? q=feel&keyfrom=dict.index")

soup = BeautifulSoup("".join(html))

txt = open("e://htmlFile/start(2).txt","w")

txt.write(soup.prettify())

txt.close()

pa = []

content = open("e://htmlFile/start(2).txt","r")

content_again = content.read();

p = re.compile('<div class=\"trans-container\">\s<ul>\s(<li>(\s.\s)</li>\s)*')

txt_again = open("e://htmlFile/start.txt","w")

for lines in p.findall(content_again):

for line in lines:

    pa.append(line)

    print type(line)==str

for l in pa:

txt_again.write(l)

content.close()

print pa

txt_again.close()

print "The End!"

这段代码主要是想要匹配一段html代码，并且取出其中的数据，但是在运行时发现只能取出最后一行的数据，不能全部取出，求指点。想要取出的数据如下：

  <ul>

     <li>

        vt. 感觉；认为；触摸；试探

     </li>

     <li>

       vi. 觉得；摸索

     </li>

      <li>

       n. 感觉；触摸

      </li>

    </ul>