Discuss / Python / Python 正则表达式

Python 正则表达式

import urllib2 import re from BeautifulSoup import BeautifulSoup

html = urllib2.urlopen("http://dict.youdao.com/search? q=feel&keyfrom=dict.index")

soup = BeautifulSoup("".join(html))

txt = open("e://htmlFile/start(2).txt","w")

txt.write(soup.prettify())

txt.close()

pa = []

content = open("e://htmlFile/start(2).txt","r")

content_again = content.read();

p = re.compile('<div class=\"trans-container\">\s<ul>\s(<li>(\s.\s)</li>\s)*')

txt_again = open("e://htmlFile/start.txt","w")

for lines in p.findall(content_again):

for line in lines:

    pa.append(line)

    print type(line)==str

for l in pa:

txt_again.write(l)

content.close()

print pa

txt_again.close()

print "The End!"

这段代码主要是想要匹配一段html代码,并且取出其中的数据,但是在运行时发现只能取出最后一行的数据,不能全部取出,求指点。 想要取出的数据如下:

<div class="trans-container">

  <ul>

     <li>

        vt. 感觉;认为;触摸;试探

     </li>

     <li>

       vi. 觉得;摸索

     </li>

      <li>

       n. 感觉;触摸

      </li>

    </ul>

  • 1

Reply