import urllib2 import re from BeautifulSoup import BeautifulSoup
html = urllib2.urlopen("http://dict.youdao.com/search? q=feel&keyfrom=dict.index")
soup = BeautifulSoup("".join(html))
txt = open("e://htmlFile/start(2).txt","w")
txt.write(soup.prettify())
txt.close()
pa = []
content = open("e://htmlFile/start(2).txt","r")
content_again = content.read();
p = re.compile('<div class=\"trans-container\">\s<ul>\s(<li>(\s.\s)</li>\s)*')
txt_again = open("e://htmlFile/start.txt","w")
for lines in p.findall(content_again):
for line in lines: pa.append(line) print type(line)==str
for l in pa:
txt_again.write(l)
content.close()
print pa
txt_again.close()
print "The End!"
这段代码主要是想要匹配一段html代码,并且取出其中的数据,但是在运行时发现只能取出最后一行的数据,不能全部取出,求指点。 想要取出的数据如下:
<div class="trans-container">
<ul> <li> vt. 感觉;认为;触摸;试探 </li> <li> vi. 觉得;摸索 </li> <li> n. 感觉;触摸 </li> </ul>
Sign in to make a reply
Michael_Extremist
import urllib2 import re from BeautifulSoup import BeautifulSoup
html = urllib2.urlopen("http://dict.youdao.com/search? q=feel&keyfrom=dict.index")
soup = BeautifulSoup("".join(html))
txt = open("e://htmlFile/start(2).txt","w")
txt.write(soup.prettify())
txt.close()
pa = []
content = open("e://htmlFile/start(2).txt","r")
content_again = content.read();
p = re.compile('<div class=\"trans-container\">\s<ul>\s(<li>(\s.\s)</li>\s)*')
txt_again = open("e://htmlFile/start.txt","w")
for lines in p.findall(content_again):
for l in pa:
content.close()
print pa
txt_again.close()
print "The End!"
这段代码主要是想要匹配一段html代码,并且取出其中的数据,但是在运行时发现只能取出最后一行的数据,不能全部取出,求指点。 想要取出的数据如下:
<div class="trans-container">