抓取Python官网的会议内容,上一节的url
Topic source#输出结果:
PyOhio 2017
29 July – 31 July 2017 Columbus, Ohio, USA
PyCon AU 2017
03 Aug. – 09 Aug. 2017 Melbourne Convention and Exhibition Centre, 1 Convention Centre Pl, South Wharf VIC 3006, Australia
PyCon KR 2017
12 Aug. – 16 Aug. 2017 COEX 513, Yeongdong-daero, Gangnam-gu Seoul 06164, Republic of Korea
PyCon Amazônia 2017
12 Aug. – 14 Aug. 2017 Manaus, Amazonas, Brazil
DjangoCon US 2017
13 Aug. – 19 Aug. 2017 Spokane, WA, USA
PyCon PL 2017
17 Aug. – 21 Aug. 2017 Hotel Ossa Congress & SPA, Ossa, Poland
import requests from bs4 import BeautifulSoup f = requests.get('https://www.python.org/events/python-events/') soup = BeautifulSoup(f.text, 'lxml') fp = soup.section.text print((fp.split('More')[1]).split('You just missed...')[0])
- 1
司马迁迁迁
''' 1.这篇教材不能说是教材只能算个简介,新手只看这个不可能看得懂的,需要大量找其他资料学习。 2.新浪微博模拟登陆的data参数获取,网上找不到实例教程,有的也非常复杂,尝试抓取数据也未成功。 希望廖老师有时间可以告知这些具体的参数是如何获取的?如有同学搞明白了请不吝赐教。 3.联系给的urf明显错的,应该是过期了,我使用了上一节的url,通过urllib的request方法获取网页内容, 使用bs4进行解析。其实解析上一节就做过了,所以这节很简单。答案是否正确也请指教,运行没问题。 下面给出代码和结果。 '''
~~~~~~~~#!usr/bin/env python3
-- coding: utf-8 --
from urllib import request from bs4 import BeautifulSoup
with request.urlopen('https://www.python.org/events/python-events/') as f: soup = BeautifulSoup(f, 'lxml', from_encoding='utf-8') fp = soup.section.text print((fp.split('More')[1]).split('You just missed...')[0])