每日迅雷会员爬虫

发布时间：2020-05-25 00:42:52 所属栏目：Python 来源：互联网

导读：每日迅雷会员爬虫

下面是脚本之家 jb51.cc 通过网络收集整理的代码片段。

脚本之家小编现在分享给大家，也给大家做个参考。

#coding=utf8
import urllib2
import codecs
import re
import time
from lxml import etree

url1  = 'http://521xunlei.com/portal.php'
path1 = '//*[@id="portal_block_62_content"]/div/ul/li[1]/a/@href'
path3 = '//*[@class="t_f"]/font/text()'

def geturlinfo(url,path,x):
	request  = urllib2.Request(url)
	response = urllib2.urlopen(request)
	result 	 = response.read()
	restree	 = etree.HTML(result)
	nodes 	 = restree.xpath(path)
	if x == '1':
		return nodes[0]
	else:
		i=0
		open('thunder.txt','w').write('')
		for node in nodes:
			if re.search(':',node):
				INFO = str(i)+': '+node.replace('rn','')
				print INFO
				open('thunder.txt','a').write(INFO.encode('utf8')+'n')
				i+=1

if __name__ == '__main__':
	while True:
		print '===================start===================n'
		url2 = 'http://'+url1.replace('http://','').split('/')[0]+'/'+geturlinfo(url1,path1,'1')
		print 'GET From: '+url2
		geturlinfo(url2,path3,'0')
		time.sleep(24*3600)

		#starts-with(@id,"test") id已test开头的 

		#首先获取对应div 再次xpath string(.) 组合

以上是脚本之家(jb51.cc)为你收集整理的全部代码内容，希望文章能够帮你解决所遇到的程序开发问题。

如果觉得脚本之家网站内容还不错，欢迎将脚本之家网站推荐给程序员好友。

（编辑：安卓应用网）

【声明】本站内容均来自网络，其相关言论仅代表作者个人观点，不代表本站立场。若无意侵犯到您的权利，请及时与联系站长删除相关内容!