php – 如何从页面源中“抓
发布时间:2020-05-25 08:32:09 所属栏目:PHP 来源:互联网
导读:我有这个代码获取页面的 HTML源代码: $page = file_get_contents(http://example.com/page.html);$page = htmlentities($page); 我想从中搜集一些内容.例如,假设页面的源包含: strongtechnorati.com/strongbr /Connection failedbr /br /
|
我有这个代码获取页面的 HTML源代码: $page = file_get_contents('http://example.com/page.html');
$page = htmlentities($page);
我想从中搜集一些内容.例如,假设页面的源包含: <strong>technorati.com</strong><br /> Connection failed<br /><br />Pinging <strong>icerocket.com</strong><br /> Connection failed<br /><br />Pinging <strong>weblogs.com</strong><br /> Done<br /><br />Pinging <strong>newsgator.com</strong><br /> Done<br /><br />Pinging <strong>blo.gs</strong><br /> Done<br /><br />Pinging <strong>feedburner.com</strong><br /> Done<br /><br />Pinging <strong>blogstreet.com</strong><br /> Done<br /><br />Pinging <strong>my.yahoo.com</strong><br /> Connection failed<br /><br />Pinging <strong>moreover.com</strong><br /> Connection failed<br /><br />Pinging <strong>newsisfree.com</strong><br /> Done<br /> 有没有办法可以从源代码中删除它并将其存储在变量中,所以它看起来像这样:
因为页面是动态的,这就是我遇到问题的原因.我可以搜索源中的每个站点吗?但那我怎么得到它之后的结果呢? (连接失败/完成) 然后使用这样的代码: <?php
include_once 'simple_html_dom.php';
$url = "http://slashdot.org/";
$html = file_get_html($url);
//remove additional spaces
$pat[0] = "/^s+/";
$pat[1] = "/s{2,}/";
$pat[2] = "/s+$/";
$rep[0] = "";
$rep[1] = " ";
$rep[2] = "";
foreach($html->find('h2') as $heading) { //for each heading
//find all spans with a inside then echo the found text out
echo preg_replace($pat,$rep,$heading->find('span a',0)->plaintext) . "n";
}
?>
这导致类似于: 5.8 Earthquake Hits East Coast of the US Origins of Lager Found In Argentina Inside Oregon State University's Open Source Lab WebAPI: Mozilla Proposes Open App Interface For Smartphones Using Tablets Becoming Popular Bathroom Activity The Syrian Government's Internet Strategy Deus Ex: Human Revolution Released Taken Over By Aliens? Google Has It Covered The GIMP Now Has a Working Single-Window Mode Zombie Cookies Just Won't Die Motorola's Most Important 18 Patents MK-1 Robotic Arm Capable of Near-Human Dexterity,Dancing Evangelical Scientists Debate Creation Story Android On HP TouchPad Google Street View Gets Israeli Government's Nod Internet Restored In Tripoli As Rebels Take Control GA Tech: Internet's Mid-Layers Vulnerable To Attack Serious Crypto Bug Found In PHP 5.3.7 Twitter To Meet With UK Government About Riots EU Central Court Could Validate Software Patents (编辑:安卓应用网) 【声明】本站内容均来自网络,其相关言论仅代表作者个人观点,不代表本站立场。若无意侵犯到您的权利,请及时与联系站长删除相关内容! |
