今天早上比较无聊,就玩了玩这个东西,简单爬了一下公司的房源列表,
Qlist的使用文档: http://doc.querylist.cc/site/index/doc/3
先看看代码:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 |
public function actionIndex(){ header("Content-type:text/html;charset=utf-8"); Includes('Query/QueryList.php');Includes('Query/phpQuery.php'); $option = array( "title"=>array('.liebiao h1','text'), 'money'=>array('.liebiao .zm_money span','text'), 'xiaoqu'=>array('.liebiao ul li.info p:nth-child(2)','text'), 'dizhi'=>array('.liebiao ul li.info p:nth-child(3)','text'), 'huxin'=>array('.liebiao ul li.info p:nth-child(4)','text'), 'uodateTime'=>array('.liebiao ul li.info p:nth-child(5)','text'), ); $page = isset($_GET['page'])?$_GET['page']:'1'; $hj = QueryList::Query('http://www.kuaiyoujia.com/zufangs/house/quyu-i'.$page,$option); $result = $hj->data; $TempArr = []; foreach($result as $key=>$value){ $SetArr = []; foreach($value as $k=>$v){ $SetArr[$k] = trimall($v); } $TempArr[$key] = $SetArr; } $ArrTemp = []; for ($i = 0; $i <10 ; $i++) { $ArrTemp[$i] = $TempArr[$i]; } file_put_contents($_SERVER['DOCUMENT_ROOT'].'/uploads/HouseInfo_'.$page.'.json',json_encode($ArrTemp).PHP_EOL, FILE_APPEND); if($page <= 10){ p('第'.$page.'次'); $page = $page+1; Header('Location:http://127.0.0.1/index.php/wechat?page='.$page); }else{ p('爬完了...'); } } |
刚开始想的是,每次走一遍,就给page+1,然后让浏览器跳转一下,结果~~~
爬完了也并没有出现我想要的跳转…..
只看到本地目录蹭蹭蹭的出现一堆数据文件,
刚开始,page+1是写在Header的地址里,结果每次只走一次就不走了,后来改成这样:
$page = $page+1;
Header(‘Location:http://127.0.0.1/index.php/wechat?page=’.$page);
就顺利的跑起来~
中间用到的删除空格方法:
1 2 3 4 5 6 |
//删除空格 function trimall($str){ $qian=array(" "," ","\t","\n","\r"); $hou=array("","","","",""); return str_replace($qian,$hou,$str); } |
1 2 3 4 |
/*载入指定的扩展文件*/ function Includes($name){ require_once $_SERVER['DOCUMENT_ROOT'].'/common/extend/'.$name; } |