puppeteer 携带cookie 访问网站方法!
首先我们在浏览器登陆需要操作的网站,然后复制cookie, 以百度为例:
获取到cooike为: (已随机插入字符)
1 |
const str = `BIDUPSID=096AB7423345867F5A96434DCF7F5652xxx; PSTM=1613050707; BAIDUID=096AB7423345867FC1716AC2C8000E64:FG=1; __yjs_duid=1_010c56ee9c14c5449c7413d2b2f54ce61613050720132; BD_UPN=123253; BDUSS=mZIZVpEenViS1BiV1g0UW9KN0s0VHdIdFpORHFLTzFFdmw4VVlqS3ZxMjFBbGRnSVFBQUFBJCQAAAAAAAAAAAEAAAAfwWAiztLKx9Ch0KHJr8CtAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAALV1L2C1dS9gS; BDUSS_BFESS=mZIZVpEenViS1BiV1g0UW9KN0s0VHdIdFpORHFLTzFFdmw4VVlqS3ZxMjFBbGRnSVFBQUFBJCQAAAAAAAAAAAEAAAAfwWAiztLKx9Ch0KHJr8CtAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAALV1L2C1dS9gS; MCITY=-131%3A; SIGNIN_UC=70a2711cf1d3d9b1a82d2f87d633bd8a03641615255we6KklgNRBlIez6rfDqs8gZJk8SPGONUQOymGg4V8Ijt1p4j3sfH2ECI%2By%2BuuCJAlucoZcWdgY08LNwx0rNcT6JdPPnMkJmYMDPz2JIJiC%2FKxIB%2FtDdvuZYvo%2FBnmg9eActqwCoXaCySISW%2FfhIXPiWEGvMpgyyms3gleDgJAJ6CGG%2FIigl5Ql%2FrRqAYxJEcP%2BOvWT2jFoPzl3hHQsCQSYEnUu1dvfPpnrzqn45qAnMJZwTiUaVmW781AwydoHbjh7hxHOo%2FDKjD1LQWULd5fCOZ7mQcPwKh4DMAsAl0MA%2BUiVVjq8Dh4Po0fHoyYhtd54327687073570082385922989507423; H_PS_PSSID=33517_33273_31253_33595_26350; BDORZ=B490B5EBF6F3CD402E515D22BCDA1598; H_PS_645EC=31caQsoMI4ia1zLkj4bk7yZz3vt8nLAkQLfdzJNoBLr5e8IjFSLVucWtweofXaj%2FsFt1; delPer=0; BD_CK_SAM=1; PSINO=2; BD_HOME=1; sug=3; sugstore=1; ORIGIN=0; bdime=20100; BA_HECTOR=0g8k2k85al04a5ah6b1g3eg000q; BAIDUID_BFESS=096AB7423345867FC1716AC2C8000E64:FG=1`; |
然后我们分析 puppeteer 的 setCookie 方法, 发现需要传入一个完整的cookie, 格式如下:
1 2 3 4 5 6 |
{ domain: '', name: '', value: '', path: '' } |
因此我们需要先处理我们拿到的cookie为这样的格式
1 2 3 4 5 6 7 8 9 10 |
const cookie = []; str.split(';').map((value, index) => { const key = value.split('='); const item= {}; item['domain'] = '.baidu.com'; item['name'] = key[0].replace(' ', ''); item['value'] = key[1].replace(' ', ''); item['path'] = '/'; cookie.push(item); }); |
然后启动puppeteer,尝试登陆结果:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 |
(async () => { puppeteer.launch({ headless: false, // 开启界面 devtools: true, // 自动开启 F12 }).then(async browser => { const page = await browser.newPage(); await page.setUserAgent("Mozilla/5.0 (Windows NT 10.0; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/85.0.4181.9 Safari/537.36") cookie.map(async (value) => { await page.setCookie(value) }) await page.goto('https://www.baidu.com'); const html = await page.content(); const news = []; const $ = cheero.load(html); $('.s-news-rank-content li').map((index,val)=> { const title = $(val).find('.title-content-title').text() news.push(title); }) page.close(); console.log(news) }); })() |
执行中我们可以看到cookie已经写入成功了;抓到数据如下:
1 2 3 4 5 6 7 8 9 10 |
[ '10人当选全国脱贫攻坚楷模', '一家12口持BNO护照投奔英国被遣返', '美舰穿航台湾海峡 东部战区回应', '25岁女孩欠二十万外债抑郁失联', '脱贫攻坚楷模奖章设计有4个含义', '交通运输部回应货拉拉女生跳车事件', '曝中国人寿员工未配合造假被解约', '天文科普专家承认性骚扰并致歉', '业主买车位停2辆车被物业制止', '涉事货拉拉司机开过饭店有房有车' ] |
这是百度登陆后推荐的热点新闻列表, 表示我们的操作成功!
最终代码:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 |
/** * Created by 天明 * Date: 2021/2/25 * Time: 10:23 * Description: */ const puppeteer = require('puppeteer'); const cheero = require('cheerio') const cookie = []; const str = `BIDUPSID=096AB7423345867F5A96434DCF7F5652xxx; PSTM=1613050707; BAIDUID=096AB7423345867FC1716AC2C8000E64:FG=1; __yjs_duid=1_010c56ee9c14c5449c7413d2b2f54ce61613050720132; BD_UPN=123253; BDUSS=mZIZVpEenViS1BiV1g0UW9KN0s0VHdIdFpORHFLTzFFdmw4VVlqS3ZxMjFBbGRnSVFBQUFBJCQAAAAAAAAAAAEAAAAfwWAiztLKx9Ch0KHJr8CtAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAALV1L2C1dS9gS; BDUSS_BFESS=mZIZVpEenViS1BiV1g0UW9KN0s0VHdIdFpORHFLTzFFdmw4VVlqS3ZxMjFBbGRnSVFBQUFBJCQAAAAAAAAAAAEAAAAfwWAiztLKx9Ch0KHJr8CtAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAALV1L2C1dS9gS; MCITY=-131%3A; SIGNIN_UC=70a2711cf1d3d9b1a82d2f87d633bd8a03641615255we6KklgNRBlIez6rfDqs8gZJk8SPGONUQOymGg4V8Ijt1p4j3sfH2ECI%2By%2BuuCJAlucoZcWdgY08LNwx0rNcT6JdPPnMkJmYMDPz2JIJiC%2FKxIB%2FtDdvuZYvo%2FBnmg9eActqwCoXaCySISW%2FfhIXPiWEGvMpgyyms3gleDgJAJ6CGG%2FIigl5Ql%2FrRqAYxJEcP%2BOvWT2jFoPzl3hHQsCQSYEnUu1dvfPpnrzqn45qAnMJZwTiUaVmW781AwydoHbjh7hxHOo%2FDKjD1LQWULd5fCOZ7mQcPwKh4DMAsAl0MA%2BUiVVjq8Dh4Po0fHoyYhtd54327687073570082385922989507423; H_PS_PSSID=33517_33273_31253_33595_26350; BDORZ=B490B5EBF6F3CD402E515D22BCDA1598; H_PS_645EC=31caQsoMI4ia1zLkj4bk7yZz3vt8nLAkQLfdzJNoBLr5e8IjFSLVucWtweofXaj%2FsFt1; delPer=0; BD_CK_SAM=1; PSINO=2; BD_HOME=1; sug=3; sugstore=1; ORIGIN=0; bdime=20100; BA_HECTOR=0g8k2k85al04a5ah6b1g3eg000q; BAIDUID_BFESS=096AB7423345867FC1716AC2C8000E64:FG=1`; str.split(';').map((value, index) => { const key = value.split('='); const item= {}; item['domain'] = '.baidu.com'; item['name'] = key[0].replace(' ', ''); item['value'] = key[1].replace(' ', ''); item['path'] = '/'; cookie.push(item); }); (async () => { puppeteer.launch({ headless: false, // 开启界面 devtools: true, // 自动开启 F12 }).then(async browser => { const page = await browser.newPage(); await page.setUserAgent("Mozilla/5.0 (Windows NT 10.0; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/85.0.4181.9 Safari/537.36") cookie.map(async (value) => { await page.setCookie(value) }) await page.goto('https://www.baidu.com'); const html = await page.content(); const news = []; const $ = cheero.load(html); $('.s-news-rank-content li').map((index,val)=> { const title = $(val).find('.title-content-title').text() news.push(title); }) page.close(); console.log(news) }); })() |