playwright介绍
- playwright是由微软开发的新一代web自动测试工具,相比selenium它的特点:
- 不需要安装webdriver
- 不需要手动设置等待
- playwright支持异步
- selenium底层是http(单向通信),而playwright则基于websocket(双向通信)
- 重点:自带录制功能,根据录制过程中的操作,自带生成代码
playwright codegen www.xxx.com
playwright codegen -o script.py
- playwright环境搭建:
- 版本需求:
Python >= 3.7
- 安装所需的模块:
pip install playwright
- 安装自带浏览器和ffmpeg:
playwright install
- 官方文档:https://playwright.bootcss.com/docs/why-playwright
playwright基本使用
- 导入模块:
from playwright.sync_api import sync_playwright
- 显示浏览器:
browser = 浏览器.launch(headless=False)
- 启用不同的有头浏览器
浏览器
:chromium、firefox、webKit
- 浏览器页面:
page = browser.new_page()
context = browser.new_context()
new_context()
:设置可启用多个page页面
page_num = context.new_page()
- 设置加载超时延迟:
page.wait_for_timeout(5000)
- 返回渲染后的源码:
- 入门案例:
from playwright.sync_api import sync_playwrightwith sync_playwright() as pw: browser = pw.chromium.launch(headless=False) page = browser.new_page() page.goto('https://www.baidu.com') print(page.title()) page.wait_for_timeout(3000) browser.close()
playwright选择元素
- 常用的元素选择器:
- 节点选择器:
page.query_selector_all('xxx')
page.query_selector('xxx')
- 文本选择器:
page.locator("text=文本内容")
- css选择器:
page.locator("标签名称")
- 存在多个时默认选择第一个
- 可直接使用标签的名称:
button
- 可通过id、class选择器:
#x .y
- 还有特定节点属性:
"[xxx=yyy]"
- xpath选择器:
page.locator("xpath=xxx")
- 下标选择器:
page.locator("button >> nth=x")
- 案例:
from playwright.sync_api import sync_playwrightwith sync_playwright() as pw: browser = pw.chromium.launch(headless=False) page = browser.new_page() page.goto('https://www.lagou.com/jobs/list_爬虫') jobs_data_list = page.query_selector_all('//*[@id="s_position_list"]/ul/li') for jobs_data in jobs_data_list: job_title = jobs_data.query_selector('xpath=./div[1]/div[1]/div[1]/a/h3').text_content() print(job_title) browser.close()
- 选择元素后常用的操作:
.text_content()
:
.fill('内容')
:
.type('内容')
:
.get_attribute('属性名')
:
.press('Shift+A')
:
.wait_for()
:
.鼠标单次事件()
:
- 案例:
from playwright.sync_api import sync_playwrightwith sync_playwright() as pw: browser = pw.chromium.launch(headless=False) page = browser.new_page() page.goto('https://www.baidu.com') """ 像这类的方法都有两种使用方法: 1. page.locator('xpath=xxx').fill('test') 2. page.fill('xpath=xxx', 'test') 两种方法作用相同,选择适合自己的就好 """ browser.close()
playwright鼠标操作
- 鼠标单次事件:
- 单击鼠标
左键
:
- 双击鼠标
左键
:
- 鼠标悬停:
- 单击鼠标
右键
:
page.click('元素位置', button='right')
- 按
shift
+ 单击
鼠标:
page.click('元素位置', modifiers=['Shift'])
- 鼠标点击元素的指定位置:
page.click('元素位置', position={'x': 0, 'y': 0})
- 鼠标保持事件:
- 按下鼠标
不放
:
移动
鼠标到指定位置:
page.mouse.move(x轴, y轴, steps=10)
松开
鼠标:
- 案例:
from playwright.sync_api import sync_playwrightwith sync_playwright() as pw: browser = pw.chromium.launch(headless=False) page = browser.new_page() page.goto('https://www.baidu.com') """ 这类也是是有两种使用方法: 1. page.locator('xpath=xxx').click() 2. page.click('xpath=xxx') 两种方法作用相同,选择适合自己的就好 """ browser.close()
playwright异步并发
import asynciofrom playwright.async_api import async_playwrightasync def main(): async with async_playwright() as pw: browser = await pw.chromium.launch() page = await browser.new_page() await page.goto('https://www.baidu.com') print(await page.title()) await browser.close()asyncio.run(main())
playwright其他操作
同时启用多个页面:
from playwright.sync_api import sync_playwrightwith sync_playwright() as p: browser_type = p.chromium browser = browser_type.launch(headless=False) context = browser.new_context() page1 = context.new_page() page1.goto('https://mail.163.com/') page2 = context.new_page() page2.goto("https://www.baidu.com/") context.close() browser.close()
截取浏览器页面:
from playwright.sync_api import sync_playwrightwith sync_playwright() as pw: browser = pw.webkit.launch() page = browser.new_page() page.goto('https://www.baidu.com') page.screenshot(path="baidu.png") page.screenshot(path="screenshot.png", full_page=True) page.locator('元素位置').screenshot(path="test.png") browser.close()
进入生成的frame标签:
from playwright.sync_api import sync_playwrightwith sync_playwright() as pw: browser = pw.chromium.launch(headless=False) page = browser.new_page() page.goto('https://www.baidu.com') """ 进入frame标签,有四种方式: 1. 通过url定位frame:page.frame(url='www.title.com') 2. 通过name定位frame:page.frame('title') 3. 通过特定元素定位frame:page.query_selector('.title').content_frame() 4. 通过page.frames查看全部的frame标签,然后使用:page.frames[元素下标] """ frame = page.query_selector('.title').content_frame() browser.close()
打开页面时不加载图片(网络劫持):
from playwright.sync_api import sync_playwrightimport re with sync_playwright() as pw: browser = pw.chromium.launch(headless=False) page = browser.new_page() def cancel_request(route, request): route.abort() page.route(re.compile(r"(\.png)|(\.jpg)"), cancel_request) page.goto("https://movie.douban.com/") page.wait_for_load_state('networkidle') page.screenshot(path='move_douban.png') browser.close()
事件监听,可以拦截获取Ajax加载的数据:
from playwright.sync_api import sync_playwrightdef on_response(response): if '/api/movie/' in response.url and response.status == 200: print(response.json())with sync_playwright() as pw: browser = pw.chromium.launch(headless=False) page = browser.new_page() page.on('response', on_response) page.goto('https://spa6.scrape.center/') page.wait_for_load_state('networkidle') browser.close()
防止playwright被检测为webdriver:
from playwright.sync_api import sync_playwrightwith sync_playwright() as pw: browser = pw.webkit.launch(headless=False) page = browser.new_page() page.add_init_script( """ Object.defineProperties(navigator, { webdriver:{ get:()=>undefined } }); """ ) page.goto('https://www.baidu.com') page.wait_for_timeout(100000) browser.close()
模拟移动设备打开浏览器:
with sync_playwright() as pw: mobile_type = pw.devices['iPhone 12'] browser = pw.webkit.launch(headless=False) context = browser.new_context( **mobile_type, locale='zh-CN', geolocation={'longitude': 115.725177, 'latitude': 34.404329}, permissions=['geolocation'] ) page = context.new_page() page.goto('https://amap.com') page.wait_for_load_state(state='networkidle') page.screenshot(path='mobile_web.png') browser.close()
获取元素相对于浏览器的坐标:
from playwright.sync_api import sync_playwrightwith sync_playwright() as pw: browser = pw.chromium.launch(headless=False) page = browser.new_page() page.goto('https://www.baidu.com') s = save_img_frame.locator('xpath=xxx') """ xxx.bounding_box() 获取元素相对于浏览器的坐标和元素自身的大小,返回一个字典: { 'x': 837.5375366210938, 'y': 190.31250762939453, 'width': 56, 'height': 56 } """ box = s.bounding_box() x = int(box["x"] + box["width"] / 2) y = int(box["y"] + box["height"] / 2) browser.close()