python数据分析之爬虫基础：selenium详细讲解_python selenium

技术文档

1、selenium介绍

2、selenium的作用：

3、配置浏览器驱动环境及selenium安装

4、selenium基本语法

4.1、selenium元素的定位

4.2、selenium元素的信息

4.3、selenium元素的交互

5、Phantomjs介绍

6、chrome handless模式

1、selenium介绍

（1）selenium是一个用于web应用程序测试的工具。

（2）selenium测试直接运行在浏览器中，就像真正的用户在操作一样。

（3）支持通过各种driver（FirfoxDriver，IternetExplorerDriver，OperaDriver，ChromeDriver）驱动真是浏览器完成测试。

（4）selenium也是支持无界面浏览器操作的。

2、selenium的作用：

（1）可以模拟用户在浏览器中的各种操作，如点击按钮、输入文本、提交表单等，用于对web应用程序进行功能测试，回归测试等。

（2）可以用于自动化一些重复性的网页操作任务，如批量上传文件、定时执行任务，提高工作效率。

（3）爬取一些我们无法获取的数据，比如京东上的限时秒杀数据等等。

3、配置浏览器驱动环境及selenium安装

（1）操作chrome浏览器下载

浏览器的驱动要下载和浏览器配套的版本。将下载的浏览器驱动放到python的安装目录下，并配置系统环境变量。

python数据分析之爬虫基础：selenium详细讲解_python selenium

安装地址：官网

（2）selenium的安装

pip install -i https://pypi.tuna.tsinghua.edu.cn/simple selenium

（3）测试浏览器驱动是否正常

from selenium import webdriver# 创建浏览器操作对象driver = webdriver.Chrome()driver.get(\'http://www.baidu.com\')input()

4、selenium基本语法

4.1、selenium元素的定位

元素定位：自动化要做的就是模拟鼠标和键盘来操作这些元素，点击、输入等等。操作这些元素首先要找到他们，webdriver提供很多元素定位的方法。

（1）id可以唯一定位到一个元素（以百度的百度一下四个字为例）

button =driver.find_element(By.ID,\"su\")

（2）name要确保是全局唯一的（以百度的文本搜索框为例）

button = driver.find_element(By.NAME,\"wd\")

（3）xpath全局唯一

button = driver.find_element(By.XPATH,\"//input[@id=\'su\']\")

（4）tag name标签，即标签名字

button = driver.find_element(By.TAG_NAME,\"input\")

（5）css selector元素样式（通过bs4的语法来获取对象）

button = driver.find_element(By.CSS_SELECTOR, \'#su\')

（6）link text获取链接文本

button = driver.find_element(By.LINK_TEXT, \'新闻\')

4.2、selenium元素的信息

（1）通过get_attribute来获取class的属性值

input = driver.find_element(By.ID, \'su\')print(input.get_attribute(\'class\'))

（2）通过text获取元素文本（只能获取两个标签之间的数据哦）

a = driver.find_element(By.LINK_TEXT, \'新闻\')print(a.text)

（3）通过tag_name获取标签名

input = driver.find_element(By.ID, \'su\')print(input.tag_name)

4.3、selenium元素的交互

（1）click点击对象（以点击百度一下按钮为例）

button = driver.find_element(By.ID,\"su\").click()

（2）send_keys在对象上模拟按键输入（搜索框中输入周杰伦）

input = driver.find_element(By.ID,\"kw\").send_keys(\"周杰伦\")

（3）滑到底部

js_bottom = \"document.documentElement.scrollTop=10000\"driver.execute_script(js_bottom)

（4）回退选项

driver.back()

（5）返回上一选项

driver.forward()

案例：在百度搜索框中搜索周杰伦，翻到最后一页，打开下一页，回退选项，在返回上一选项。最后退出浏览器

from selenium import webdriverimport timefrom selenium.webdriver.common.by import Bydriver = webdriver.Chrome()url = \"https://www.baidu.com\"driver.get(url)time.sleep(2)input = driver.find_element(By.ID,\"kw\").send_keys(\"周杰伦\")time.sleep(2)button = driver.find_element(By.ID,\"su\").click()time.sleep(2)js_bottom = \"document.documentElement.scrollTop=10000\"driver.execute_script(js_bottom)time.sleep(2)# 获取下一页的按钮next_button = driver.find_element(By.XPATH,\"//a[@class=\'n\']\").click()time.sleep(2)# 回退，回到上一页driver.back()time.sleep(2)# 返回上一选项driver.forward()time.sleep(5)driver.quit()

5、Phantomjs介绍

（1）是一个无界面的浏览器

（2）支持页面元素查找，js的执行等

（3）由于不进行css和gui渲染，运行效率要比真实的浏览器要快很多

但是Phantomjs这个公司已经黄了，维护者已经辞职并停止维护了，因此这里就不讲解了。

6、chrome handless模式

chrome-handless模式，Google针对chrome浏览器59版本新增加的一种模式，可以在不打开UI界面的情况下使用chrome浏览器，所以运行效果与chrome保持一致。

系统要求：

chrome：

Unix/Linux 系统需要 chrome >=59

windows 系统需要 chrome >=60

python版本 >=3.6

selenium版本 >=3.4.*

chromedriver版本 >=2.31

from selenium import webdriverfrom selenium.webdriver.common.by import Byfrom selenium.webdriver.chrome.options import Optionschrome_options = Options()chrome_options.add_argument(\'--headless\')chrome_options.add_argument(\'--disable-gpu\')driver = webdriver.Chrome(options=chrome_options)url = \"https://www.baidu.com\"driver.get(url)driver.save_screenshot(\"screenshot.png\")

但这样每次都需要配置的话会比较麻烦，我们只要封装到函数里面，那么每次用只需要调用函数

from selenium import webdriverfrom selenium.webdriver.chrome.options import Optionsdef share_browser(): chrome_options = Options() chrome_options.add_argument(\'--headless\') chrome_options.add_argument(\'--disable-gpu\') driver = webdriver.Chrome(options=chrome_options) return driverdriver = share_browser()

生活小窍门

python数据分析之爬虫基础：selenium详细讲解_python selenium

1、selenium介绍

2、selenium的作用：

3、配置浏览器驱动环境及selenium安装

4、selenium基本语法

4.1、selenium元素的定位

4.2、selenium元素的信息

4.3、selenium元素的交互

5、Phantomjs介绍

6、chrome handless模式

公告

DeepSeek全套部署资料免费下载

免费可商用字体批量下载

标签

python数据分析之爬虫基础：selenium详细讲解_python selenium

1、selenium介绍

2、selenium的作用：

3、配置浏览器驱动环境及selenium安装

4、selenium基本语法

4.1、selenium元素的定位

4.2、selenium元素的信息

4.3、selenium元素的交互

5、Phantomjs介绍

6、chrome handless模式

相关问题

公告

DeepSeek全套部署资料免费下载

免费可商用字体批量下载

标签