Selenium使用教程-爬虫版（超详细）_selenium 教程

技术文档

1.环境搭建

# 在安装过程中最好限定框架版本为4.9.1
pip install selenium==4.9.1 -i https://pypi.tuna.tsinghua.edu.cn/simple

安装完selenium后，还需要安装使用selenium控制的浏览器需要的驱动。

谷歌驱动下载地址：Chrome for Testing availability

下载这一部分

记得驱动跟浏览器版本要对应

输入chrome://version/

所以要寻找版本相同或者接近的驱动

下载后解压将

移到python的安装目录下，因为python目录已经设置环境变量了，相当于驱动也是全局的。如果你用虚拟环境，你移到虚拟环境python目录下。

测试

from selenium import webdriver# 获取要操作的浏览器驱动对象browser = webdriver.Chrome()# 加载指定的页面browser.get(\"http://www.baidu.com\")# 截屏browser.save_screenshot(\"百度首页.png\")

获取图片成功

说明环境搭建成功。

谷歌浏览器也会自动更新，每次更新就要重新安装驱动，很麻烦，

2.禁止浏览器更新

进入这个目录下：C:\\Windows\\System32\\drivers\\etc

然后打开host，往最后添加

最后添加127.0.0.1 update.googleapis.com

保存后 win+r 快捷键，输入cmd打开命令行，输入\"ipconfig /flushdns\" 刷新dns

刷新浏览器即可

3.基本操作

基本初始化浏览器（基本）

from selenium import webdriverfrom selenium.webdriver.chrome.service import Servicefrom webdriver_manager.chrome import ChromeDriverManager# Chrome浏览器driver = webdriver.Chrome(service=Service(ChromeDriverManager().install()))# Firefox浏览器driver = webdriver.Firefox()# Edge浏览器driver = webdriver.Edge()

浏览器导航操作（基本）

# 打开网页driver.get(\"https://www.example.com\")# 获取当前URLcurrent_url = driver.current_url# 获取页面标题title = driver.title# 获取页面源代码page_source = driver.page_source# 浏览器前进driver.forward()# 浏览器后退driver.back()# 刷新页面driver.refresh()# 关闭当前窗口driver.close()# 关闭浏览器所有窗口driver.quit()

元素定位方法（重要）

from selenium.webdriver.common.by import By# 通过ID定位element = driver.find_element(By.ID, \"element_id\")# 通过NAME定位element = driver.find_element(By.NAME, \"element_name\")# 通过CLASS_NAME定位element = driver.find_element(By.CLASS_NAME, \"class_name\")# 通过TAG_NAME定位element = driver.find_element(By.TAG_NAME, \"div\")# 通过LINK_TEXT定位element = driver.find_element(By.LINK_TEXT, \"点击这里\")# 通过PARTIAL_LINK_TEXT定位element = driver.find_element(By.PARTIAL_LINK_TEXT, \"点击\")# 通过CSS_SELECTOR定位element = driver.find_element(By.CSS_SELECTOR, \"#id > div.class\")# 通过XPATH定位element = driver.find_element(By.XPATH, \"//div[@id=\'element_id\']\")# 查找多个元素elements = driver.find_elements(By.CLASS_NAME, \"class_name\")

元素交互操作（重要）

# 点击元素element.click()# 输入文本element.send_keys(\"要输入的文本\")# 清除文本element.clear()# 获取元素文本内容text = element.text# 获取元素属性值attribute = element.get_attribute(\"attribute_name\")# 检查元素是否显示is_displayed = element.is_displayed()# 检查元素是否启用is_enabled = element.is_enabled()# 检查元素是否被选中is_selected = element.is_selected()

下拉菜单操作

from selenium.webdriver.support.ui import Select# 创建Select对象select = Select(driver.find_element(By.ID, \"dropdown_id\"))# 通过索引选择select.select_by_index(1)# 通过值选择select.select_by_value(\"value\")# 通过可见文本选择select.select_by_visible_text(\"选项文本\")# 取消所有选择select.deselect_all()# 获取所有选项options = select.options# 获取所有已选选项selected_options = select.all_selected_options# 获取第一个选中的选项first_selected = select.first_selected_option

等待策略

from selenium.webdriver.support.ui import WebDriverWaitfrom selenium.webdriver.support import expected_conditions as ECimport time# 强制等待time.sleep(5) # 等待5秒# 隐式等待driver.implicitly_wait(10) # 最多等待10秒# 显式等待wait = WebDriverWait(driver, 10) # 最多等待10秒element = wait.until(EC.presence_of_element_located((By.ID, \"element_id\")))# 常用的预期条件wait.until(EC.title_contains(\"部分标题\"))wait.until(EC.visibility_of_element_located((By.ID, \"element_id\")))wait.until(EC.element_to_be_clickable((By.ID, \"element_id\")))wait.until(EC.alert_is_present())

鼠标和键盘操作

from selenium.webdriver.common.action_chains import ActionChainsfrom selenium.webdriver.common.keys import Keys# 创建ActionChains对象actions = ActionChains(driver)# 鼠标悬停actions.move_to_element(element).perform()# 鼠标右键点击actions.context_click(element).perform()# 双击actions.double_click(element).perform()# 拖放操作actions.drag_and_drop(source_element, target_element).perform()# 按住并移动actions.click_and_hold(element).move_by_offset(10, 20).release().perform()# 键盘操作element.send_keys(Keys.ENTER) # 回车键element.send_keys(Keys.CONTROL + \'a\') # Ctrl+A 全选element.send_keys(Keys.CONTROL + \'c\') # Ctrl+C 复制element.send_keys(Keys.CONTROL + \'v\') # Ctrl+V 粘贴

cookies操作（重要）

# 添加Cookiedriver.add_cookie({\"name\": \"cookie_name\", \"value\": \"cookie_value\"})# 获取所有Cookiecookies = driver.get_cookies()# 获取特定Cookiecookie = driver.get_cookie(\"cookie_name\")# 删除特定Cookiedriver.delete_cookie(\"cookie_name\")# 删除所有Cookiedriver.delete_all_cookies()

截图操作

# 截取整个页面driver.save_screenshot(\"screenshot.png\")# 截取特定元素element.screenshot(\"element_screenshot.png\")

执行JavaScript （重要）

# 执行JavaScriptdriver.execute_script(\"window.scrollTo(0, document.body.scrollHeight);\")# 带参数执行JavaScriptdriver.execute_script(\"arguments[0].scrollIntoView();\", element)# 获取JavaScript返回值title = driver.execute_script(\"return document.title;\")

4.封装Selenium类

由以上的基本操作封装一个selenium类

第一步设置初始化

class SeleniumCrawler: \"\"\" Selenium爬虫类，用于网页自动化操作和数据抓取 \"\"\" def __init__(self, driver_path=None, headless=False,url =None,  disable_images=False, proxy=None,  disable_automation_control=True, implicit_wait=10): \"\"\" 初始化Selenium爬虫 参数: browser_type (str): 浏览器类型，支持\'chrome\', driver_path (str): 浏览器驱动路径，如果为None则使用系统PATH中的驱动 headless (bool): 是否使用无头模式（不显示浏览器界面） disable_images (bool): 是否禁用图片加载以提高性能 user_agent (str): 自定义User-Agent proxy (str): 代理服务器地址，格式为\'ip:port\' disable_automation_control (bool): 是否禁用自动化控制特征（反爬虫检测） implicit_wait (int): 隐式等待时间（秒） \"\"\" self.browser_type = \'chrome\' self.url = url self.driver_path = driver_path self.driver = None self.headless = headless self.disable_images = disable_images self.user_agent = \"Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/138.0.0.0 Safari/537.36\", self.proxy = proxy self.disable_automation_control = disable_automation_control self.implicit_wait = implicit_wait # 初始化浏览器 self._init_browser()

初始化属性设置

self.browser_type = \'chrome\'self.driver_path = driver_pathself.driver = Noneself.headless = headlessself.disable_images = disable_imagesself.user_agent = \"Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/138.0.0.0 Safari/537.36\",self.proxy = proxyself.disable_automation_control = disable_automation_controlself.implicit_wait = implicit_waitself.url =url# 初始化浏览器self._init_browser()

self.browser_type指定谷歌浏览器，一般都是谷歌浏览器，包括驱动也是，如果你安装别的驱动，可以指定firefox或edge

self.driver_path ,这个指驱动路径，可以不用管，因为一般驱动都是设置在环境路径下，也就是在安装在python的安装目录下。

self.headless 是否开启无头模式

self.disable_images 是否加载图片

self.user_agent设置请求头

self.proxy 设置代理ip

self.disable_automation_control否禁用自动化控制特征（反爬虫检测）

self.url在初始化时可以定义url模拟，也可以再类方式里再次定义

self.implicit_wait 隐式等待时间

第二步，初始化浏览器类型

 def _init_browser(self): \"\"\" 根据配置初始化浏览器 \"\"\" if self.browser_type == \'chrome\': self._init_chrome() else: raise ValueError(f\"不支持的浏览器类型: {self.browser_type}，请使用\'chrome\'\") # 设置隐式等待时间 if self.driver: self.driver.implicitly_wait(self.implicit_wait)

第三步，初始浏览器配置

 def _init_chrome(self): \"\"\" 初始化Chrome浏览器 \"\"\" options = Options() # 无头模式配置 if self.headless: options.add_argument(\'--headless\') options.add_argument(\'--disable-gpu\') # 禁用图片加载 if self.disable_images: options.add_argument(\'--blink-settings=imagesEnabled=false\') # 设置User-Agent if self.user_agent: options.add_argument(f\'--user-agent={self.user_agent}\') # 设置代理 if self.proxy: options.add_argument(f\'--proxy-server={self.proxy}\') # 禁用自动化控制特征（反爬虫检测） if self.disable_automation_control: options.add_argument(\"--disable-blink-features=AutomationControlled\") options.add_experimental_option(\"excludeSwitches\", [\"enable-automation\"]) options.add_experimental_option(\'useAutomationExtension\', False) # 初始化Chrome浏览器 if self.driver_path: service = Service(executable_path=self.driver_path) self.driver = webdriver.Chrome(service=service, options=options) else: self.driver = webdriver.Chrome(options=options) # 进一步防止被检测为自动化工具 if self.disable_automation_control: self.driver.execute_cdp_cmd(\"Page.addScriptToEvaluateOnNewDocument\", { \"source\": \"\"\" Object.defineProperty(navigator, \'webdriver\', {get: () => undefined}) \"\"\" })

包含反爬检测，设置ip，无头，不加载图片，设置ua等功能。

第四步，封装获取网页源代码方法（重要）

 def get_page_source(self,url=None) -> str: if url: self.driver.get(self.url) \"\"\" 获取指定URL的网页HTML源代码 参数: url (str): 要访问的网页URL 返回: str: 网页的HTML源代码，如果获取失败则返回None \"\"\" try: # 访问URL # 等待页面加载完成，可以根据特定元素的存在来判断 # 这里简单地等待页面完全加载 time.sleep(1) # 获取页面源代码 page_source = self.driver.page_source return page_source except TimeoutException: print(f\"页面加载超时: {self.driver.current_url}\") return None except Exception as e: print(f\"获取页面源代码时发生错误: {e}\") return None

测试案例：百度

这是整个目录

from selenium_wrapper import SeleniumCrawlercrawler = SeleniumCrawler(headless = True)print(crawler.get_page_source(\'https://www.baidu.com\'))

结果如下：

第五步，封装获取cookies（重要）

 def get_cookies(self,url = None) -> dict: \"\"\" 获取当前页面的cookies，并以cookies[name] = cookies[value]的形式返回 返回: dict: cookies字典，如果获取失败则返回None \"\"\" time.sleep(2) try: if url: self.driver.get(self.url) # 获取所有cookies cookies = self.driver.get_cookies() # 将cookies转换为字典格式 cookies_dict = {} for cookie in cookies: cookies_dict[cookie[\'name\']] = cookie[\'value\'] return cookies_dict except Exception as e: print(f\"获取cookies时发生错误: {e}\") return None

这部分相当重要，一般可以通过自动化工具获取cookies绕过瑞数等安全产品cookies反爬

举一个例子：招标采购-厦门医学院

瑞数5案例

def get_cookies(): cookies = crawler.get_cookies(\'https://www.xmmc.edu.cn/index/zbcg/150.htm\') print(cookies) for i in range(10): url = \"https://www.xmmc.edu.cn/index/zbcg/150.htm\" response = requests.get(url, headers=headers,cookies = cookies) print(response)get_cookies()

第六步，设置浏览器打开，关闭或退出浏览器

 def open_url(self, url): \"\"\" 打开指定URL 参数: url (str): 要访问的网页URL \"\"\" self.driver.get(url) def close(self): \"\"\" 关闭当前浏览器窗口 \"\"\" if self.driver: self.driver.close() def quit(self): \"\"\" 退出浏览器，释放所有资源 \"\"\" if self.driver: self.driver.quit() self.driver = None

def close_html(): crawler = SeleniumCrawler(headless=False,url = \'https://www.baidu.com\') crawler.close() crawler.quit()

第七步，设置定位输入和定位点击

以下用来等待策略

 def send_keys(self, by, value, text, timeout=10): \"\"\" 向指定元素发送文本 参数: by (str): 元素定位方式（如ID、NAME、CLASS_NAME等） value (str): 元素定位值 text (str): 要输入的文本 timeout (int): 等待元素出现的超时时间（秒） 返回: bool: 如果操作成功返回True，否则返回False \"\"\" try: by = \'By.\'+by # 设置显式等待 wait = WebDriverWait(self.driver, timeout) element = wait.until(EC.presence_of_element_located((eval(by), value))) # 清空输入框并输入文本 element.clear() element.send_keys(text) return True except TimeoutException: print(f\"元素未找到: {by} = {value}\") return False except Exception as e: print(f\"输入文本时发生错误: {e}\") return False def click_element(self, by, value, timeout=10): \"\"\" 点击指定元素 参数: by (str): 元素定位方式（如ID、NAME、CLASS_NAME等） value (str): 元素定位值 timeout (int): 等待元素出现的超时时间（秒） 返回: bool: 如果操作成功返回True，否则返回False \"\"\" try: by = \'By.\' + by # 设置显式等待 wait = WebDriverWait(self.driver, timeout) element = wait.until(EC.element_to_be_clickable((eval(by), value))) # 点击元素 element.click() return True except TimeoutException: print(f\"元素未找到或不可点击: {by} = {value}\") return False except Exception as e: print(f\"点击元素时发生错误: {e}\") return False

例子：百度

def send_click(): crawler = SeleniumCrawler(headless=False,url = \'https://www.baidu.com\') \'\'\' 参数: by (str): 元素定位方式（如ID、NAME、CLASS_NAME等） value (str): 元素定位值 timeout (int): 等待元素出现的超时时间（秒） \'\'\' crawler.send_keys(\'ID\',\'kw\',\'python\') time.sleep(2) crawler.click_element(\'ID\',\'su\') time.sleep(2) print(crawler.get_page_source())send_click()

第八步，操作控制台

 def execute_console_command(self, command): \"\"\" 在浏览器控制台执行JavaScript命令 参数: command (str): 要执行的JavaScript命令 返回: 执行命令的结果 \"\"\" try: result = self.driver.execute_script(command) return result except Exception as e: print(f\"执行JavaScript命令时发生错误: {e}\") return None

例子：

def console(): # 创建SeleniumCrawler实例 crawler = SeleniumCrawler(headless=False,url = \'https://www.baidu.com\') # 执行JavaScript命令 command = \'return document.title;\' result = crawler.execute_console_command(command) print(f\"执行结果: {result}\") # 等待几秒，以便观察结果 time.sleep(12) # 关闭浏览器 crawler.close()console()

目前就封装到这，全案例最终稿结果

from selenium import webdriverfrom selenium.webdriver.chrome.service import Servicefrom selenium.webdriver.chrome.options import Optionsfrom selenium.webdriver.common.by import Byfrom selenium.webdriver.support.ui import WebDriverWaitfrom selenium.webdriver.support import expected_conditions as ECfrom selenium.common.exceptions import TimeoutException, NoSuchElementExceptionimport timeimport osclass SeleniumCrawler: \"\"\" Selenium爬虫类，用于网页自动化操作和数据抓取 \"\"\" def __init__(self, driver_path=None, headless=False,url =None,  disable_images=False, proxy=None,  disable_automation_control=True, implicit_wait=10): \"\"\" 初始化Selenium爬虫 参数: browser_type (str): 浏览器类型，支持\'chrome\', driver_path (str): 浏览器驱动路径，如果为None则使用系统PATH中的驱动 headless (bool): 是否使用无头模式（不显示浏览器界面） disable_images (bool): 是否禁用图片加载以提高性能 user_agent (str): 自定义User-Agent proxy (str): 代理服务器地址，格式为\'ip:port\' disable_automation_control (bool): 是否禁用自动化控制特征（反爬虫检测） implicit_wait (int): 隐式等待时间（秒） \"\"\" self.browser_type = \'chrome\' self.url = url self.driver_path = driver_path self.driver = None self.headless = headless self.disable_images = disable_images self.user_agent = \"Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/138.0.0.0 Safari/537.36\", self.proxy = proxy self.disable_automation_control = disable_automation_control self.implicit_wait = implicit_wait # 初始化浏览器 self._init_browser() def _init_browser(self): \"\"\" 根据配置初始化浏览器 \"\"\" if self.browser_type == \'chrome\': self._init_chrome() else: raise ValueError(f\"不支持的浏览器类型: {self.browser_type}，请使用\'chrome\'\") # 设置隐式等待时间 if self.driver: self.driver.implicitly_wait(self.implicit_wait) def _init_chrome(self): \"\"\" 初始化Chrome浏览器 \"\"\" options = Options() # 无头模式配置 if self.headless: options.add_argument(\'--headless\') options.add_argument(\'--disable-gpu\') # # 禁用图片加载 # if self.disable_images: # options.add_argument(\'--blink-settings=imagesEnabled=false\') # 设置User-Agent if self.user_agent: options.add_argument(f\'--user-agent={self.user_agent}\') # 设置代理 if self.proxy: options.add_argument(f\'--proxy-server={self.proxy}\') # 禁用自动化控制特征（反爬虫检测） if self.disable_automation_control: options.add_argument(\"--disable-blink-features=AutomationControlled\") options.add_experimental_option(\"excludeSwitches\", [\"enable-automation\"]) options.add_experimental_option(\'useAutomationExtension\', False) # 初始化Chrome浏览器 if self.driver_path: service = Service(executable_path=self.driver_path) self.driver = webdriver.Chrome(service=service, options=options) else: self.driver = webdriver.Chrome(options=options) # 进一步防止被检测为自动化工具 if self.disable_automation_control: self.driver.execute_cdp_cmd(\"Page.addScriptToEvaluateOnNewDocument\", { \"source\": \"\"\" Object.defineProperty(navigator, \'webdriver\', {get: () => undefined}) \"\"\" }) if self.url: self.driver.get(self.url) #获取url def open_url(self, url): \"\"\" 打开指定URL 参数: url (str): 要访问的网页URL \"\"\" self.driver.get(url) def close(self): \"\"\" 关闭当前浏览器窗口 \"\"\" if self.driver: self.driver.close() def quit(self): \"\"\" 退出浏览器，释放所有资源 \"\"\" if self.driver: self.driver.quit() self.driver = None def get_page_source(self,url=None) -> str: if url: self.driver.get(self.url) \"\"\" 获取指定URL的网页HTML源代码 参数: url (str): 要访问的网页URL 返回: str: 网页的HTML源代码，如果获取失败则返回None \"\"\" try: # 访问URL # 等待页面加载完成，可以根据特定元素的存在来判断 # 这里简单地等待页面完全加载 time.sleep(1) # 获取页面源代码 page_source = self.driver.page_source return page_source except TimeoutException: print(f\"页面加载超时: {self.driver.current_url}\") return None except Exception as e: print(f\"获取页面源代码时发生错误: {e}\") return None def get_cookies(self,url = None) -> dict: \"\"\" 获取当前页面的cookies，并以cookies[name] = cookies[value]的形式返回 返回: dict: cookies字典，如果获取失败则返回None 对于cookies反爬可以用此手段 \"\"\" time.sleep(1) try: if url: self.driver.get(self.url) # 获取所有cookies cookies = self.driver.get_cookies() # 将cookies转换为字典格式 cookies_dict = {} for cookie in cookies: cookies_dict[cookie[\'name\']] = cookie[\'value\'] return cookies_dict except Exception as e: print(f\"获取cookies时发生错误: {e}\") return None def send_keys(self, by, value, text, timeout=10): \"\"\" 向指定元素发送文本 参数: by (str): 元素定位方式（如ID、NAME、CLASS_NAME等） value (str): 元素定位值 text (str): 要输入的文本 timeout (int): 等待元素出现的超时时间（秒） 返回: bool: 如果操作成功返回True，否则返回False \"\"\" try: by = \'By.\'+by # 设置显式等待 wait = WebDriverWait(self.driver, timeout) #等待出现时点击 element = wait.until(EC.presence_of_element_located((eval(by), value))) # 清空输入框并输入文本 element.clear() element.send_keys(text) return True except TimeoutException: print(f\"元素未找到: {by} = {value}\") return False except Exception as e: print(f\"输入文本时发生错误: {e}\") return False def click_element(self, by, value, timeout=10): \"\"\" 点击指定元素 参数: by (str): 元素定位方式（如ID、NAME、CLASS_NAME等） value (str): 元素定位值 timeout (int): 等待元素出现的超时时间（秒） 返回: bool: 如果操作成功返回True，否则返回False \"\"\" try: by = \'By.\' + by # 设置显式等待 wait = WebDriverWait(self.driver, timeout) element = wait.until(EC.element_to_be_clickable((eval(by), value))) # 点击元素 element.click() return True except TimeoutException: print(f\"元素未找到或不可点击: {by} = {value}\") return False except Exception as e: print(f\"点击元素时发生错误: {e}\") return False def execute_console_command(self, command): \"\"\" 在浏览器控制台执行JavaScript命令 参数: command (str): 要执行的JavaScript命令 返回: 执行命令的结果 \"\"\" try: # 执行JavaScript命令 result = self.driver.execute_script(command) return result except Exception as e: print(f\"执行JavaScript命令时发生错误: {e}\") return None

全测试案例

from selenium.webdriver import Keysfrom selenium_wrapper import SeleniumCrawlerimport requestsfrom selenium.webdriver.common.by import Byimport timeheaders = { \"Accept\": \"text/html,application/xhtml+xml,application/xml;q=0.9,image/avif,image/webp,image/apng,*/*;q=0.8,application/signed-exchange;v=b3;q=0.7\", \"Accept-Language\": \"zh-CN,zh;q=0.9\", \"Cache-Control\": \"no-cache\", \"Connection\": \"keep-alive\", \"Pragma\": \"no-cache\", \"Referer\": \"https://www.jscq.com.cn/jscq/xwzx/tzgg/e866e61a-3.shtml\", \"Sec-Fetch-Dest\": \"document\", \"Sec-Fetch-Mode\": \"navigate\", \"Sec-Fetch-Site\": \"same-origin\", \"Sec-Fetch-User\": \"?1\", \"Upgrade-Insecure-Requests\": \"1\", \"User-Agent\": \"Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/138.0.0.0 Safari/537.36\", \"sec-ch-ua\": \"\\\"Not)A;Brand\\\";v=\\\"8\\\", \\\"Chromium\\\";v=\\\"138\\\", \\\"Google Chrome\\\";v=\\\"138\\\"\", \"sec-ch-ua-mobile\": \"?0\", \"sec-ch-ua-platform\": \"\\\"Windows\\\"\"}def get_html(): crawler = SeleniumCrawler(headless=False,url = \'https://www.baidu.com\') print(crawler.get_page_source())def get_cookies(): crawler = SeleniumCrawler(headless=False, url=\'https://www.xmmc.edu.cn/index/zbcg/150.htm\') cookies = crawler.get_cookies() crawler.quit() # print(cookies) for i in range(10): url = \"https://www.xmmc.edu.cn/index/zbcg/150.htm\" response = requests.get(url, headers=headers,cookies = cookies) print(response)def close_html(): crawler = SeleniumCrawler(headless=False,url = \'https://www.baidu.com\') crawler.close() crawler.quit()def send_click(): crawler = SeleniumCrawler(headless=False,url = \'https://www.baidu.com\') \'\'\' 参数: by (str): 元素定位方式（如ID、NAME、CLASS_NAME等） value (str): 元素定位值 timeout (int): 等待元素出现的超时时间（秒） \'\'\' crawler.send_keys(\'ID\',\'kw\',\'python\') time.sleep(2) crawler.click_element(\'ID\',\'su\') time.sleep(2) print(crawler.get_page_source())def console(): # 创建SeleniumCrawler实例 crawler = SeleniumCrawler(headless=False,url = \'https://www.baidu.com\') # 执行JavaScript命令 command = \'return document.title;\' result = crawler.execute_console_command(command) print(f\"执行结果: {result}\") # 等待几秒，以便观察结果 time.sleep(12) # 关闭浏览器 crawler.close()

封装好selenium基本可以和scrapy合并在一起，实现自动化爬虫。缺点就是慢。。。

验证码，待更新。。。。

Selenium使用教程-爬虫版（超详细）_selenium 教程

1.环境搭建

2.禁止浏览器更新

3.基本操作

基本初始化浏览器（基本）

浏览器导航操作（基本）

元素定位方法（重要）

元素交互操作（重要）

下拉菜单操作

等待策略

鼠标和键盘操作

cookies操作（重要）

截图操作

执行JavaScript （重要）

4.封装Selenium类

第一步设置初始化

第二步，初始化浏览器类型

第三步，初始浏览器配置

第四步，封装获取网页源代码方法（重要）

第五步，封装获取cookies（重要）

第六步，设置浏览器打开，关闭或退出浏览器

第七步，设置定位输入和定位点击

第八步，操作控制台

公告

DeepSeek全套部署资料免费下载

免费可商用字体批量下载

标签

Selenium使用教程-爬虫版（超详细）_selenium 教程

1.环境搭建

2.禁止浏览器更新

3.基本操作

基本初始化浏览器（基本）

浏览器导航操作 （基本）

元素定位方法（重要）

元素交互操作（重要）

下拉菜单操作

等待策略

鼠标和键盘操作

cookies操作（重要）

截图操作

执行JavaScript （重要）

4.封装Selenium类

第一步设置初始化

第二步，初始化浏览器类型

第三步，初始浏览器配置

第四步，封装获取网页源代码方法（重要）

第五步，封装获取cookies（重要）

第六步，设置浏览器打开，关闭或退出浏览器

第七步，设置定位输入和定位点击

第八步，操作控制台

相关问题

公告

DeepSeek全套部署资料免费下载

免费可商用字体批量下载

标签

浏览器导航操作（基本）