python爬虫


1309 浏览 4 years, 4 months

5.2 自动登录readthedocs.org

版权声明: 转载请注明出处 http://www.codingsoho.com/

环境设置

前一节已经提高selenium3需要设置webdriver位置,前面讲了如何设置Firefox,这一节会介绍chrome的用法

from selenium import webdriver
browser = webdriver.Chrome()

如果不设置,会有下面的报错

selenium.common.exceptions.WebDriverException: Message: 'chromedriver' executable needs to be in PATH. Please see https://sites.google.com/a/chromium.org/chromedriver/home

在初始化chrome时添加path参数

driver = webdriver.Chrome(r'e:\Computer\virtualenv\webscrapping\chrome\chromedriver.exe')

执行时,上面问题解决了,但是有了个新的问题

selenium.common.exceptions.SessionNotCreatedException: Message: session not created exception: Chrome version must be >= 58.0.3029.0

查了一下,是版本的问题,chrome的版本太低了,进行了chrome升级
升级前 57.0.2987.133
升级后 69.0.3497.100

问题解决!

登陆

下面是一个例子,发送用户名

import os
import time
from selenium import webdriver

from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC

    username = WebDriverWait(browser, 10).until(
        EC.presence_of_element_located((By.XPATH, "//input[@id='id_login']")))

报错:

selenium.common.exceptions.WebDriverException: Message: unknown error: call function result missing 'value'

查了一下,是chrome版本和chromedriver不匹配,重新下载chromedriver可解决

chromedriver下载位置

完整代码如下:

def login(browser,loginurl):
    browser.get(loginurl)  
    time.sleep(1)
    username = WebDriverWait(browser, 10).until(
        EC.presence_of_element_located((By.XPATH, "//input[@id='id_login']")))
    username.clear()
    username.send_keys("hebinn@163.com")   
    print("name sent")
    time.sleep(1)    
    browser.execute_script("document.getElementById('id_password').setAttribute('class', 'form-control')") #purpose?
    password = WebDriverWait(browser, 50).until(
        EC.presence_of_element_located((By.XPATH, "//input[@id='id_password']")))
    password.clear()
    password.send_keys("")    
    browser.execute_script("document.getElementById('id_password').disabled=false") #purpose?
    print("password sent")
    sign = WebDriverWait(browser, 50).until(
        EC.presence_of_element_located((By.XPATH, "//button[@class='primaryAction']")))
    sign.send_keys(u"Log in")
    sign.click()
    print("sign sent")
    time.sleep(1)

代码能正常运行,但是结束后报错

selenium.common.exceptions.StaleElementReferenceException: Message: stale element reference: element is not attached to the page document,

看了一下,是发送登陆键时的代码错了,修改为如下key值必须与登陆按钮上面的值一致

sign.send_keys(u"登录")

参数设置

from selenium.webdriver.chrome.options import Options

def chrome(headless=False, proxy=False): 
    #driver_path = "/usr/local/chromedriver"
    driver_path = "E:\Computer\virtualenv\pyspider\src\chromedriver.exe"
    chrome_options = Options()
    chrome_options.add_argument('--no-sandbox')
    chrome_options.add_argument('--disable-gpu')
    if headless is True:
        chrome_options.add_argument('--headless')
    if proxy is True:
        chrome_options.add_argument('--proxy-server=[135.245.48.34:8000](135.245.48.34:8000)')

    return chrome_options, driver_path

页面访问

静默等待,而不是像前面一样等到某个元素出现

def open_page(self):
    self.driver.get(self.url)
    self.driver.implicitly_wait(10)