UPDATE 2017-08-31 Use chromedriver version 2.25. Tried using 2.9 with chrome 60 and could not navigate to URLS. It throws the exception below. Works fine with 2.25.
WebDriverException: Message: unknown error: Runtime.executionContextCreated has invalid 'context': {"auxData":{"frameId":"18604.1","isDefault":true},"id":1,"name":"","origin":"://"}
(Session info: chrome=60.0.3112.113)
(Driver info: chromedriver=2.9.248304,platform=Linux 4.4.0-93-generic x86_64)
Setting up Selenium to work in on a headless server requires the browser to be headless as well. This involves setting up Google Chrome, Chromedriver, and Selenium.
All instructions that follow are for Ubuntu.
sudo apt-get install unzip
wget http://chromedriver.storage.googleapis.com/2.25/chromedriver_linux64.zip
unzip chromedriver_linux64.zip
rm chromedriver_linux.zip
sudo mv chromedriver /usr/local/bin/
Chromedriver 2.9 is current as of this writing. Check the Chromedriver downloads page for the latest stable release.
That last command moves chromedriver
to your local bin
folder to put in on the $PATH, so that it’s accessible to the logged-in user.
I don’t know if this is the best way to install Chrome, but it worked for me.
Get the debian package:
$ wget https://dl.google.com/linux/direct/google-chrome-stable_current_amd64.deb
If you try installing it, most likely you’ll get an error for missing dependencies:
$ sudo dpkg -i google-chrome-stable_current_amd64.deb
Errors were encountered while processing:
google-chrome-stable
So force install the dependencies:
$ sudo apt-get install -f
$ pip install beautifulsoup4 pyvirtualdisplay selenium
BeautifulSoup isn’t critical at this stage, but I usually end up using it to parse page sources.
from pyvirtualdisplay import Display
from selenium import webdriver
display = Display(visible=0, size=(800, 600))
display.start()
driver = webdriver.Chrome()
# automate_awesome_stuff
# when you're done, stop the display
display.stop()