Browser automation using docker and python

Few weeks ago I was working on the new report of customer’s game. The platform that’s providing campaigns reports don’t have public API to generate the campaign reports on request with any kind of developer key to access.

But it’s possible to request such reports by using their dashboard. I know it’s a bit odd to rely on UI for downloading such reports but it was the only one way to get access for customer’s valuable data.

Lets define requirements for this idea:

  • should be standalone python script for easy execution and integration with existing ETL libraries
  • should not require extra software on the server except the docker package(that’s pretty flexible)

Now we are ready to give a try and build something runnable. In this post going to use specific libraries to get access to the docker process because of specific version of installed package in CentOS(in my example).

My requirements.txt:

docker==2.1.0
splinter==0.7.7
timeout-decorator==0.3.3

Enter fullscreen mode Exit fullscreen mode

splinter is nice library to wrap browser drivers on automating anything on the pages.

Let’s define the class for running Google Chrome container, later we will use before to get access to the page via splinter library.

class _ChromeContainer:
    ''' _ChromeContainer should handle run of chrome docker container on background. Requires to have docker service on machine to pull images and run images. '''
    def __init__(self):
        self.__image_name = "selenium/standalone-chrome:3.10.0"
        self.__client = docker.from_env()

    def run(self):
        ''' Startup docker container with chromedriver, waiting for running state '''
        client = self.__client

        self.container = client.containers.run(self.__image_name,
                                               detach=True,
                                               ports={'4444/tcp': None})

        @timeout_decorator.timeout(120)
        def waiting_up(client: docker.client.DockerClient, container):
            while True:
                container.reload()
                if container.status == "running":
                    break
                time.sleep(1)

        waiting_up(client, self.container)

    def quit(self):
        ''' kills and deletes named container '''
        self.container.kill()

    @property
    def public_port(self):
        container = self.__chrome_container.container
        return container.attrs["NetworkSettings"]["Ports"]["4444/tcp"][0]["HostPort"]

Enter fullscreen mode Exit fullscreen mode

Now we are ready to use splinter and ahd _ChromeContainer to automate your task.

import timeout_decorator
import docker


from splinter import Browser


class Worker:
    def __init__(self):
        self.__chrome_container = _ChromeContainer()

    def process(self):
        self.__chrome_container.run()

        self.__web_client = Browser('remote',
                                    url="http://127.0.0.1:{}/wd/hub".format(self.__chrome_container.public_port),
                                    browser='chrome')

        # Example for login request:         try:
            self.__login()
        finally:
            self.__web_client.quit()
            self.__chrome_container.quit()

    def __login(self):
        self.__web_client.visit("http://www.example.com/login")
        self.__web_client.fill('developer_session[email]', 'EXAMPLE_USERNAME')
        self.__web_client.fill('developer_session[password]', 'EXAMPLE_PASSWORD')
        button = self.__web_client.find_by_id('developer_session_submit')
        button.click()

Enter fullscreen mode Exit fullscreen mode

It’s an example and it would possible to extend by the similar steps like __login in your Worker class.

Thank you for reading! 🙂

原文链接:Browser automation using docker and python

© 版权声明
THE END
喜欢就支持一下吧
点赞14 分享
评论 抢沙发

请登录后发表评论

    暂无评论内容