Python and Selenium Web Automation SEO Tool Bot AI for 2018 | Tutorial 3

We have our SEO tool bot class for our Python Selenium Bot.  We defined a few functions so far.  I’ve found that this methodology of expanding functions works well for my work flow.  This way we can get the bot to do small actions or chunks of code at a time.  For debugging purposes this is a nice technique too. As it allows the programmer to track down the faulty code in that particular function block.  Once we learn to pass the local variables to the whole class expansion becomes trivial.

Lets get this bot to earn it’s electrics. We are going  to scrape some data.  Since most people tend to scrape the wikis.  Lets scrape Wikipedia (At this time. I think this ok under their TOS?).  Let’s make the bot do it.   As before lets import Selenium Web Driver and define our current class structure. This is what we should have so far.

<pre lang=”python” line=”1″>

 

</pre>

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
 
print("*" * 60)
print("MI PYTHON COM SEO TOOL BOT")
print("*" * 60)
 
from selenium import webdriver
from selenium.webdriver.common.keys import Keys
import time
 
class SEO_BOT(object):
 
    def __init__(self,browser,anon_url):
        self.browser = browser
        self.anon_url = anon_url
        self.current_url = None
 
    def main(self):
        pass
 
    def go_anon(self):
        ###   SELENIUM IMPLICIT WAIT  !!! IMPORTANT !!!
        self.browser.implicitly_wait(300)
        print("GETTING " + str(self.anon_url))
        ###  GO TO ANON URL
        self.browser.get(self.anon_url)
 
    def get_location(self):
        self.current_ip = self.browser.current_url
        print("*" * 60)
        print("CURRENT LOCATION:")
        print(str(self.current_ip))
        print("*" * 60)
 
bot = SEO_BOT(webdriver.Firefox(),"https://www.kproxy.com")
bot.get_location()
bot.go_anon()
bot.get_location()

So far this Python script invokes the Selenium Web Driver. Opens up an anon redirector, Kproxy. Then Gets the current Url of the Selenium Web Driver. As before we need to define another local variable to pass through the class.

1
2
3
4
5
6
7
8
9
 
class SEO_BOT(object):
 
    def __init__(self,browser,anon_url,scrape_page):
        self.browser = browser
        self.anon_url = anon_url
        self.current_url = None
        self.scrape_page = scrape_page
        self.scrape_html = None

Above we assigned two new class variables. The “scrape page” variable is the page we are going to scrape. The next variable has a value of None to store the HTML that is scrapped. This will be defined upon instantiation of the class.
Now lets define a scrape function to scrape a page so we can put this bot to work. For now to keep things simple we are just scrapping the entire HTML source. We will parse it later.

1
2
3
4
5
6
7
8
def scrape(self):
        self.browser.implicitly_wait(300)
        print("FINDING ELEMENTS ON " + self.scrape_page + "TO SCRAPE")
        #self.scrape_html = self.browser.page_source.encode('utf-8')
        self.scrape_html = self.browser.page_source.encode('ascii', 'ignore')
        print("SCRAPPED" + str(self.scrape_page))
        print(str(self.scrape_html))
        return str(self.scrape_html)

Some times the data doesn’t like to encode properly so one option is “asci ignore” with the encode method.
Since we already have a nifty anon redirect function in our Python Selenium SEO Tool Bot. Lets forward our python scrapping bot through that service. This keeps a few complications down when bot auto testing goes awry. We need to add a few things to our Selenium Anon Redirect function. First we need to find the element locations for our web driver get method. As before we get the element from chrome devtools / firebug and inspect it. Below are the element id’s for the kproxy fields. We are using the Selenium Web Driver .find_element_by_id method to find the element Then submit our scrape page URL to Kproxy for redirection.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
 
def go_anon(self):
        ###   SELENIUM IMPLICIT WAIT  !!! IMPORTANT !!!
        self.browser.implicitly_wait(300)
        print("GETTING " + str(self.anon_url))
        ###  GO TO ANON URL
        self.browser.get(self.anon_url)
        ###  FIND ELEMENTS AND POST SCRAPE PAGE URL
        elem = self.browser.find_element_by_id("maintextfield").clear()
        print("CLEARED maintextfield")
        elem = self.browser.find_element_by_id("maintextfield").send_keys(self.scrape_page)
        print("POSTING" + str(self.scrape_page) + "TO" + str(self.anon_url))
        elem = self.browser.find_element_by_id("maintextfield").submit()
        print("REDIRECTING TO" + str(self.scrape_page))

Now we should have went to Kproxy then redirected to the Wiki. Scrape the Wiki and print the raw HTML to terminal.
Final code for this section should look like this.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
 
print("*" * 60)
print("MI PYTHON COM SEO TOOL BOT")
print("*" * 60)
 
from selenium import webdriver
from selenium.webdriver.common.keys import Keys
import time
 
class SEO_BOT(object):
 
    def __init__(self,browser,anon_url,scrape_page):
        self.browser = browser
        self.anon_url = anon_url
        self.scrape_page = scrape_page
        self.scrape_html = None
        self.current_url = None
 
 
    def main(self):
        pass
 
    def go_anon(self):
        ###   SELENIUM IMPLICIT WAIT  !!! IMPORTANT !!!
        self.browser.implicitly_wait(300)
        print("GETTING " + str(self.anon_url))
        ###  GO TO ANON URL
        self.browser.get(self.anon_url)
        ###  FIND ELEMENTS AND POST SCRAPE PAGE URL
        elem = self.browser.find_element_by_id("maintextfield").clear()
        print("CLEARED maintextfield")
        elem = self.browser.find_element_by_id("maintextfield").send_keys(self.scrape_page)
        print("POSTING" + str(self.scrape_page) + "TO" + str(self.anon_url))
        elem = self.browser.find_element_by_id("maintextfield").submit()
        print("REDIRECTING TO" + str(self.scrape_page))
 
    def get_location(self):
        self.current_ip = self.browser.current_url
        print("*" * 60)
        print("CURRENT LOCATION:")
        print(str(self.current_ip))
        print("*" * 60)
 
    def scrape(self):
        self.browser.implicitly_wait(300)
        print("FINDING ELEMENTS ON " + self.scrape_page + "TO SCRAPE")
        #self.scrape_html = self.browser.page_source.encode('utf-8')
        self.scrape_html = self.browser.page_source.encode('ascii', 'ignore')
        print("SCRAPPING" + str(self.scrape_page))
        print(str(self.scrape_html))
        print(str(self.scrape_page) + "HAS BEEN SCRAPPED")
        return str(self.scrape_html)
 
bot = SEO_BOT(webdriver.Firefox(),"https://www.kproxy.com","http://www.wikipedia.com")
bot.get_location()
bot.go_anon()
bot.get_location()
bot.scrape()

Next time we actually do something with our data from our Python Selenium SEO Tool Bot.

Related posts