Create a simple web scraping script in python

Create a simple web scraping script in python

What is the project about

This project is a simple example of how you can use web scraping in a funny way to learn and practice coding. This simple code is a way to get all the products related to research on the Amazon webpage, for example, you search "lamp" on Amazon this simple script automates this process getting all the products inside of the page, getting the title and the price.

What you need to know for this project

To understand and modify this project you should know the basics of Python and HTTP request. If you want to get an introduction to HTTP you can check out my article about it: "HTTP protocol made simple"

The project

So let's take a look at what this script should do:

from bs4 import BeautifulSoup
import requests

#function that get all the product inside of an amazon page writing them in a csv file
def scrape_page(url):
    cache = []
    count = 0
    #to get your user agent you can write in your browser "what is my user agent" and then past here
    headers = {'user-agent':'Your User Agent'}

    response = requests.get(url,headers=headers)

    soup = BeautifulSoup(response.content,"html.parser")

    card = soup.find_all("div",class_="sg-col-inner")

    print("===========================")
    print("Page data:")

    print(f"Response Status Code: {response.status_code}")

    for element in card:
        title = element.find("span",class_="a-size-base-plus a-color-base a-text-normal") 
        price = element.find("span", class_="a-price-whole")

        if title != None and price != None and (title,price) not in cache :        
            with open("data.csv","a") as file:
                line = f"{title.text};{price.text}\n"
                file.write(line)

            cache.append((title,price))
            count+=1

    print(f"Element Stored: {count}")
    print("===========================")


def main():

    product = input("What product are you looking for?")

    page_number = input("How many pages do you want to scrape?").strip()

    while not page_number.isdigit():
        print("Please enter numbers for page number")
        page_number = input("How many pages do you want to scrape?").strip()



    page_number= int(page_number)

    #with this script is also possible to scrape more than 1 page
    for page in range(1,page_number+1):
        url = f"https://www.amazon.it/s?k={product}&page={page}"
        scrape_page(url)


if __name__ == "__main__":
    main()

Conclusion

I hope that seeing this simple script you can try to build out your version make it better or improve the performance. Thanks for reading the article.

Follow and support me:

Special thanks if you subscribe to my channel :)

Did you find this article valuable?

Support Paolo Ferrari by becoming a sponsor. Any amount is appreciated!