SIZ Tutorial | How To Scrape A Website Using Python| by @ramizaly | #club5050

in Steem Infinity Zone3 years ago

Hey Steemians!
How are ya'll doing? So, Today I am here to share with you how you can scrape a website using python in a few steps. First, lets have a quick overview what web scraping actually is and what is it used for?

0d8caa52-d4e6-498f-a9d8-cc6fdb781893.jpg

What is web scraping?


Suppose that you want some particular set of information from a website. Now copy pasting all that data can take you hours or even days. That's where web scraping comes in.
Web scraping basically lets you extract data from a website by writing a few blocks of code. I personally prefer python for web scraping because it's syntax makes the process much easier.

Is Web scraping legal?


Yes and No. You need to understand the difference between copying the data and stealing it. As long as you are not using it for analysis and public consumption it's perfectly legal. Just make sure you are not using to steal any confidential data for profiteering purposes.

Now that you know what webscraping is, I am going to show you how in just a few steps you can scrape any website.

siz.png

Scraping a website using website using python

Step1:


First of all, select a website and the data you want to scrape from it. As I mentioned above, make sure you are not scraping confidential data. Here, I am going to scrape steam's website. It is basically a platform to buy, play, create and discuss PC games. I am scraping it to fetch a list of all the new and trending games and their discounted prices.

be324e7c-d1e3-4329-b03e-6e1b7a44e64d.jpg
img src

Step2:


Now open your python IDE and the first thing that you have to do is to import two libraries.

d997014e-41f6-498b-8015-fb3df47e55a6.jpg

The Request Library is basically a standardized way of making an HTTP request from python. It's simple API makes up for the complexities of making a request and the user can focus on fetching the data.

The Beautiful soup is a python based library and is used for web scraping and pulling data out of html and XML files. It basically forms a parse tree from the page source code which extracts the data and makes it more readible.

P.s If anyone does not have these libraries already installed , just go to your windows power shell and type pip install (library's name)

Step3:


Copy the link of the website that you want to scrape and save it in a variable.

b2a8bc5d-7ab1-4e21-8f59-5c135a85ec8a.jpg

Step4:


Write the following block of code to:
-Open the connection with the web page
-Read all the data of the web page and store it into a variable
-Close The Connection
-Parse the HTML file

fbf00c35-a119-4349-aa49-4ff9a99f6ea2.jpg

Step5:


After successfully parsing the HTMl file, you need to play a little with the HTML code of the web page. For this go to the web page, click the right mouse button and click on inspect.

65ee49ba-7ad0-4f0d-8132-b5e70a029efd.jpg
img src

Step6:


Now since I wanted to fetch the name and prices of all the games from the new&trending section, I hovered my cursor over the first container i.e FIFA 22 to get the html code related to it.

45649ba5-5e93-4cf8-b844-222672269a21.jpg
img src

Analyzing the above image, it is cleared that the first container is embedded by the anchor tag with the class name tab_item which also means that all the other containers will also have the same class name. Hence for scraping all the containers, we will write the following code.

0bd6e912-9906-43d1-a2a0-3badd4c4d64d.jpg

Step7:


Similarly we will fetch the tag and class name of the Title of the game and it's discounted price as shown below.

60897b11-49d7-4ba7-9a4c-d6a42e4f4da1.jpg
img src

cc66973b-bd64-4a38-a6c7-9c9d888e7d3a.jpg
img src

From the above pictures we know that the class name for all the Titles would be tab_item_name and the class name for all the discounted prices would be discount_final_price and that both the attributes are embedded in div tags

Step8:


We will then create a for loop to access the titles and discounted prices of all the containers using the class names and tags and save it in a variable. We will then print the variables.

3102609d-21e4-443a-95a8-adb9cd8b8ece.jpg

Finally, Run this program and voilà, You will see a list of the game titles and their discounted prices in the output just like this

200ceafb-faf5-49ac-9ec9-36af1a2eff88.jpg

siz.png

That's it for today, if you like my post do upvote it.


Achievement 1
Achievement 2
#club5050

Divider 2.png

Steem Infinity Zone Team
@cryptokraze | @arie.steem | @qasimwaqar | @vvarishayy | @suboohi

Footer.png

Click Here to Join Official SIZ Discord Channel

Discord
Twitter
Facebook

Divider 2.png

Sort:  

Cool, will try for sure 👍

 3 years ago 

Please read these guidelines.

You cannot post Tutorials.

You can apply for course and after approval you can post such tutorials in form of a course.
https://steemit.com/hive-181430/@siz-official/siz-community-guidelines-on-daily-content-creation-categories

Coin Marketplace

STEEM 0.17
TRX 0.16
JST 0.029
BTC 62151.48
ETH 2421.34
USDT 1.00
SBD 2.57