Amongst the most significant gifts that science has conferred upon humanity is electricity. It has also assimilated into contemporary life, and it is impossible to imagine existence without it. We consume electricity for a variety of purposes every day. It is used to operate fans, light up rooms, and operate home equipment like air conditioners and electric ranges. People can find consolation in all of these. Electricity is used in factories to operate massive machinery. Electricity produces a wide range of commodities, including food, clothing, paper, and other necessities.
It has transformed contemporary modes of communication and transportation. Fast modes of transportation include electric trains and battery automobiles. Radio, television, and the movie theatre, the three most popular kinds of entertainment, are all made possible by electricity.
Because of electricity, modern devices like computers and robots have also been produced. In the areas of surgery and medicine, too, electricity is crucial for procedures like X-rays and ECGs. Every day, more and more people are using power. To maintain the nation's economic production, the electrical sector's expansion is crucial.
Electricity must be created since it cannot be found naturally. Coal, lignite, gas, diesel, nuclear, solar, wind, hydro, and many more resources are the primary means of generating power. The globe is evolving toward renewable energy sources, which offer dependable power supply and fuel variety, such as wind, solar, biomass, etc.
It is advised to scrape and evaluate this information because of the countless ways that power is used.
Here are the below steps for scraping and downloading the data:
It is a great idea to download and import the necessary programs just once since we want to scrape data in two distinct formats from two separate web pages.
Requests will be used to obtain the web pages for this project, beautifulsoup4 will be used to analyze the online content, and pandas will be used to generate data frames and CSV files.
Let's import these libraries and install them.
This section explains how to scrape the list of nations by energy output from the page in detail.
We'll give a variable the page URL.
Downloading the Webpage Using Various Requests
The process of scraping begins with the download of a page. Requests will be used. get function to get a page.
The results of the requests, the status codes, and other details are included in the response object. We may use response.text to view the web page's contents.
The Web page will consist of the following HTML source code.
Here, you will successfully download the web page using requests.
Utilize BeautifulSoup to parse the HTML source code.
The HTML code of the website that was retrieved in the previous step will be parsed using the BeautifulSoup class.
Once the page has been parsed, we can utilize the document to extract the data from it.
As seen above, we were able to extract the page's title and first picture.
The web page's HTML source code has been properly processed.
Extract the name of the nation, the year it was produced, and the URL
By looking at the HTML code that has been processed, we may extract the necessary data from the website. To extract table rows and to extract information about a nation, we will construct helper functions.
Any webpage's source code may be seen directly in your browser by performing a right-click anywhere on the page and choosing the "Inspect" option. It launches the "Developer Tools" pane, where the source code is shown as a tree, as in the example below.
Two tables may be accessed using the table element with the class "box-Update plainlinks metadata ambox ambox-content ambox-Update," according to an examination of the aforementioned HTML code.
Create a reusable utility function called get_tr_tags that can download a website, produce a lovely soup document, and extract the tr tags from the first table tag for the specified URL.
Now that we have downloaded and parsed a web page, we can use get_tr_tags to separate the "tr" tags from the initial table tag.
We can locate the data we're seeking when the 'tr' tags are enlarged. The first two rows must be disregarded since they lack the necessary data. The results of expanding the 'tr' tags are as follows:
To extract the above information from a tr tag, let's develop a helper function.
Utilizing pandas, compile the data and produce a CSV file.
To produce a CSV file and assemble all the necessary data into a Python dictionary, we will develop helper functions in this phase.
To organize the information into a Python dictionary, let's define the get_all_countries function. Once tr_tags are given as an input, this function outputs a dictionary of lists containing all the information about the nation.
Let's use the pd.DataFrame method to generate a data frame from the dictionary of lists above.
To create a CSV file, let's define the get_csv helper function.
We'll create a scrape countries helper function, which will combine all the previously mentioned procedures to create a CSV file.
Call the scrape countries function and examine the results.
In the expectation that the preceding portion has adequate inputs, this section will offer the whole code without any explanations, using the same step-by-step instructions to build a CSV file from the India URL retrieved in the above section.
To see the outcome, let's call the scrape_states method.
After executing the code, you will be able to generate another CSV file from the webpage
Get in touch with iWeb Scraping today for any web scraping services.
Request for a quote!