How to do web scraping using python?
In this tutorial, I am going to do web scraping using python.Before jumping to web scraping let know what is the use of it.Web scraping is extracted the data from the website using script.Why should we do this? because consider a situation If you want to get Flipkart or Amazon all product list then manually typing all product name is excel sheet is impossible to do this automatically we write a script to download all product name store in csv, excel or JSON format.
Now let's see the necessary package to write this script.In Python, we use BeautifulSoup,urllib2, pandas.
To do this example I take moneycontrol website (www.moneycontrol.com).This is a famous website for all funds, the stock market and shares etc. In this website, I going to scrap RELIANCE TOP 200 FUND - RETAIL PLAN (G) data.Let's jump into the code!!
I have to give website Link below to scarp the data
Link:http://www.moneycontrol.com/mutual-funds/reliance-top-200-fund-retail-plan/portfolio-holdings/MRC155
Now let's see the necessary package to write this script.In Python, we use BeautifulSoup,urllib2, pandas.
To do this example I take moneycontrol website (www.moneycontrol.com).This is a famous website for all funds, the stock market and shares etc. In this website, I going to scrap RELIANCE TOP 200 FUND - RETAIL PLAN (G) data.Let's jump into the code!!
I have to give website Link below to scarp the data
Link:http://www.moneycontrol.com/mutual-funds/reliance-top-200-fund-retail-plan/portfolio-holdings/MRC155
Python Code
# import the basic package such as pandas,urlib and beautifulsoup import pandas as pd
import urllib.request as urllib2
from bs4 import BeautifulSoup
# Analyze the website, we going to scrap the table called Portfolio Holding.It consists Equity,sector,qty,value.So make list where we append all row value to it. Equity=[] Sector=[] Qty=[] Value=[] Percentage=[] website="http://www.moneycontrol.com/mutual-funds/reliance-top-200-fund-retail-plan/portfolio-holdings/MRC155"
#urlib is use fetech the whole data from the website page = urllib2.urlopen(website) soup = BeautifulSoup(page) all_tables=soup.find_all('table')
# table has the class name called tblporhd.You can find this by inspect element the website and fetching all the table datas and appending to corresponding list to it right_table=soup.find('table', class_='tblporhd') for row in right_table.findAll("tr"): cells = row.findAll('td') states=row.findAll('th') #To store second column data if len(cells)==5: #Only extract table body not heading Equity.append(cells[0].find(text=True)) Sector.append(cells[1].find(text=True)) Qty.append(cells[2].find(text=True)) Value.append(cells[3].find(text=True)) Percentage.append(cells[4].find(text=True))
#Now the pandas part take over we merge all the list to make table frame using pandas so it will used to make all the analysis df=pd.DataFrame(Equity,columns=['Equity']) df['Sector']=Sector df['Qty']=Qty df['Value']=Value df['Percentage']=Percentage print(df)
#Now converting Dataframe to CSV df.to_csv('funds.csv')
I hope you understand the simple web scraping da ta.I upload the original and updated source code in Github for further reference.
The link is given below:https://github.com/12345k/Web-scraping-Financial-Data-Set
I will constantly upload more web scraping in this account.Kindly have a touch in it.
Happy coding!!!!
The link is given below:https://github.com/12345k/Web-scraping-Financial-Data-Set
I will constantly upload more web scraping in this account.Kindly have a touch in it.
Happy coding!!!!
Comments
Post a Comment