The World Bank publishes lots of data. They make data available for download as csv, xml, or xls files. Additionally, they make the data available through API calls, which is the civilized route. Tariq Khokhar wrote an excellent survey of libraries for accessing World Bank data in Python, R, Ruby, and Stata. We'll use one of the libraries discussed, pandas_datareader. This will be a quick look at per capita GDP trends for a few countries using pandas_datareader. In another blog post, we'll look at the same exercise using the csv files to show how much harder it is.

World Bank Data

Scanning the list of indicators, we see the link to "GDP per capita, PPP...", and the text for the link gives the indicator code we are looking for, "NY.GDP.PCAP.PP.CD". Additionally we can note from the chart that the populated data starts in 1990. Following the link, we get the option to download the series as csv, xml, or xls, and can download a file (now named API_NY.GDP.PCAP.PP.CD_DS2_en_csv_v2_9908727.zip, but it could be something else another day) to use in a later blog.

Installing pandas_datareader

A pandas_datareader can be installed using conda:

$ conda install -c anaconda pandas-datareader

Using pandas_datareader

The following can be downloaded as a notebook. The chart created in the notebook can be seen below.

world bank data with pandas_datareader¶

Downloaded ppp data csv. Workaround neded for version incompatability in datareader. See discussion</a href>.

In [33]:

# imports
import os
import pandas as pd
pd.core.common.is_list_like = pd.api.types.is_list_like # workaround
from pandas_datareader import wb as WB

# CONST
indCode = "NY.GDP.PCAP.PP.CD" # code for  per capita ppp gdp

Downloading with pandas_datareader ¶

Arguments are

indicator= Code for series to grab
country= "all" or list of 2 byte country codes. Defailt is to "CA MX US".split(). We should prefer to filter in Python rather than in the pull.
start= Desired start year. Better to filter in Python so set early.
stop= Desired stop year. Better to filter in Python so set late.

In [50]:

# grab all data for 
pppDF = WB.download(indicator=indCode, country="all", start=1980, end=2020)
pppDF.head()

Out[50]:

		NY.GDP.PCAP.PP.CD
country	year
Arab World	2017	NaN
	2016	16726.722185
	2015	16302.363760
	2014	15934.202070
	2013	15548.200905

Reshaping the data¶

Removing rows with null entries.
Reseting indexes because you never want indexes.
Giving a normal whitespace free name to the series.
Sorting just to make table display read better.
Making numeric copy of the string year.

In [51]:

pppDF = pppDF.dropna().reset_index()
pppDF.columns = "country year GDPpc".split()
pppDF.sort_values("year country".split(), inplace=True)
pppDF["yearN"] = pppDF.year.apply(pd.to_numeric)
pppDF.head()

Out[51]:

	country	year	GDPpc	yearN
1263	Albania	1990	2722.280344	1990
1290	Algeria	1990	6616.408352	1990
1317	Angola	1990	2840.200763	1990
1344	Antigua and Barbuda	1990	10587.593409	1990
26	Arab World	1990	6759.785391	1990

Filtering for top African economies¶

The spelling of Egypt is a little unfortunate. Here we need exact matches.

In [52]:

# top Africa economies
aL = "Nigeria|South Africa|Egypt, Arab Rep.|Morocco|Ethiopia".split("|")
africaDF = pppDF[pppDF.country.isin(aL)]
africaDF.head()

Out[52]:

	country	year	GDPpc	yearN
2571	Egypt, Arab Rep.	1990	3819.286370	1990
2694	Ethiopia	1990	421.378824	1990
4261	Morocco	1990	2528.458556	1990
4514	Nigeria	1990	1965.827996	1990
5256	South Africa	1990	6267.091465	1990

plotting¶

In [54]:

# the next line is needed to diaplay the plot in the notebook
%matplotlib inline  
import matplotlib.pyplot as plt
fig, ax = plt.subplots() 
africaDF.groupby("country").plot(x="yearN", y="GDPpc", ax=ax)
ax.legend("Egypt|Ethiopia|Morocco|Nigeria|South Africa".split("|"))
fig.savefig("africaGDPpc.png")

Downloading with pandas_datareader ¶

zero budget science

Tuesday, May 22, 2018

World Bank Data with pandas_datareader

World Bank Data

Installing pandas_datareader

Using pandas_datareader

world bank data with pandas_datareader¶

Reshaping the data¶

Filtering for top African economies¶

plotting¶

No comments:

Post a Comment

Followers

Blog Archive

About Me

zero budget science

Tuesday, May 22, 2018

World Bank Data with pandas_datareader

World Bank Data

Installing pandas_datareader

Using pandas_datareader

world bank data with pandas_datareader¶

Downloading with pandas_datareader¶

Reshaping the data¶

Filtering for top African economies¶

plotting¶

No comments:

Post a Comment

Followers

Blog Archive

About Me

Downloading with pandas_datareader ¶