As earlier stated in my earlier related post, this is going to be a series of tutorials to analyze the COVID-19 data set.
In this third tutorial of the series, we will be plotting the COVID-19 results from the questions from the Jupyter Notebook of the research paper using Exploratory Data Analysis (EDA).
What is Data Visualization?
Data Visualization is the graphical representation of information and data. Data visualizations make big and small data easier for the human brain to understand. By using visual elements like charts, graphs, and maps, data visualization tools provide an accessible way to see and understand trends, outliers, and patterns in data.
Why Plotly?
Plotly is the ideal choice when performing Exploratory Data Analysis. The interactivity of the plots allows you to do a much more thorough investigation of your data with ease.
Plotly enables users to create beautiful interactive web-based visualizations that can be displayed in Jupyter notebooks, saved to standalone HTML files, or served as part of pure Python-built web applications using Dash.
While Seaborn and Matplotlib graphics are static, Plotly charts allow you to hover over values and zoom in/out your graphs, like identifying outliers among a large number of data points or detecting anomalies in time series plots.
In this tutorial, we are going to Plot our COVID-19 data on a map
Prerequisite
Before we get started, you should have installed the following packages/software:
Anaconda – this installation comes with almost everything we will need for this task.
We will be working with the Python programming language for the purpose of this tutorial. There are other tools available in other languages like JavaScript.
Before we get into any installations, create a virtual environment.
python3 -m venv plotly_visuals source plotly_visuals/bin/activate #or . plotly_visuals/bin/activate
Now let’s install the other tools that are not included in Anaconda. You can do this by running the following commands in the virtual environment created.
pip install beautifulsoup4 pip install plotly pip install request
Data Visualization Using Plotly
Preferably I would advise you go through the previous tutorials since this is going to be a continuation of the earlier two posts below;
- Tutorial 1 : Data Collection and Preprocessing
- Tutorial 2: Exploratory Data Analysis
Now that we have all our installations, we can get right into plotting our data.
Start your Jupyter Notebook and create a new Python3 Project.
Then go ahead and add these lines of code to import all libraries needed.
import pandas as pd import requests from bs4 import BeautifulSoup from datetime import timedelta,date import plotly.express as px
Data Collection
Tutorial 1 : Data Collection and Preprocessing
Here we are grouping our df by Country_Region, but while we are doing that, we also want the max value of all values. Why? The COVID-19 data values are cumulative and hence by getting the max value we are literally getting the most recent values.
current_figure_df = df.groupby('Country_Region').max()[["Province_State","Lat","Long", 'Confirmed', "Death", "Recovered"]].reset_index() current_figure_df
Web Scraping using BeautifulSoup
I have covered web scraping in a previous tutorial. Our original data set doesn’t have the ISO three-letter country code. These are codes for the representation of names of countries and their subdivisions.
I visited this website which had countries and their codes and scraped that information.
results = requests.get('https://www.iban.com/country-codes') soup = BeautifulSoup(results.content, 'lxml') table = soup.table code_df = pd.read_html(table.prettify())[0] code_df
There were some countries that were spelt differently and had to be renamed before joining with our original data set. There are still more to be renamed but just did a few.
code_df = code_df.rename(columns={'Country': 'Country_Region'}) code_df['Country_Region'] = code_df['Country_Region'].replace({"United States of America (the)": 'US', 'Russian Federation (the)': 'Russia', 'Tanzania, United Republic of': 'Tanzania', 'United Arab Emirates (the)':'United Arab Emirates', 'United Kingdom of Great Britain and Northern Ireland (the)': 'United Kingdom', 'Viet Nam':'Vietnam', 'Taiwan (Province of China)':'Taiwan*', 'Sudan (the)':'Sudan' , 'Central African Republic (the)': 'Central African Republic', "Côte d'Ivoire" : "Cote d'Ivoire", "Dominican Republic (the)" : "Dominican Republic", "Iran (Islamic Republic of)": "Iran", "Korea (the Republic of)": "Korea, South", "Netherlands (the)": "Netherlands", "Niger (the)": "Niger", 'Venezuela (Bolivarian Republic of)':'Venezuela'})
The code_df data set was then merged with our original data set, df
country_codes_df = pd.merge(df, code_df) country_codes_df
We then merged our code_df with our current_figure_df
most_recent_data = pd.merge(code_df,current_figure_df) most_recent_data
Map View Of Confirmed Cases Reported
- dataframe: most_recent_data
- location: Alpha-3 code (uses that to plot the data on the map)
- color: Confirmed (represents the bar that shows the changing color scheme)
- hover_name: Country_Region
fig = px.choropleth(most_recent_data, locations="Alpha-3 code", color="Confirmed", hover_name="Country_Region",color_continuous_scale="matter") fig.show()
Map View Of Death Cases Reported
- dataframe: most_recent_data
- location: Alpha-3 code (uses that to plot the data on the map)
- color: Death (represents the bar that shows the changing color scheme)
- hover_name: Country_Region
fig = px.choropleth(most_recent_data, locations="Alpha-3 code", color="Death", # death is a column of gapminder hover_name="Country_Region", # column to add to hover information color_continuous_scale="Reds") fig.show()
Map View Of Recovered Cases Reported
- dataframe: most_recent_data
- location: Alpha-3 code (uses that to plot the data on the map)
- color: Recovered (represents the bar that shows the changing color scheme)
- hover_name: Country_Region
fig = px.choropleth(most_recent_data, locations="Alpha-3 code", color="Recovered", # recovered is a column of gapminder hover_name="Country_Region", # column to add to hover information color_continuous_scale="Greens") fig.show()
Spread Of COVID-19 Outside Of China Over Time
First, let’s convert our date column to a string. Then we go ahead and select all rows where the Country_Region is NOT China. After getting that data set, we use that to plot our map.
- dataframe: outside_china
- location: Alpha-3 code (uses that to plot the data on the map)
- color: Confirmed (represents the bar that shows the changing color scheme)
- hover_name: Country_Region
country_codes_df['Date'] = country_codes_df['Date'].dt.strftime('%Y-%m-%d') outside_china = country_codes_df.loc[(country_codes_df['Country_Region'] != 'China')] fig = px.scatter_geo(outside_china, locations="Alpha-3 code", color="Confirmed", hover_name="Country_Region", size="Confirmed", color_continuous_scale="matter", animation_frame="Date", projection="natural earth") fig.show()
Spread Of COVID-19 In China Over Time
Here, we select all rows where the Country_Region is China. After getting that data set, we use that to plot our map. This is slightly different. We will be focusing on the Province in China and with that we will be using the Longitude and Latitude for that. All you will realize is that we didn’t use the full map; that’s because we specified a scope which was asia. This will truncate the map to that of Asia. The same would work if we choose scope africa.
- Dataframe: in_china
- scope: Asia (map section to the plot)
- size: Confirmed (the bigger the size of the plot, the bigger the number of confirmed cases)
- hover_name: Province_State
- lon: Long (Longitude)
- lat: Lat (Latitude)
in_china = country_codes_df.loc[(country_codes_df['Country_Region'] == 'China')] fig = px.scatter_geo(in_china, size="Confirmed", hover_name="Province_State", scope="asia", lon = "Long", lat = "Lat", color_continuous_scale="matter", animation_frame="Date", projection="natural earth") fig.show()
You can find Jupyter Notebook here in my GitHub repository.
1 Comment