EDA and V-EDA Tutorial 3 - Covid-19 Data visualization using Plotly Maps

As earlier stated in my earlier related post, this is going to be a series of tutorials to analyze the COVID-19 data set.

In this third tutorial of the series, we will be plotting the COVID-19 results from the questions from the Jupyter Notebook of the research paper using Exploratory Data Analysis (EDA).

What is Data Visualization?

Data Visualization is the graphical representation of information and data. Data visualizations make big and small data easier for the human brain to understand. By using visual elements like charts, graphs, and maps, data visualization tools provide an accessible way to see and understand trends, outliers, and patterns in data.

Why Plotly?

Plotly is the ideal choice when performing Exploratory Data Analysis. The interactivity of the plots allows you to do a much more thorough investigation of your data with ease.

Plotly enables users to create beautiful interactive web-based visualizations that can be displayed in Jupyter notebooks, saved to standalone HTML files, or served as part of pure Python-built web applications using Dash.

While Seaborn and Matplotlib graphics are static, Plotly charts allow you to hover over values and zoom in/out your graphs, like identifying outliers among a large number of data points or detecting anomalies in time series plots.

In this tutorial, we are going to Plot our COVID-19 data on a map

Prerequisite

Before we get started, you should have installed the following packages/software:

Anaconda – this installation comes with almost everything we will need for this task.

We will be working with the Python programming language for the purpose of this tutorial. There are other tools available in other languages like JavaScript.

Before we get into any installations, create a virtual environment.

python3 -m venv plotly_visuals
source plotly_visuals/bin/activate #or
. plotly_visuals/bin/activate

Now let’s install the other tools that are not included in Anaconda. You can do this by running the following commands in the virtual environment created.

pip install beautifulsoup4
pip install plotly
pip install request

Data Visualization Using Plotly

Preferably I would advise you go through the previous tutorials since this is going to be a continuation of the earlier two posts below;

Tutorial 1 : Data Collection and Preprocessing
Tutorial 2: Exploratory Data Analysis

Now that we have all our installations, we can get right into plotting our data.

Start your Jupyter Notebook and create a new Python3 Project.

Then go ahead and add these lines of code to import all libraries needed.

import pandas as pd
import requests
from bs4 import BeautifulSoup
from datetime import timedelta,date
import plotly.express as px

Data Collection

Tutorial 1 : Data Collection and Preprocessing

Here we are grouping our df by Country_Region, but while we are doing that, we also want the max value of all values. Why? The COVID-19 data values are cumulative and hence by getting the max value we are literally getting the most recent values.

current_figure_df = df.groupby('Country_Region').max()[["Province_State","Lat","Long", 'Confirmed', "Death", "Recovered"]].reset_index()
current_figure_df

Web Scraping using BeautifulSoup

I have covered web scraping in a previous tutorial. Our original data set doesn’t have the ISO three-letter country code. These are codes for the representation of names of countries and their subdivisions.

I visited this website which had countries and their codes and scraped that information.

results = requests.get('https://www.iban.com/country-codes')
soup = BeautifulSoup(results.content, 'lxml')
table = soup.table

code_df = pd.read_html(table.prettify())[0]
code_df

There were some countries that were spelt differently and had to be renamed before joining with our original data set. There are still more to be renamed but just did a few.

code_df = code_df.rename(columns={'Country': 'Country_Region'})
code_df['Country_Region'] = code_df['Country_Region'].replace({"United States of America (the)": 'US', 
'Russian Federation (the)': 'Russia', 
'Tanzania, United Republic of': 'Tanzania', 
'United Arab Emirates (the)':'United Arab Emirates', 'United Kingdom of Great Britain and Northern Ireland (the)': 'United Kingdom', 'Viet Nam':'Vietnam', 
'Taiwan (Province of China)':'Taiwan*', 
'Sudan (the)':'Sudan' ,
'Central African Republic (the)': 'Central African Republic',
"Côte d'Ivoire" : "Cote d'Ivoire",
"Dominican Republic (the)" : "Dominican Republic",
"Iran (Islamic Republic of)": "Iran",
"Korea (the Republic of)": "Korea, South",
"Netherlands (the)": "Netherlands",
"Niger (the)": "Niger",
'Venezuela (Bolivarian Republic of)':'Venezuela'})

The code_df data set was then merged with our original data set, df

country_codes_df = pd.merge(df, code_df)
country_codes_df

We then merged our code_df with our current_figure_df

most_recent_data = pd.merge(code_df,current_figure_df)

most_recent_data

Map View Of Confirmed Cases Reported

dataframe: most_recent_data
location: Alpha-3 code (uses that to plot the data on the map)
color: Confirmed (represents the bar that shows the changing color scheme)
hover_name: Country_Region

fig = px.choropleth(most_recent_data, locations="Alpha-3 code", color="Confirmed", hover_name="Country_Region",color_continuous_scale="matter")
fig.show()

Map View Of Death Cases Reported

dataframe: most_recent_data
location: Alpha-3 code (uses that to plot the data on the map)
color: Death (represents the bar that shows the changing color scheme)
hover_name: Country_Region

fig = px.choropleth(most_recent_data, locations="Alpha-3 code",
                    color="Death", # death is a column of gapminder
                    hover_name="Country_Region", # column to add to hover information
                    color_continuous_scale="Reds")
fig.show()

Map View Of Recovered Cases Reported

dataframe: most_recent_data
location: Alpha-3 code (uses that to plot the data on the map)
color: Recovered (represents the bar that shows the changing color scheme)
hover_name: Country_Region

fig = px.choropleth(most_recent_data, locations="Alpha-3 code",
                    color="Recovered", # recovered is a column of gapminder
                    hover_name="Country_Region", # column to add to hover information
                    color_continuous_scale="Greens")
fig.show()

Spread Of COVID-19 Outside Of China Over Time

First, let’s convert our date column to a string. Then we go ahead and select all rows where the Country_Region is NOT China. After getting that data set, we use that to plot our map.

dataframe: outside_china
location: Alpha-3 code (uses that to plot the data on the map)
color: Confirmed (represents the bar that shows the changing color scheme)
hover_name: Country_Region

country_codes_df['Date'] = country_codes_df['Date'].dt.strftime('%Y-%m-%d')

outside_china = country_codes_df.loc[(country_codes_df['Country_Region'] != 'China')]

fig = px.scatter_geo(outside_china, locations="Alpha-3 code", color="Confirmed", hover_name="Country_Region",
                     size="Confirmed",
                     color_continuous_scale="matter",
                     animation_frame="Date",
                     projection="natural earth")
fig.show()

Spread Of COVID-19 In China Over Time

Here, we select all rows where the Country_Region is China. After getting that data set, we use that to plot our map. This is slightly different. We will be focusing on the Province in China and with that we will be using the Longitude and Latitude for that. All you will realize is that we didn’t use the full map; that’s because we specified a scope which was asia. This will truncate the map to that of Asia. The same would work if we choose scope africa.

Dataframe: in_china
scope: Asia (map section to the plot)
size: Confirmed (the bigger the size of the plot, the bigger the number of confirmed cases)
hover_name: Province_State
lon: Long (Longitude)
lat: Lat (Latitude)

in_china = country_codes_df.loc[(country_codes_df['Country_Region'] == 'China')]


fig = px.scatter_geo(in_china, 
                     size="Confirmed", hover_name="Province_State",
                     scope="asia",
                     lon = "Long",
                     lat = "Lat",
                     color_continuous_scale="matter",
                     animation_frame="Date",
                     projection="natural earth")
fig.show()

You can find Jupyter Notebook here in my GitHub repository.

Spread the love

EDA and V-EDA Tutorial 3 – Covid-19 Data visualization using Plotly Maps

What is Data Visualization?

Why Plotly?

Prerequisite

Data Visualization Using Plotly

Data Collection

Web Scraping using BeautifulSoup

Map View Of Confirmed Cases Reported

Map View Of Death Cases Reported

Map View Of Recovered Cases Reported

Spread Of COVID-19 Outside Of China Over Time

Spread Of COVID-19 In China Over Time

1 Comment

Leave a Reply Cancel reply

EDA and V-EDA Tutorial 3 – Covid-19 Data visualization using Plotly Maps

What is Data Visualization?

Why Plotly?

Prerequisite

Data Visualization Using Plotly

Data Collection

Web Scraping using BeautifulSoup

Map View Of Confirmed Cases Reported

Map View Of Death Cases Reported

Map View Of Recovered Cases Reported

Spread Of COVID-19 Outside Of China Over Time

Spread Of COVID-19 In China Over Time

You may also like

EDA and V-EDA Tutorial 4 – COVID-19 Comparative Analysis and Visualization

EDA and V-EDA Tutorial 2: Exploratory Data Analysis using Pandas

The Journey: On Admission, Seasoned Supervisor and Funding

1 Comment

Leave a Reply Cancel reply