Este es un análisis exploratorio de datos con fines didácticos. El análisis es realizado con datos relacionados a COVID-19 reportados por diversos países. Los datos son obtenidos del sitio: https://ourworldindata.org/coronavirus-source-data. El conjunto de datos del sitio es actualizado constantemente. El resultado del análisis cambia de acuerdo al día en que se ejecuta el código.
Se cargan los datos directamente del sitio para obtener la información actualizada al día de ejecución del código.
covid.df <- read.csv("https://covid.ourworldindata.org/data/owid-covid-data.csv")
Se cargan los paquetes necesarios:
library(ggplot2)
library(knitr)
library(reshape2)
library(kableExtra)
#library(dplyr)
library(tidyverse)
library(lubridate)
library(gmodels)
plotscaption <- "oscarcastrolopez.github.io"
str(covid.df)
## 'data.frame': 39482 obs. of 40 variables:
## $ iso_code : Factor w/ 212 levels "","ABW","AFG",..: 2 2 2 2 2 2 2 2 2 2 ...
## $ continent : Factor w/ 7 levels "","Africa","Asia",..: 5 5 5 5 5 5 5 5 5 5 ...
## $ location : Factor w/ 212 levels "Afghanistan",..: 10 10 10 10 10 10 10 10 10 10 ...
## $ date : Factor w/ 239 levels "2019-12-31","2020-01-01",..: 74 80 81 82 83 84 85 86 87 88 ...
## $ total_cases : num 2 NA 4 NA NA NA 12 17 19 28 ...
## $ new_cases : num 2 NA 2 NA NA NA 8 5 2 9 ...
## $ new_cases_smoothed : num NA 0.286 0.286 0.286 0.286 ...
## $ total_deaths : num 0 NA 0 NA NA NA 0 0 0 0 ...
## $ new_deaths : num 0 NA 0 NA NA NA 0 0 0 0 ...
## $ new_deaths_smoothed : num NA 0 0 0 0 0 0 0 0 0 ...
## $ total_cases_per_million : num 18.7 NA 37.5 NA NA ...
## $ new_cases_per_million : num 18.7 NA 18.7 NA NA ...
## $ new_cases_smoothed_per_million : num NA 2.68 2.68 2.68 2.68 ...
## $ total_deaths_per_million : num 0 NA 0 NA NA NA 0 0 0 0 ...
## $ new_deaths_per_million : num 0 NA 0 NA NA NA 0 0 0 0 ...
## $ new_deaths_smoothed_per_million: num NA 0 0 0 0 0 0 0 0 0 ...
## $ new_tests : num NA NA NA NA NA NA NA NA NA NA ...
## $ total_tests : num NA NA NA NA NA NA NA NA NA NA ...
## $ total_tests_per_thousand : num NA NA NA NA NA NA NA NA NA NA ...
## $ new_tests_per_thousand : num NA NA NA NA NA NA NA NA NA NA ...
## $ new_tests_smoothed : num NA NA NA NA NA NA NA NA NA NA ...
## $ new_tests_smoothed_per_thousand: num NA NA NA NA NA NA NA NA NA NA ...
## $ tests_per_case : num NA NA NA NA NA NA NA NA NA NA ...
## $ positive_rate : num NA NA NA NA NA NA NA NA NA NA ...
## $ tests_units : Factor w/ 7 levels "","people tested",..: 1 1 1 1 1 1 1 1 1 1 ...
## $ stringency_index : num 0 33.3 33.3 44.4 44.4 ...
## $ population : num 106766 106766 106766 106766 106766 ...
## $ population_density : num 585 585 585 585 585 ...
## $ median_age : num 41.2 41.2 41.2 41.2 41.2 41.2 41.2 41.2 41.2 41.2 ...
## $ aged_65_older : num 13.1 13.1 13.1 13.1 13.1 ...
## $ aged_70_older : num 7.45 7.45 7.45 7.45 7.45 ...
## $ gdp_per_capita : num 35974 35974 35974 35974 35974 ...
## $ extreme_poverty : num NA NA NA NA NA NA NA NA NA NA ...
## $ cardiovasc_death_rate : num NA NA NA NA NA NA NA NA NA NA ...
## $ diabetes_prevalence : num 11.6 11.6 11.6 11.6 11.6 ...
## $ female_smokers : num NA NA NA NA NA NA NA NA NA NA ...
## $ male_smokers : num NA NA NA NA NA NA NA NA NA NA ...
## $ handwashing_facilities : num NA NA NA NA NA NA NA NA NA NA ...
## $ hospital_beds_per_thousand : num NA NA NA NA NA NA NA NA NA NA ...
## $ life_expectancy : num 76.3 76.3 76.3 76.3 76.3 ...
La descripción de cada una de las variables de este conjunto de datos se encuentra en el sitio: https://github.com/owid/covid-19-data/blob/master/public/data/owid-covid-codebook.csv.
Ajuste de la columna date que es tipo factor a tipo date:
covid.df$date <- as.Date(covid.df$date, format("%Y-%m-%d"))
levels(covid.df$location)
## [1] "Afghanistan" "Albania"
## [3] "Algeria" "Andorra"
## [5] "Angola" "Anguilla"
## [7] "Antigua and Barbuda" "Argentina"
## [9] "Armenia" "Aruba"
## [11] "Australia" "Austria"
## [13] "Azerbaijan" "Bahamas"
## [15] "Bahrain" "Bangladesh"
## [17] "Barbados" "Belarus"
## [19] "Belgium" "Belize"
## [21] "Benin" "Bermuda"
## [23] "Bhutan" "Bolivia"
## [25] "Bonaire Sint Eustatius and Saba" "Bosnia and Herzegovina"
## [27] "Botswana" "Brazil"
## [29] "British Virgin Islands" "Brunei"
## [31] "Bulgaria" "Burkina Faso"
## [33] "Burundi" "Cambodia"
## [35] "Cameroon" "Canada"
## [37] "Cape Verde" "Cayman Islands"
## [39] "Central African Republic" "Chad"
## [41] "Chile" "China"
## [43] "Colombia" "Comoros"
## [45] "Congo" "Costa Rica"
## [47] "Cote d'Ivoire" "Croatia"
## [49] "Cuba" "Curacao"
## [51] "Cyprus" "Czech Republic"
## [53] "Democratic Republic of Congo" "Denmark"
## [55] "Djibouti" "Dominica"
## [57] "Dominican Republic" "Ecuador"
## [59] "Egypt" "El Salvador"
## [61] "Equatorial Guinea" "Eritrea"
## [63] "Estonia" "Ethiopia"
## [65] "Faeroe Islands" "Falkland Islands"
## [67] "Fiji" "Finland"
## [69] "France" "French Polynesia"
## [71] "Gabon" "Gambia"
## [73] "Georgia" "Germany"
## [75] "Ghana" "Gibraltar"
## [77] "Greece" "Greenland"
## [79] "Grenada" "Guam"
## [81] "Guatemala" "Guernsey"
## [83] "Guinea" "Guinea-Bissau"
## [85] "Guyana" "Haiti"
## [87] "Honduras" "Hong Kong"
## [89] "Hungary" "Iceland"
## [91] "India" "Indonesia"
## [93] "International" "Iran"
## [95] "Iraq" "Ireland"
## [97] "Isle of Man" "Israel"
## [99] "Italy" "Jamaica"
## [101] "Japan" "Jersey"
## [103] "Jordan" "Kazakhstan"
## [105] "Kenya" "Kosovo"
## [107] "Kuwait" "Kyrgyzstan"
## [109] "Laos" "Latvia"
## [111] "Lebanon" "Lesotho"
## [113] "Liberia" "Libya"
## [115] "Liechtenstein" "Lithuania"
## [117] "Luxembourg" "Macedonia"
## [119] "Madagascar" "Malawi"
## [121] "Malaysia" "Maldives"
## [123] "Mali" "Malta"
## [125] "Mauritania" "Mauritius"
## [127] "Mexico" "Moldova"
## [129] "Monaco" "Mongolia"
## [131] "Montenegro" "Montserrat"
## [133] "Morocco" "Mozambique"
## [135] "Myanmar" "Namibia"
## [137] "Nepal" "Netherlands"
## [139] "New Caledonia" "New Zealand"
## [141] "Nicaragua" "Niger"
## [143] "Nigeria" "Northern Mariana Islands"
## [145] "Norway" "Oman"
## [147] "Pakistan" "Palestine"
## [149] "Panama" "Papua New Guinea"
## [151] "Paraguay" "Peru"
## [153] "Philippines" "Poland"
## [155] "Portugal" "Puerto Rico"
## [157] "Qatar" "Romania"
## [159] "Russia" "Rwanda"
## [161] "Saint Kitts and Nevis" "Saint Lucia"
## [163] "Saint Vincent and the Grenadines" "San Marino"
## [165] "Sao Tome and Principe" "Saudi Arabia"
## [167] "Senegal" "Serbia"
## [169] "Seychelles" "Sierra Leone"
## [171] "Singapore" "Sint Maarten (Dutch part)"
## [173] "Slovakia" "Slovenia"
## [175] "Somalia" "South Africa"
## [177] "South Korea" "South Sudan"
## [179] "Spain" "Sri Lanka"
## [181] "Sudan" "Suriname"
## [183] "Swaziland" "Sweden"
## [185] "Switzerland" "Syria"
## [187] "Taiwan" "Tajikistan"
## [189] "Tanzania" "Thailand"
## [191] "Timor" "Togo"
## [193] "Trinidad and Tobago" "Tunisia"
## [195] "Turkey" "Turks and Caicos Islands"
## [197] "Uganda" "Ukraine"
## [199] "United Arab Emirates" "United Kingdom"
## [201] "United States" "United States Virgin Islands"
## [203] "Uruguay" "Uzbekistan"
## [205] "Vatican" "Venezuela"
## [207] "Vietnam" "Western Sahara"
## [209] "World" "Yemen"
## [211] "Zambia" "Zimbabwe"
levels(covid.df$continent)
## [1] "" "Africa" "Asia" "Europe"
## [5] "North America" "Oceania" "South America"
maxdays <- max(table(covid.df$location))
print(paste("Días máximos reportados:",maxdays))
## [1] "Días máximos reportados: 239"
names(table(covid.df$location)[table(covid.df$location)==maxdays])
## [1] "Afghanistan" "Algeria" "Armenia"
## [4] "Australia" "Austria" "Azerbaijan"
## [7] "Bahrain" "Belarus" "Belgium"
## [10] "Brazil" "Cambodia" "Canada"
## [13] "China" "Croatia" "Czech Republic"
## [16] "Denmark" "Dominican Republic" "Ecuador"
## [19] "Egypt" "Estonia" "Finland"
## [22] "France" "Georgia" "Germany"
## [25] "Greece" "Iceland" "India"
## [28] "Indonesia" "International" "Iran"
## [31] "Iraq" "Ireland" "Israel"
## [34] "Italy" "Japan" "Kuwait"
## [37] "Lebanon" "Lithuania" "Luxembourg"
## [40] "Macedonia" "Malaysia" "Mexico"
## [43] "Monaco" "Nepal" "Netherlands"
## [46] "New Zealand" "Nigeria" "Norway"
## [49] "Oman" "Pakistan" "Philippines"
## [52] "Qatar" "Romania" "Russia"
## [55] "San Marino" "Singapore" "South Korea"
## [58] "Sri Lanka" "Sweden" "Switzerland"
## [61] "Taiwan" "Thailand" "United Arab Emirates"
## [64] "United Kingdom" "United States" "Vietnam"
## [67] "World"
startdate <- min(covid.df$date)
enddate <- max(covid.df$date)
todayformatted <- format(Sys.Date(), "%A, %d de %B de %Y")
todayformatted <- paste(toupper(substr(todayformatted, 1, 1)),
substr(todayformatted, 2, nchar(todayformatted)),
sep="")
print(paste("Inicia el", format(startdate, "%d de %B de %Y"),
"y termina el", format(enddate, "%d de %B de %Y")))
## [1] "Inicia el 31 de diciembre de 2019 y termina el 25 de agosto de 2020"
Del dataframe original, se obtienen sólo las observaciones en donde la columna date sea igual a la fecha máxima. De esta manera se obtienen los totales o acumulados actualizados de cada país. Adicionalmente, se eliminan los países con menos de 1 millón de habitantes. Algunos países con poca población tienen estadísticas por miilón de habitantes muy altas.
Con las siguientes variables se crean los rankings:
# Get one row of each country with the updated totals
covid.total.df <- covid.df[covid.df$date== enddate,]
# Filter countries with less than 1 millón population
covid.total.df <- covid.total.df[covid.total.df$population >= 1000000,]
covid.total.df$location <- droplevels.factor(covid.total.df$location)
print(names(covid.total.df)[c(5,7,9,11)])
## [1] "total_cases" "new_cases_smoothed"
## [3] "new_deaths" "total_cases_per_million"
Al descartar los países que tienen al menos 1 millón de habitantes, el conjunto de datos ahora tiene: 156 Países.
Para cada una de las variables con totales, se obtienen los 20 países con el valor más alto y los 20 países con el valor más bajo. En caso de que México no se encuentre en los primeros 20 países, se agrega en la lista indicando la posición que le toca tomando en cuenta todos los países. Para cada variable a analizar se hace lo siguiente en R:
# Data is ordered according to the total_cases column
ranking.total_cases <- covid.total.df[order(-covid.total.df$total_cases),]
# The row corresponding to World is removed
ranking.total_cases <- ranking.total_cases[ranking.total_cases$location != "World", ]
# A new column indicating the positionin the rank is added
ranking.total_cases$position <- 1:nrow(ranking.total_cases)
# Only columns of interest are kept
columnfilter <- c("position", "location", "total_cases")
bottom20.total_cases <- tail(ranking.total_cases[, columnfilter],20)
top20.total_cases <- head(ranking.total_cases[, columnfilter],20)
rm(ranking.total_cases)
rownames(top20.total_cases) <- c()
rownames(bottom20.total_cases) <- c()
mexrow <- which(top20.total_cases$location=='Mexico')
top20.total_cases$total_cases_formated <- formatC(top20.total_cases$total_cases,
format="f", big.mark=",", digits=0)
bottom20.total_cases$total_cases_formated <- formatC(bottom20.total_cases$total_cases,
format="f", big.mark=",", digits=0)
tablecolnames <- c("Posición", "País", "Casos totales")
kable(top20.total_cases[,c(1,2,4)], format = "html", align = "clr",
col.names=tablecolnames, caption="Países con más casos totales") %>%
kable_styling(bootstrap_options = c("striped", "hover", "condensed", "responsive"),
full_width = F, position = "float_left") %>%
row_spec(mexrow, bold = T, color = "black", background = "yellow")
kable(bottom20.total_cases[,c(1,2,4)], format = "html", align = "clr",
col.names=tablecolnames, caption="Países con menos casos totales") %>%
kable_styling(bootstrap_options = c("striped", "hover", "condensed", "responsive"),
full_width = F, position = "right")
Posición | País | Casos totales |
---|---|---|
1 | United States | 5,740,909 |
2 | Brazil | 3,622,861 |
3 | India | 3,167,323 |
4 | Russia | 961,493 |
5 | South Africa | 611,450 |
6 | Peru | 600,438 |
7 | Mexico | 563,705 |
8 | Colombia | 551,696 |
9 | Chile | 399,568 |
10 | Iran | 358,905 |
11 | Argentina | 350,854 |
12 | United Kingdom | 326,614 |
13 | Saudi Arabia | 308,654 |
14 | Bangladesh | 297,083 |
15 | Pakistan | 293,711 |
16 | Italy | 260,298 |
17 | Turkey | 259,692 |
18 | France | 244,854 |
19 | Germany | 234,853 |
20 | Iraq | 207,985 |
Posición | País | Casos totales |
---|---|---|
137 | Latvia | 1,337 |
138 | Togo | 1,295 |
139 | Liberia | 1,290 |
140 | Niger | 1,172 |
141 | Trinidad and Tobago | 1,099 |
142 | Vietnam | 1,022 |
143 | Lesotho | 1,015 |
144 | Chad | 987 |
145 | Tanzania | 509 |
146 | Taiwan | 487 |
147 | Myanmar | 474 |
148 | Burundi | 430 |
149 | Papua New Guinea | 401 |
150 | Mauritius | 346 |
151 | Eritrea | 306 |
152 | Mongolia | 298 |
153 | Cambodia | 273 |
154 | Timor | 26 |
155 | Laos | 22 |
156 | NA | NA |
ggplot(data=top20.total_cases, aes(x=reorder(paste(position, location),total_cases),
y=total_cases, fill=location))+
geom_bar(stat = "identity", position=position_dodge(), colour="black", show.legend = FALSE)+
ylab("Casos totales de COVID-19") +
geom_text(aes(y=max(total_cases)+170000,
label=total_cases_formated,
fontface="bold"),
color="black")+
labs(title="Top 20 de los países con más casos de COVID-19",
subtitle = todayformatted,
caption = plotscaption)+
scale_y_continuous(breaks=c(100000, 500000, 750000, 1000000, 2000000, 3000000, 4000000, 5000000),
label=c("100k", "500k", "750k", "1m", "2m", "3m", "4m", "5m"))+
coord_flip() +
xlab("Países") +
theme_bw()+
theme(title = element_text(size=14, face="bold", colour = "black"),
axis.text.y = element_text(size=11, face="bold", colour = "black"),
axis.text.x = element_text(size=11, face="bold", colour = "black"))
Se obtiene el ranking sólo incluyendo países del continente americano. Se obtienen los datos en donde la columna continent sea igual a “North America” o “South America”.
ranking.total_cases.america <- covid.total.df[covid.total.df$continent %in%
c("North America", "South America") ,]
ranking.total_cases.america <- ranking.total_cases.america[order(-ranking.total_cases.america$total_cases),]
ranking.total_cases.america$position <- 1:nrow(ranking.total_cases.america)
bottom20.total_cases.america <- tail(ranking.total_cases.america[, c("position", "location", "total_cases")],5)
top20.total_cases.america <- head(ranking.total_cases.america[, c("position", "location", "total_cases")],20)
rownames(bottom20.total_cases.america) <- c()
rownames(top20.total_cases.america) <- c()
mexrow <- which(ranking.total_cases.america$location=='Mexico')
rm(ranking.total_cases.america)
top20.total_cases.america$total_cases_formated <- formatC(top20.total_cases.america$total_cases,
format="f", big.mark=",", digits=0)
bottom20.total_cases.america$total_cases_formated <- formatC(bottom20.total_cases.america$total_cases,
format="f", big.mark=",", digits=0)
tablecolnames <- c("Posición", "País", "Casos totales")
kable(top20.total_cases.america[,c(1,2,4)], format = "html", align = "clr",
col.names=tablecolnames, caption="Países del continente Americano con más casos totales") %>%
kable_styling(bootstrap_options = c("striped", "hover", "condensed", "responsive"),
full_width = F, position = "center") %>%
row_spec(mexrow, bold = T, color = "black", background = "yellow")
Posición | País | Casos totales |
---|---|---|
1 | United States | 5,740,909 |
2 | Brazil | 3,622,861 |
3 | Peru | 600,438 |
4 | Mexico | 563,705 |
5 | Colombia | 551,696 |
6 | Chile | 399,568 |
7 | Argentina | 350,854 |
8 | Canada | 125,647 |
9 | Bolivia | 110,148 |
10 | Ecuador | 108,289 |
11 | Dominican Republic | 91,608 |
12 | Panama | 87,485 |
13 | Guatemala | 68,533 |
14 | Honduras | 55,479 |
15 | Venezuela | 40,338 |
16 | Costa Rica | 34,463 |
17 | Puerto Rico | 30,618 |
18 | El Salvador | 24,811 |
19 | Paraguay | 13,602 |
20 | Haiti | 8,110 |
ggplot(data=top20.total_cases.america, aes(x=reorder(paste(position, location),total_cases),
y=total_cases, fill=location))+
geom_bar(stat = "identity", position=position_dodge(), colour="black", show.legend = FALSE)+
ylab("Casos de COVID-19") +
geom_text(aes(y=max(total_cases)+170000,
label=format(total_cases, big.mark=","),
fontface="bold"),
color="black")+
labs(title="Top 20 de los países con más casos de COVID-19 de América",
subtitle = todayformatted,
caption = plotscaption)+
scale_y_continuous(breaks=c(100000, 500000, 750000, 1000000, 2000000, 3000000, 4000000, 5000000),
label=c("100k", "500k", "750k", "1m", "2m", "3m", "4m", "5m"))+
coord_flip() +
xlab("Países") +
theme_bw()+
theme(title = element_text(size=14, face="bold", colour = "black"),
axis.text.y = element_text(size=11, face="bold", colour = "black"),
axis.text.x = element_text(size=11, face="bold", colour = "black"))
ranking.total_cases_per_million <- covid.total.df[order(-covid.total.df$total_cases_per_million),]
ranking.total_cases_per_million$position <- 1:nrow(ranking.total_cases_per_million)
columnfilter <- c("position", "location", "total_cases_per_million")
mexico.total_cases_per_million <- ranking.total_cases_per_million[
ranking.total_cases_per_million$location == "Mexico", ]
bottom20.total_cases_per_million <- tail(ranking.total_cases_per_million[, columnfilter],20)
top20.total_cases_per_million <- head(ranking.total_cases_per_million[, columnfilter],20)
mexico.total_cases_per_million <- mexico.total_cases_per_million[, columnfilter]
rm(ranking.total_cases_per_million)
rownames(top20.total_cases_per_million) <- c()
rownames(bottom20.total_cases_per_million) <- c()
rownames(mexico.total_cases_per_million) <- c()
top20.total_cases_per_million <- rbind(top20.total_cases_per_million, mexico.total_cases_per_million)
mexrow <- which(top20.total_cases_per_million$location=='Mexico')
top20.total_cases_per_million$total_cases_per_million_formated <- formatC(
top20.total_cases_per_million$total_cases_per_million, format="f", big.mark=",", digits=2)
bottom20.total_cases_per_million$total_cases_per_million_formated<- formatC(
bottom20.total_cases_per_million$total_cases_per_million, format="f", big.mark=",", digits=2)
tablecolnames <- c("Posición", "País", "Casos por mdh")
kable(top20.total_cases_per_million[,c(1,2,4)], format = "html", align = "clr",
col.names=tablecolnames, caption="Países con más casos por millón de habitantes (mdh)") %>%
kable_styling(bootstrap_options = c("striped", "hover", "condensed", "responsive"),
full_width = F, position = "float_left") %>%
row_spec(mexrow, bold = T, color = "black", background = "yellow")
kable(bottom20.total_cases_per_million[,c(1,2,4)], format = "html", align = "clr",
col.names=tablecolnames, caption="Países con menos casos por millón de habitantes (mdh)") %>%
kable_styling(bootstrap_options = c("striped", "hover", "condensed", "responsive"),
full_width = F, position = "right")
Posición | País | Casos por mdh |
---|---|---|
1 | Qatar | 40,702.38 |
2 | Bahrain | 28,990.65 |
3 | Chile | 20,902.05 |
4 | Panama | 20,275.71 |
5 | Kuwait | 18,957.69 |
6 | Peru | 18,210.63 |
7 | United States | 17,344.00 |
8 | Brazil | 17,043.99 |
9 | Oman | 16,548.90 |
10 | Armenia | 14,452.11 |
11 | Israel | 12,138.24 |
12 | Colombia | 10,842.47 |
13 | Puerto Rico | 10,702.45 |
14 | South Africa | 10,309.62 |
15 | Singapore | 9,641.14 |
16 | Bolivia | 9,436.11 |
17 | Saudi Arabia | 8,865.83 |
18 | Sweden | 8,586.86 |
19 | Dominican Republic | 8,444.76 |
20 | Moldova | 8,385.80 |
41 | Mexico | 4,372.09 |
NA | NA | NA |
Posición | País | Casos por mdh |
---|---|---|
138 | Mongolia | 90.90 |
139 | Eritrea | 86.28 |
140 | Angola | 67.61 |
141 | Yemen | 64.24 |
142 | Burkina Faso | 64.01 |
143 | China | 62.33 |
144 | Chad | 60.09 |
145 | Uganda | 51.64 |
146 | Thailand | 48.74 |
147 | Niger | 48.42 |
148 | Papua New Guinea | 44.82 |
149 | Burundi | 36.16 |
150 | Taiwan | 20.45 |
151 | Timor | 19.72 |
152 | Cambodia | 16.33 |
153 | Vietnam | 10.50 |
154 | Myanmar | 8.71 |
155 | Tanzania | 8.52 |
156 | Laos | 3.02 |
157 | NA | NA |
ggplot(data=top20.total_cases_per_million,
aes(x=reorder(paste(position, location),total_cases_per_million),
y=total_cases_per_million, fill=location))+
geom_bar(stat = "identity", position=position_dodge(), colour="black", show.legend = FALSE)+
ylab("Casos de COVID-19 por millón de habitantes") +
geom_text(aes(y=max(total_cases_per_million)+1250,
label=total_cases_per_million_formated,
fontface="bold"),
color="black")+
labs(title="Top 20 de los países con más casos reportados de COVID-19 por millón de habitantes (+ México)",
subtitle = todayformatted,
caption = plotscaption)+
coord_flip() +
xlab("Países") +
theme_bw()+
theme(title = element_text(size=14, face="bold", colour = "black"),
axis.text.y = element_text(size=11, face="bold", colour = "black"),
axis.text.x = element_text(size=11, face="bold", colour = "black"))
## Warning: Removed 1 rows containing missing values (geom_bar).
## Warning: Removed 22 rows containing missing values (geom_text).
ranking.total_deaths <- covid.total.df[order(-covid.total.df$total_deaths),]
ranking.total_deaths <- ranking.total_deaths[ranking.total_deaths$location != "World", ]
ranking.total_deaths$position <- 1:nrow(ranking.total_deaths)
columnfilter <- c("position", "location", "total_deaths")
bottom20.total_deaths <- tail(ranking.total_deaths[, columnfilter],20)
top20.total_deaths <- head(ranking.total_deaths[, columnfilter],20)
mexrow <- which(ranking.total_deaths$location=='Mexico')
rm(ranking.total_deaths)
rownames(top20.total_deaths) <- c()
rownames(bottom20.total_deaths) <- c()
top20.total_deaths$total_deaths_formated <- formatC(top20.total_deaths$total_deaths,
format="f", big.mark=",", digits=0)
bottom20.total_deaths$total_deaths_formated <- formatC(bottom20.total_deaths$total_deaths,
format="f", big.mark=",", digits=0)
tablecolnames <- c("Posición", "País", "Muertes totales")
kable(top20.total_deaths[,c(1,2,4)], format = "html", align = "clr",
col.names=tablecolnames, caption="Países con más muertes por COVID-19") %>%
kable_styling(bootstrap_options = c("striped", "hover", "condensed", "responsive"),
full_width = F, position = "float_left") %>%
row_spec(mexrow, bold = T, color = "black", background = "yellow")
kable(bottom20.total_deaths[,c(1,2,4)], format = "html", align = "clr",
col.names=tablecolnames, caption="Países con menos muertes por COVID-19") %>%
kable_styling(bootstrap_options = c("striped", "hover", "condensed", "responsive"),
full_width = F, position = "right")
Posición | País | Muertes totales |
---|---|---|
1 | United States | 177,279 |
2 | Brazil | 115,309 |
3 | Mexico | 60,800 |
4 | India | 58,390 |
5 | United Kingdom | 41,433 |
6 | Italy | 35,441 |
7 | France | 30,528 |
8 | Peru | 27,813 |
9 | Iran | 20,643 |
10 | Colombia | 17,612 |
11 | Russia | 16,448 |
12 | South Africa | 13,159 |
13 | Chile | 10,916 |
14 | Belgium | 9,996 |
15 | Germany | 9,277 |
16 | Canada | 9,083 |
17 | Argentina | 7,366 |
18 | Indonesia | 6,759 |
19 | Iraq | 6,519 |
20 | Ecuador | 6,322 |
Posición | País | Muertes totales |
---|---|---|
137 | Mozambique | 21 |
138 | Tanzania | 21 |
139 | Georgia | 18 |
140 | Jamaica | 16 |
141 | Trinidad and Tobago | 15 |
142 | Jordan | 14 |
143 | Rwanda | 14 |
144 | Sri Lanka | 12 |
145 | Mauritius | 10 |
146 | Taiwan | 7 |
147 | Myanmar | 6 |
148 | Papua New Guinea | 4 |
149 | Botswana | 3 |
150 | Burundi | 1 |
151 | Eritrea | 0 |
152 | Cambodia | 0 |
153 | Laos | 0 |
154 | Mongolia | 0 |
155 | Timor | 0 |
156 | NA | NA |
ggplot(data=top20.total_deaths,
aes(x=reorder(paste(position, location),total_deaths),
y=total_deaths, fill=location))+
geom_bar(stat = "identity", position=position_dodge(), colour="black", show.legend = FALSE)+
ylab("Muertes por COVID-19") +
geom_text(aes(y=max(total_deaths)+7500,
label=total_deaths_formated,
fontface="bold"),
color="black")+
labs(title="Top 20 de los países con más muertes por COVID-19",
subtitle = todayformatted,
caption = plotscaption)+
scale_y_continuous(breaks=c(10000, 25000, 50000, 75000, 100000, 150000, 170000),
label=c("10k", "25k", "50k", "75k", "100k", "150k", "170k"))+
coord_flip() +
xlab("Países") +
theme_bw()+
theme(title = element_text(size=14, face="bold", colour = "black"),
axis.text.y = element_text(size=11, face="bold", colour = "black"),
axis.text.x = element_text(size=11, face="bold", colour = "black"))
ranking.total_deaths_per_million <- covid.total.df[order(-covid.total.df$total_deaths_per_million),]
ranking.total_deaths_per_million$position <- 1:nrow(ranking.total_deaths_per_million)
columnfilter <- c("position", "location", "total_deaths_per_million")
bottom20.total_deaths_per_million <- tail(ranking.total_deaths_per_million[, columnfilter],20)
top20.total_deaths_per_million <- head(ranking.total_deaths_per_million[, columnfilter],20)
mexrow <- which(ranking.total_deaths_per_million$location=='Mexico')
rm(ranking.total_deaths_per_million)
rownames(top20.total_deaths_per_million) <- c()
rownames(bottom20.total_deaths_per_million) <- c()
top20.total_deaths_per_million$total_deaths_per_million_formated <- formatC(
top20.total_deaths_per_million$total_deaths_per_million, format="f", big.mark=",", digits=2)
bottom20.total_deaths_per_million$total_deaths_per_million_formated <- formatC(
bottom20.total_deaths_per_million$total_deaths_per_million, format="f", big.mark=",", digits=2)
tablecolnames <- c("Posición", "País", "Muertes por mdh")
kable(top20.total_deaths_per_million[,c(1,2,4)], format = "html", align = "clr",
col.names=tablecolnames,
caption="Países con más muertes por COVID-19 por millón de habitantes (mdh)") %>%
kable_styling(bootstrap_options = c("striped", "hover", "condensed", "responsive"),
full_width = F, position = "float_left") %>%
row_spec(mexrow, bold = T, color = "black", background = "yellow")
kable(bottom20.total_deaths_per_million[,c(1,2,4)], format = "html", align = "clr",
col.names=tablecolnames,
caption="Países con menos muertes por COVID-19 por millón de habitantes (mdh)") %>%
kable_styling(bootstrap_options = c("striped", "hover", "condensed", "responsive"),
full_width = F, position = "right")
Posición | País | Muertes por mdh |
---|---|---|
1 | Belgium | 862.50 |
2 | Peru | 843.54 |
3 | United Kingdom | 610.33 |
4 | Italy | 586.17 |
5 | Sweden | 575.59 |
6 | Chile | 571.03 |
7 | Brazil | 542.48 |
8 | United States | 535.58 |
9 | Mexico | 471.56 |
10 | France | 467.69 |
11 | Panama | 441.74 |
12 | Bolivia | 392.19 |
13 | Netherlands | 361.43 |
14 | Ireland | 359.88 |
15 | Ecuador | 358.33 |
16 | Colombia | 346.13 |
17 | Armenia | 288.20 |
18 | Macedonia | 270.71 |
19 | Iran | 245.77 |
20 | Kosovo | 241.62 |
Posición | País | Muertes por mdh |
---|---|---|
138 | Burkina Faso | 2.63 |
139 | Jordan | 1.37 |
140 | Botswana | 1.28 |
141 | Rwanda | 1.08 |
142 | Thailand | 0.83 |
143 | Mozambique | 0.67 |
144 | Sri Lanka | 0.56 |
145 | Uganda | 0.48 |
146 | Papua New Guinea | 0.45 |
147 | Tanzania | 0.35 |
148 | Taiwan | 0.29 |
149 | Vietnam | 0.28 |
150 | Myanmar | 0.11 |
151 | Burundi | 0.08 |
152 | Eritrea | 0.00 |
153 | Cambodia | 0.00 |
154 | Laos | 0.00 |
155 | Mongolia | 0.00 |
156 | Timor | 0.00 |
157 | NA | NA |
ggplot(data=top20.total_deaths_per_million,
aes(x=reorder(paste(position, location),total_deaths_per_million),
y=total_deaths_per_million, fill=location))+
geom_bar(stat = "identity", position=position_dodge(), colour="black", show.legend = FALSE)+
ylab("Muertes por COVID-19 por millón de habitantes") +
geom_text(aes(y=max(total_deaths_per_million)+25,
label=total_deaths_per_million_formated,
fontface="bold"),
color="black")+
labs(title="Top 20 de los países con más muertes por COVID-19 por millón de habitantes",
subtitle = todayformatted,
caption = plotscaption)+
coord_flip() +
xlab("Países") +
theme_bw()+
theme(title = element_text(size=14, face="bold", colour = "black"),
axis.text.y = element_text(size=11, face="bold", colour = "black"),
axis.text.x = element_text(size=11, face="bold", colour = "black"))
Las siguientes funciones son para facilitar la creación de las gráficas de líneas.
##Line plot of new_cases with date breaks by month
plot.trend.new_cases.month <- function(startdate, enddate, countries, graphtitle, graphsubtitle){
tmp.df <- covid.df[covid.df$location %in% countries,]
tmp.df <- tmp.df[tmp.df$date >= startdate,]
tmp.df <- tmp.df[tmp.df$date <= enddate,]
ggplot(data=tmp.df , aes(x=date, y=new_cases, group=location, colour=location)) +
geom_line(size=1) +
scale_color_discrete(name = "Países")+
scale_x_date(date_breaks = "month", date_labels = "%B")+
ggtitle(graphtitle,
subtitle = graphsubtitle)+
labs(caption = plotscaption)+
ylab("Nuevos casos de COVID-19")+
xlab("Fecha")+
theme_bw()+
theme(title = element_text(size=14, face="bold", colour = "black"),
axis.text.y = element_text(size=11, face="bold", colour = "black"),
axis.text.x = element_text(size=11, face="bold", colour = "black"),
legend.position = "bottom",
legend.title = element_text(size = 14),
legend.text = element_text(size = 13),
legend.key.width = unit(1.5,"cm"))+
guides(colour = guide_legend(override.aes = list(size=2)))
}
##Line plot of new_cases with date breaks by short month and days
plot.trend.new_cases.monthday <- function(startdate, enddate, countries, graphtitle, graphsubtitle){
tmp.df <- covid.df[covid.df$location %in% countries,]
tmp.df <- tmp.df[tmp.df$date >= startdate,]
tmp.df <- tmp.df[tmp.df$date <= enddate,]
ggplot(data=tmp.df , aes(x=date, y=new_cases, group=location, colour=location)) +
geom_line(size=1) +
scale_color_discrete(name = "Países")+
scale_x_date(date_breaks = "day", date_labels = "%b %d")+
ggtitle(graphtitle,
subtitle = graphsubtitle)+
labs(caption = plotscaption)+
ylab("Nuevos casos de COVID-19")+
xlab("Fecha")+
theme_bw()+
theme(title = element_text(size=14, face="bold", colour = "black"),
axis.text.y = element_text(size=11, face="bold", colour = "black"),
axis.text.x = element_text(size=11, face="bold", colour = "black"),
legend.position = "bottom",
legend.title = element_text(size = 14),
legend.text = element_text(size = 13),
legend.key.width = unit(1.5,"cm"))+
guides(colour = guide_legend(override.aes = list(size=2)))
}
Gráficas con rangos de fechas. Las fechas iniciales son distintas, la fecha final siempre es el día Martes, 25 de agosto de 2020.
top5.total_cases <- c(head(as.character(top20.total_cases$location),5), "Mexico")
sublabel.tmp <- paste("Rango:", format(startdate, "%d/%m/%Y"),"-", format(enddate, "%d/%m/%Y"))
plot.trend.new_cases.month(startdate, enddate, top5.total_cases, "Tendencia de contagios de COVID-19", sublabel.tmp)
mes.tmp <- as.numeric(format(enddate, "%m"))-5
mes.tmp <- ifelse(mes.tmp < 10, paste0("0",mes.tmp), mes.tmp)
startdate.tmp <- paste0("2020-", mes.tmp, "-01")
startdate.tmp <- as.Date(startdate.tmp)
sublabel.tmp <- paste("Rango:", format(startdate.tmp, "%d/%m/%Y"),"-", format(enddate, "%d/%m/%Y"))
plot.trend.new_cases.month(startdate.tmp, enddate, top5.total_cases, "Tendencia de contagios de COVID-19", sublabel.tmp)
mes.tmp <- as.numeric(format(enddate, "%m"))-3
mes.tmp <- ifelse(mes.tmp < 10, paste0("0",mes.tmp), mes.tmp)
startdate.tmp <- paste0("2020-", mes.tmp, "-01")
startdate.tmp <- as.Date(startdate.tmp)
sublabel.tmp <- paste("Rango:", format(startdate.tmp, "%d/%m/%Y"),"-", format(enddate, "%d/%m/%Y"))
plot.trend.new_cases.month(startdate.tmp, enddate, top5.total_cases, "Tendencia de contagios de COVID-19", sublabel.tmp)
mes.tmp <- as.numeric(format(enddate, "%m"))-2
mes.tmp <- ifelse(mes.tmp < 10, paste0("0",mes.tmp), mes.tmp)
startdate.tmp <- paste0("2020-", mes.tmp, "-01")
startdate.tmp <- as.Date(startdate.tmp)
sublabel.tmp <- paste("Rango:", format(startdate.tmp, "%d/%m/%Y"),"-", format(enddate, "%d/%m/%Y"))
plot.trend.new_cases.month(startdate.tmp, enddate, top5.total_cases, "Tendencia de contagios de COVID-19", sublabel.tmp)
mes.tmp <- as.numeric(format(enddate, "%m"))-1
mes.tmp <- ifelse(mes.tmp < 10, paste0("0",mes.tmp), mes.tmp)
startdate.tmp <- paste0("2020-", mes.tmp, "-01")
startdate.tmp <- as.Date(startdate.tmp)
sublabel.tmp <- paste("Rango:", format(startdate.tmp, "%d/%m/%Y"),"-", format(enddate, "%d/%m/%Y"))
plot.trend.new_cases.month(startdate.tmp, enddate, top5.total_cases, "Tendencia de contagios de COVID-19", sublabel.tmp)
sublabel.tmp <- paste("Rango:", format(enddate-21, "%d/%m/%Y"),"-", format(enddate, "%d/%m/%Y"))
plot.trend.new_cases.monthday(enddate-21, enddate, top5.total_cases, "Tendencia de contagios de COVID-19", sublabel.tmp)
sublabel.tmp <- paste("Rango:", format(enddate-14, "%d/%m/%Y"),"-", format(enddate, "%d/%m/%Y"))
plot.trend.new_cases.monthday(enddate-14, enddate, top5.total_cases, "Tendencia de contagios de COVID-19", sublabel.tmp)
Funciones para generar las gráficas.
##Line plot of new_deaths with date breaks by month
plot.trend.new_deaths.month <- function(startdate, enddate, countries, graphtitle, graphsubtitle){
tmp.df <- covid.df[covid.df$location %in% countries,]
tmp.df <- tmp.df[tmp.df$date >= startdate,]
tmp.df <- tmp.df[tmp.df$date <= enddate,]
ggplot(data=tmp.df , aes(x=date, y=new_deaths, group=location, colour=location)) +
geom_line(size=1) +
scale_color_discrete(name = "Países")+
scale_x_date(date_breaks = "month", date_labels = "%B")+
ggtitle(graphtitle,
subtitle = graphsubtitle)+
labs(caption = plotscaption)+
ylab("Nuevos muertes atribuibles a COVID-19")+
xlab("Fecha")+
theme_bw()+
theme(title = element_text(size=14, face="bold", colour = "black"),
axis.text.y = element_text(size=11, face="bold", colour = "black"),
axis.text.x = element_text(size=11, face="bold", colour = "black"),
legend.position = "bottom",
legend.title = element_text(size = 14),
legend.text = element_text(size = 13),
legend.key.width = unit(1.5,"cm"))+
guides(colour = guide_legend(override.aes = list(size=2)))
}
##Line plot of new_deaths with date breaks by short month and day
plot.trend.new_deaths.monthdays <- function(startdate, enddate, countries, graphtitle, graphsubtitle){
tmp.df <- covid.df[covid.df$location %in% countries,]
tmp.df <- tmp.df[tmp.df$date >= startdate,]
tmp.df <- tmp.df[tmp.df$date <= enddate,]
ggplot(data=tmp.df , aes(x=date, y=new_deaths, group=location, colour=location)) +
geom_line(size=1) +
scale_color_discrete(name = "Países")+
scale_x_date(date_breaks = "day", date_labels = "%b %d")+
ggtitle(graphtitle,
subtitle = graphsubtitle)+
labs(caption = plotscaption)+
ylab("Nuevos casos de COVID-19")+
xlab("Fecha")+
theme_bw()+
theme(title = element_text(size=14, face="bold", colour = "black"),
axis.text.y = element_text(size=11, face="bold", colour = "black"),
axis.text.x = element_text(size=11, face="bold", colour = "black"),
legend.position = "bottom",
legend.title = element_text(size = 14),
legend.text = element_text(size = 13),
legend.key.width = unit(1.5,"cm"))+
guides(colour = guide_legend(override.aes = list(size=2)))
}
Gráficas con rangos de fechas. Las fechas iniciales son distitnas, fecha final siempre es el día Martes, 25 de agosto de 2020.
top5.total_cases <- c(head(as.character(top20.total_cases$location),5), "Mexico")
sublabel.tmp <- paste("Rango:", format(startdate, "%d/%m/%Y"),"-", format(enddate, "%d/%m/%Y"))
title.tmp <- "Tendencia de nuevas muertes por COVID-19"
plot.trend.new_deaths.month(startdate, enddate, top5.total_cases, title.tmp, sublabel.tmp)
mes.tmp <- as.numeric(format(enddate, "%m"))-5
mes.tmp <- ifelse(mes.tmp < 10, paste0("0",mes.tmp), mes.tmp)
startdate.tmp <- paste0("2020-", mes.tmp, "-01")
startdate.tmp <- as.Date(startdate.tmp)
sublabel.tmp <- paste("Rango:", format(startdate.tmp, "%d/%m/%Y"),"-", format(enddate, "%d/%m/%Y"))
plot.trend.new_deaths.month(startdate.tmp, enddate, top5.total_cases, title.tmp, sublabel.tmp)
mes.tmp <- as.numeric(format(enddate, "%m"))-3
mes.tmp <- ifelse(mes.tmp < 10, paste0("0",mes.tmp), mes.tmp)
startdate.tmp <- paste0("2020-", mes.tmp, "-01")
startdate.tmp <- as.Date(startdate.tmp)
sublabel.tmp <- paste("Rango:", format(startdate.tmp, "%d/%m/%Y"),"-", format(enddate, "%d/%m/%Y"))
plot.trend.new_deaths.month(startdate.tmp, enddate, top5.total_cases, title.tmp, sublabel.tmp)
mes.tmp <- as.numeric(format(enddate, "%m"))-2
mes.tmp <- ifelse(mes.tmp < 10, paste0("0",mes.tmp), mes.tmp)
startdate.tmp <- paste0("2020-", mes.tmp, "-01")
startdate.tmp <- as.Date(startdate.tmp)
sublabel.tmp <- paste("Rango:", format(startdate.tmp, "%d/%m/%Y"),"-", format(enddate, "%d/%m/%Y"))
plot.trend.new_deaths.month(startdate.tmp, enddate, top5.total_cases, title.tmp, sublabel.tmp)
mes.tmp <- as.numeric(format(enddate, "%m"))-1
mes.tmp <- ifelse(mes.tmp < 10, paste0("0",mes.tmp), mes.tmp)
startdate.tmp <- paste0("2020-", mes.tmp, "-01")
startdate.tmp <- as.Date(startdate.tmp)
sublabel.tmp <- paste("Rango:", format(startdate.tmp, "%d/%m/%Y"),"-", format(enddate, "%d/%m/%Y"))
plot.trend.new_deaths.month(startdate.tmp, enddate, top5.total_cases, title.tmp, sublabel.tmp)
sublabel.tmp <- paste("Rango:", format(enddate-21, "%d/%m/%Y"),"-", format(enddate, "%d/%m/%Y"))
plot.trend.new_deaths.monthdays(enddate-21, enddate, top5.total_cases, title.tmp, sublabel.tmp)
sublabel.tmp <- paste("Rango:", format(enddate-14, "%d/%m/%Y"),"-", format(enddate, "%d/%m/%Y"))
plot.trend.new_deaths.monthdays(enddate-14, enddate, top5.total_cases, title.tmp, sublabel.tmp)
Para las siguientes gráficas se utilizan las columnas total_cases y total_deaths.
Gráficas con rangos de fechas. Las fechas iniciales son distitnas, fecha final siempre es el día Martes, 25 de agosto de 2020.
Funciones para generar las gráficas.
##Line plot of new_total_cases with date breaks by month
plot.trend.total_cases.month <- function(startdate, enddate, countries, graphtitle, graphsubtitle){
tmp.df <- covid.df[covid.df$location %in% countries,]
tmp.df <- tmp.df[tmp.df$date >= startdate,]
tmp.df <- tmp.df[tmp.df$date <= enddate,]
ggplot(data=tmp.df , aes(x=date, y=total_cases, group=location, colour=location)) +
geom_line(size=1) +
scale_color_discrete(name = "Países")+
scale_x_date(date_breaks = "month", date_labels = "%B")+
ggtitle(graphtitle,
subtitle = graphsubtitle)+
labs(caption = plotscaption)+
ylab("Casos acumulados de COVID-19")+
xlab("Fecha")+
theme_bw()+
theme(title = element_text(size=14, face="bold", colour = "black"),
axis.text.y = element_text(size=11, face="bold", colour = "black"),
axis.text.x = element_text(size=11, face="bold", colour = "black"),
legend.position = "bottom",
legend.title = element_text(size = 14),
legend.text = element_text(size = 13),
legend.key.width = unit(1.5,"cm"))+
guides(colour = guide_legend(override.aes = list(size=2)))
}
##Line plot of new_deaths with date breaks by short month and day
plot.trend.total_cases.monthdays <- function(startdate, enddate, countries, graphtitle, graphsubtitle){
tmp.df <- covid.df[covid.df$location %in% countries,]
tmp.df <- tmp.df[tmp.df$date >= startdate,]
tmp.df <- tmp.df[tmp.df$date <= enddate,]
ggplot(data=tmp.df , aes(x=date, y=total_cases, group=location, colour=location)) +
geom_line(size=1) +
scale_color_discrete(name = "Países")+
scale_x_date(date_breaks = "day", date_labels = "%b %d")+
ggtitle(graphtitle,
subtitle = graphsubtitle)+
labs(caption = plotscaption)+
ylab("Casos acumulados de COVID-19")+
xlab("Fecha")+
theme_bw()+
theme(title = element_text(size=14, face="bold", colour = "black"),
axis.text.y = element_text(size=11, face="bold", colour = "black"),
axis.text.x = element_text(size=11, face="bold", colour = "black"),
legend.position = "bottom",
legend.title = element_text(size = 14),
legend.text = element_text(size = 13),
legend.key.width = unit(1.5,"cm"))+
guides(colour = guide_legend(override.aes = list(size=2)))
}
sublabel.tmp <- paste("Rango:", format(startdate, "%d/%m/%Y"),"-", format(enddate, "%d/%m/%Y"))
title.tmp <- "Contagios acumulados de COVID-19"
plot.trend.total_cases.month(startdate, enddate, top5.total_cases, title.tmp, sublabel.tmp)
mes.tmp <- as.numeric(format(enddate, "%m"))-5
mes.tmp <- ifelse(mes.tmp < 10, paste0("0",mes.tmp), mes.tmp)
startdate.tmp <- paste0("2020-", mes.tmp, "-01")
startdate.tmp <- as.Date(startdate.tmp)
sublabel.tmp <- paste("Rango:", format(startdate.tmp, "%d/%m/%Y"),"-", format(enddate, "%d/%m/%Y"))
plot.trend.total_cases.month(startdate.tmp, enddate, top5.total_cases, title.tmp, sublabel.tmp)
mes.tmp <- as.numeric(format(enddate, "%m"))-3
mes.tmp <- ifelse(mes.tmp < 10, paste0("0",mes.tmp), mes.tmp)
startdate.tmp <- paste0("2020-", mes.tmp, "-01")
startdate.tmp <- as.Date(startdate.tmp)
sublabel.tmp <- paste("Rango:", format(startdate.tmp, "%d/%m/%Y"),"-", format(enddate, "%d/%m/%Y"))
plot.trend.total_cases.month(startdate.tmp, enddate, top5.total_cases, title.tmp, sublabel.tmp)
mes.tmp <- as.numeric(format(enddate, "%m"))-2
mes.tmp <- ifelse(mes.tmp < 10, paste0("0",mes.tmp), mes.tmp)
startdate.tmp <- paste0("2020-", mes.tmp, "-01")
startdate.tmp <- as.Date(startdate.tmp)
sublabel.tmp <- paste("Rango:", format(startdate.tmp, "%d/%m/%Y"),"-", format(enddate, "%d/%m/%Y"))
plot.trend.total_cases.month(startdate.tmp, enddate, top5.total_cases, title.tmp, sublabel.tmp)
mes.tmp <- as.numeric(format(enddate, "%m"))-1
mes.tmp <- ifelse(mes.tmp < 10, paste0("0",mes.tmp), mes.tmp)
startdate.tmp <- paste0("2020-", mes.tmp, "-01")
startdate.tmp <- as.Date(startdate.tmp)
sublabel.tmp <- paste("Rango:", format(startdate.tmp, "%d/%m/%Y"),"-", format(enddate, "%d/%m/%Y"))
plot.trend.total_cases.month(startdate.tmp, enddate, top5.total_cases, title.tmp, sublabel.tmp)
La cantidad de contagios que tienen Estados Unidos y Brasil no permiten apreciar las curvas del resto de los países. Se eliminan esos 2 países y se agregan los 2 países que se encuentran en la posición 6 y 7 del ranking de los países con más contagios reportados.
mes.tmp <- as.numeric(format(enddate, "%m"))-1
mes.tmp <- ifelse(mes.tmp < 10, paste0("0",mes.tmp), mes.tmp)
startdate.tmp <- paste0("2020-", mes.tmp, "-01")
startdate.tmp <- as.Date(startdate.tmp)
sublabel.tmp <- paste("Rango:", format(startdate.tmp, "%d/%m/%Y"),"-", format(enddate, "%d/%m/%Y"))
total_cases.others <- as.character(top20.total_cases$location)[3:7]
plot.trend.total_cases.month(startdate.tmp, enddate, total_cases.others, title.tmp, sublabel.tmp)
sublabel.tmp <- paste("Rango:", format(enddate-21, "%d/%m/%Y"),"-", format(enddate, "%d/%m/%Y"))
plot.trend.total_cases.monthdays(enddate-21, enddate, total_cases.others, title.tmp, sublabel.tmp)
sublabel.tmp <- paste("Rango:", format(enddate-14, "%d/%m/%Y"),"-", format(enddate, "%d/%m/%Y"))
plot.trend.total_cases.monthdays(enddate-14, enddate, total_cases.others, title.tmp, sublabel.tmp)
total_cases.others <- as.character(top20.total_cases$location)[3:7]
total_cases.others <- total_cases.others[c(-1,-2)]
countrieslabels <- paste(total_cases.others, collapse = ', ')
sublabel.tmp <- paste("Rango:", format(enddate-14, "%d/%m/%Y"),"-", format(enddate, "%d/%m/%Y"))
title.tmp <- paste(title.tmp, "de", countrieslabels)
plot.trend.total_cases.monthdays(enddate-14, enddate, total_cases.others, title.tmp, sublabel.tmp)
Gráficas con rangos de fechas. Las fechas iniciales son distitnas, fecha final siempre es el día Martes, 25 de agosto de 2020.
Funciones para generar las gráficas.
# Generate total_deaths line plot with month breaks
plot.trend.total_deaths.month <- function(startdate, enddate, countries, graphtitle, subgraphtitle){
tmp.df <- covid.df[covid.df$location %in% countries,]
tmp.df <- tmp.df[tmp.df$date >= startdate,]
tmp.df <- tmp.df[tmp.df$date <= enddate,]
ggplot(data=tmp.df , aes(x=date, y=total_deaths, group=location, colour=location)) +
geom_line(size=1) +
scale_color_discrete(name = "Países")+
scale_x_date(date_breaks = "month", date_labels = "%B")+
ggtitle(graphtitle,
subtitle = subgraphtitle)+
labs(caption = plotscaption)+
ylab("Muertes totales por COVID-19")+
xlab("Fecha")+
theme_bw()+
theme(title = element_text(size=14, face="bold", colour = "black"),
axis.text.y = element_text(size=11, face="bold", colour = "black"),
axis.text.x = element_text(size=11, face="bold", colour = "black"),
legend.position = "bottom",
legend.title = element_text(size = 14),
legend.text = element_text(size = 13),
legend.key.width = unit(1.5,"cm"))+
guides(colour = guide_legend(override.aes = list(size=2)))
}
# Generate total_deaths line plot with month and day breaks
plot.trend.total_deaths.monthdays <- function(startdate, enddate, countries, graphtitle, subgraphtitle){
tmp.df <- covid.df[covid.df$location %in% countries,]
tmp.df <- tmp.df[tmp.df$date >= startdate,]
tmp.df <- tmp.df[tmp.df$date <= enddate,]
ggplot(data=tmp.df , aes(x=date, y=total_deaths, group=location, colour=location)) +
geom_line(size=1) +
scale_color_discrete(name = "Países")+
scale_x_date(date_breaks = "day", date_labels = "%b %d")+
ggtitle(graphtitle,
subtitle = subgraphtitle)+
labs(caption = plotscaption)+
ylab("Muertes totales por COVID-19")+
xlab("Fecha")+
theme_bw()+
theme(title = element_text(size=14, face="bold", colour = "black"),
axis.text.y = element_text(size=11, face="bold", colour = "black"),
axis.text.x = element_text(size=11, face="bold", colour = "black"),
legend.position = "bottom",
legend.title = element_text(size = 14),
legend.text = element_text(size = 13),
legend.key.width = unit(1.5,"cm"))+
guides(colour = guide_legend(override.aes = list(size=2)))
}
top5.total_deaths <- head(as.character(top20.total_deaths$location),5)
sublabel.tmp <- paste("Rango:", format(startdate, "%d/%m/%Y"),"-", format(enddate, "%d/%m/%Y"))
title.tmp <- "Muertes acumuladas por COVID-19"
plot.trend.total_deaths.month(startdate, enddate, top5.total_deaths, title.tmp, sublabel.tmp)
mes.tmp <- as.numeric(format(enddate, "%m"))-5
mes.tmp <- ifelse(mes.tmp < 10, paste0("0",mes.tmp), mes.tmp)
startdate.tmp <- paste0("2020-", mes.tmp, "-01")
startdate.tmp <- as.Date(startdate.tmp)
sublabel.tmp <- paste("Rango:", format(startdate.tmp, "%d/%m/%Y"),"-", format(enddate, "%d/%m/%Y"))
plot.trend.total_deaths.month(startdate.tmp, enddate, top5.total_deaths, title.tmp, sublabel.tmp)
mes.tmp <- as.numeric(format(enddate, "%m"))-3
mes.tmp <- ifelse(mes.tmp < 10, paste0("0",mes.tmp), mes.tmp)
startdate.tmp <- paste0("2020-", mes.tmp, "-01")
startdate.tmp <- as.Date(startdate.tmp)
sublabel.tmp <- paste("Rango:", format(startdate.tmp, "%d/%m/%Y"),"-", format(enddate, "%d/%m/%Y"))
plot.trend.total_deaths.month(startdate.tmp, enddate, top5.total_deaths, title.tmp, sublabel.tmp)
mes.tmp <- as.numeric(format(enddate, "%m"))-2
mes.tmp <- ifelse(mes.tmp < 10, paste0("0",mes.tmp), mes.tmp)
startdate.tmp <- paste0("2020-", mes.tmp, "-01")
startdate.tmp <- as.Date(startdate.tmp)
sublabel.tmp <- paste("Rango:", format(startdate.tmp, "%d/%m/%Y"),"-", format(enddate, "%d/%m/%Y"))
plot.trend.total_deaths.month(startdate.tmp, enddate, top5.total_deaths, title.tmp, sublabel.tmp)
mes.tmp <- as.numeric(format(enddate, "%m"))-1
mes.tmp <- ifelse(mes.tmp < 10, paste0("0",mes.tmp), mes.tmp)
startdate.tmp <- paste0("2020-", mes.tmp, "-01")
startdate.tmp <- as.Date(startdate.tmp)
sublabel.tmp <- paste("Rango:", format(startdate.tmp, "%d/%m/%Y"),"-", format(enddate, "%d/%m/%Y"))
plot.trend.total_deaths.month(startdate.tmp, enddate, top5.total_deaths, title.tmp, sublabel.tmp)
mes.tmp <- as.numeric(format(enddate, "%m"))-1
mes.tmp <- ifelse(mes.tmp < 10, paste0("0",mes.tmp), mes.tmp)
startdate.tmp <- paste0("2020-", mes.tmp, "-01")
startdate.tmp <- as.Date(startdate.tmp)
sublabel.tmp <- paste("Rango:", format(startdate.tmp, "%d/%m/%Y"),"-", format(enddate, "%d/%m/%Y"))
total_deaths.others <- as.character(top20.total_deaths$location)[3:7]
plot.trend.total_deaths.month(startdate.tmp, enddate, total_deaths.others, title.tmp, sublabel.tmp)
sublabel.tmp <- paste("Rango:", format(enddate-21, "%d/%m/%Y"),"-", format(enddate, "%d/%m/%Y"))
plot.trend.total_deaths.monthdays(enddate-21, enddate, total_deaths.others, title.tmp, sublabel.tmp)
Las curvas de Francia, Italia y Reino Unido están planas, se ha estancado la cantidad de muertes atribuidas a COVID-19. Una posible razón es que la pandemia inició primero en Europa y la cantidad de muertes reportadas recientemente es nula o muy baja. Las curvas de México e India siguen en aumento. Se eliminan los países de Francia, Italia y Reino Unido y se agregan otros países en donde al parecer la cantidad de muertes atribuidas a COVID-19 aún sigue en aumento. Se genera la gráfica con las curvas de muertes acumuladas de las últimas 3 semanas de India, Iran, México, Perú y Rusia:
total_deaths.others <- as.character(top20.total_deaths$location)[c(3,5,8,9,10)]
countrieslabels <- paste(total_deaths.others, collapse = ', ')
title.tmp <- paste(title.tmp, "de", countrieslabels)
sublabel.tmp <- paste("Rango:", format(enddate-21, "%d/%m/%Y"),"-", format(enddate, "%d/%m/%Y"))
plot.trend.total_deaths.monthdays(enddate-21, enddate, total_deaths.others, title.tmp, sublabel.tmp)
total_deaths.others <- as.character(top20.total_deaths$location)[c(3,5,8,9,10)]
countrieslabels <- paste(total_deaths.others, collapse = ', ')
title.tmp <- paste(title.tmp, "de", countrieslabels)
sublabel.tmp <- paste("Rango:", format(enddate-14, "%d/%m/%Y"),"-", format(enddate, "%d/%m/%Y"))
plot.trend.total_deaths.monthdays(enddate-14, enddate, total_deaths.others, title.tmp, sublabel.tmp)
covid.mexico <- covid.df[covid.df$location=='Mexico',]
covid.mexico <- covid.mexico[covid.mexico$date > "2020-04-01",]
covid.mexico.new <- covid.mexico[,c("date", "new_cases", "new_deaths")]
covid.mexico.new.lf <-melt(covid.mexico.new, id.vars = c("date"))
ggplot(data=covid.mexico.new.lf , aes(x=date, y=value, group=variable, colour=variable)) +
geom_line(size=1) +
scale_color_discrete(name = "Indicador", labels=c("Nuevos Contagios", "Nuevas Muertes"))+
scale_x_date(date_breaks = "week", date_labels = "%b %d")+
ggtitle("Curvas de contagios y muertes diarias por COVID-19 en México")+
labs(caption = plotscaption)+
ylab("")+
xlab("Fecha")+
theme_bw()+
theme(title = element_text(size=14, face="bold", colour = "black"),
axis.text.y = element_text(size=11, face="bold", colour = "black"),
axis.text.x = element_text(size=11, face="bold", colour = "black"),
legend.position = "bottom",
legend.title = element_text(size = 14),
legend.text = element_text(size = 13),
legend.key.width = unit(1.5,"cm"))+
guides(colour = guide_legend(override.aes = list(size=2)))
covid.mexico.new.lastmonth <- covid.mexico.new[covid.mexico.new$date >= enddate-30, ]
covid.mexico.new.lastmonth.lf <- melt(covid.mexico.new.lastmonth, id.vars = c("date"))
ggplot(data=covid.mexico.new.lastmonth.lf , aes(x=date, y=value, group=variable, colour=variable)) +
geom_line(size=1) +
scale_color_discrete(name = "Indicador", labels=c("Nuevos Contagios", "Nuevas Muertes"))+
scale_x_date(date_breaks = "week", date_labels = "%b %d")+
ggtitle("Curvas de contagios y muertes diarias por COVID-19 en México - Último mes")+
labs(caption = plotscaption)+
ylab("")+
xlab("Fecha")+
theme_bw()+
theme(title = element_text(size=14, face="bold", colour = "black"),
axis.text.y = element_text(size=11, face="bold", colour = "black"),
axis.text.x = element_text(size=11, face="bold", colour = "black"),
legend.position = "bottom",
legend.title = element_text(size = 14),
legend.text = element_text(size = 13),
legend.key.width = unit(1.5,"cm"))+
guides(colour = guide_legend(override.aes = list(size=2)))
covid.mexico.total <- covid.mexico[,c("date", "total_cases", "total_deaths")]
covid.mexico.total.lf <-melt(covid.mexico.total, id.vars = c("date"))
ggplot(data=covid.mexico.total.lf , aes(x=date, y=value, group=variable, colour=variable)) +
geom_line(size=1) +
scale_color_discrete(name = "Indicador", labels=c("Contagios", "Muertes"))+
scale_x_date(date_breaks = "week", date_labels = "%b %d")+
scale_y_continuous(breaks=c(50000, 100000, 200000, 300000, 400000, 500000, 750000),
label=c("50k","100k", "200k", "300k", "400k", "500k", "750k"))+
ggtitle("Curvas de contagios y muertes acumulados por COVID-19 en México")+
labs(caption = plotscaption)+
ylab("")+
xlab("Fecha")+
theme_bw()+
theme(title = element_text(size=14, face="bold", colour = "black"),
axis.text.y = element_text(size=11, face="bold", colour = "black"),
axis.text.x = element_text(size=11, face="bold", colour = "black"),
legend.position = "bottom",
legend.title = element_text(size = 14),
legend.text = element_text(size = 13),
legend.key.width = unit(1.5,"cm"))+
guides(colour = guide_legend(override.aes = list(size=2)))
covid.mexico.total.lasttmonth <- covid.mexico.total[covid.mexico.total$date >= enddate-30, ]
covid.mexico.total.lasttmonth.lf <- melt(covid.mexico.total.lasttmonth, id.vars = c("date"))
ggplot(data=covid.mexico.total.lasttmonth.lf , aes(x=date, y=value, group=variable, colour=variable)) +
geom_line(size=1) +
scale_color_discrete(name = "Indicador", labels=c("Contagios", "Muertes"))+
scale_x_date(date_breaks = "week", date_labels = "%b %d")+
scale_y_continuous(breaks=c(50000, 100000, 200000, 300000, 400000, 500000, 750000),
label=c("50k","100k", "200k", "300k", "400k", "500k", "750k"))+
ggtitle("Curvas de contagios y muertes acumulados por COVID-19 en México - Último mes")+
labs(caption = plotscaption)+
ylab("")+
xlab("Fecha")+
theme_bw()+
theme(title = element_text(size=14, face="bold", colour = "black"),
axis.text.y = element_text(size=11, face="bold", colour = "black"),
axis.text.x = element_text(size=11, face="bold", colour = "black"),
legend.position = "bottom",
legend.title = element_text(size = 14),
legend.text = element_text(size = 13),
legend.key.width = unit(1.5,"cm"))+
guides(colour = guide_legend(override.aes = list(size=2)))
covid.mexico <- covid.df[covid.df$location=='Mexico',]
week.agg <- covid.mexico %>%
group_by(week = week(date)) %>%
summarise(mean_new_cases = mean(new_cases),
n = n(),
loCI_new_cases = ci(new_cases)[2],
hiCI_new_cases = ci(new_cases)[3],
mean_new_deaths = mean(new_deaths),
loCI_new_deaths = ci(new_deaths)[2],
hiCI_new_deaths = ci(new_deaths)[3],
total_new_cases = sum(new_cases),
total_new_deaths = sum(new_deaths))
# Weeks with zero mean new cases are deleted
week.agg <- week.agg[which(week.agg$mean_new_cases > 0),]
# Finding the date of the first week to be ploted
firstweek <- min(week.agg$week)
rowindex <- match(firstweek, week(covid.mexico[,"date"]))
firstweek.date <- format(covid.mexico[rowindex, ]$date, "%d/%m/%Y")
rm(firstweek, rowindex)
# Finding the date of the last week to be ploted
lastweek <- max(week.agg$week)
rowindex <- match(lastweek, week(covid.mexico[,"date"]))
lastweek.date <- format(covid.mexico[rowindex, ]$date, "%d/%m/%Y")
rm(lastweek, rowindex)
sublabel <- paste("Rango:", firstweek.date, "-", lastweek.date)
ggplot(data=week.agg, aes(x=as.character(week),y=mean_new_cases))+
geom_bar(stat = 'identity', aes(fill = mean_new_cases)) +
geom_errorbar(aes(ymin=loCI_new_cases, ymax=hiCI_new_cases), width=.2,
position=position_dodge(.9)) +
xlab("Semana")+
ylab("Promedio semanal de casos ± IC 95%")+
labs(title="Promedio semanal de casos de COVID-19 en México",
subtitle = sublabel,
caption = plotscaption)+
scale_fill_gradient(name="Casos", low = "green", high = "red")+
theme_bw()+
theme(title = element_text(size=14, face="bold", colour = "black"),
axis.text.y = element_text(size=11, face="bold", colour = "black"),
axis.text.x = element_text(size=11, face="bold", colour = "black"))
ggplot(data=week.agg, aes(x=as.character(week),y=mean_new_deaths))+
geom_bar(stat = 'identity', aes(fill = mean_new_deaths)) +
geom_errorbar(aes(ymin=loCI_new_deaths, ymax=hiCI_new_deaths), width=.2,
position=position_dodge(.9)) +
xlab("Semana")+
ylab("Promedio semanal de muertes ± IC 95%")+
labs(title="Promedio semanal de muertes por COVID-19 en México",
subtitle = sublabel,
caption = plotscaption)+
scale_fill_gradient(name="Muertes", low = "green", high = "red")+
theme_bw()+
theme(title = element_text(size=14, face="bold", colour = "black"),
axis.text.y = element_text(size=11, face="bold", colour = "black"),
axis.text.x = element_text(size=11, face="bold", colour = "black"))
ggplot(data=week.agg, aes(x=as.character(week),y=total_new_cases))+
geom_bar(stat = 'identity', aes(fill = total_new_cases)) +
xlab("Semana")+
ylab("Casos totales por semana")+
labs(title="Casos totales de COVID-19 por semana en México",
subtitle = sublabel,
caption = plotscaption)+
scale_fill_gradient(name="Casos totales", low = "green", high = "red")+
theme_bw()+
theme(title = element_text(size=14, face="bold", colour = "black"),
axis.text.y = element_text(size=11, face="bold", colour = "black"),
axis.text.x = element_text(size=11, face="bold", colour = "black"))
ggplot(data=week.agg, aes(x=as.character(week),y=total_new_deaths))+
geom_bar(stat = 'identity', aes(fill = total_new_deaths)) +
xlab("Semana")+
ylab("Muertes totales por semana")+
labs(title="Muertes totales por COVID-19 por semana en México",
subtitle=sublabel,
caption = plotscaption)+
scale_fill_gradient(name="Muertes totales", low = "green", high = "red")+
theme_bw()+
theme(title = element_text(size=14, face="bold", colour = "black"),
axis.text.y = element_text(size=11, face="bold", colour = "black"),
axis.text.x = element_text(size=11, face="bold", colour = "black"))