Introduction
Corruption has positioned itself as the main problem that worries Colombians in the face of the election of the next president, according to the latest Invamer survey (June 10, 2022), it becomes the reflection of the importance and powerlessness to combat this scourge that takes away $50 billion Colombian pesos annually to the country, according to control entities.
This project attempts to answer the following question: What is the voting expectation in the cities with the most cases of corruption detected in Colombia?

In the elections for president in 2022, Colombians will be able to choose the candidate who, in their opinion, will tackle corruption in their regions and throughout the national territory. After the first round of elections, we can visualize the behavior of the electorate that yearns to completely eradicate this scourge, where according to the Corruption Perception Index – 2021, (Transparency International) Colombia obtained 39 points out of 100, being 0, very high corruption and 100 absence of corruption. The country ranks 87th out of 180 countries evaluated. A score below 50 points indicates very serious levels of corruption in the public sector.
The degree of uncertainty that Colombians are experiencing regarding the country’s destiny can only be counteracted through the democratic exercise of voting, where reason and intellect should push us to review the anti-corruption plans in government programs.
We will conclude with an analysis at the end of the development, emphasizing the mix of the use of different tools to generate some valuable insights. All this without the slightest intention of biasing or taking political sides, since corruption is a cancer and it is in our hands to try to find a cure at the ballot box. It is also important to say that there is no correlation between the number of cases found and the winner of the elections in each region.
📌 For the treatment of the obtained data I will use Python Pandas and Plotly, the visualizations will be built with the MapBox and Tableau Public API.

Data
The main data were downloaded from the report “Así se mueve la corrupción”, published by Transparencia por Colombia on December 2, 2021, where the analysis of 967 corruption facts reported in 2026 national press releases published between 2016 and 2020 was presented.

The report showed that the majority of corruption incidents (53%) are found in: Bogota (200 facts), Atlántico (88 facts), Antioquia (76 facts), Santander (75 facts) and Valle del Cauca (74 facts), where the most frequent type of corruption is administrative corruption in the public sector.
Using the database of the latest bulletin of the results of Colombia’s first round presidential elections for 2022, issued by the Colombian National Registrar’s Office, we can make a comparison with the data from the corruption report to understand who voted for in each city to end this crime.
“This report once again reflects the systematic and structural nature of corruption. Those who aspire to the Presidency of the Republic must put forward ambitious, decisive and coherent proposals to comprehensively address this problem.
As a society, we must make an in-depth review of the deep affectation that corruption generates. We must be able to elect those who can genuinely and seriously face this problem, not buy facile speeches and demand effective actions. We must not allow them to continue stealing our present and our future”.
Andrés Hernández, Executive Director, Transparencia por Colombia
METHODOLOGY
📌 The complete code will be published in a Github notebook. After a problem with the MapBox API in the Jupyter Notebook running locally on my computer, I decided to process the data in a COLAB online notebook and then export the DataFrame in .CSV format to Tableau Public.
Data processing and cleaning
- We import our dependencies (pandas, plotly and json) from Python into our notebook.
import pandas as pd
import json
import plotly.express as px
px.set_mapbox_access_token(token_map_plot)
import plotly.graph_objects as go
pd.options.display.max_columns = 999
Visualization of the report “This is how corruption moves”.
- We start importing the data in CSV format and create our first DataFrame with Pandas in Python.
bd_corr = pd.read_csv('data/base-de-datos-hechos.csv')
- For this example we must have an account to generate a token with the MapBox API.
token_map_plot = 'AquíVaTuTokenPersonalDeMapbox'
- The next step inside our EDA will be to review your information, column and row sizes, null data, column names and data types to change.
bd_corr.info()
bd_corr.shape
bd_corr.columns
bd_corr.dtypes
- I wanted to rename one of the main columns I want to work with:
bd_corr = bd_corr.rename(columns={'Tipo de corrupción':'tipo_corrupcion'})
- I group with .groupby in a new DataFrame only the columns I need:
map_corr_deptos = bd_corr.groupby(['Departamento', 'Dep_Lat', 'Dep_Lng'])['tipo_corrupcion'].count().reset_index()
To obtain:

- Here I faced a problem that took me several hours: The Dep_Lat and Dep_lng columns (latitude and longitude location for each department) changed their values by shifting the coordinate point by points of thousands and millions, that is:
For Amazonas it returns the values: -42.152.778 and -699.405.556 when in fact its correct coordinates are: -4.215278 y -69.940556
- I tried to move the points from the notebook with pandas but did not succeed, finally as there were only 32 records I decided to download the DataFrame in .CSV and format it in an Excel spreadsheet. (Must not be a good practice but I will find later the right way to do it ☹ ).
map_corr_deptos.to_csv("map_corr_deptos_error.csv")
- Then I loaded again the CSV corrected in the Latitude and Longitude values: (the parameter “sep=’;’ is used to separate the columns when loading the Dataframe with Pandas)
map_corr_deptos_ok = pd.read_csv('data/map_corr_deptos_ok.csv', sep=';')
map_corr_deptos_ok.head()

- We import the plotly.io library to generate our first visualization:
import plotly.io as pio
pio.templates
fig = px.bar(map_corr_deptos_ok, x='Departamento', y='count',
color='count',
template='plotly_dark',
labels={'count':'Cantidad de casos'},
title='Cantidad de casos de corrupción por departamento',
height=400)
fig.show()

- I made a change in some department names to reduce their length:
map_corr_deptos_ok.Departamento = map_corr_deptos_ok.Departamento.replace({'BOGOTÁ, DISTRITO CAPITAL' : 'BOGOTA D.C.'})
map_corr_deptos_ok.Departamento = map_corr_deptos_ok.Departamento.replace({'GUAJIRA' : 'LA GUAJIRA'})
map_corr_deptos_ok.Departamento = map_corr_deptos_ok.Departamento.replace({'NORTE SANTANDER' : 'NORTE DE SAN'})
map_corr_deptos_ok.Departamento = map_corr_deptos_ok.Departamento.replace({'SAN ANDRES, PROV.' : 'SAN ANDRES'})
- I used px.scatter_mapbox to visualize the data with the map of Colombia:
px.scatter_mapbox(map_corr_deptos_ok,
lat='Dep_Lat',
lon='Dep_Lng',
color='Departamento',
size='tipo_corrupcion',
color_continuous_scale=px.colors.cyclical.IceFire,
#size_max=5,
zoom=4,
center = dict(
lat = 4.570868,
lon = -74.297333
),
height= 600
)

- And another display mode applying a different MapBox style:
fig = go.Figure(go.Scattermapbox(
lon = map_corr_deptos_ok.Dep_Lng,
lat = map_corr_deptos_ok.Dep_Lat,
mode = 'markers+text',
marker = go.scattermapbox.Marker(size=map_corr_deptos_ok.tipo_corrupcion,
color = map_corr_deptos_ok.tipo_corrupcion,
#colorscale = 'Edge',
showscale = True,
sizemode = 'area',
opacity = 0.8
),
hoverinfo = 'text',
hovertext =
'<b>Departamento</b>: '+ map_corr_deptos_ok['Departamento'].astype(str) + '<b>' +
'<b>Cantidad de actos</b>: '+ map_corr_deptos_ok['tipo_corrupcion'].astype(str) + '<b>'
))
fig.update_layout(
hovermode = 'x',
margin = dict(r=0, l=0, b=0, t=0),
mapbox = dict(
accesstoken = token_map_plot,
style = 'dark',
zoom = 4.5,
center = dict(
lat = 4.570868,
lon = -74.297333
)
),
showlegend = True,
autosize = True
)

Visualization of the electoral map in the first round of the presidential election in Colombia 2022 🗳
- We import the data from our CSV file:
df_elecciones = pd.read_csv('data/resultados_primera_vuelta_2022_boletin_68.csv')
- We analyze its content in column names (.columns), amount of data (.shape) and data types (.dtypes).
- With the dataFrame df_elections we can visualize the votes inside and outside Colombia, for example in France 🇫🇷 these were the results: False=Total votes // True=Votes in France for each candidate (first 4):
df_elecciones_francia = df_elecciones.groupby([df_elecciones.mun == 'FRANCIA'])['FEDERICO_GUTIÉRREZ_vot', 'RODOLFO_HERNÁNDEZ_vot', 'GUSTAVO_PETRO_vot', 'SERGIO_FAJARDO_vot'].sum().reset_index()
df_elecciones_francia.head()

df_elecciones_francia.plot(kind='bar')

- I grouped a new DataFrame with .groupby the departments and the 4 candidates with the most votes.
df_elecciones_deptos = df_elecciones.groupby(['dpto'])['FEDERICO_GUTIÉRREZ_vot', 'RODOLFO_HERNÁNDEZ_vot', 'GUSTAVO_PETRO_vot', 'SERGIO_FAJARDO_vot'].sum().reset_index()

- I made a bar chart with matplotlib (but it needs more formatting):
df_elecciones_deptos.plot(kind='bar')

- Here is the first bar graph of the votes of each candidate by department:
fig = px.bar(df_elecciones_deptos, x='dpto', y='Nombre_Columna_Candidato',
color='dpto',
template='plotly_dark',
labels={'df_elecciones_deptos':'Cantidad de votos'},
title='Cantidad de votos por departamento',
height=400)
fig.show()




- Finally, to know the result of the voting in the cities with more corruption cases I unified with .merge the two Dataframes (df_elecciones_deptos and map_coo_deptos_ok) in a single Dataframe = df_elecciones_casos, with this Dataframe we are adding the geographic coordinates and the number of corruption cases by department.
df_elecciones_casos = df_elecciones_deptos.merge(map_corr_deptos_4, left_on='dpto', right_on='Departamento', how ='left')
df_elecciones_casos
- Analyzing the data type of the new Dataframe we see that the column “Corruption_type” changed to Float64 type and we need it to be integer type “Int”.
df_elecciones_casos.tipo_corrupcion = df_elecciones_casos.tipo_corrupcion.astype(int)
- Now we have our Dataframe ready to be exported as CSV and continue testing some possibilities in Tableau Public.
df_elecciones_casos.to_csv("data/df_elecciones_casos_filtrado.csv")
Tableau Public

With the data unified in .CSV format, I used the Tableau Public platform to create a dashboard to gather some visualizations to support the analysis.
📌 You can access the public dashboard at this link.
Analysis and conclusions

Although cases of corruption were identified in all 32 departments and Bogotá as the Capital District, it is important to highlight that 53% are concentrated in only five departments: Bogota, Atlántico, Antioquia, Santander and Valle del Cauca.
With the 200 cases of corruption concentrated in Bogotá, the fact of the centralization of institutions, public agencies and social and economic actors is highlighted, as well as the ease of reporting, since the report refers to the guarantees of freedom of expression and the existence of media outlets. The latter reflects the low level of reporting in some regions, not due to the non-existence of corruption cases, but to the lack of free exercise of investigative journalism, added to threats, terrorism and corruption itself.
The result of crossing the corruption data by department and the presidential election results in the first round simply translates into the confirmation of the trends towards the candidates that Colombians chose in each department. Gustavo Petro was the winner in Bogotá, Atlántico and Valle del Cauca, Federico Gutiérrez won in Antioquia and Rodolfo Hernández in Santander.

Far from making a political analysis, there is a strong tendency towards a decisive change in the way Colombia is governed, a lot of polarization is generated towards the winners who will go to the second round on June 19, 2022: Gustavo Petro and Rodolfo Hernandez.
Many continue to call on NON VOTERS to actively participate in this decision by taking sides with arguments and knowledge of the proposals, not with memes or fake news. The indifference in abstentionism hurts more knowing that everything can improve if we agree to go towards the same goal and completely eradicating the corruption that hurts us so much as a people.
I found the way of data to tell visual stories, the learning curve is very steep, sometimes frustrating, running into problems all the time, I post and ask for help, sometimes it comes and sometimes it doesn’t, sometimes the solution is on Stack Overflow, sometimes on Youtube or in some tutorial on Medium, but I confess I love it and I would like to learn how to do it better.
I make this project 4 days before the presidential elections in the second round, all of us who have the opportunity to choose should do it. One vote makes the difference.

See the Python code of the whole project in GitHub
References
- https://www.elespectador.com/politica/elecciones-colombia-2022/la-corrupcion-sigue-siendo-la-mayor-preocupacion-de-los-colombianos-invamer/
- https://transparenciacolombia.org.co/2022/01/25/indice-de-percepcion-de-la-corrupcion-2021/
- https://www.elespectador.com/politica/elecciones-colombia-2022/la-corrupcion-sigue-siendo-la-mayor-preocupacion-de-los-colombianos-invamer/