In recent years, the choice of genderless names has become increasingly common in different cultures. These names make it possible to challenge traditional conventions and promote gender equality. In this publication, we will explore the genderless names used in births in the city of Nantes, France, from 2001 to 2022. We will analyze a database from data.nantesmetropole.fr containing the names of boys and girls born in that period, identifying those names that do not have a specific gender assignment. Join us on this journey through the diversity of names and discover which genderless names have been popular in Nantes for more than two decades.
Analyse ETL :
En utilisant une base de données qui recueille des informations sur les noms des nouveau-nés à Nantes, nous créons un ETL (Extract, Transform, Load) en Python pour analyser et visualiser les données présentées au format .CSV, où à partir de la bibliothèque Pandas nous capturons les informations initiales qui nous sont présentées par l’ensemble : Types et quantité de variables, nombre d’enregistrements dans l’ensemble de données et révision des données manquantes.
Nous utiliserons les bibliothèques pandas, matplotlib et plotly pour effectuer l’analyse et la visualisation des données. Assurez-vous que ces bibliothèques sont installées avant d’exécuter le code. Vous pouvez les installer à l’aide du gestionnaire de paquets pip. Par exemple, lancez pip install pandas matplotlib plotly dans votre environnement pour les installer.
import pandas as pd
import matplotlib.pyplot as plt
import plotly.express as px
# Extraction des données
data = pd.read_csv('nantes_prenoms.csv')
Initial exploration of the dataset:
After importing the data, we extract its basic information with Pandas: record count, # of columns and variable types.
data.info()
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 6037 entries, 0 to 6036
Data columns (total 6 columns):
# Column Non-Null Count Dtype
--- ------ -------------- -----
0 Commune concernée 6037 non-null object
1 Code INSEE 6037 non-null int64
2 Sexe relatif au prénom 6037 non-null object
3 Prénom 6037 non-null object
4 Nombre d'occurrences 6037 non-null int64
5 Année 6037 non-null int64
dtypes: int64(3), object(3)
memory usage: 283.1+ KB
None
We rename the columns to be used to eliminate accents, spaces and unify the use of upper and lower case:
# Data transformation
data = data.rename(columns={'Sexe relatif au prénom':'sexe','Prénom':'prenom',"Nombre d'occurrences":'naissances', 'Année':'annee'})
We validate if the dataframe contains null data:
data.isnull().sum()
Commune concernée 0
Code INSEE 0
sexe 0
prenom 0
naissances 0
annee 0
dtype: int64
Discovering genderless names in Nantes:
Over the years, we have observed a growing trend towards the choice of genderless names in Nantes. From 2001 to 2022, several names have been registered that are not uniquely associated with a specific gender. Some of these genderless names have gained popularity and have become popular choices for parents who want their children to have a non-binary gender identity.
Through clustering, filtering, and final counts, we have identified names that have a female and male gender assignment. We can presume that these names are chosen by parents looking to break away from traditional norms and allow their children to freely choose their gender identity in the future.
# Data analysis and visualization
# Filter names without gender (both male and female names are displayed)
names_without_gender = data.groupby('nombre').filter(lambda x: x['Sexo'].nunique() == 2)
# Counting occurrences of nouns without gender
names_count = names_without_gender.groupby('nombre')['Nacimientos'].sum().reset_index()
# Bar chart of names without gender
fig_names_without_gender = px.bar(names_count, x='nombre', y='Nacimientos', title='Names Without Gender Used')
fig_names_without_gender.update_layout(xaxis_title='Nombres', yaxis_title='Number of Births')
fig_names_without_gender.show()
In this code, we first read the CSV file and store the data in the dataframe data. Then, we filter the names that appear in both male and female gender using the groupby() function along with filter(). Next, we count the occurrences of each name without gender using groupby() and sum(). Finally, we create a bar chart using px.bar() from Plotly Express, where the x-axis shows the names and the y-axis shows the number of births associated with each name. Each bar has a unique color for easy identification and to highlight the diversity of genderless names used in the city.
# Names used in both sexes common_names_both_genders2 = data.groupby('prenom')['sexe'].nunique() common_names_both_genders2 = common_names_both_genders2[common_names_both_genders2 == 2].index common_names_both_genders2 9 resultados: ['Alix', 'Camille', 'Charlie', 'Eden', 'Lou', 'Louison', 'Noa', 'Sasha', 'Swann']
Having achieved our initial objective of obtaining the 9 most common names used for both sexes in Nantes, we can resume the exploration of the dataframe and solve some basic questions that will help us in our analysis.
What is the total number of records?
The dataframe has 6037 birth records between 2001 and 2022.
data.shape
(6037, 6)
How many unique names are there?
The dataframe has 731 unique names used.
data['prenom'].nunique()
731
If you want to see the complete list run:
data['prenom'].unique()
We visualized with Plotly the number of unique names per year, with 2003 being the lowest with 183 births and 2022 (pandemic year) with 316 births.
What is the total distribution by gender?
Between 2001 and 2022, more boys (3110) than girls (2927) were born in the city of Nantes.
data['sexe'].value_counts()
M 3110
F 2927
Name: sexe, dtype: int64
We can generate a pie chart as a visualization:
gender_distribution = data['sexe'].value_counts()
fig_gender = px.pie(names=gender_distribution.index, values=gender_distribution.values, title='Gender distribution of children')
fig_gender.show()
Display the gender distribution for each year:
fig_gender_year = px.histogram(data, x='annee', color='sexe', text_auto=True, title='Répartition des sexes par année')
fig_gender_year.show()
What are the most used names in the dataset?
Of the 731 unique names present in the dataframe, we filtered the top 10 according to the number of births. We also visualized them according to their gender for our main purpose of identifying non-binary names, highlighting “Camille”, a genderless name ranked second with 1147 records after “Louise”, a female name ranked first with 1158 records.
# Group by name and gender and count the number of occurrences.
names_count = data.groupby(['prenom', 'sexe'])['naissances'].sum().reset_index()
# Filter only the most used names
top_names = names_count.groupby('prenom')['naissances'].sum().nlargest(10).index
# Filter data only for the most used names
top_names_data = names_count[names_count['prenom'].isin(top_names)]
prenom sexe naissances
80 Arthur M 1143
121 Camille F 890
122 Camille M 257
211 Emma F 1102
281 Hugo M 1024
327 Jules M 1037
402 Louis M 1104
404 Louise F 1158
415 Lucas M 1122
436 Léo M 977
452 Manon F 975
# Bar chart differentiated by color according to sex
fig_top_names = px.bar(top_names_data, x='prenom', y='naissances', color='sexe', text_auto=True, title='Prénoms les plus utilisés (différenciés par sexe)')
fig_top_names.update_layout(xaxis_title='Prénoms', yaxis_title='Nombre de naissances')
fig_top_names.show()
What are the most used names for each year?
We can visualize the name that occupies the first position for each year of the dataframe, where “Camille” dominates its presence in 6 different years, followed by “Charlie” (5 years), “Louison” (4 years) and “Noa” (3 years).
popular_names = data.groupby('annee')['prenom'].agg(lambda x: x.value_counts().index[0])
fig_popular_names = px.bar(x=popular_names.index, y=popular_names.values, text_auto=True, color=popular_names, color_continuous_scale = 'viridis', title='Prénom le plus populaire par an')
fig_popular_names.show()
Percentage of names with gender over time.
Using a stacked bar chart we show the percentage of names by gender over time, using blue for male gender “M” and orange for female gender “F”:
Link to interactive visualization in Plotly
In this code, after calculating the percentage of first names by gender and year, we assign colors to the genders using color mapping. Next, we create a stacked bar chart using go.Bar() from Plotly Graph Objects. We iterate over the genres and add a bar for each genre with different colors. Finally, we update the graph layout with a title, axis titles and display the graph generated by Plotly.
Run the code and you can interact with the stacked bar chart showing the percentage of first names by gender over time.
# Calculate percentage of first names by gender and year
percentage_data = data.groupby(['year', 'gender']).size() / data.groupby('year').size() * 100
percentage_data = percentage_data.reset_index(name='percentage')
# Assign colors to genders
color_map = { 'M': 'blue', 'F': 'orange'}
percentage_data['Color'] = percentage_data['sex'].map(color_map)
# Create a stacked bar chart of the percentage of first names by gender over time
fig = go.Figure()
for sex in ['M', 'F']:
fig.add_trace(go.Bar(
x=percentage_data[percentage_data['sex'] == sex]['annee'],
y=percentage_data[percentage_data['sex'] == sex]['percentage'],
name=sex,
marker_color=color_map[sex]
))
fig.update_layout(
title='Percentage of stack names by gender in Nantes',
xaxis_title='Year',
yaxis_title='Percentage',
barmode='stack'
)
fig.show()
Genderless naming trend for the coming years
Thanks to Line Ton That‘s comment, where he wondered about Camille’s impact on all this, if her name would no longer be so fashionable in the coming years, or if we will see the same trend in genderless names : I used the scikit-learn library (sklearn) to calculate the trend of names appearing in both genders as a function of the total number of births by name and year. Then, visualize the trend of these names using a bar chart with the Plotly Express (px) library.
Link to interactive visualization in Plotly
Here we explain the code execution flow step by step:
from sklearn.linear_model import LinearRegression
# Filter the names appearing in both sexes
both_sex_names = data.groupby('prenom').filter(lambda x: len(x['sexe'].unique()) == 2)
# Calculate total number of births by name and year
births_by_name_and_year = both_sex_names.groupby(['prenom', 'annee'])['naissances'].sum().reset_index()
# Step 3: Data analysis and visualization
# Create a DataFrame with the future trends
future_trends = pd.DataFrame(columns=['prenom', 'trend'])
# Calculate the trend for each name
for name in both_sex_names['prenom'].unique():
data_name = births_by_name_and_year[births_by_name_and_year['prenom'] == name]
# Use linear regression to calculate the trend
X = data_name['annee'].values.reshape(-1, 1)
y = data_name['naissances'].values
model = LinearRegression()
model.fit(X, y)
trend = model.coef_[0]
future_trends = future_trends.append({'prenom': name, 'trend': trend}, ignore_index=True)
# Sort names by descending trend
future_trends = future_trends.sort_values(by='trend', ascending=False)
# Display the trend of names that use both genders
fig = px.bar(future_trends, x='prenom', y='trend', labels={'trend': 'Trend'},
title='Trend of Names Using Both Genders in the Coming Years')
fig.update_layout(xaxis_title='Name', yaxis_title='Trend')
fig.show()
In short, this code filters out names that appear in both sexes, calculates the trend using linear regression, and visualizes the trend of names using a bar chart. This allows us to obtain information about the names that have an increasing or decreasing trend in terms of their popularity over the years.
After running the linear regression we observe a decreasing trend for the name “Camille” for the next few years, and an increase for the name “Charlie” versus the other non-gendered names.
Conclusion:
The analysis of genderless names used in Nantes births shows us the evolution and acceptance of gender diversity in society. Parents are opting for names that are not restricted to a single gender identity, giving their children the freedom to explore and define their own identity in the future. This trend reflects a greater openness and acceptance of diversity in society, as well as a desire to foster gender equality from the beginning of life.
Interestingly, when comparing the most popular names by year, 5 of the 9 genderless names found at the beginning of the exploration are present (Camille, Charlie, Louison, Noa and Sasha). This determines the firm decision made by their parents to name their sons and daughters in a non-binary way, either by tendency or as an adaptive phenomenon for the new generations.
To highlight the name “Camille“, as the first choice in non-gendered names and its presence in the general ranking of the database of boys and girls born in Nantes in the last two decades.
Surely these results will change according to the data of the city and the country, so if anyone would like to share their results to compare them with those of Nantes, do not hesitate to contact us to continue learning in a collaborative way.
References:
- Nantes births data (2001-2022): Data source: Open Data Nantes Métropole
- Plotly Express: [Official Documentation] https://plotly.com/python/plotly-express
- Photo Rene Asmussen de Pexels