Recently, I participated in a job interview to join the Dataviz – BI team of a large company. During the conversation, I was asked the use case of how they would operate the data of the operation of a restaurant, how to structure, analyze and visualize them. I think my solution was too practical and not technical, fortunately my interviewer guided me to put the explanation on track to a solution that would contribute to the business. Therefore, I decided to create a project that involves the creation of a Data Warehouse using the star schema model and data analysis using SQL and Power BI. Here I tell you the steps I followed to complete this project, from the design of the star schema to the creation of an interactive dashboard in Power BI.
Introduction
The project seeks to design a Data Warehouse for a restaurant using the star schema of dimensional modeling. This schema will facilitate the manipulation and analysis of operational data from the restaurant, allowing the generation of reports and performance analysis using SQL queries at first. We will then bring the data into Power BI to create a dashboard for analysis and distribution to other stakeholders.
Project Objectives
- Design a star schema for the restaurant’s data warehouse.
- Implement the necessary tables in a SQL database.
- Populate the tables with simulated data.
- Perform SQL queries to analyze various aspects of the business.
- Export the database in .csv format to be imported into Power BI.
- Create a dashboard in Power BI for analysis.
- Conclusions.
Star Scheme Design
The star schema is a dimensional modeling technique used in the construction of data warehouses. It is one of the most popular methods due to its simplicity and efficiency in organizing data for analysis and queries.
The first step of the project was to design a star schema for the restaurant’s data warehouse. A star schema is characterized by having a central table called “fact table” that contains the transactional data, and several “dimension tables” that contain the related descriptive attributes.
For this project, I designed the following scheme:
- Fact Table: Sales_Fact
- Sales_ID (primary key)
- Customer_ID (foreign key)
- Product_ID (foreign key)
- Location_ID (foreign key)
- Date_ID (foreign key)
- Location_ID (foreign key)
- Quantity_Sold
- Total_Amount
- Cost
- Dimension Tables:
- Customer_Dim
- Customer_ID (primary key)
- Customer_Name
- Customer_Email
- Product_Dim
- Product_ID (primary key)
- Product_Name
- Product_Category
- Location_Dim
- Location_ID (primary key)
- Location_Name
- Location_City
- Date_Dim (primary key)
- Date_ID (primary key)
- Date
- Day
- Month
- Year
- Customer_Dim
Advantages of the Star Scheme
- Simplicity: The structure is easy to understand and navigate, both for developers and end users.
- Query Performance: Queries in a star schema are usually fast because the relationships between the fact table and dimension tables are simplified.
- Scalability: It is easy to expand by adding more dimensions or facts without reorganizing the entire structure.
Disadvantages of the Star Scheme
- Data Redundancy: Dimension tables may contain redundant data, which can increase storage space.
- Maintenance: Redundancy and data duplication can make maintenance more complicated.
Implementation of Tables in SQL
After designing the star schema, I proceeded to implement the tables in a SQL database. I used PostgreSQL for this purpose. Here is the SQL code to create the tables:
-- Create fact table
CREATE TABLE Sales_Fact (
Sales_ID SERIAL PRIMARY KEY,
Customer_ID INT,
Product_ID INT,
Location_ID INT,
Date_ID INT,
Quantity_Sold INT,
Total_Amount DECIMAL(10, 2),
Cost DECIMAL(10, 2)
);
-- Create dimensiones table
CREATE TABLE Customer_Dim (
Customer_ID SERIAL PRIMARY KEY,
Customer_Name VARCHAR(100),
Customer_Email VARCHAR(100)
);
CREATE TABLE Product_Dim (
Product_ID SERIAL PRIMARY KEY,
Product_Name VARCHAR(100),
Product_Category VARCHAR(100)
);
CREATE TABLE Location_Dim (
Location_ID SERIAL PRIMARY KEY,
Location_Name VARCHAR(100),
Location_City VARCHAR(100)
);
CREATE TABLE Date_Dim (
Date_ID SERIAL PRIMARY KEY,
Date DATE,
Day INT,
Month INT,
Year INT
);
Populating Tables with Simulated Data
To simulate data in the tables, I used the RANDOM() function of PostgreSQL. Below is an example of how I inserted the simulated data into the fact table:
-- Insert data in Date_Dim
INSERT INTO Date_Dim (Date_ID, Date, Day, Month, Quarter, Year) VALUES
(1, '2023-01-01', 1, 1, 1, 2023),
(2, '2023-01-02', 2, 1, 1, 2023);
-- Insert data in Product_Dim
INSERT INTO Product_Dim (Product_ID, Product_Name, Category, Price) VALUES
(1, 'Burger', 'Food', 5.99),
(2, 'Fries', 'Food', 2.99);
-- Insert data in Customer_Dim
INSERT INTO Customer_Dim (Customer_ID, Customer_Name, Gender, Age, City, Country) VALUES
(1, 'John Doe', 'M', 30, 'New York', 'USA'),
(2, 'Jane Smith', 'F', 25, 'Los Angeles', 'USA');
-- Insert data in Location_Dim
INSERT INTO Location_Dim (Location_ID, Store_Name, City, Region, Country) VALUES
(1, 'Main Street', 'New York', 'Northeast', 'USA'),
(2, 'Second Avenue', 'Los Angeles', 'West', 'USA');
-- Insert data in Sales_Fact
INSERT INTO Sales_Fact (Sale_ID, Date_ID, Product_ID, Customer_ID, Location_ID, Quantity_Sold, Total_Amount) VALUES
(1, 1, 1, 1, 1, 2, 11.98),
(2, 2, 2, 2, 2, 3, 8.97);
Perform SQL queries for analysis
Once the tables were populated, I ran several SQL queries to analyze different aspects of the business. For example:
-- Sales Analysis by Customer
SELECT C.Customer_Name, SUM(S.Total_Amount) AS Total_Spent
FROM Sales_Fact S
JOIN Customer_Dim C ON S.Customer_ID = C.Customer_ID
GROUP BY C.Customer_Name
ORDER BY Total_Spent DESC;
-- Total Sales by Product
SELECT P.Product_Name, SUM(S.Quantity_Sold) AS Total_Quantity, SUM(S.Total_Amount) AS Total_Sales FROM Sales_Fact S JOIN Product_Dim P ON S.Product_ID = P.Product_ID GROUP BY P.Product_Name;
-- Total Sales per Month
SELECT D.Month, D.Year, SUM(S.Total_Amount) AS Total_Sales
FROM Sales_Fact S
JOIN Date_Dim D ON S.Date_ID = D.Date_ID
GROUP BY D.Month, D.Year
ORDER BY D.Year, D.Month;
-- Sales by Location
SELECT L.Store_Name, L.City, SUM(S.Total_Amount) AS Total_Sales
FROM Sales_Fact S
JOIN Location_Dim L ON S.Location_ID = L.Location_ID
GROUP BY L.Store_Name, L.City;
The use of the star schema facilitates the organization and analysis of data in a data warehouse, enabling the generation of efficient SQL queries. This demonstrates how a restaurant’s data can be structured and manipulated to obtain valuable information to support decision making and improve business performance.
Export the database to .csv
To import the data into Power BI, I first exported the database tables to .csv files. I used the following instructions in PostgreSQL:
COPY Sales_Fact TO '/path/to/sales_fact.csv' DELIMITER ',' CSV HEADER;
COPY Customer_Dim TO '/path/to/customer_dim.csv' DELIMITER ',' CSV HEADER;
COPY Product_Dim TO '/path/to/product_dim.csv' DELIMITER ',' CSV HEADER;
COPY Location_Dim TO '/path/to/location_dim.csv' DELIMITER ',' CSV HEADER;
COPY Date_Dim TO '/path/to/date_dim.csv' DELIMITER ',' CSV HEADER;
If your PostgreSQL has versioning or permissions problems, one solution is to do it by Console. You will need to install PSQL:
➜ psql -h localhost -d restaurants -U postgres
Enter your User and Password and once inside the “Restaurants” database do the query for each table in SQL:
restaurants=# \copy customer_dim TO '/Users/javierladino/Documents/Javier/sql/customer_dim.csv' DELIMITER ',' CSV HEADER;
Dashboard creation in Power BI
For a restaurant, the goal of the dashboard is to provide critical information to help make strategic decisions, optimize operations and improve customer satisfaction. Based on the star model and simulated data, we defined the measures and visualizations that would be the most important to include in a Power BI dashboard.
We imported the .csv files and configured the relationships between the tables according to the star schema. Then, we created several visualizations to check the data usage, such as:
- Total Sales by Customer: A bar chart showing total sales by each customer.
- Sales by Product Category: A pie chart showing the distribution of sales by product category.
- Sales by Location: A map showing sales in different locations.
- Sales Trends by Month: A line chart showing the sales trend over time.
But what questions could the restaurant manager ask the data analyst to know the main measures of his business? (This point could be at the beginning, but this is where I think it takes validity).
The restaurant manager can ask a series of questions to better understand the key measures of their business and make informed decisions. Some key questions the owner could ask, followed by the query in DAX (Power BI Data Analysis Expressions) to create the measures or columns respectively:
Sales and Performance Questions
- What are the restaurant’s total sales for a specific period (month, quarter, year)?
Measure: Total Sales (SUM(Sales_Fact[Total_Amount])) - What is the restaurant’s net profit after subtracting costs?
Size: Profit (SUM(Sales_Fact[Total_Amount]) – SUM(Sales_Fact[Cost])) - How do sales vary over time (day, week, month)?
Visualization: Line Graph for Total Sales per Month - What is the restaurant’s profit margin?
Size: Profit Margin (DIVIDE([Profit], [Total_Sales], 0))
Questions about Products and Categories
- What are the best selling products?
Metric: Sales by Product (SUM(Sales_Fact[Total_Amount]) grouped by Product_Dim[Product_Name]) - Which product categories generate the most revenue?
Metric: Sales by Category (SUM(Sales_Fact[Total_Amount]) grouped by Product_Dim[Category]) - How many products are sold on average per customer?
Metric: Average Products Sold per Customer (AVERAGE(Sales_Fact[Quantity_Sold]))
Questions about Customers
- What is the total consumption for each customer?
Measurement/Column: Total Consumption per Customer (SUM(Sales_Fact[Total_Amount]) filtered by Customer_ID) - What is the average ticket per customer?
Metric: Average Sales per Customer (AVERAGE(Sales_Fact[Total_Amount])) - Which customers spend the most?
Visualization: Table or Bar Chart with Customer_Name and Total_Consumption_per_Customer
Operations and Efficiency Questions
- During which hours of the day are the most sales generated?
Metric: Sales per Hour (SUM(Sales_Fact[Total_Amount]) grouped by Date_Dim[Hour]) - What is the sales performance per restaurant location (if multiple locations)?
Metric: Sales by Location (SUM(Sales_Fact[Total_Amount]) grouped by Location_Dim[Location_Name]) - Which days of the week have the most sales?
Metric: Sales by Day of the Week (SUM(Sales_Fact[Total_Amount]) grouped by Date_Dim[DayOfWeek])
Questions about Promotions and Discounts
- How do promotions affect sales?
Measure: Compare sales during promotional periods vs. non-promotional periods. - Which discounts generate more sales?
Metric: Sales by Discount Type (SUM(Sales_Fact[Total_Amount]) grouped by Promotion_Dim[Promotion_Type])
Examples of Detailed Questions
- “What has been the best-selling product in the last quarter and how much has it generated in sales?”
- “What is the average profit margin on our main dishes?”
- “Who are our most valuable customers and how much have they spent in total?”
- “What days of the week should we focus more on promotions to increase sales?”
- “How have our sales varied during different seasons of the year?”
- “Which restaurant location (if multiple) has the highest revenue and which has the lowest?”
- “What is the impact of our promotions on total sales and profit margin?”
- “What are the peak sales times during the day?”
- “What percentage of our total sales comes from our best-selling products?”
- “What is the average spend per customer and how does it vary between repeat and new customers?”
By including these measures and visualizations in your Power BI dashboard, you’ll get a comprehensive view of restaurant performance, covering aspects such as sales, profits, profit margins, and trends over time. They will allow you to identify opportunities for improvement, monitor performance, and make informed decisions for business growth.
What I did was to create a Dashboard in 5 sections: Home (Dashboard), Sales, Products, Customers and Operations. Some metrics or values are repeated, but I did it intentionally to improve the user experience by filtering the items in Power BI.
You will be able to interact with the dashboard by logging into your Power BI account:
Conclusions
The answers to these questions will provide the owner with a clear view of key areas of the business, helping them make strategic decisions to improve operations, optimize the menu, target promotions and increase profitability. These questions also help identify trends and patterns in customer behavior and restaurant performance.
Personally, this project allowed me to demonstrate my skills in designing and implementing a star schema, using SQL to manipulate and analyze data, and creating effective visualizations in Power BI. Key findings include for example:
- Identifying Key Customers: The data showed which customers contributed the most to total sales.
- Top Selling Products: Identifying the most popular product categories helped make informed inventory decisions.
- Performance by Location: Analyzing sales by location revealed the best performing areas.
In summary, this project not only showed my technical ability, but also provided valuable information for the management of the restaurant. Doing this kind of projects mixing several tools, making mistakes, researching and relying on all resources is the basis for further progress.
I hope you like it and can replicate it to practice, and do not hesitate to tell me what you think, if there are mistakes or discover new ways to analyze and visualize data. Greetings!