Mesa de trabajo 1_

THE CHALLENGE

Arrange all store into segments depending on their characteristics and predict sales for, existing and new, retail stores.

MY ROLE

Sole analist, Capstone of the Bussiness Analyst Udacity Nanodegree.

TOOLS

Alteryx and Tableau

CLUSTERING

A retail company currently has 85 grocery stores and is planning to open 10 more in the beginning of 2016. Up until now, the company has treated all stores equally, shipping the same amount of product to each store. This is beginning to cause problems as stores are suffering from product surpluses in some categories and shortages in others.

I was asked to provide analytical support to make decisions about store formats and inventory planning. I was given a dataset of store sales per category for the last 4 years, I decided to use the last year recorded, 2015, for the analysis. Using this data I determined what percentage of the total sales each category contributes in each store and 

Once the data was all ready to go, I decided to use K-Centroid method to create the clusters, it must be taken into account that the number of clusters must be low due to the fact that creating too many clusters would complicate things for the store and make costs rise, so I analyzed the possibility of having from 2 to 6 clusters.

First Steps-01

After the analysis the optimal  number of cluster was found to be  3 and the optimal method K - Means, this was found using the Adjusted Rand and the Calinski Harabasz Index. Finally every existing store was assigned to their respective clusters. These clusters were:

What Makes Cluster 1 Special

FIRST CLUSTER

FIRST CLUSTER

The first cluster has significant higher general merchandise (GM) sales and lower bakery sales than the other clusters.  This cluster could named  “No Cake, Just Things” in order to be more easily described.

The first cluster has significant higher general merchandise (GM) sales and lower bakery sales than the other clusters.  This cluster could named  “No Cake, Just Things” in order to be more easily described.

What Makes Cluster 2 Special

SECOND CLUSTER

SECOND CLUSTER

This cluster has a significantly higher floral and produce sales than the two other clusters, but it also has significantly lower dry grocery sales. This cluster could be named “Organic please!”.

This cluster has a significantly higher floral and produce sales than the two other cluster, but it also has significantly lower dry grocery sales. This cluster could be named “Organic please!”.

What Makes Cluster 3 Special

THIRD CLUSTER

SECOND CLUSTER

The final cluster is the “Meat & Sandwich Lover” cluster. Here the sales of meat and deli items are high when compared with the other stores.

A more complete visualization of the data can be found on Tableau Public. Right Here!

A more complete visualization of the data can be found on Tableau Public. Right Here!

ASSIGNING CLUSTERS TO NEW STORES

After creating the clusters for the existing stores I had to assign every new store to their respective cluster. However, since there is no sales data available for the new stores, I used the demographic data to assign each one to their correct cluster.

By creating several Decision Trees, Forest and Boosted models based on the demographic data of the existing stores I was able to choose the one that fit the data the best without overfitting the results or having bias towards a determined cluster. In the end the Boosted model fitted the data better. Using the demographic data of the new stores and the Boosted model, I obtained the expected clusters for the 10 new stores.

First Steps_2-01

PREDICTING SALES

Of all the products the retail company offers one of the most affected by shelf life and a short expiration date is produce. In order to give their customers fresh produce and avoid overstocking I had to predict the produce sales for the year 2016 so that the company can harvest and distribute it more precisely.

I created several ARIMA and ETS models, taking into account the trends and seasonality embedded in the data.  After testing against a holdout sample and iterating in order to get a more accuarate result. I picked the model that represented the data the best and perfomed better against the holdout sample, an ETS(M.N,M) model.

The results were:

_zMesa de trabajo 1

Using these predictive techniques the client can minize risk and know how high a profit they can expect.

Did you enjoy looking at this project? Here are some other projects you might enjoy as well.

Did you enjoy looking at this project? Here are some other projects you might enjoy as well.

Peipeile! - Web Design

Online sales platform for Peipeile dough outside of China.

Menu Rollout - A/B Testing

Decide whether or not a chain restaurant should launch a new menu.

Ticket Sauce - Web Design

UI and UX redesign of Ticket Sauce app.