Kaan Kıvılcım
Kaan Kıvılcım - 21 October 2022
Junior Data Scientist

In today’s world, after the very well known pandemic we had through, the e-commerce sector rose like the sun. With this raise, websites of almost every brand from relatively small businesses to the largest businesses have gained a lot more popularity and their traffic has increased almost 50%. According to the International Trade Administration (2021), an average of 19% increase in e-commerce revenue is forecasted (26% in Food & Personal Care products) after the pandemic.

These growth statistics and developments tell us one significant thing that the businesses should start to give more attention (and more budget) to their e-commerce/marketing departments and their operations. 

When e-commerce is mentioned, the first thing that comes to mind is obviously websites. Products are being tried to be exhibited in websites in such ways that help business owners to sell more products and gain more revenue from their most valuable resources which are customers. 

There are tens of ways to take the attention of customers and give them the intent of simply buying more. In this article, I will try to explain a method called “Basket Analysis”.

What is Basket Analysis?

Basket Analysis is a method that research and study on the baskets (carts) of customers in the website and analyse them to offer meaningful and customised product suggestions to the customers. Before getting through to the technical part of the analysis, there are some more things that we should better mention.

  • Every customer is different as well as their purchase behaviour.
  • Every product is different. However, some of them are used and bought together.
  • In some situations, very unrelated products are sold together and human eyes sometimes cannot determine these ones.

Exactly here, artificial intelligence and machine learning enter the stage.

Let’s dive deeper into algorithms.

Market Basket Analysis

Apriori Algorithm

The Apriori algorithm has been in our minds since 1994 and it helps us find frequent item sets in a dataset for boolean association rules. Name of the algorithm is Apriori because it uses prior knowledge of frequent item set properties.

In this algorithm, as mentioned above, the dataset must include products that are frequently bought. The data we need to apply this algorithm includes the following columns:

  • Transaction ID (Basket ID)
  • Product SKU (Product ID)
  • Product Category
  • Quantity

After we obtain the necessary data, the magic starts.

This algorithm can be written and applied by R Studio, Python etc.

Since our data is breakdowned and thus has duplicate transaction ID’s, first we need to group the data by transaction ID and learn every distinct product that has been sold (obviously added to basket before the payment step) in that specific transaction.

After this is done, we get dummy variables of all products and create new columns for each of them. For every unique transaction row, the quantity of the product is written to the cell of its own column and transaction. A sample processed data can be seen below:

Transaction IDProduct AProduct BProduct CProduct D
123abc7300
456def2011

Machine learning has its own rules, obviously. 

In order to analyse this data and have meaningful insights, we need to encode the cells into 1-0 to determine which product is added to the basket and bought in that specific transaction. The reason to do this is that Apriori Algorithms takes only 1 and 0 values to determine the association between products without any bias such as unrelated quantities. Consider that we are only interested in products that are being sold together.

Finally, before we apply the model, we yield the below data:

Transaction IDProduct AProduct BProduct CProduct D
123abc1100
456def101

1

We use the “frequent_patterns“ tool from “mlxtend” library and import “apriori” and “association_rules” packages to apply the model in an optimised and fast way.

After the necessary parameters are adjusted in the model according to the specific dataset and specific purpose, we get the results as a table below:

AntecedentsConsequents

Antecedent

Support

Consequent

Support

SupportConfidenceLiftLeverageConviction
Product AProduct B0.40.60.50.832.780.0221.67
Product AProduct C0.40.30.450.652.110.11.12
Product BProduct D0.60.50.50.621.98-0.321.43

Note: Values are randomly generated due to privacy issues.

Market Basket Analysis

Results:OK. But What do they Mean?

Here, the most important metrics that we should consider are “support” and“confidence” values. However, you can read the explanations below for a better understanding.

  • Antecedent Support: The rate of the presence of antecedent products over all.
  • Consequent Support: The rate of the presence of consequent products over all.
  • Support: The rate of presence of antecedent product and consequent product being together in basket over all.
  • Confidence: The confidence rate of products’ being together in the same basket.
  • Lift: Confidence over expected confidence.
  • Leverage: The statistical independency rate of a specific basket according to including products in it.
  • Conviction: Gets higher when the consequent product is highly dependent on the antecedent product.

After we yield the result table, we can start to analyse the results. How we do this analysis is according to some statistical methods. We should determine a threshold value for the “confidence” metric and split the rows into two parts: Meaningful or Not.

When the confidence value is more than the threshold value, say 0.6, we can conclude that this relationship between products is meaningful and customers frequently buy these products together.

Market Basket Analysis

Where to Use these Results?


Where to use these results is another challenge.Businesses usually use this information for suggestion algorithms and shelf design.
For instance, Product B is suggested to the customer who has just added Product A to his/her basket because the confidence level of these products is higher than our threshold value. Thus, the possibility of customer’s missing, forgetting or just not being interested in Product B is decreased and thus, we are being able to direct the customer to buy Product A.
Secondly, shelf design (product listing pages in our case) can be conducted and applied to our website according to the results. For instance, Product A and Product B are located near each other to remind customers that they can buy them together (because they generally do it, don’t they?!).Thirdly, campaign scenarios can be set up for customers. For instance, Product B is presented with discounted prices for those customers who add Product A to their baskets or simply buy them before.
Last but not least, the results of this analysis can help the business owners and marketers to design their offline (physical) stores’ shelves. Just like the product listing pages in websites, stores shelves can also be designed in such a way that customers can see related and frequently bought products together.
In these ways, sale amount, order amount, revenue, traffic that website gains and key performance indicators like these may be increased. Besides, product & marketing costs can be allocated according to the results.

Our Similar Articles in The Product Analytics & Data Category

Product Scoring Algorithm
Product Scoring Algorithm

In this article, we'll talk about the Data Science & Insights team, the product scoring algorithm structure, its applications in the digital marketing industry.

Read more
React & Google Analytics: How to Integrate GA4 in React?
React & Google Analytics: How to Integrate GA4 in React?

Click to learn how to integrate Google Analytics 4 in your any React app step by step.

Read more
We are waiting for you! Contact us now to meet our multidimensional digital marketing solutions.