Customer Lifetime Value, Present and Future
Customer lifetime value for a business is the monetary value associated with relationship with a customer, although there have been attempts to include non monetary value associated with a customer. It’s an important metrics to have for any marketing initiative e.g., customer retention. The metric is also useful if preferential treatment is to be given to high value customer, during various interactions with customers e.g. customer service.
In this post we will cover a Hadoop based solution for customer life time value. The solution starts with customer transaction history and computes customer lifetime value score using multiple map reduce jobs. The solution is part of open source project visitante. Some of the map reduce jobs in the solution workflow come from another open source project of mine called chombo.
Components of Customer Life Time Value
There is no general consensus as far as what constitutes a customer life time value (CLV). Often it’s defined as the projected potential value of a customer in a time horizon going into the future. Current value of a customer based on a time horizon into the past is some times included in CLV. Some times the present value is treated as a separate metrics called customer profitability (CP).
Our approach will be to include both the present and the future value in CLV as separate components. It’s along the line of the model proposed here. As we will see later, the final map reduce job does a weighted average of the different components for the final CLVscore. If the interest is only in the present value of a customer or only in the future value of a customer, they can be obtained by appropriately setting some of the weights to 0.
Here are the 3 different components of CLV in our solution. The first two are related to the present value and are purely deterministic. The third component is used for future value computation and is based on some simple statistical techniques.
- Average time gap between successive transactions
- Average transaction amount
- How recent the recent transactions are
For future value, we are using a simple solution based on recency of latest transactions.There are other more complex models e.g., discounted monetary value over a future time horizon.
Map Reduce Workflow
The workflow requires 4 map reduces jobs listed below executed in a sequence. The output of one Map Reduce goes as input to the next Map Reduce.
TransactionFrequencyRecencyValue | Generates avearge transaction time gap, transaction recency, average transaction amout |
TransactionRecencyScore | Converts transaction recency time gap to a score, based on normal distribution of transaction time gap |
Normalizer | Normalizes all fields, so thay are all on the same scale |
WeightedAverage | Calculates weighted average of all fields, which the the final CLV score |
Transaction Attributes Map Reduce
This Map Reduce job takes the transaction history of all customers and computes for each customer, average time gap between transactions, time elapsed since the most recent transaction and the average amount spent in a transaction.Here is some sample input.
0W34D5Z9SXOYUX,TN6B4RCY5T,2015-08-07 17:56:31,144.11 ASTJCL972T7OOM,NRA5F2LMOC,2015-08-07 18:21:31,65.81 L27Z1PYH4QG713,WLRZQCJOW9,2015-08-07 18:46:31,72.14 66ORZ67JU9I6BE,N12LS0CI3L,2015-08-07 19:15:31,71.83 5819CWQN2DSOX5,NRA5F2LMOC,2015-08-07 19:43:31,57.43
The fields in the input are
- Transaction ID
- Customer ID
- Transaction date time
- Transaction amount.
The output of this Map Reduce is as follows.
Y44442W65X,18,29,61.70 Y4PM4B56DS,15,20,63.63 YBF22E213G,14,31,59.29 YBQ8B66J3L,18,8,66.81 YQD63K9USS,11,21,62.40
The fields in the output are as follows
- Average transaction time gap
- Transaction recency
- Average transaction amount.
The transaction recency computation requires some explanation. A high value of this may indicate a churning customer or a customer who is making transactions less frequently and hence has less future potential value.
However, the recency value may not simply be the time elapsed since the most recent transaction, depending how the configuration parameter trf.recency.count is set. If set to 1, it will be the time elapsed since the most recent transaction.
If we consider only the most recent transaction, this quantity will have a large variance,since reference time or time horizon end time could fall anywhere with respect to the most recent transaction. It could be right after the most recent transaction or long after the most recent transaction.
It’s recommended that the parameter trf.recency.count be set to 2 or higher to reduce the variance. When the parameter is 2 or higher it’s a function of the time elapsed since the most recent transaction and also the time gaps between the some of the recent transactions. It can also be thought as effective elapsed time.
tee = (current time – r th recent transaction time) / r
tee = effective elapsed time
r = 1 for most recent transaction, 2 for second most recent and so forth i.e. the parameter trf.recency.count
Using a higher value of r has the effect of nudging the effective elapsed time more towards the average time gap and hence reduce the variance.
The time horizon for the analysis starts with date time of the earliest transaction and ends with the date time of the latest transaction among all customers.This map reduce job should be run as close as possible to the latest transaction time in the data set. Otherwise we will introduce a large bias in the recency value.
If we can not control the execution time of the map reduce job, the configuration parameter trf.recency.ref.date should be set for the end date time of the time horizon. if this parameter is not set, current date time is used as the time horizon end.
Recency Score Map Reduce
This Map Reduce job has only a mapper and it’s purpose is to convert the recency value computed in the previous Map Reduce to a recency score. As discussed earlier, a higher value of recency or effective elapsed time implies a lower future potential value.
As more time elapses since the most recent transaction for a customer, lower the probability that the customer will come back for another transaction. We make use of the transaction time gap probability density function and assume it to be a normal distribution.
The probability of a customer returning and making another transaction is the area under the probability density function above tee as below, which is the recency score.
P(tee) = ∫ p(t) | t = tee to ∞
P(tee) = probability of transaction after tee has elapsed since most recent transaction
p(t) = probability density function of transaction gap
The output of this Map reduce is similar to the last one, except that the recency value is replaced by a recency score as computed above.
The probability density function used above is based on transaction time gap mean and standard deviation of the whole customer population. It’s likely will be more accurate to segment the customer and use a segment specific normal distribution, instead of the global distribution.
Instead of normal distribution, which is the default for the transaction time gap, we could also use any of the following alternatives. Currently, normal and exponential distribution are supported. The distribution type could be chosen by setting the configuration parameter trs.trans.gap.prob.distr to either normal or exponential.
- Exponential distribution.
- Non parametric distribution based on the histogram of transaction time gap.
Normalization Map Reduce
This Map Reduce job normalizes all 3 fields (average transaction time gap, recency score, average transaction amount) in the output of the previous Map Reduce, so that they are all on the same scale.
Two normalization strategies are supported, minmax and zscore. The zscore strategy is more immune to outliers. For our use case, we have used the minmax strategy, which uses the range value to normalize.
This Map Reduce job can do some limited data transformation, before normalization. We needed to do additive inverse of average transaction gap, because a a lower value of the transaction time gap should contribute towards a higher CLV score. The transformation is as follows:
ttg = ttgmax – ttg
ttg = average time gap between successive transactions
It’s noteworthy that the normalization is done in one pass in this Map Reduce. The stats results are made available to the reducer, before the data, with appropriate usage of composite mapper output key.
The output is similar to the last Map reduce, except that all field values are scaled and lie between 0 and whatever is set for scale through configuration.
Weighted Average Map Reduce
This is a general purpose Map Reduce job that calculates weighted average of a set of fields . Using this Map Reduce we calculate the final weighted CLV score based on weights assigned to each of the 3 fields which are
- normalized average transaction time gap
- normalized recency score
- normalized average transaction amount
Present value of a customer can be obtained by setting the weight for the third field to 0. Similarly if only the future value of a customer is desired, it can be obtained by setting the weights for the first two fields to o. Here is some sample output. It contains customer ID followed by the CLV score.
Z52WSE7X0Q,83 F8KV2611X4,80 822XH1ZV14,70 74S0SP0PL2,69 405B240B40,67
The output is sorted by descending order of CLV score. What’s being shown is a set of customers with high CLV score, starting with the customer with the highest CLV score.
Variance of Transaction Time Gap and Amount
In our solution, we have used average values for transaction time gap and transaction amount. However variance also could also be an important factor. We all know that businesses don’t like uncertainty and a lower variance in customer behavior reduces uncertainty.
A lower variance for transaction time gap and amount imply better predictability for business revenue and hence should translate to a better CLV score. These two additional factors could easily be incorporated with some enhancement to the first Map Reduce job.
Life Time or Life Cycle
We have loosely used the term lifetime. However the life time of a customer’s relationship with a business may contain multiple life cycles. A customer may return after a long absence. A life cycle is defined as series of customer activity where the time gap between activities is less than some time interval threshold. A customer’s behavior might also change between life cycles, especially if there was a large gap between two successive life cycles because of changing circumstances.
Any significant time gap e.g., 6 months in a customers activity may indicate a boundary between two life cycles. There is an implied assumption in our analysis that the transaction data being used is for the current life cycle.
Some pre processing tools should be used to discard all transaction data unless it belongs to the current active life cycle. It should also discard data for churning customer i.e. customers who had life cycles in the past, but there is no current active life cycle. I am working on implementing such a tool based on Map Reduce. basically it’s a temporal clustering problem, where each cluster represents a life cycle.
Alternative Model for Future Value
We have used a model for future value based on transaction recency which is a function of the elapsed time for some recent transaction. As mentioned earlier, another popular model is discounted value over a future time horizon. It’s function of the following quantities
- Time horizon size
- How far into the life cycle a customer is in
- Probability distribution of life cycle size
- Discount rate
I have plans to implement this model as an alternative for the future component of the CLV. This is a better model for computing future value for customer of subscription based services.
Customer Churn
If we compute CLV based on future value only using the recency model, it’s a good indicator of a churning customer. A lower future value indicates a churning customer.
However, it’s a predictor of customer churn as the churn is in progress and it’s value may be of limited use. Because it’s preferable to predict churn as early as possible using other behavioral attributes of a customer before the onset of the actual churn.
Summing Up
Customer lifetime value is an important metric that any retailer will be interested in for any kind of planning, marketing or otherwise. We have gone through a Hadoop Map Reduced based solution implementation. The use case in this post can be run following the steps outlined in the tutorial document.