Loaded Dice and Weighted Coins

Predicting App User Behavior at Localytics

As any app publisher can tell you, user churn is a major concern. And even with a robust, data-driven marketing solution, churn can be a particularly tricky problem to diagnose and treat because it trumps the logic of descriptive analytics: Once you’ve observed users to churn, it’s already too late to save them.

Earlier this month we introduced Localytics Predictions to solve this problem. With Predictions, our customers have the ability to forecast how app users are likely to behave in the future, and preemptively communicate with users likely to churn based on that forecasted behavior. We describe here some of the approaches we take to predicting user behavior.

Likelihood Segments

Localytics Predictions groups active users into High, Medium, and Low likelihood segments which reflect how likely they are predicted to behave in a specific way. The methodology used to generate likelihood segments follows a unique, two-step process that combines techniques from both statistical modeling and machine learning.

As with any prediction, model accuracy is not guaranteed. We do, however, optimize all modeling towards minimizing the Root-mean-square error between the number of event occurrences our model predict users will have and the actual number of event occurrences we observe users to have. As we’ll see later on, this prediction around the number of event occurrences users will have is the core modeling output that we use to determine which users will fall into High, Medium, and Low likelihood segments.

Dead or Alive?

The first step of our prediction methodology looks at historical user behavior. Specifically, we look at the recency, frequency, and lifetime of relevant events performed by each user. The intuition here is that more recent, more frequent, users with a longer lifetime are more likely to remain active than less recent, less frequent, and newer users. Concretely, a user who has made a purchase at least once every week for 24 weeks, but last purchased 3 weeks ago, has likely churned. On the other hand, a user who made a purchase once per month in the last quarter, but last purchased 3 weeks ago, is likely still alive.

Here’s an example: let’s define a churn prediction as users who will not perform a Checkout event in 30 consecutive days. We’ll look at the first occurrence, recency, and frequency of Checkout events performed by each user that was active 31-60 days ago; and then assign each user a binary value of either ‘active’ or ‘churned’ by looking at whether or not they performed the Checkout event from 0-30 days ago.

 

We then fit a beta distribution, which is particularly convenient for modeling binary outcomes, to users’ observed recency, frequency, and lifetime data. The result is a specific beta distribution which is applied to current active users’ recency, frequency, and lifetime data to determine the probability that they will be active in the next 30 days. This step is roughly equivalent to a coin flip -- only weighted towards heads or tails (i.e. churned or active) depending on each user’s past behavior.

 

For each user, we also observe a discrete non-negative value which corresponds to the number of Checkout events they performed from 0-30 days ago.

 

The observed number of event occurrences per user are used to fit a Poisson distribution, which is convenient for modeling discrete outcomes, to the same users’ recency, frequency and lifetime data. The result is a specific Poisson distribution that can be applied to current active users and give us a value for how many Checkout events each user is expected to perform in the next 30 days. This step is roughly equivalent to a dice roll -- only loaded towards specific sides of the dice (e.g. 6 expected future event occurrences) depending on each user’s past behavior.

 

We then combine the results of both the weighted coin toss and loaded dice roll to get a final, continuous value for each user. Rather than representing something concrete like probability of churning or expected number of future event occurrences, this final value is abstract and serves to rank users along a continuous range so that they can be split into more interpretable segments.

Determining Optimal Splits

Now that we have a continuous range on which we can plot users, we now need to find the optimal thresholds to split users into High, Medium, and Low segments. To do this, we find a primary threshold to split users into two groups such that we minimize impurity, giving us a nice,natural balance between group size and sensitivity.

Each of these two groups are then split again into another two groups, for a final total of four groups. The secondary splits are simply around whether users fall above or below the median value from each of the two original groups.

 

In the case of churn predictions, the two groups with the greatest average values combine to form our Low likelihood segment. The remaining two groups form our Medium and High likelihood segments. For conversion predictions, the two groups with the lowest average values combine to form our Low likelihood segment. The remaining groups form our Medium and High likelihood segments. In this way, we emphasize the granularity of those segments most vulnerable to churn or least likely to convert.

Towards a More Predictable Future

The techniques we’ve employed to date have been fairly basic, but accurate. In fact, it’s rooted in the same methodologies that have powered lifetime value predictions for some of the world’s largest organizations for decades.  It turns out that analytics - like many things - directionally follows an 80/20 rule where the majority of the value is derived from a minority of the work and complexity. 90% of the value of analytics is in counting things; the residual 10% value comes from fancy things like forecasts and advanced analyses.

We believe that residual value still represents a huge opportunity for our customers. The early success we’ve observed from churn prediction is pushing us to invest more in data science and machine learning techniques. Marketers have a tremendous amount to gain from easily being able to turn on things like forecasts of time-series data, detection of anomalous behavioral patterns, and automated clustering of their users.

If you’re excited by the prospect of working with some of the largest publishers in the app stores, the billions of data points they send Localytics every day, and want to be a part of this future, join us! We’re hiring data scientists.

X