Measuring Cross-Product Adoption Utilizing dbt_set_similarity | by Matthew Senick | Dec, 2024

Enhancing cross-product insights inside dbt workflows

For multi-product firms, one crucial metric is usually what is named “cross-product adoption”. (i.e. understanding how customers have interaction with a number of choices in a given product portfolio)

One measure instructed to calculate cross-product or cross-feature utilization within the fashionable e-book Hacking Development [1] is the Jaccard Index. Historically used to measure the similarity between two units, the Jaccard Index may also function a strong instrument for assessing product adoption patterns. It does this by quantifying the overlap in customers between merchandise, you’ll be able to determine cross-product synergies and progress alternatives.

A dbt bundle dbt_set_similarity is designed to simplify the calculation of set similarity metrics straight inside an analytics workflow. This bundle offers a technique to calculate the Jaccard Indices inside SQL transformation workloads.

To import this bundle into your dbt challenge, add the next to the packages.yml file. We will even want dbt_utils for the needs of this articles instance. Run a dbt deps command inside your challenge to put in the bundle.

packages:
- bundle: Matts52/dbt_set_similarity
model: 0.1.1
- bundle: dbt-labs/dbt_utils
model: 1.3.0

The Jaccard Index, also referred to as the Jaccard Similarity Coefficient, is a metric used to measure the similarity between two units. It’s outlined as the scale of the intersection of the units divided by the scale of their union.

Mathematically, it may be expressed as:

The Jaccard Index represents the “Intersection” over the “Union” of two units (picture by creator)

The place:

  • A and B are two units (ex. customers of product A and product B)
  • The numerator represents the variety of components in each units
  • The denominator represents the entire variety of distinct components throughout each units
(picture by creator)

The Jaccard Index is especially helpful within the context of cross-product adoption as a result of:

  • It focuses on the overlap between two units, making it splendid for understanding shared consumer bases
  • It accounts for variations within the whole dimension of the units, making certain that outcomes are proportional and never skewed by outliers

For instance:

  • If 100 customers undertake Product A and 50 undertake Product B, with 25 customers adopting each, the Jaccard Index is 25 / (100 + 50 — 25) = 0.2, indicating a 20% overlap between the 2 consumer bases by the Jaccard Index.

The instance dataset we might be utilizing is a fictional SaaS firm which presents space for storing as a product for shoppers. This firm offers two distinct storage merchandise: doc storage (doc_storage) and photograph storage (photo_storage). These are both true, indicating the product has been adopted, or false, indicating the product has not been adopted.

Moreover, the demographics (user_category) that this firm serves are both tech lovers or owners.

For the sake of this instance, we’ll learn this csv file in as a “seed” mannequin named seed_example throughout the dbt challenge.

Now, let’s say we wish to calculate the jaccard index (cross-adoption) between our doc storage and photograph storage merchandise. First, we have to create an array (listing) of the customers who’ve the doc storage product, alongside an array of the customers who’ve the photograph storage product. Within the second cte, we apply the jaccard_coef operate from the dbt_set_similarity bundle to assist us simply compute the jaccard coefficient between the 2 arrays of consumer id’s.

with product_users as (
choose
array_agg(user_id) filter (the place doc_storage = true)
as doc_storage_users,
array_agg(user_id) filter (the place photo_storage = true)
as photo_storage_users
from {{ ref('seed_example') }}
)

choose
doc_storage_users,
photo_storage_users,
{{
dbt_set_similarity.jaccard_coef(
'doc_storage_users',
'photo_storage_users'
)
}} as cross_product_jaccard_coef
from product_users

Output from the above dbt mannequin (picture by creator)

As we will interpret, it appears that evidently simply over half (60%) of customers who’ve adopted both of merchandise, have adopted each. We are able to graphically confirm our outcome by inserting the consumer id units right into a Venn diagram, the place we see three customers have adopted each merchandise, amongst 5 whole customers: 3/5 = 0.6.

What the gathering of consumer id’s and product adoption would appear like, verifying our outcome (picture by creator)

Utilizing the dbt_set_similarity bundle, creating segmented jaccard indices for our completely different consumer classes ought to be pretty pure. We are going to comply with the identical sample as earlier than, nevertheless, we’ll merely group our aggregations on the consumer class {that a} consumer belongs to.

with product_users as (
choose
user_category,
array_agg(user_id) filter (the place doc_storage = true)
as doc_storage_users,
array_agg(user_id) filter (the place photo_storage = true)
as photo_storage_users
from {{ ref('seed_example') }}
group by user_category
)

choose
user_category,
doc_storage_users,
photo_storage_users,
{{
dbt_set_similarity.jaccard_coef(
'doc_storage_users',
'photo_storage_users'
)
}} as cross_product_jaccard_coef
from product_users

Output from the above dbt mannequin (picture by creator)

We are able to see from the info that amongst owners, cross-product adoption is greater, when contemplating jaccard indices. As proven within the output, all owners who’ve adopted one of many product, have adopted each. In the meantime, solely one-third of the tech lovers who’ve adopted one product have adopted each of the merchandise. Thus, in our very small dataset, cross-product adoption is greater amongst owners versus tech lovers.

We are able to graphically confirm the output by once more creating Venn diagram:

Venn diagrams cut up by the 2 segments (picture by creator)

dbt_set_similarity offers an easy and environment friendly technique to calculate cross-product adoption metrics such because the Jaccard Index straight inside a dbt workflow. By making use of this technique, multi-product firms can achieve precious insights into consumer habits and adoption patterns throughout their product portfolio. In our instance, we demonstrated the calculation of general cross-product adoption in addition to segmented adoption for distinct consumer classes.

Utilizing the bundle for cross-product adoption is just one easy software. In actuality, there exists numerous different potential functions of this method, for instance some areas are:

  • Characteristic utilization evaluation
  • Advertising and marketing marketing campaign influence evaluation
  • Assist evaluation

Moreover, this fashion of study is definitely not restricted to simply SaaS, however can apply to just about any trade. Blissful Jaccard-ing!

References

[1] Sean Ellis and Morgan Brown, Hacking Development (2017)

Sources

dbt bundle hub