Keywords

1 Introduction

We aim to summarize customer journeys. A customer journey is a collection of interactions (or touchpoints) between a customer and a firm. A journey can be as simple as a single activity (e.g., ‘looking at a product’), but can also involve complex interactions. Concretely, a challenge faced by many practitioners is to make sense of the–potentially infinite–combination of activities that exist in order to consume a service. As a response, new methods to understand, design, and analyze customer journeys are emerging from the industry and are becoming increasingly popular among researchers. One of these methods, which is the focus of this paper, is the Customer Journey Map (CJM). A CJM is a conceptual tool used for better understanding customers’ journeys when they are consuming a service. A CJM depicts typical journeys experienced by the customers across several touchpoints. To represent a CJM, we use a visual chart showing the basic components of a CJM; namely, the touchpoints, and the journeys. It possesses two axis: the y-axis lists the touchpoints in alphabetical order, while the x-axis represents the ordering of the sequence of touchpoints. Dots connected with a smooth line represent a journey.

Fig. 1.
figure 1

A fraction of the dataset–612 sequences out of 123’706–displayed on a CJM, and , a CJM that summarizes the entire dataset using three representatives.

The left part of Fig. 1 display a CJM containing all the observed journeys from customers while the right part shows how our algorithm helps in reducing a CJM’s complexity by building three representative journeys that best summarize the entire dataset. Summarizing thousands of journeys using few representatives has many compelling applications. First, it allows a business analyst to discover the common customers’ behaviors. Second, the representative journeys extracted from data can also serve as a basis to fuel the discussion during workshops and complement strategic CJM built by internal stakeholders. Third, by assigning each journey to its closest representative, we turn a complex type of input data into a categorical one. The latter can be used to complete the traditional types of data that are used to perform behavior segmentation (e.g., usage volume, loyalty to brand).

Our work contributes by: (1) clarifying the customer journey discovery activity which has, to the best of our knowledge, never been defined, (2) proposing ground truth datasets, which are particularly suited for evaluating this activity, and (3) introducing a genetic algorithm to discover representative journeys. Using the proposed datasets and existing cluster analysis techniques, we demonstrate that our approach outperforms state-of-the-art approaches used in social sciences to summarize categorical sequences.

The paper is organized as follows. Section 2 defines the customer journey discovery activity while Sect. 3 provides an overview of existing techniques closely related to this task. Section 4 introduces our genetic algorithm. In Sect. 5, we evaluate the results. Finally, Sect. 6 concludes the paper.

2 Customer Journey Discovery

The customer journey discovery activity can be described with the following definition: given a set of actual journeys, find a reasonable amount of representative journeys that well summarize the data.

Definition 1

(Touchpoint): a touchpoint is an interaction between a customer and a company’s products or services [2]. ‘Sharing on social network’ or ‘ordering a product on the website’ are two typical examples of touchpoints in a retail context. Let t be a touchpoint, and let T be the finite set of all touchpoints.

Definition 2

(Journey): A journey J is a sequence of touchpoints. For instance, J = {\(\langle \)‘Visiting the shop’, ‘Testing the product’, ‘Sharing on social network’\(\rangle \)} is a journey with three touchpoints. For the sake of brevity, we replace the touchpoints with alphabetical characters so that J becomes \(\langle \)ABC\(\rangle \).

Definition 3

(Actual Journey): An event log \(J_a\) is the set of all actual journeys observed from customers.

Definition 4

(Event Logs): Let \(J_{\mathcal {A}}\) be an event log, the set of all actual journeys observed by customers.

Definition 5

(Representative Journey): A representative journey, \(J_{r}\), is a journey that summarizes a subset of actual journeys \(\in J_{\mathcal {A}}\).

Definition 6

(Customer Journey Map): A CJM summarizes customer journeys through the use of representative journeys. Let a customer journey map \(J_{\mathcal {R}}\) be the set of all the \(J_{r}\mathrm{s}\) summarizing \(J_{\mathcal {A}}\). Let \(k_{\mathcal {R}}\) denotes the number of journeys in a map (i.e., \(|J_{\mathcal {R}}|\)). Typically, the part of Fig. 1 is a CJM, \(J_{\mathcal {R}}\), containing three representative journeys, \(J_{r}\) (\(k_{\mathcal {R}} = 3\)), summarizing an event log \(J_{\mathcal {A}}\).

Discovering \(J_{\mathcal {R}}\) from \(J_{\mathcal {A}}\) is an unsupervised clustering task that entails interesting challenges. First, there is no work in the literature that deals with the optimal number of \(k_{\mathcal {R}}\). Second, the sequence that best summarizes its assigned actual journeys needs to be found.

3 Related Work

In [1], we propose a web-based tool to navigate CJMs, which is called CJM-ex (CJM-explorer). It uses a hierarchical structure so that the first layers show only the most representative journeys, abstracting from less representative ones.

Fig. 2.
figure 2

Expected BPM and CJM when L = \([\langle \textsc {abab}\rangle ,\langle \textsc {dce}\rangle ,\langle \textsc {cde}\rangle ]\).

In [2], we show that the process mining framework as well as the input data are relevant in a customer journey analytics context. Similar to the process discovery activity in process mining, this work focuses on the discovery of a model from event logs. Our work was inspired by the approach in [3, 6, 14] where the authors propose discovering business process models (BPMs) using a genetic algorithm. A BPM and a CJM is not used for the same purpose. Fundamentally, the goal of a BPM is to find models that best describe how the processes are handled. In contrast, the aim of a CJM is to help internal stakeholders to put themselves in their customers’ shoes. Figure 2 shows that these models convey different type of information. A CJM depicts journeys as experienced by customers while a BPM shows the available combination of activities using advanced constructs such as exclusive or parallel gateways. Overall, CJMs are used to supplement but not to replace BPMs [12].

In fact, our work is closer to the summarization of categorical sequences used in social sciences. In particular, in [8, 9], Gabadinho et al. propose summarizing a list of observed sequences (i.e., actual journeys) using representative (i.e., representative journey). They define a representative set as “a set of non-redundant ‘typical’ sequences that largely, though not necessarily exhaustively, cover the spectrum of observed sequences” [8]. We argue that their definition matches our definition of a representative sequence summarizing a set of sequences. The authors propose five ways to select a representative. The ‘frequency’ (1), where the most frequent event is used as the representative. The ‘neighborhood density’ (2), which consists of counting the number of sequences within the neighborhood of each candidate sequence. The most representative is the one with the largest number of sequences in a defined neighborhood diameter. The ‘mean state frequency’ (3): the transversal frequencies of the successive states is used to find a representative using the following equation:

$$\begin{aligned} MSF(s) = \frac{1}{\ell } \sum _{i=1}^{\ell }{f_{si}} \end{aligned}$$
(1)

where

figure a

The sum of the state frequencies divided by the sequence length becomes the mean state frequency. Its value is bounded by 0 and 1 where 1 describes a situation where there is only one single distinct sequence [8]. The sequence with the highest score is the representative. The ‘centrality’ (4): the representative – or medoid – can be found using the centrality. Being the most central object, the representative is the sequence with the minimal sum of distances to all other sequences. Finally, the ‘sequence likelihood’ (5): the sequence likelihood of a sequence derived from the first-order Markov model can also be used to determine the representative. In the evaluation section, we compare our genetic approach with these five techniques using their own implementation of the package Traminer available in R [7].

Fig. 3.
figure 3

Overview of the genetic algorithm to discover CJMs

4 Genetic Algorithm for CJM Discovery

As an introduction, Fig. 3 provides intuition on how the genetic algorithm builds a CJM such as the one visible in Fig. 1 (part ). A set of actual journeys, \(J_{a}\), is provided to the algorithm. Then, during Generation 1, we build p number of \(J_{\mathcal {R}}\), where p is the population size; i.e., the number of CJMs that will be evaluated after each generation. In our example \(p=3\). Each \(J_{\mathcal {R}}\) is evaluated and we keep the e best \(J_{\mathcal {R}}\mathrm{s}\), called the elite set while we discard the other (\(p-e\)) \(J_{\mathcal {R}}\mathrm{s}\). Then, we move to Generation 2, where we keep an untouched version of the e number of \(J_{\mathcal {R}}\mathrm{s}\) in the elite set. For instance, \(J_{\mathcal {R}2}\) was kept in Generation 2 because it has the best average quality in Generation 1, i.e., \(J_{\mathcal {R}2}\) is intact in Generation 2. We then apply some transformation (to be discussed next) to generate (\(p-e\)) new \(J_{\mathcal {R}}\mathrm{s}\) that will, in turn, be evaluated. In our case, we generate \(J_{\mathcal {R}4}\) and \(J_{\mathcal {R}5}\). We recursively transform and evaluate the p number of \(J_{\mathcal {R}}\mathrm{s}\) until a stop** criterion is met. Once a stop** criterion is met, we return the best \(J_{\mathcal {R}}\). The best \(J_{\mathcal {R}}\) can be interpreted as the best set of representative journeys, \(J_r\), representing a set of journeys from \(J_{\mathcal {A}}\) that have been found given a certain evaluation criterion. The next section describes how we generate the initial population, what various types of operations we apply on each \(J_{\mathcal {R}}\) to transform them, and how we evaluate each one of them given a set of journeys in \(J_{\mathcal {A}}\).

4.1 Preprocessing

To gain in efficiency, we make the assumption that \(J_{\mathcal {R}}\) will be close to the frequent patterns observed in \(J_{a}\). Let \(Top_{\ell _n}\) be the n most occurring pattern of length \(\ell \) and \(Top_n \supseteq Top_{\ell _{[2,m]}}\) be the superset of all the most occurring patterns of lengths 2 to m. \(Top_n\) is used later to form the initial population of \(J_{\mathcal {R}}\) (described in Sect. 4.2), and to add a random journey to \(J_{\mathcal {R}}\). Using \(Top_n\) we avoid generating journeys by picking a random number of touchpoints from T. According to our experiments, using \(Top_n\) reduces the execution time by two to get an output \(J_{\mathcal {R}}\) of the same average quality.

4.2 Initial Population

The initial population is generated by adding a sequence randomly picked from \(Top_n\) (defined in Sect. 4.1).

4.3 Assign Actual Journeys

The quality of a representative journey can only be measured when knowing which actual journeys it represents. Hence, a first step toward evaluating the quality of \(J_{\mathcal {R}}\) is to assign each journey \(J_a \in J_{\mathcal {A}}\) to its closest journey in \(J_r \in J_{\mathcal {R}}\). To characterize the closeness between \(J_a\) and \(J_r\), we use the Levensthein distance borrowed from [11]. It is a metric particularly well suited to measure the distance between sequences. The Levensthein distance counts the number of edit operations that are necessary to transform one sequence into another one. There are three types of operations: deletions, insertions, and substitutions. For instance, the distance between \(\langle \textsc {abc} \rangle \) and \(\langle \textsc {acce} \rangle \) is 2 since one substitution and one insertion are required to match them. We define the closest representative as the one having the smallest Levensthein distance with the actual journeys. Note that if a tie occurs between multiple best representatives, we assign the \(J_a\) to the \(J_r\) having the smallest amount of actual journeys already assigned to it. Once each actual journey has been assigned to its closest representative, we can evaluate \(J_{\mathcal {R}}\) using the criteria described in the next section.

4.4 CJM Evaluation Criteria

This section introduces the evaluation criteria used to determine the quality of each \(J_{\mathcal {R}}\), namely, (1) the fitness, (2) the number of representatives, and (3) the average quality.

Fitness. The fitness measures the distance between each sequence of activities \(J_{a}\) and its closest representative \(J_{\mathcal {R}}\) using the Levenshtein distance [11].

$$\begin{aligned} Fitness(J_{a}, J_{\mathcal {R}}) = 1-\frac{\sum _{i=1}^{|J_{a}|}min_{j=1}^{|J_{\mathcal {R}}|}({Levensthein(\mathbf {\sigma _{\mathcal {A}_i}}}; \mathbf {\sigma _{\mathcal {R}_j}}))}{\sum _{i=1}^{|J_{a}|}{Length(\sigma _{\mathcal {A}_i})}} \end{aligned}$$
(2)

where

figure b

A fitness of 1 means that the representative journey perfectly catches the behavior of the actual journeys assigned to it. In contrast, a fitness close to 0 implies that many edit operations are necessary to match the sequences.

Number of Representatives. If we maximize the fitness without trying to keep a low \(k_{\mathcal {R}}\), the CJM will become unreadable because too many representative journeys will be displayed in it. In other words, \(J_{\mathcal {R}}\) overfits. Hence, the goal is to find a \(k_{\mathcal {R}}\) that offers a good compromise between underfitting and overfitting. Finding the optimal number of clusters is a recurrent challenge when clustering data. We propose integrating traditional ways of determining the optimal number of clusters, such as the Bayesian information criterion [13], or the Calinski-Harabasz index [5]. The idea is to evaluate a range of solutions (e.g., from 2 to 10 journeys) and to keep the best solution. Let \(k_{h}\) be the optimal number of clusters returned by one of the techniques mentioned above. By integrating \(k_{h}\) into the evaluation, we can guide the solution toward a \(k_{\mathcal {R}}\) that is statistically relevant. To evaluate the quality, we measure the distance between \(k_{\mathcal {R}}\) and \(k_{h}\). To do this, we propose the following distribution function:

$$\begin{aligned} Number Of Representatives({k_{\mathcal {R}}}, {k_{h}}, {x_0}) = \frac{1}{1 + (\frac{|k_{\mathcal {R}}-k_{h}|}{x_0})^{2}} \end{aligned}$$
(3)

where

figure c

The parameter \(x_0\) determines where the midpoint of the curve is. Concretely, if \(x_0=5\), \(k_{\mathcal {R}}=11\) will result in a quality of 0.5 because the absolute distance from \(k_{h}\) is 5. We set \(x_{0}=5\) for all our experiments. Intuitively, \(x_0\) guide the number of representatives that will be found. We set it to 5 as we believe that it is a reasonable amount of journeys to display on a single CJM. Because the number of representatives is not the only criteria to assess the quality of a CJM, the final CJM might contain more or less journeys if it increases the average quality.

Average Quality. We assign weights to the fitness and the number of representatives qualities to adjust their relative importance. According to our experiments and in line with the work from Buijs et al. [3], the results tend to be best if more weight is given to the fitness quality. Typically, weights \(w_f=3\), and \(w_{k_h}=1\) lead to the best results. Then, we get the overall quality by averaging the weighted qualities using the arithmetic mean. In the next section, we determine the stop** criterion of the algorithm.

4.5 Stop** Criterion

Before starting a new generation, we check if a stop** criterion is met. There are three ways we can use to stop the algorithm taken from [3, 6]. (1) The algorithm could stop after a certain number of generations. (2) One could stop the algorithm when a certain number of generations have been created without improving the average quality. (3) We could stop the algorithm when a certain quality threshold is reached for one of the evaluation criteria. Because it is difficult to predict the quality level that can be reached, we believe that stop** the algorithm using a threshold is not advisable. For this reason, we used a combination of approaches 1 and 2 for our experiments. Once the stop** criteria have been evaluated, either the algorithm stops, or, we generate new candidates by applying genetic operators described in the next section.

4.6 Genetic Operations

Once all the CJMs have been evaluated, we rank them by their average quality and copy a fraction (i.e., e) of the best ones in elite. Because we keep an untouched version of the e number of \(J_{\mathcal {R}}\mathrm{s}\), we make sure that the overall quality will only increase or stay steady. Then, we generate \((p-e)\) new \(J_{\mathcal {R}}\mathrm{s}\) as follows. We pick one random \(J_{\mathcal {R}}\) from elite, and perform one or multiple of these four operators. (1) Add a journey: A sequence is randomly picked from \(Top_n\) and added to \(J_{\mathcal {R}}\). (2) Delete a journey: A random journey is removed from \(J_{\mathcal {R}}\). Nothing happens if \(J_{\mathcal {R}}\) contains only one journey. (3) Add a touchpoint: A touchpoint from T is added to one of the journeys from \(J_{\mathcal {R}}\) at a random position. (4) Delete a touchpoint: A touchpoint is removed from \(J_{\mathcal {R}}\) unless removing this touchpoint would result in an empty set of touchpoints. As described in Fig. 3, once new \(J_{\mathcal {R}}\mathrm{s}\) have been created, we go back to the evaluation phase where the new \(J_{\mathcal {R}}\mathrm{s}\) are evaluated until one stop** criterion is met. Once such a criterion is met, we return the best \(J_{\mathcal {R}}\mathrm{s}\) of the last generation and the algorithm stops.

5 Evaluation

5.1 Datasets

We produced several event logs that simulate journeys. Generating the event logs ourselves means that we know the ground truth represented by the generative journeys and therefore the objective is to recover these journeys from a set of actual ones we produce. A generative journey is a known list of touchpoints from which we generate the event logs. Let \(J_{\mathcal {G}}\) be a set of \(k_{\mathcal {G}}\) generative journeys used to generate a dataset composed of 1,000 actual journeys.

If we were to use only these generative journeys to generate 1,000 journeys, we would obtain only \(k_{\mathcal {G}}\) distinct journeys. For instance, if we use \(J_{g1}\) = \(\langle \textsc {abc} \rangle \) and \(J_{g2}\) = \(\langle \textsc {abbd} \rangle \) to generate 1,000 journeys equally distributed, we obtain \(J_{a} = \big \{J_{g1}^{500}, J_{g2}^{500}\big \}\). A more realistic situation would depict a scenario where each group of customers can be described by a representative sequence of activities, but the actual journeys within the group can deviate from the representative one. Hence, to produce more realistic data, we inject noise for a fraction of the journeys. For instance, if the noise level is set to 50%, \(J_a = J_g\) is true for half of the data. For the other half, we add noise by removing or adding touchpoints, or by swap** the ordering of activities.

We generated 8 datasets of varying characteristics. The characteristics are distinct in terms of: number of \(k_{\mathcal {G}}\), number of touchpoints, average length of the journeys, and the standard deviation of the number of journeys assigned to each generative journey. For each dataset, we gradually apply 5 levels of noise, resulting in 40 datasets. The datasets, the detailed characteristics of them as well as the procedure followed to produce them are available at the following url: http://customer-journey.unil.ch/datasets/.

5.2 Metrics

We use both external and internal evaluation metrics. On the one hand, the external ones evaluate the results relative to the generative journeys. On the other hand, the internal evaluation uses cluster analysis techniques to assess the results. The aim is to account for the fact that the ground truth might not be the optimal solution because of the noise that was added. This section introduces these metrics. For the internal evaluation metrics, we borrowed them from [9].

External Evaluation - Jaccard Index. To evaluate the similarity between the sequences of activities from the generative journeys (\(J_{\mathcal {G}}\)) and the discovered representative journeys (\(J_{\mathcal {R}}\)), we propose to use the Jaccard index where a score of 1 indicates a perfect match.

$$\begin{aligned} JaccardIndex({J_{\mathcal {R}}}, J_{\mathcal {G}}) = \frac{|{J_{\mathcal {R}} \cap J_{\mathcal {G}}}|}{|{J_{\mathcal {R}} \cup J_{\mathcal {G}}}|} \end{aligned}$$
(4)

External Evaluation - Distance in Number of Journeys. This metric measures the distance between the number of generative journeys and the number of representative journeys returned by the algorithm. We propose:

$$\begin{aligned} NbJourneysDistance({k_{\mathcal {G}}}, {k_{\mathcal {R}}}) = abs({k_{\mathcal {G}}} - {k_{\mathcal {R}}}) \end{aligned}$$
(5)

Internal Evaluation - Mean Distance [9]. The mean distance i returns the average Levensthein distance between the representative sequence i and the sequence of actual journeys that have been assigned to i, \(k_i\) being the number of actual journeys assigned to the representative journey i.

$$\begin{aligned} MeanDistanceScore_i = \frac{\sum ^{k_i}_{j=1}{D(J_{r_{i}}}, J_{a_{ij}})}{k_i} \end{aligned}$$
(6)

Internal Evaluation - Coverage [9]. The coverage indicates the proportion of actual journeys that are within the neighborhood n of a representative.

$$\begin{aligned} Coverage_i = \frac{\sum ^{k_i}_{j=1}{(D(J_{r_{i}}}, J_{a_{ij}}) < n)}{k_i} \end{aligned}$$
(7)

Internal Evaluation - Distance Gain [9]. The distance gain measures the gain in using a representative journey instead of the true center of the set, C (i.e., the medoid of the whole dataset). In other words, it measures the gain obtained in using multiple representative journeys instead of a single one.

$$\begin{aligned} DistGain_i = \frac{\sum ^{k_i}_{j=1}{D(C(J_{a})}, J_{a_{ij}})-\sum ^{k_i}_{j=1}{D(J_{r_{i}}}, J_{a_{ij}})}{\sum ^{k_i}_{j=1}{D(C(J_{a})}, J_{a_{ij}})} \end{aligned}$$
(8)

5.3 Settings

We evaluate our genetic algorithm (approach 1) against an approach using Kmedoids clustering (approach 2) and the approaches proposed by Gabadinho et al. [7] (approach 3). Approach 1 is using our genetic algorithm with a fitness weight set to 3 and a weight for the number of representatives set to 1. Due to the non-deterministic nature of the genetic algorithm, we run it ten times. In approach 2, we build a distance matrix of the edit-distance between sequences. We then create k (found using the Calinski-Harabasz index) clusters using an implementation, [4], of the k-medoids algorithm. Finally, the medoid of each cluster becomes the representative. In approach 3 we build the same distance matrix as then one used in approach 2. Then, we used an agglomerative hierarchical clustering to define k clusters. Finally, we return the best representatives of each cluster using the frequency, the neighborhood density, the mean state frequency, the centrality, or the likelihood using the package Traminer available in R [7] (Fig. 4).

Fig. 4.
figure 4

Three approaches used to find the representative journeys.

5.4 Results

We present the results of the external evaluation metrics, then the internal ones and we conclude by mentioning the execution times for several datasets sizes. The external evaluation metrics are shown in Fig. 5. Here, we can see that the solution that reduces the Jaccard index the most and which is closest to the ground truth in terms of number of journeys is the genetic approach. The internal evaluation in Fig. 6 shows that the genetic algorithm outperforms the other approaches. Finally, the execution time to for 100 actual journeys is much faster using the kmedoids or using the techniques implemented in Traminer [7]. We observe that when we increase the datasets’ size the performance of the genetic algorithm tend to be comparable to the kmedoids implementation and faster than the techniques implemented in Traminer (Fig. 7).

Fig. 5.
figure 5

External evaluation

Fig. 6.
figure 6

Internal evaluation. The genetic algorithm perform best.

Fig. 7.
figure 7

Comparing the execution time per dataset for 100, 1’000, and 10’000 journeys.

6 Conclusion

In this paper, we propose two basic quality criteria to guide the evolution process of discovering the best representative journeys for a given set of actual journeys that otherwise would be unreadable. We demonstrate that they perform well on synthetic datasets. We show that techniques from social sciences are also useful for studying customer journeys. As suggested by Gabadinho et al., “The methods are by no way limited to social science data and should prove useful in many other domains” [8]. This present study supports this claim and highlights how research from social science can benefit our understanding of customers. At a time when a customer-centric culture has become a matter of survival according to [10], we anticipate that research at the crossroads between data science, marketing, and social sciences will be key to a full understanding of customer experiences.