library(tidyverse) # data wrangling
library(lubridate) # dates and times
library(janitor) # cleaning up and duplicate checks
library(vtable) # balance tableTutorial 8: Causal Analytics & Improving User Sign Up Experiences
By the end of this tutorial, students should be able to:
- Describe how experiments help evaluate product or design changes.
- Join and prepare experimental and event-level datasets to compute key outcomes such as funnel completion rates.
- Compare and interpret results across variations, expressing differences in both percentage points and relative terms, and evaluating whether they are meaningful in a business context.
- Communicate analytic findings in clear, decision-oriented language suitable for a managerial audience, using one concise table or figure.
The Business Challenge
A fast-growing fintech company has just completed an A/B test to evaluate a new sign-up process that existing users must complete before accessing a new feature release in their account dashboard. The goal of the experiment was to introduce this new process without creating excessive friction or drop-off during sign-up.
The company rolled out a Test variation of the sign-up flow, updating the layout and copy of key steps, while a Control group kept the original version. You’ve received detailed event-level data showing when each user started, moved through the funnel, and confirmed. The product team wants to know not only whether the Test increased completions, but why — and whether it created any new friction for users.
In modern tech firms, analysts are embedded within product teams and play a central role in decision-making. Their job is to evaluate changes to the user experience (UX) using data rather than intuition.
Small design choices—such as layout, wording, or the number of steps—can meaningfully affect user behaviour at scale. To assess these changes, firms rely on A/B testing, comparing different versions of a feature on real users.
Common metrics include:
- Completion rates.
- Drop-off at each step of a funnel.
- Time to complete key actions.
- Downstream engagement and retention.
The stakes are commercial. Even modest improvements in conversion or retention can compound into large gains in revenue and growth.
The analyst’s role is not just to determine whether a change worked, but to explain why, for whom, and with what trade-offs. This requires combining causal inference with careful behavioural analysis.
Your Task
- Quantify the effect of the new variation on completion rates.
- Interpret the magnitude of the improvement — is it large, small, or somewhere in between?
- Explore additional metrics to understand user behaviour inside the sign-up process.
- Prepare an executive summary explaining whether the new sign-up process should be rolled out to all customers.
Loading R Packages
Here are the R packages we will need to complete the exercises:
The data comes in multiple files that we will load as we need them.
Prepare these Exercises before Class
Prepare these exercises before coming to class. Plan to spend 45 minutes on these exercises.
Exercise 1: Understanding the Business Context to See Why We Use Experiments to Test
Before analysing the data, take a step back and think about the business motivation behind this experiment.
The company has introduced a new personalised insights dashboard — but to use it, existing customers must complete an additional sign-up process. This is a risky move. Each additional step in the customer journey can introduce friction and reduce completion.
Completion is not just a UX metric — it is directly tied to financial performance. If fewer users complete the process, fewer access the new feature, reducing engagement, retention, and ultimately revenue. At scale, even small drops in completion can translate into large losses in customer lifetime value.
In product analytics, we often represent a sign-up process as a funnel — a sequence of steps where users either continue or drop out at each stage (for example, from start to confirm). Each step is a decision point.
If we observe differences in completion rates or drop-off across steps, an important question arises: are these differences caused by the new design, or by other factors?
(a) Why might the company choose to test a new sign-up flow rather than launch it for all users immediately?
(b) What outcomes or trade-offs should the company pay attention to when evaluating the new flow (think about success from the users’ perspective vs the company’s perspective)?
(c) What can a funnel reveal about customer behaviour that a simple completion rate cannot?
(d) Rather than run an experiment, the company could just implement the new sign-up process for all users going forward and then compare completion rates before and after this change. Why is this problematic?
(e) How does random assignment in an A/B test help us isolate the effect of the new design?
Exercise 2: Exploring the Experimental Data
It’s time to look at the data from the experiment.
You have two datasets:
ab_test_demographics: information about each client.ab_test_experiment_clients: information on which clients were assigned to the Control or Test variation.
Our goal is to merge these datasets and check whether allocation to treatment and control groups is balanced across demographic characteristics.
(a) Load the two datasets into R by completing the code below:
demogs <- YOUR_CODE
treatment <-
YOUR_CODE |>
clean_names()What problem does clean_names() solve, and how might inconsistent column names affect later steps?
(b) Merge the treatment indicator variable into the demogs data by completing the code below. We recommend you use an inner_join() to do this.
demogs <-
demogs |>
YOUR_CODEWhat does a join do in this context, and why must these datasets be combined before analysing the experiment?
(c) Next, we want to inspect and clean the treatment variable. First, use distinct() to find the values that variation takes in our dataset.
demogs |>
YOUR_CODEWhat is this step checking, and why is it important to verify these values before proceeding?
Second, use drop_na() to remove rows with missing treatment labels.
demogs <-
demogs |>
YOUR_CODEWhy do we drop these observations?
(d) In a well-designed experiment, each variation should include comparable types of users. How does this connect back to the idea of random assignment?
(e) Complete the code below to compare demographic compositions of users across treatments.
demogs |>
select(-client_id) |>
st(group = YOUR_CODE)What is this table helping us assess, and why do we check this before comparing outcomes?
# Write your answer hereExercise 3: Working with Web Log Data
Now that you’ve checked that users were randomly assigned to the Test and Control groups, it’s time to examine what they actually did inside the experiment.
In this step, you’ll work with event-level data — information about each time a user interacted with the sign-up process.
(a) Load and inspect the web log data. The data are split across two files. Run the code below to combine them into one dataset.
weblogs_files <-
c('data/ab_test_web_data_pt_1.txt',
'data/ab_test_web_data_pt_2.txt')
weblog <-
weblogs_files |>
map_df(read_csv)
glimpse(YOUR_CODE)What problem does map_df() solve here, and why might the data be stored across multiple files?
(b) Explain the structure of the data. What kind of information does each row represent? How is this different from the demographic dataset used earlier?
(c) A common problem when working with event log data is duplicate rows, where the same information is recorded multiple times.
Clean the data to remove duplicate rows by completing the code below:
weblog_clean <-
weblog |>
YOUR_CODEWhat might happen if we failed to remove duplicates before computing outcomes?
(d) Merge the cleaned web log data with the dataset that contains information on which clients are assigned to each treatment. Drop any observations with missing treatment assignments.
weblog_clean <-
weblog_clean |>
YOUR_CODE |>
YOUR_CODEWhy do we restrict the dataset to users with valid treatment assignments before analysing behaviour?
In-Class Exercises
You will discuss these exercises in class with your peers in small groups and with your tutor. These exercises build from the exercises you have prepared above, you will get the most value from the class if you have completed those above before coming to class.
Exercise 4: Measuring Funnel Completion
Now that we’ve linked event-level data with the experimental information, let’s create our first outcome measure: funnel completion.
Each client goes through a series of process steps (for example, start, step_1, step_2, confirm).
A funnel completion occurs if the client reaches the final step, "confirm".
(a) Compute a client-level indicator for whether each client completed the funnel by completing the code below.
completion_by_visitor <-
YOUR_CODE |>
group_by(YOUR_CODE) |>
YOUR_CODE
completed = any(YOUR_CODE == "confirm"),
.groups = "drop"What does any() do here, and why must we aggregate from event-level data to client-level outcomes?
(b) We want to see if the new sign-up process is more effective. Therefore, we first need to summarise completion rates by treatment.
completion_summary <-
YOUR_CODE |>
YOUR_CODE |>
summarise(
completes = YOUR_CODE,
total = YOUR_CODE,
pct = completes / total * 100,
.groups = "drop"
)What do completes, total, and pct represent, and why do we summarize at the treatment-group level?
(c) Create a simple visual to compare completion rates across the two variations. Use a bar chart to display the percentage of users who completed the funnel.
completion_summary |>
YOUR_CODE +
YOUR_CODE +
geom_text(aes(label = round(pct, 1)), vjust = -0.5) +
labs(
title = "Funnel Completion Rate by Variation",
x = "Variation",
y = "Completion Rate (%)"
) +
theme_minimal() +
theme(legend.position = "none")What does this visual add beyond the table, and why is a simple chart often preferred in a business setting?
(d) Complete the code below to compute and interpret the ‘effect’ of the new sign-up process on completion. Use lag() to compare the Test group to the previous row after ordering the summary table.
completion_diff <-
completion_summary |>
arrange(YOUR_CODE) |>
mutate(diff_pct_points = YOUR_CODE) |>
summarise(diff_pct_points = YOUR_CODE)
completion_summary |>
arrange(YOUR_CODE) |>
mutate(
pct_change = YOUR_CODE
)Because the Control row has no previous row, the lag() result for Control will be missing. Report the Test row’s difference.
By how many percentage points did the change in sign-up process influence completions? What is the difference in percentage points and percent change, and why do we typically report percentage points in this setting?
(e) Complete the code below to express the effect size in standard deviation units.
diff_rate <- completion_diff |>
pull(diff_pct_points) / 100
control_sd <-
completion_by_visitor |>
YOUR_CODE |>
YOUR_CODE
sd_rate = YOUR_CODE
pull(sd_rate)
diff_sd_units <- diff_rate / control_sd
print(diff_sd_units)What does dividing by the standard deviation tell us about the size of the effect? What does this help us better understand?
(f) Would you consider this a small, moderate, or large effect in a typical online experiment? Explain your answer.
Exercise 5: Communicating Experimental Insights
Now we need to translate your analytical results into a short executive brief written for a business audience.
Your goal is to communicate what was tested, what happened, and what the company should do next — clearly and persuasively, without technical detail.
Length: 400 words.
Exhibit: 1 table or figure.
Write this as an executive brief to be read and understood by managers.
- Bullets must be written as full sentences.
- Use the writing principles from Write Like an Amazonian — short, declarative sentences; clarity before style.
- No visible code in the brief.
- Do not produce new analysis, tables, or figures in this section — use the results already created in Exercise 4.
INSERT YOUR FIGURE OR TABLE HERE
(For example: your completion-rate comparison chart or summary table from Exercise 4.)
Executive Summary
Summarise in 3-4 sentences:
- The business problem
- The intervention
- The key result
- Why it matters (economic significance)
Key Insights
- Up to 3 bullets describing what the data and analysis show.
- Use clear, declarative statements to describe what the data show.
- Report results in percentage points
- Avoid technical language
Business Implications
- Up to 3 bullets on why these findings matter for the organisation.
- Connect the analytics result back to strategic goals.
Recommended Actions
- Up to 3 bullets outlining clear, actionable next steps.
- Recommendations should be specific, realistic, and aligned with business priorities.
Exercise 6: Understanding Mid-Funnel Outcomes
After reviewing your executive brief, one manager comments:
“This tells me how many users finished the sign-up flow — but I want to understand what happened in the middle.
Where exactly are we gaining or losing customers?”
(a) What does the manager mean by “mid-funnel outcomes”? How are these different from overall completion rates?
(b) Why might focusing only on completions hide important insights about customer experience or friction points?
(c) Using the same event-level data, how could you explore mid-funnel performance? Describe two or three possible approaches (e.g. visual summaries or simple metrics).
How does the structure of event-level data (multiple rows per user) enable this type of analysis?
(d) What kinds of business questions could mid-funnel analysis help answer that top-line completion rates cannot?