Analyse User Flow in GA4 and BigQuery
You may already be observing users' journeys through the Explore module in GA4. Let's kick it up a notch using BigQuery.
This query gives you a sequence of events fired by the given user (Client ID) on your website.
You may want to find out how your most valuable customers shop compared to the newly acquired cohort of low spenders. Or how about finding out what sort of content serves as a gateway to purchase? Possibilities are limitless.
How It Works
Before we begin, it's important to note what Client ID we want to track. You will find the Client ID denotated as user_pseudo_id in BQ. For the sake of simplicity, you'll find it as a special variable at the beginning of the query.
The query breaks down the timestamp field into date and time. Then it unearths the page_location dimension from the nested table.
The rest is about selecting event-scoped and user-scoped dimensions of source, medium and campaign.
declare cid string;
set cid = '1103212246.1623307268'; -- Insert Client ID you want to track
extract (date from timestamp_micros (event_timestamp) ) as date,
extract (time from timestamp_micros (event_timestamp) ) as time,
(select value.string_value from unnest (event_params) where key = 'page_location') as page,
concat(traffic_source.source, ' / ', traffic_source.medium) as user_source_medium,
traffic_source.name as user_campaign_name,
concat ( (select value.string_value from unnest (event_params) where key = 'source'), ' / ', (select value.string_value from unnest (event_params) where key = 'medium') ) as event_source_medium,
(select value.string_value from unnest (event_params) where key = 'campaign') as event_campaign
from `projectId.analytics_123456789.events_*` -- Change this to your source table
where user_pseudo_id = cid
order by event_timestamp asc
I deliberately added user-scoped as well as event-scoped dimensions to this query because it will help you to see the stark difference between GA4 attribution models on the event vs. user level.
It also showcases how powerful the event-based data model is. With a couple of lines of code, you see how many campaigns touched your customer throughout the set date period.
Side Note on Attribution
Universal Analytics uses the session-based last touch non-direct attribution model. GA4 utilises different attribution models depending on scope (event, session, user). For more info, look into Witold Wrodarczyk's article over at Search Engine Land.
Going forward, you might observe discrepancies in attribution between UA and GA4. They will be especially noticeable in standard reports.
Moral of the story: learn to use BigQuery for reporting. It will pay off.
- Don't confuse Client ID and User ID. Client ID relies heavily on browser cookies, whereas User ID relies on your own means of identification. Test which identifier suits your analysis better.
- Take some time to find users that are representative of the segment you are analysing.
- Use this report to fuel your further quantitative and qualitative research efforts.
Jiří makes data engineering look easy. His long experience with building ETL pipelines comes in handy when he writes about BigQuery and automation processes.