Why Do I Need a Lead ID?
Lead ID is a cornerstone of conversion tracking and an important tool to essential customer analytics. This article sheds light on what it is and why it's important.
Map of The Problematique
Every analytics platform gathers information about its visitors. When an event happens on the website, there are three main pieces of information that must be known.
1) Who triggered the event?
GA stores this information under client_id
. For more details about this wonderful dimension, check out Julius' descriptive article.
If your site has a client zone or a customer profile, your analytics platform may be tracking the user_id
dimension. That gets populated when users log in to their accounts.
2) During which session has the event been triggered?
That's your session_id
. GA defines how it counts sessions in its documentation. There are some important differences between sessions from the GA4 and the already deprecated Universal Analytics.
3) When exactly did the event occur?
That's your timestamp
plain and simple. You can be fancy and implement custom timestamp tracking from Simo. We highly recommend it. The reason for it is that GA sends events in batches and the timestamp you see in the BigQuery export is the timestamp of the batch, not the event itself.
user_id
is great for cross-device analysis, it's not suitable to run this kind of analysis on an out-of-the-box GA implementation. For the analysis to make sense, you need a special data model.Refresher
Unless you are completely new to how GA processes events and attributes them to sessions, clients and users, this diagram shouldn't be anything new.
No matter how hard you look, there is no Lead ID, is there? That's because GA doesn't natively collect it. It's something we've added to our data model to make our lives easier.
Enter The Lead ID
When you collect Lead ID, it's much easier to build reporting tables like this one.
You already know that you may not send any PII to Google Analytics. However, this report shows e-mail addresses of leads who submitted a form. It also attributes that particular conversion to our custom channel grouping. That's not all!
Key Benefits of Lead ID
- Makes report creation easier and faster, thereby saving you money.
- Allows you to report on individual leads, not just on sessions.
- Effectively circumvents the "no PII in GA rule" without the need to hash information.
- Serves as a connector between the website front-end and your CRM software.
- Promotes clarity - you can uncover data discrepancies between your website and CRM really fast.
Implementation - Developers
Most developers are going to understand when you show them this diagram.
The idea is to create an identifier and let it persist in user's browser using the localStorage or sessionStorage. Alternatively, you can save it as a first-party cookie. Ideally, do all three and build a logic that keeps the values in all these storage spaces unified.
When a conversion occurs, the developers should check for the Lead ID and send it into a dataLayer object. That way you'll be able to see the Lead ID value in GA.
Once the conversion has been validated and written into the database, send the same Lead ID along with all form data to BigQuery using its API.
Use the Lead ID as a primary key to pair the data from Google Analytics to the data the devs sent you straight to BQ.
Implementation - GTM
This is way easier than it seems. Half the battle is to get the Lead ID into the dataLayer. Unless the dev team screwed up badly, you should be fine now.
Variable
Find out how the developers are calling the Lead ID in the dataLayer object. Yes, documentation comes in handy. If you have our dataLayer, chances are it's going to be called plainly lead_id.
Create a Data Layer Variable in GTM, use the value lead_id as the DL Variable Name and call the variable accoriding to your naming convention. You do have got a naming convention, right?
To be extra sure, convert null
and undefined
to the error string value. Just in case the developers send a wrong value or their logic defaults to null
or undefined
, it's good to have this covered.
Trigger
Once you've got the variable set up, it's trigger time. Since we are tracking the Lead ID, ensure it fires on events that send leads to GA. Again, it depends your dataLayer set-up, but we'll stick with the event name generate_lead.
You could now exclude the error string, filtering out all the faulty leads. Don't do that. That way, you won't find out how many errors did the devs send you. Discrepancies are going to pile up and the blame game will start before you can say "otorinolaryngology".
It's a good practice to send errors to your data warehouse and clean it up there. Plus, you've got a nice feedback loop with the dev team. The second you spot an error, you can let them know to go investigate.
Tag
This is the easiest one of all. Add a parameter called lead_id to your GA tag among the User Properties. If you have enough parameters left, you can track the Lead ID in Event Parameters too.
Finished!
Test out your implementation and iron out the kinks. If you are feeling particularly masochistic and want to look at your website's data in the Google Analytics UI, don't forget to register the lead_id as an event parameter and a user property in the GA admin section.
You now have a fully functioning Lead ID in GA as well as in BigQuery. This is a starting point for customer analytics and CRM integration.
In case something doesn't make sense or doesn't work, reach out via the contact form, our LinkedIn page or straight to Honza Felt.
Data Engineer
Jiří makes data engineering look easy. His long experience with building ETL pipelines comes in handy when he writes about BigQuery and automation processes.