Split Strings in Google Data Studio Using RegEx

Here’s a quick and dirty tip for Google Data Studio. We are going to be splitting strings using regular expressions and turning one custom dimension into two different values.

Situation

Let’s say you are sending a string from an ad platform to GA custom dimension. It looks like this:

lineItem=USA~ACQ~AppNexus~PD~interscroller~D_M~lifestyle~35541382|bannerDomain=marieclaire.com.

It’s easier to send those dimensions to GA as separate custom dimensions, so why bother with this? Well, sometimes you don’t have the luxury of owning GA 360, and in the free version custom dimensions are scarce.

You want to separate values of lineItem and bannerDomain into two different fields. Google Data Studio will serve as a visualisation platform. Your tool of choice to crack this nut is going to be RegEx. Some of you are thinking of the quote by Jamie Zawinski:

Some people, when confronted with a problem, think “I know, I’ll use regular expressions.” Now they have two problems.

You are absolutely right.

Solution

There are way too many ways to approach this problem, but here’s the one I used.

1) Create two new fields in GDS

One will serve as a placeholder for Line Item and the other for Banner Domain values. That’s the easy part.

2) Write up formulas

Let’s start with the Line Item field.

REGEXP_EXTRACT(Custom Dimension 1, '([^=]*)\\|')

This extracts the portion of the string between the “=” character and the “|” character. It is possible that the pipe at the end of the extracted string will remain as a part of the new string. Don’t worry, GDS will most likely cut it and not visualise it.

You can test the formula at regex101, a tool I highly recommend using.

On to the second formula that goes to the Banner Domain field. Here it gets tricky, because you need to extract the back part of the string. In the end, I went for a combination of RegEx and some modifications on the GDS side.

REGEXP_REPLACE(
  REGEXP_EXTRACT(Custom Dimension 1, 'bannerDomain=(.+)'),
    'bannerDomain=', ''
)

Almost too easy, right? We take every character that supersedes the pipe character, including the “bannerDomain=” string. That’s what the REGEXP_EXTRACT is for. We then wrap this code in a replacement function REGEX_REPLACE that finds the string “bannerDomain=” and replaces it with a string as empty as my life’s purpose. Done.

Check and test your formula again at regex101.

3) Visualise the data

You have newly acquired a treasure trove of data, so you can run different analyses and help your PPC team to optimise campaigns in a much more efficient fashion. I’m sure they’ll be thrilled about that.

Caveat

You might have noticed that some characters are escaped using a double backslash instead of the single backslash. That’s because Google Data Studio uses RE2 (perfect bedtime reading).

Another caveat is that you can’t perform more complex calculations inside of a CASE function. This makes Power BI or Tableau look better and better, doesn’t it?

We occasionally send out a newsletter for specialists and business decision makers alike. Interested?

No spam, no annoying sales pitches.

Honza Felt
Honza Felt

Managing Director

Honza Felt is a performance marketing specialist who turned into a marketing consultant after a mysterious accident. He spends his days leading a bunch of misfits at CF Agency. Drinks rum and knows things.