RuffRuff App RuffRuff App by Tsun

[Looker Studio] How to Use the REGEXP_EXTRACT Function and Practical Examples | Calculated Fields

[Looker Studio] How to Use the  REGEXP_EXTRACT Function and Practical Examples | Calculated Fields

[Looker Studio] How to Use the  REGEXP_EXTRACT Function and Practical Examples | Calculated Fields

This article provides a clear explanation of how to use the REGEXP_EXTRACT function in Looker Studio and specific examples of its practical applications. Let's utilize the functions available in Looker Studio to process and visualize data.

When it comes to processing data from GA4 (Google Analytics 4) and Search Console, many operations can be performed using calculated fields without needing Big Query. Additionally, the REGEXP_EXTRACT function is a regex-based function. In both GA4 and Looker Studio, using regular expressions can significantly expand the possibilities for data visualization and analysis. I will explain specific examples using GA4 data through connectors, so please make use of this information.

What are Calculated Fields in Looker Studio?

Calculated Fields in Looker Studio are a handy feature that allows you to create custom fields by using operators (such as addition, subtraction, multiplication, division), functions, and regular expressions based on existing items for use in reports.

Also, for basic usage of Looker Studio, please refer to the "How to Use Looker Studio" guide. Looker Studio is a very convenient tool that is free to use and allows for the creation of easy-to-understand reports by connecting to various data sources, so let's make active use of it.

What is the REGEXP_EXTRACT Function?

The REGEXP_EXTRACT function is a function used to extract the first substring that matches a specific string pattern using regular expressions within a text field. Please note that if multiple specified string patterns exist, the substring corresponding to the pattern appearing earlier will be extracted.

Syntax

The syntax for the REGEXP_EXTRACT function is as follows:

REGEXP_EXTRACT("text containing the string to extract", "string using regular expressions")

In "text containing the string to extract", you specify a text field to determine if it contains a specific string pattern. In "string using regular expressions", you specify the string you want to extract using regular expressions.

 What are Regular Expressions?

Regular expressions are special strings used to describe specific patterns within text strings. By using regular expressions, you can specify strings such as the following:

  • A string that starts with "abc" and has five characters can be represented by the regular expression ^abc.{2}$
  • A string containing a hyphen between a three-digit number and a four-digit number can be represented by the regular expression ^\d{3}-\d{4}$
  • A string that ends with "abc" can be represented by the regular expression .*abc$

The key characteristic of regular expressions is their ability to specify not just specific strings but patterns of strings, which allows for a high degree of flexibility in the strings that can be specified. For instance, if there are two strings, "abc" and "abcde", a normal search specifying "abcde" would not include "abc". However, by using regular expressions, both can be included in the search results.

 For a detailed description of how to write regular expressions, please refer to the official Looker Studio Regular Expressions help.

How to Use the REGEXP_EXTRACT Function

To use the REGEXP_EXTRACT function, you need to create a calculated field in Looker Studio. There are two types of calculated fields: data source calculated fields and chart-specific calculated fields. This article will focus on creating a data source calculated field, but for more detailed differences between the two, please see the differences between data source calculated fields and chart-specific calculated fields.

Use case: Extracting the second level of the page URL in GA4 using Looker Studio

In this instance, let's extract only the second level from the URL directory to make it easier to categorize pages.

For example, in the URL https://inhouse-plus.jp/reporting/dataportal-templates/, the hierarchical structure is as shown in the table below, and the second level is the "reporting" part.

 Level URL example
first level inhouse-plus.jp
second level reporting
third level dataportal-templates

 

First, set up the calculation fields as follows. 

 

lookerstudio-regexp-extract-setting

(Quote:Looker Studio)

① Field Name: Please enter any field name.

② Formula: REGEXP_EXTRACT(Page location,'^https://[^/]+/[^/]+/([^/]+)/')

This regular expression specifies the second-level directory of a URL that begins with https://.

③ Save: Once you have completed entering the information, click save.

second level field has been created as follows.

lookerstudio-regexp-extract-metric

(Quote:Looker Studio)

 Next, since the fields created earlier are now available for use in reports, we will add them to the table. Add a second level to the dimensions as follows. 

lookerstudio-regexp-extract-graph

(Quote:Looker Studio) 

We were able to visualize the number of views by the second level hierarchy in this way.

In this case, we used the number of views as the metric, but caution is necessary when using the number of sessions or users. Even though the dimensions are grouped by the second level hierarchy, the numbers of sessions and users are not unique.

For example,

https://function.com/abc/123/x had 10 views and 8 users

https://function.com/abc/123/y had 5 views and 5 users

When we have access data like this, adding the two, the number of views for the second level hierarchy 123 is correctly summed to 15, but the number of users should not simply be summed to 13. This happens if the same users accessed both pages. If two users visited both pages, the number of views for the second level hierarchy 123 is 15, but the number of users would be 11. However, in the example usage given, using the REGEXP_EXTRACT function at the second level, the number of sessions and users are not uniquely counted but simply summed up to 13, so be careful with the scenes and metrics you use.

If you want to see the numbers for each level of pages where the number of sessions and users is uniquely counted, it will be necessary to set up content groups using Google Tag Manager.

Relevant Looker Studio Official Documentation

Looker Studio Official Help : About calculated fields

Looker Studio Official Help : Function list

Back to blog

Featured collection