[Looker Studio] How to Use the REGEXP_REPLACE Function and Practical Examples | Calculated Fields
This article provides a clear explanation of how to use the REGEXP_REPLACE function in Looker Studio and specific examples of its practical applications. Let's utilize the functions available in Looker Studio to process and visualize data.
When it comes to processing data from GA4 (Google Analytics 4) and Search Console, many operations can be performed using calculated fields without needing Big Query. Additionally, the REGEXP_REPLACE function is a regex-based function. In both GA4 and Looker Studio, using regular expressions can significantly expand the possibilities for data visualization and analysis. I will explain specific examples using GA4 data through connectors, so please make use of this information.
What are Calculated Fields in Looker Studio?
Calculated Fields in Looker Studio are a handy feature that allows you to create custom fields by using operators (such as addition, subtraction, multiplication, division), functions, and regular expressions based on existing items for use in reports.
Also, for basic usage of Looker Studio, please refer to the "How to Use Looker Studio" guide. Looker Studio is a very convenient tool that is free to use and allows for the creation of easy-to-understand reports by connecting to various data sources, so let's make active use of it.
What is the REGEXP_REPLACE Function?
The REGEXP_REPLACE function is used to replace substrings that match a specific pattern, defined by a regular expression, within a text field with another string.
Syntax
The syntax for the REGEXP_REPLACE function is as follows
REGEXP_REPLACE("text containing the string to be replaced", "string using regular expressions", "replacement text")
In "text containing the string to be replaced", specify the text field that includes the specific string pattern you want to replace.
In "string using regular expressions", specify the string you want to replace using regular expressions. In "replacement text", specify the text to replace with. The string pattern that matches the specified regular expression will be replaced with the text specified here.
What are Regular Expressions?
Regular expressions are special strings used to describe specific patterns within text strings. By using regular expressions, you can specify strings such as the following:
- A string that starts with "abc" and has five characters can be represented by the regular expression
^abc.{2}$
- A string containing a hyphen between a three-digit number and a four-digit number can be represented by the regular expression
^\d{3}-\d{4}$
- A string that ends with "abc" can be represented by the regular expression
.*abc$
The key characteristic of regular expressions is their ability to specify not just specific strings but patterns of strings, which allows for a high degree of flexibility in the strings that can be specified. For instance, if there are two strings, "abc" and "abcde", a normal search specifying "abcde" would not include "abc". However, by using regular expressions, both can be included in the search results.
For a detailed description of how to write regular expressions, please refer to the official Looker Studio Regular Expressions help.
How to Use the REGEXP_REPLACE Function
To use the REGEXP_REPLACE function, you need to create a calculated field in Looker Studio. There are two types of calculated fields: data source calculated fields and chart-specific calculated fields. This article will focus on creating a data source calculated field, but for more detailed differences between the two, please see the differences between data source calculated fields and chart-specific calculated fields.
Use case: Removing Query Parameters from GA4 Page URLs in Looker Studio
Cleanse your data by removing query parameters from URLs. Although Looker Studio's GA4 connector includes a "page path" dimension that already excludes hostnames and query parameters, there are cases where you may want to identify pages including the hostname, such as for sites with subdomains. In such cases, you can remove only the query parameters from the URL that contains the hostname, page path, and query.
First, set up a calculated field as follows.
(Quote:Looker Studio)
① Field Name: Please enter any field name.
② Formula: REGEXP_REPLACE(Full page URL,"\\?.*$","")
③ Save: Once you have completed entering the information, click save.
A field called the Page URL (excluding parameters) has been created as follows.
(Quote:Looker Studio)
Now that the field we created earlier can be used in the report, let's add it to the table. Add the page URL (excluding parameters) to the dimensions as shown below.
(Quote:Looker Studio)
When using the REGEXP_REPLACE function, there are some points to be aware of. Although dimensions are grouped by parameter-stripped URLs, the number of sessions and users are not unique.
For example,
If the access data shows 10 page views and 8 users for https://function.com/abc/123
and 5 page views and 5 users for https://function.com/abc/123?x=y
, the total page views for https://function.com/abc/123
can be simply added to get 15. However, the number of users cannot be summed up to 13 in the same way. This is because the same users might be included in both URLs. If 2 users accessed both URLs, the total page views for https://function.com/abc/123
would be 15, but the number of users would be 11. When using the REGEXP_REPLACE function as in the example provided, the session and user counts for the page URL will be displayed as a simple total of 13, without deduplication. So, be cautious about this.
Relevant Looker Studio Official Documentation
Looker Studio Official Help : About calculated fields
Looker Studio Official Help : Function list