During coaching sessions I’ve found that the Group by and Aggregate features in Bubble.io are both somewhat overlooked, and often misunderstood. I’m not really surprised – Bubble’s documentation is not inaccurate, but the feature itself is different from the visual way we’re used to set up stuff in Bubble. In some ways both the setup and the result behaves more like code, and it can be tricky to wrap your head around what exactly the result is.
In this article, we’ll look into what the Group by feature is for, and in this case, what it’s not for – ‘cause that’s where the misunderstandings come in.
What the Group by feature in Bubble is for
In short, the Group by feature is for producing a table of aggregated data from your database. It can perform calculations based on a given set of constraints, such as the total sum of sales in a given month or city. The result is a list of data points that you can then present in a repeating group or in a graph.
Let’s set up a quick example: your company has a CRM set up where you keep a list of clients in the US. You would like to know how many clients you have per city. There are three ways you could solve this:
- Set up a repeating group of all cities, and then perform a search within each row where you count all the clients in that city
- Save the number of clients as a separate field on the city data type
- Use the Group by feature to return a list of all cities along with the number of clients in each city
The first option would be the less performant one: each city would require a separate database search that may slow down the page significantly as the number grows. The second option would be faster to load, but require more upkeep. You would have to count the number of clients in each city every time one is created, deleted or moved by using backend triggers or some other method.
The third option is both performant and requires no upkeep, but comes with its own set of pros and cons.
Setting up a Group by aggregation in Bubble.io
Many assume that the Group by feature is used to set up multiple lists that are grouped by some given condition, such as listing all the Companies in each city. Wait, wasn’t that what you just said it was for?
Yes and no: yes, we can separate Companies by City, but what we get back is not a list of Companies – it’s an array of aggregated data such as the number of Companies per city.
What can be confusing about the Group by feature is that it only returns the database records by which it is grouped. Returning to our CRM example, if you set up the data source of a Repeating Group to contain a Group by expression, you will see that the Type of content changes to Grouping.
From then on we can fetch the data stored on the City (such as the city’s name), but we can’t get any data on the Company, since this is aggregated. In other words, what we get back is a calculation that we specify, such as the number of Companies in each city, but we don’t get the actual list of cities.
First, let’s set up what a Group by expression looks like.
In this search, we’re looking for Companies, and then choosing to group those Companies by City. As you can see in the bottom rectangle, we’re setting up an aggregation of the Count. This means that what we’ll get back is a list of Cities with the aggregated total number of Companies in each city represented by a number. In a repeating group, the result would look like this:
Another typical use case is to set up charts, as illustrated below:
To understand how this is all structured, it’s helpful to have a look at our old friend Chrome Developer Tools and use the Network tab to investigate what kind of data the Bubble server is actually sending to our browser. Let’s have a look:
- First, we navigate to the Network tab…
- … and then filter the network activity by Fetch/XHR, which lets us isolate the network activity related to the database
- The Group by query is named group_by. As you can see from the highlighted tooltip, the Group by feature uses the Elasticsearch API, just like other Bubble database queries.
- Clicking the Preview tab, we can see the actual data being sent.
Each of the numbered rows in this JSON-code represents one Grouping. The Grouping in our case is a City, since we specified that we want to group by City. You can see the Cities represented by their Unique ID and then the aggregated data (number of companies) preceded by agg0 and then the numerical value. What can we learn from this?
The first important point is that we only get actual database data for each Grouping (City) and not for the actual Companies. Technically, the browser is still oblivious to what Companies we’re working with, as it only knows the total number of Companies in each city. When you think about it, this makes sense: each row represents multiple companies (aggregated with a count) and not a single one. In other words, we can’t reference a specific Company, since all we’re looking at is a number.
A second point is that each City does not then send a list of those companies. The only thing the Group by operator can do is to group and aggregate data. If you’re looking to list all Companies in a given city, you will have to resort to using a nested Repeating Group
Group by multiple groupings
As you can see in the inspector window for the Group by feature, you can set up more than one Grouping in each search expression. This will split the search further into every possible combination of groupings and allow you to again aggregate based on each grouping combination.
Let’s expand our example from earlier and introduce a Category to each company: Finance and Health Care. By adding the Category as a second Grouping, Bubble generates a list of all possible combinations. The expression would then look like this:
Visualizing this in a Repeating Group, we can see that Bubble now generates one row of each possible combination, like New York – Finance and Los Angeles – Health Care. This way, you can dig further into the data, but still rely on a single search expression and not a complex combination of nested repeating groups and searches:
Having another look in DevTools to see the data, we can see that Bubble now returns two lookups (the City (grouping0) and Category (grouping1)), represented by their Unique ID’s. You can also see the aggregated number 2, which is the total number of Companies in New York under the Finance Category:
Different kinds of aggregations
So far we’ve simply counted the number of Companies. This doesn’t refer to any specific field, but simply counts all the records saved in the database that matches the constraint (City/Category).
This is where it gets interesting: we can also perform calculations on fields saved on that Thing in the database. Let’s say for instance that we’re no longer looking at Companies, but on Orders delivered to a specific city. In this case we want to know the total sum of orders to one City, and to do that we use the aggregate function to calculate the total sum from a given field.
Highlighted in red you can see the Aggregation is set to Sum. Bubble then asks which field we would like to apply the calculation to. In this case we’re going with the price.
The result in a repeating group is as above: each City has a row with the aggregated sum of sales. Sum is just one of the calculations we can perform. In the Group by expression we can specify other kinds:
As we’ve seen, the Group by feature is used to create tables of aggregated data with a single search expression through the Elasticsearch API. Since we don’t have to resort to adding nested searches within the cell of a repeating group, we get a result that performs very well and that can perform calculations on any field we specify, like generating a sales report for a specific area or product.