Formula reference

Introduction

Custom formulas can be used to generate custom metrics based on the undelying data. There are following types of formulas (in the order of evaluation):

Field formulas and table aggregations can be applied in two levels: first part level and second report level to the values received from part level (including formulas and aggregations).

Custom formulas are applied last to the whole report after generating the output from field formulas and aggregations. While custom formulas are most powerful there are some considerations when using them. First, make as many data manipulations using row formulas an table aggregations as possible, and apply custom formulas to the aggregated data.

The row formulas and aggregations are tehcnically vectorized and provide fastest and most efficient way of working with large amounts of data (tens of millions of records). It is also recommended to apply as many filters as possible to the data being processes: part filters, dimension filters or context filters.

Field formulas

Field formulas are applied to data before any aggregations are made. Aggregations can be defined using report builder for each field: Sum, CountDistinct, Avg, Count, Max, Min, StringConcat, UniqueList, TakeFirst

General syntax for writing custom field formulas:

Fn.name(argument[, argument...])

Function argument can be another function which return required data type, or expression. Expressions must be placed inside curly brackets and can contain python code including contants, arithmetic operations and API calls. All arithmetics must be composed using the available functions. Regular operators such as +, -, *, / are not supported.

Existing field value can be referenced using the row variable using field alias:

row['alias']

Full example of a formula for calculating change percentage based on the canonical equation (a-b)/b:

Fn.div(Fn.minus(row['a_alias'],row['b_alias']),row['b_alias'])

Full example of a formula for calculating the daily average of an aggregated column:

Fn.div(row['some_sum'],{context.API.range(context.start,context.end,unit='days')|length}

Dynamic variables are are passed to the function interpreter via the context variable. the context object has the following properties:

The following functions are available in custom field (function name, arguments and return type):

Fn.when(a any,b any,c any) -> any b|c (alias Fn.if (deprecated))

In 'when' the first variable can be Boolean, numeric or string value. Boolean false, empty string and numeric 0 are considered false, any other value is true. Return type is inferred from the types of b and c.

Custom formulas

Custom formulas are evaluated to the intermediate after all field formulas and aggegrations are applied in the same order as they are defined. This means that any previous formula is available in subsequent formulas.

There are three types of formulas:

Aggregate functions

Aggregate functions take field alias (field's data type is specified instead in the list below) as first argument to apply respective aggregation to. Return value is new field containing the aggregation result.

The optional groupby argument takes list of field aliases. The returned field's values are the result of given aggregation within each specified group.

The Quantile function takes field alias as an argument and the percentile as second argument. Returned field contains repective percentile value.



Mutating functions

Mutating functions are special functions which change the layout (order or structure) of the data.

Example:

Fn.Group(['field','another_field'], aggregator='Sum')

Groups rows by column and another column, and applies Sum aggregator to all remaining fields. Returns an error if the operation cannot be completed (e.g. trying to apply numeric aggregator to string field). 

Fn.Group(['column','another_column'], aggregators={'alias_to_sum':'Sum','to_count':'Count'})

Groups rows by column and another column, and applies Sum and Count aggregators to respective fields. All remaining fields are dropped (report will display an empty column).  Returns an error if the operation cannot be completed (e.g. trying to apply Sum aggregator to a string field).