# Formula reference

## Introduction

Custom formulas can be used to generate custom metrics based on the undelying data. There are following types of formulas (in the order of evaluation):

Field formulas

Field aggragations

Custom formulas

Field formulas and table aggregations can be applied in two levels: first part level and second report level to the values received from part level (including formulas and aggregations).

Custom formulas are applied last to the whole report after generating the output from field formulas and aggregations. While custom formulas are most powerful there are some considerations when using them. First, make as many data manipulations using row formulas an table aggregations as possible, and apply custom formulas to the aggregated data.

The row formulas and aggregations are tehcnically vectorized and provide fastest and most efficient way of working with large amounts of data (tens of millions of records). It is also recommended to apply as many filters as possible to the data being processes: part filters, dimension filters or context filters.

## Field formulas

Field formulas are applied to data before any aggregations are made. Aggregations can be defined using report builder for each field: Sum, CountDistinct, Avg, Count, Max, Min, StringConcat, UniqueList, TakeFirst

General syntax for writing custom field formulas:

Fn.name(argument[, argument...])

Function argument can be another function which return required data type, or expression. Expressions must be placed inside curly brackets and can contain python code including contants, arithmetic operations and API calls. All arithmetics must be composed using the available functions. Regular operators such as +, -, *, / are not supported.

Existing field value can be referenced using the row variable using field alias:

row['alias']

Full example of a formula for calculating change percentage based on the canonical equation (a-b)/b:

Fn.div(Fn.minus(row['a_alias'],row['b_alias']),row['b_alias'])

Full example of a formula for calculating the daily average of an aggregated column:

Fn.div(row['some_sum'],{context.API.range(context.start,context.end,unit='days')|length}

Dynamic variables are are passed to the function interpreter via the context variable. the context object has the following properties:

context.start - report start date

context.end - report end date

context.API - additional builtin helper functions providing facilities for manipulating other values such as time shifts, or subset generation, for example. Refer the context API documentation for available functions.

context.config - access to site configuration values.

context.tz - Site time zone.

context.dim - Access dynamic context variables. Refer the dynamic dimensions API for details.

context.today - Current date according to site's time zone in 'YYYY-MM-DD' format.

context.boy - January 1st of current year in 'YYYY-MM-DD HH:mm:ss' format.

The following functions are available in custom field (function name, arguments and return type):

Fn.when(a any,b any,c any) -> any b|c (alias Fn.if (deprecated))

In 'when' the first variable can be Boolean, numeric or string value. Boolean false, empty string and numeric 0 are considered false, any other value is true. Return type is inferred from the types of b and c.

Fn.mult(a float|int, b float|int) -> a*b = float|int

Fn.minus(a float|int, b float|int) -> a-b = float|int

Fn.plus(a float|int, b float|int) -> a+b = float|int (alias Fn.add)

Fn.div(a float|int, b float|int) -> a/b = float|int The division is "safe" i.e. it returns 0 instead of an error in cases such as division by 0.

Fn.str(a any) -> string or '' if the value cannot be converted to string.

Fn.float(a any) -> float or 0 if the value cannot be converted to number.

Fn.int(a any) -> int or 0 if the value cannot be converted to number.

Fn.ceil(a float) -> int

Fn.floor(a float) -> int

Fn.round(a float) -> int

Fn.max(a float, b float) -> float if a>b then a else b

Fn.min(a float, b float) -> float if a<b then a else b

Fn.eq(a float, b float) -> bool if a=b then true else false

Fn.ne(a float, b float) -> bool

Fn.gt(a float, b float) -> bool

Fn.lt(a float, b float) -> bool

Fn.ge(a float, b float) -> bool

Fn.le(a float, b float) -> bool

Fn.abs(a float|int) -> float|int

Fn.isodateformat(a DateTime, format str) -> str

Fn.dateformat(a DateTime, format str) -> str

Fn.asdate(a str) -> DateTime

Fn.monthformat(a DateTime) -> str

Fn.includes(haystack str, pattern regex) -> bool

Fn.has(haystack array, element any) -> bool

Fn.extract(haystack str, pattern regex) -> str

Fn.trim(string str) -> str

Fn.combine(a array, b array) -> array

Fn.substring(a str, start int, end int) -> str

Fn.length(a array) -> int

Fn.concat(a str, b str) -> str

Fn.and(a any, b any) -> bool

Fn.or(a any, b any) -> bool

Fn.not(a any, b any) -> bool

Fn.join(a array, separator str) -> str

## Custom formulas

Custom formulas are evaluated to the intermediate after all field formulas and aggegrations are applied in the same order as they are defined. This means that any previous formula is available in subsequent formulas.

There are three types of formulas:

Row formulas take input from other rows or aggregations and produce potentially different result for each row. Row formulas are the same as abovementioned field formulas. In addition to the field formula syntax the row formulas can be expressed as Python lambda functions, e.g. (row['a_alias'] - row['b_alias']) / row['b_alias']. The all row and aggregate functions and can be used as well.

Aggregate functions take a column to aggregate, and return same value for each row.

In addition to returning same aggregate for each row, aggregate formulas can be used to reduce or group rows. This is achieved by adding list of gouping fields using groupby parameter to the formula e.g. Fn.Sum('some_field', groupby=['otherfield','anotherfield']). Instead of returning total sum for each row the function returns sum over the defined group.

Some functions can produce cumulative aggregates based on an input column, for example date. They are useful for calculating year-to-date totals for example. For timeseries cumulative aggregations the time field must be presorted. Example for calculating YTD sales for each product type: Fn.CumSum('date', groupby=['product_type']). The data can be sorted using part definition or mutator functions.

Mutator functions change the shape of the report by reordering or reducing rows or columns. For example Fn.Sort('a_field'[, asc=True]) sorts the data based on given field, and optionally reverses the result. The function returns rank number. Another function is Fn.Reduce which reduces the number of rows by applying an aggregate function or functions to fields not used in grouping.

### Aggregate functions

Aggregate functions take field alias (field's data type is specified instead in the list below) as first argument to apply respective aggregation to. Return value is new field containing the aggregation result.

The optional groupby argument takes list of field aliases. The returned field's values are the result of given aggregation within each specified group.

The Quantile function takes field alias as an argument and the percentile as second argument. Returned field contains repective percentile value.

Fn.Max(a float|int[, groupby=[string]]) -> float|int

Fn.Min(a float|int[, groupby=[string]]) -> float|int

Fn.Count(a float|int[, groupby=[string]]) -> float|int

Fn.Sum(a float|int[, groupby=[string]]) -> float|int

Fn.CumSum(a float|int[, groupby=[string]]) -> float|int

Fn.Avg(a float|int[, groupby=[string]]) -> float|int

Fn.Median(a float|int[, groupby=[string]]) -> float|int

Fn.Mode(a any[, groupby=[string]]) -> any

Fn.Quantile(a float|int, b float) -> int

Fn.CountDistinct(a any[, groupby=[string]]) -> int

Fn.StringConcat(a any[, groupby=[string]]) -> [string]

Fn.UniqueList(a any[, groupby=[string]]) -> [any]

Fn.TakeFirst(a any[, groupby=[string]]) -> any

Fn.StdDev(a any[, groupby=[string]]) -> float

### Mutating functions

Mutating functions are special functions which change the layout (order or structure) of the data.

Fn.Sort(a any[, asc=Boolean]) -> int Sort the data using the alias field, and optionally reverse the result. Returns new column containing the rank value of the sort.

Fn.Group(by string[][, aggregator=string][, aggregators=Map<string, string>]) -> None Reduce the number of rows by applying aggregation to multiple columns at once. The aggregator argument applies same aggregation to all fields, default aggregator is Count. If the aggregator argument is specified then each specified field will have a separate aggregator. Return value for this function and all fields not specified in either groupby or reducers argument is None.

Example:

Fn.Group(['field','another_field'], aggregator='Sum')

Groups rows by column and another column, and applies Sum aggregator to all remaining fields. Returns an error if the operation cannot be completed (e.g. trying to apply numeric aggregator to string field).

Fn.Group(['column','another_column'], aggregators={'alias_to_sum':'Sum','to_count':'Count'})

Groups rows by column and another column, and applies Sum and Count aggregators to respective fields. All remaining fields are dropped (report will display an empty column). Returns an error if the operation cannot be completed (e.g. trying to apply Sum aggregator to a string field).