InfluxDB: Flux - Aggregate the Sum over Time
Inhaltsverzeichnis
AggregateWindow() with sum() = unexpected behaviour
The Flux function aggregateWindow() is often used for timeseries data for example in a graph visualization.
Mostly you use the aggregation function mean() inside aggregateWindow() and then everything is as you expect, but if you use the aggregation function sum() inside the aggregateWindow() the behaviour is a little bit strange (at least for me).
The "problem" repectively the unexpected behaviour occurs if you have irregular data points.
"Problem" explanation / Difference between mean() and sum()
For example you want the sum of all VMs memory over time. -> This is example is simplified with only one timeserie.
For example you have following timeseries data (memory in GB):
_time, vm, _value
2021-08-13 08:00:00 GMT+2, vm01, 4GB 2021-08-13 08:01:00 GMT+2, vm01, 4GB 2021-08-13 08:02:00 GMT+2, vm01, 4GB 2021-08-13 08:02:30 GMT+2, vm01, 4GB 2021-08-13 08:03:00 GMT+2, vm01, 4GB
Result of AggregateWindow() with mean() "|> aggregateWindow(every: 1m, fn: mean)"
2021-08-13 08:00:00 GMT+2, vm01, 4GB 2021-08-13 08:01:00 GMT+2, vm01, 4GB 2021-08-13 08:02:00 GMT+2, vm01, 4GB 2021-08-13 08:03:00 GMT+2, vm01, 4GB
Result of AggregateWindow() with sum() "|> aggregateWindow(every: 1m, fn: sum)"
2021-08-13 08:00:00 GMT+2, vm01, 4GB 2021-08-13 08:01:00 GMT+2, vm01, 4GB 2021-08-13 08:02:00 GMT+2, vm01, 8GB 2021-08-13 08:03:00 GMT+2, vm01, 4GB
In the example above everything looks logical, but if you have multiple timeseries (for example multiple VMs) and want the sum of all VMs memory over time, then the behaviour is really strange.
Workaround to prevent the unexpected behaviour
In the following workaround we use the same example; you want the sum of all VMs memory over time.
//define variables bucket = "<YOUR_BUCKET>" field = "<YOUR_FIELD>" window = "<YOUR_WINDOW>" //create a custom function for your usecase customsum = (tables=<-, column="_value") => tables |> mean() //calculates the mean to prevent unexpected behaviours with irregular timeseries |> drop(columns: ["host"]) //drops the "host" because we want the sum overall and a VM can switch the host, we use drop() because we can't use group() it removes some timestamps |> unique(column: "VMName") //removes duplicated timeseries (for example the VM is on multiple hosts in the timewindow) |> drop(columns: ["VMName"]) //drops the "VMName" because we want the sum overall |> sum(column) //calculates the sum from(bucket: bucket) |> range(start: v.timeRangeStart, stop: v.timeRangeStop) |> filter(fn: (r) => r._measurement == "hyperv_health" and r._field == field ) |> aggregateWindow(every: window, fn: customsum, createEmpty: false) |> fill(column: "_value", usePrevious: true) |> toInt() //round the result
Important Notes (with aggregateWindow() and sum() inside)
If the gathering of the data took some time (a few minutes or more) then maybe your graph looks weird because your windowing time is shorter then this gathering time.
This means in some windows you have more data then in other windows.
Solution
Increase your windowing time (have to be bigger than the gather time).
Example time to gather took in average 15 minutes:
... |> aggregateWindow(every: duration(v: int(v: v.windowPeriod) + int(v: 15m)), fn: customsum, createEmpty: false)