InfluxDB: Flux - Aggregate the Sum over Time
AggregateWindow() with sum() = unexpected behaviour
The Flux function aggregateWindow() is often used for timeseries data for example in a graph visualization.
Mostly you use the aggregation function mean() inside aggregateWindow() and then everything is as you expect, but if you use the aggregation function sum() inside the aggregateWindow() the behaviour is a little bit strange (at least for me).
The "problem" repectively the unexpected behaviour occurs if you have irregular time series.
"Problem" explanation / Difference between mean() and sum()
For example you have following timeseries data (memory in GB): _time, vm, _value 2021-08-13 08:00:00 GMT+2, vm01, 4GB 2021-08-13 08:01:00 GMT+2, vm01, 4GB 2021-08-13 08:02:00 GMT+2, vm01, 4GB 2021-08-13 08:02:30 GMT+2, vm01, 4GB 2021-08-13 08:03:00 GMT+2, vm01, 4GB
Result of AggregateWindow() with mean() "|> aggregateWindow(every: 1m, fn: mean)" 2021-08-13 08:00:00 GMT+2, vm01, 4GB 2021-08-13 08:01:00 GMT+2, vm01, 4GB 2021-08-13 08:02:00 GMT+2, vm01, 4GB 2021-08-13 08:03:00 GMT+2, vm01, 4GB
Result of AggregateWindow() with sum() "|> aggregateWindow(every: 1m, fn: sum)" 2021-08-13 08:00:00 GMT+2, vm01, 4GB 2021-08-13 08:01:00 GMT+2, vm01, 4GB 2021-08-13 08:02:00 GMT+2, vm01, 8GB 2021-08-13 08:03:00 GMT+2, vm01, 4GB
//define variables
bucket = "${dynamicbucket}"
fields = /Health_.+/ //Regex that matches both fields
window = ${window}
customsum = (tables=<-, column="_value") =>
tables |> mean() |> drop(columns: ["host", "_field"]) |> sum(column)
from(bucket: bucket)
|> range(start: v.timeRangeStart, stop: v.timeRangeStop) |> filter(fn: (r) => r._measurement == "hyperv_health" and r._field =~ fields )
|> aggregateWindow(every: window, fn: customsum, createEmpty: false) |> fill(column: "_value", usePrevious: true) |> toInt() //round the result