InfluxDB: Flux - Distinct() vs. Unique(): Unterschied zwischen den Versionen
Admin (Diskussion | Beiträge) |
Admin (Diskussion | Beiträge) K |
||
(19 dazwischenliegende Versionen des gleichen Benutzers werden nicht angezeigt) | |||
Zeile 1: | Zeile 1: | ||
− | ''Because I was confused several times about which function is the right one (distinct() or unique()) I write this site.'' | + | ''Because I was confused several times about which function is the right one (distinct() or unique()) I write this site.''<br> |
+ | |||
+ | Long Story Short - The only difference between distinct() and unique(): '''Distinct() removes all other columns then the specified column, unique() does not do this.''' | ||
+ | |||
== Base Data == | == Base Data == | ||
Zeile 31: | Zeile 34: | ||
''The distinct() function '''returns the unique values for a given column'''. Null is considered its own distinct value if it is present.'' <br> | ''The distinct() function '''returns the unique values for a given column'''. Null is considered its own distinct value if it is present.'' <br> | ||
''The _value of each output record is set to '''only the specified column'''. This means '''all other columns will be removed'''. Distinct() is a selector function.'' <br> | ''The _value of each output record is set to '''only the specified column'''. This means '''all other columns will be removed'''. Distinct() is a selector function.'' <br> | ||
− | ''The function | + | ''The function distinct() by default uses the column <code>_value</code>.'' <br> |
+ | ''For example I use distinct() mainly for '''Grafana template variables'''.'' <br> | ||
− | So if you use distinct() without | + | Function documentation: https://docs.influxdata.com/flux/v0.x/stdlib/universe/distinct/ |
+ | |||
+ | === Example 1 - without parameter === | ||
+ | So if you use distinct() without parameter like following query: <br> | ||
<pre> | <pre> | ||
.. | .. | ||
Zeile 42: | Zeile 49: | ||
'''Output''' <br> | '''Output''' <br> | ||
[[Datei:02-distinct without parameter.png]] <br> | [[Datei:02-distinct without parameter.png]] <br> | ||
− | |||
This means following rows will be removed: <br> | This means following rows will be removed: <br> | ||
Zeile 48: | Zeile 54: | ||
+ | === Example 2 - with parameter "host" === | ||
+ | So if you use distinct() with parameter like following query: <br> | ||
+ | <pre> | ||
+ | .. | ||
+ | array.from(rows: var) | ||
+ | |> distinct(column: "host) | ||
+ | </pre> | ||
+ | |||
+ | '''Output''' <br> | ||
+ | [[Datei:03-distinct with parameter.png]] <br> | ||
+ | |||
+ | This means following rows will be removed: <br> | ||
+ | [[Datei:01-base-table after distinct with parameter.png]]<br> | ||
Zeile 53: | Zeile 72: | ||
== Unique() == | == Unique() == | ||
+ | ''The unique() function returns '''all records containing unique values in a specified column. Group keys, record columns, and values are not modified'''. Unique() is a selector function.'' <br> | ||
+ | ''Important to know: If unique() finds multiple times the same value in the specified column, then '''it selects the first/oldest data'''.'' | ||
+ | ''The function unique() by default uses the column <code>_value</code>.'' <br> | ||
+ | ''For example I use unique() mainly for tables that I have to '''deduplicate on a specified column'''.'' <br> | ||
+ | |||
+ | Function documentation: https://docs.influxdata.com/flux/v0.x/stdlib/universe/unique/ | ||
+ | |||
+ | === Example 1 - without parameter === | ||
+ | So if you use unique() without parameter like following query: <br> | ||
+ | <pre> | ||
+ | .. | ||
+ | array.from(rows: var) | ||
+ | |> unique() | ||
+ | </pre> | ||
+ | |||
+ | '''Output''' <br> | ||
+ | [[Datei:01-unique without parameter.png]] <br> | ||
+ | |||
+ | This means following rows will be removed: <br> | ||
+ | [[Datei:01-base-table after distinct without parameter.png]]<br> | ||
+ | |||
+ | |||
+ | === Example 2 - with parameter "host" === | ||
+ | So if you use unique() with parameter like following query: <br> | ||
+ | <pre> | ||
+ | .. | ||
+ | array.from(rows: var) | ||
+ | |> unique(column: "host) | ||
+ | </pre> | ||
+ | |||
+ | '''Output''' <br> | ||
+ | [[Datei:03-unique with parameter.png]] <br> | ||
+ | |||
+ | This means following rows will be removed: <br> | ||
+ | [[Datei:01-base-table after distinct with parameter.png]]<br> | ||
+ | |||
+ | |||
+ | |||
+ | == Unique() - select the last/newest values == | ||
+ | Because of - ''if unique() finds multiple times the same value in the specified column, then '''it selects the first/oldest data'''.'' - here is an example how to use unique() but select the last/newest values. | ||
+ | |||
+ | You only have to add the '''sort() function before unique()''': | ||
+ | <pre> | ||
+ | .. | ||
+ | array.from(rows: var) | ||
+ | |> sort(columns: ["_time"], desc: true) | ||
+ | |> unique(column: "host") | ||
+ | </pre> | ||
+ | |||
+ | '''Output''' <br> | ||
+ | [[Datei:01-unique with sort.png]] <br> | ||
+ | |||
+ | This means instead of removing the yellow rows, the red rows will be removed: <br> | ||
+ | [[Datei:02-baste-table after unique with sort.png]] <br> | ||
Aktuelle Version vom 10. Dezember 2021, 11:57 Uhr
Because I was confused several times about which function is the right one (distinct() or unique()) I write this site.
Long Story Short - The only difference between distinct() and unique(): Distinct() removes all other columns then the specified column, unique() does not do this.
Inhaltsverzeichnis
Base Data
All the the examples from this site are based from the following data:
Flux query to generate this data
import "array" var = [ {_time: time(v: "2021-12-10T00:01:00Z"), host: "HOST01", _field: "cpu_usage", _value: 37}, {_time: time(v: "2021-12-10T00:02:00Z"), host: "HOST02", _field: "cpu_usage", _value: 72}, {_time: time(v: "2021-12-10T00:03:00Z"), host: "HOST03", _field: "cpu_usage", _value: 88}, {_time: time(v: "2021-12-10T00:04:00Z"), host: "HOST03", _field: "cpu_usage", _value: 11}, {_time: time(v: "2021-12-10T00:05:00Z"), host: "HOST01", _field: "cpu_usage", _value: 37}, {_time: time(v: "2021-12-10T00:06:00Z"), host: "HOST02", _field: "cpu_usage", _value: 90}, {_time: time(v: "2021-12-10T00:07:00Z"), host: "HOST04", _field: "cpu_usage", _value: 90}, {_time: time(v: "2021-12-10T00:08:00Z"), host: "HOST03", _field: "cpu_usage", _value: 77}, {_time: time(v: "2021-12-10T00:09:00Z"), host: "HOST05", _field: "cpu_usage", _value: 57}, {_time: time(v: "2021-12-10T00:10:00Z"), host: "HOST01", _field: "cpu_usage", _value: 13} ] array.from(rows: var)
Distinct()
The distinct() function returns the unique values for a given column. Null is considered its own distinct value if it is present.
The _value of each output record is set to only the specified column. This means all other columns will be removed. Distinct() is a selector function.
The function distinct() by default uses the column _value
.
For example I use distinct() mainly for Grafana template variables.
Function documentation: https://docs.influxdata.com/flux/v0.x/stdlib/universe/distinct/
Example 1 - without parameter
So if you use distinct() without parameter like following query:
.. array.from(rows: var) |> distinct()
This means following rows will be removed:
Example 2 - with parameter "host"
So if you use distinct() with parameter like following query:
.. array.from(rows: var) |> distinct(column: "host)
This means following rows will be removed:
Unique()
The unique() function returns all records containing unique values in a specified column. Group keys, record columns, and values are not modified. Unique() is a selector function.
Important to know: If unique() finds multiple times the same value in the specified column, then it selects the first/oldest data.
The function unique() by default uses the column _value
.
For example I use unique() mainly for tables that I have to deduplicate on a specified column.
Function documentation: https://docs.influxdata.com/flux/v0.x/stdlib/universe/unique/
Example 1 - without parameter
So if you use unique() without parameter like following query:
.. array.from(rows: var) |> unique()
This means following rows will be removed:
Example 2 - with parameter "host"
So if you use unique() with parameter like following query:
.. array.from(rows: var) |> unique(column: "host)
This means following rows will be removed:
Unique() - select the last/newest values
Because of - if unique() finds multiple times the same value in the specified column, then it selects the first/oldest data. - here is an example how to use unique() but select the last/newest values.
You only have to add the sort() function before unique():
.. array.from(rows: var) |> sort(columns: ["_time"], desc: true) |> unique(column: "host")
This means instead of removing the yellow rows, the red rows will be removed: