InfluxDB: Flux - Distinct() vs. Unique(): Unterschied zwischen den Versionen

Aus Wiki-WebPerfect
Wechseln zu: Navigation, Suche
K
 
(29 dazwischenliegende Versionen des gleichen Benutzers werden nicht angezeigt)
Zeile 1: Zeile 1:
''Because I was confused several times about which function is the right one (distinct() or unique()) I write this site.''
+
''Because I was confused several times about which function is the right one (distinct() or unique()) I write this site.''<br>
 +
 
 +
Long Story Short - The only difference between distinct() and unique(): '''Distinct() removes all other columns then the specified column, unique() does not do this.'''
 +
 
  
 
== Base Data ==
 
== Base Data ==
Zeile 29: Zeile 32:
  
 
== Distinct() ==
 
== Distinct() ==
The function [https://docs.influxdata.com/flux/v0.x/stdlib/universe/distinct/ distinct()] by default uses the column ''_value''. <br>
+
''The distinct() function '''returns the unique values for a given column'''. Null is considered its own distinct value if it is present.'' <br>
So if you use distinct() without parameters this is the output:<br>
+
''The _value of each output record is set to '''only the specified column'''. This means '''all other columns will be removed'''. Distinct() is a selector function.'' <br>
 +
''The function distinct() by default uses the column <code>_value</code>.'' <br>
 +
''For example I use distinct() mainly for '''Grafana template variables'''.'' <br>
 +
 
 +
Function documentation: https://docs.influxdata.com/flux/v0.x/stdlib/universe/distinct/
 +
 
 +
=== Example 1 - without parameter ===
 +
So if you use distinct() without parameter like following query: <br>
 
<pre>
 
<pre>
 
..
 
..
 
 
array.from(rows: var)
 
array.from(rows: var)
 
   |> distinct()
 
   |> distinct()
 
</pre>
 
</pre>
 +
 +
'''Output''' <br>
 
[[Datei:02-distinct without parameter.png]] <br>
 
[[Datei:02-distinct without parameter.png]] <br>
  
 +
This means following rows will be removed: <br>
 +
[[Datei:01-base-table after distinct without parameter.png]]<br>
  
 +
 +
=== Example 2 - with parameter "host" ===
 +
So if you use distinct() with parameter like following query: <br>
 +
<pre>
 +
..
 +
array.from(rows: var)
 +
  |> distinct(column: "host)
 +
</pre>
 +
 +
'''Output''' <br>
 +
[[Datei:03-distinct with parameter.png]] <br>
 +
 +
This means following rows will be removed: <br>
 +
[[Datei:01-base-table after distinct with parameter.png]]<br>
  
  
Zeile 45: Zeile 72:
  
 
== Unique() ==
 
== Unique() ==
 +
''The unique() function returns '''all records containing unique values in a specified column. Group keys, record columns, and values are not modified'''. Unique() is a selector function.'' <br>
 +
''Important to know: If unique() finds multiple times the same value in the specified column, then '''it selects the first/oldest data'''.''
 +
''The function unique() by default uses the column <code>_value</code>.'' <br>
 +
''For example I use unique() mainly for tables that I have to '''deduplicate on a specified column'''.'' <br>
 +
 +
Function documentation: https://docs.influxdata.com/flux/v0.x/stdlib/universe/unique/
 +
 +
=== Example 1 - without parameter ===
 +
So if you use unique() without parameter like following query: <br>
 +
<pre>
 +
..
 +
array.from(rows: var)
 +
  |> unique()
 +
</pre>
 +
 +
'''Output''' <br>
 +
[[Datei:01-unique without parameter.png]] <br>
 +
 +
This means following rows will be removed: <br>
 +
[[Datei:01-base-table after distinct without parameter.png]]<br>
 +
 +
 +
=== Example 2 - with parameter "host" ===
 +
So if you use unique() with parameter like following query: <br>
 +
<pre>
 +
..
 +
array.from(rows: var)
 +
  |> unique(column: "host)
 +
</pre>
 +
 +
'''Output''' <br>
 +
[[Datei:03-unique with parameter.png]] <br>
 +
 +
This means following rows will be removed: <br>
 +
[[Datei:01-base-table after distinct with parameter.png]]<br>
 +
 +
 +
 +
== Unique() - select the last/newest values ==
 +
Because of - ''if unique() finds multiple times the same value in the specified column, then '''it selects the first/oldest data'''.'' - here is an example how to use unique() but select the last/newest values.
 +
 +
You only have to add the '''sort() function before unique()''':
 +
<pre>
 +
..
 +
array.from(rows: var)
 +
  |> sort(columns: ["_time"], desc: true)
 +
  |> unique(column: "host")
 +
</pre>
 +
 +
'''Output''' <br>
 +
[[Datei:01-unique with sort.png]] <br>
 +
 +
This means instead of removing the yellow rows, the red rows will be removed: <br>
 +
[[Datei:02-baste-table after unique with sort.png]] <br>
  
  

Aktuelle Version vom 10. Dezember 2021, 11:57 Uhr

Because I was confused several times about which function is the right one (distinct() or unique()) I write this site.

Long Story Short - The only difference between distinct() and unique(): Distinct() removes all other columns then the specified column, unique() does not do this.


Base Data

All the the examples from this site are based from the following data:
01-base-table.png

Flux query to generate this data

import "array"

var = [
  {_time: time(v: "2021-12-10T00:01:00Z"), host: "HOST01", _field: "cpu_usage", _value: 37},
  {_time: time(v: "2021-12-10T00:02:00Z"), host: "HOST02", _field: "cpu_usage", _value: 72},
  {_time: time(v: "2021-12-10T00:03:00Z"), host: "HOST03", _field: "cpu_usage", _value: 88},
  {_time: time(v: "2021-12-10T00:04:00Z"), host: "HOST03", _field: "cpu_usage", _value: 11},
  {_time: time(v: "2021-12-10T00:05:00Z"), host: "HOST01", _field: "cpu_usage", _value: 37},
  {_time: time(v: "2021-12-10T00:06:00Z"), host: "HOST02", _field: "cpu_usage", _value: 90},
  {_time: time(v: "2021-12-10T00:07:00Z"), host: "HOST04", _field: "cpu_usage", _value: 90},
  {_time: time(v: "2021-12-10T00:08:00Z"), host: "HOST03", _field: "cpu_usage", _value: 77},
  {_time: time(v: "2021-12-10T00:09:00Z"), host: "HOST05", _field: "cpu_usage", _value: 57},
  {_time: time(v: "2021-12-10T00:10:00Z"), host: "HOST01", _field: "cpu_usage", _value: 13}
]

array.from(rows: var)



Distinct()

The distinct() function returns the unique values for a given column. Null is considered its own distinct value if it is present.
The _value of each output record is set to only the specified column. This means all other columns will be removed. Distinct() is a selector function.
The function distinct() by default uses the column _value.
For example I use distinct() mainly for Grafana template variables.

Function documentation: https://docs.influxdata.com/flux/v0.x/stdlib/universe/distinct/

Example 1 - without parameter

So if you use distinct() without parameter like following query:

..
array.from(rows: var)
  |> distinct()

Output
02-distinct without parameter.png

This means following rows will be removed:
01-base-table after distinct without parameter.png


Example 2 - with parameter "host"

So if you use distinct() with parameter like following query:

..
array.from(rows: var)
  |> distinct(column: "host)

Output
03-distinct with parameter.png

This means following rows will be removed:
01-base-table after distinct with parameter.png



Unique()

The unique() function returns all records containing unique values in a specified column. Group keys, record columns, and values are not modified. Unique() is a selector function.
Important to know: If unique() finds multiple times the same value in the specified column, then it selects the first/oldest data. The function unique() by default uses the column _value.
For example I use unique() mainly for tables that I have to deduplicate on a specified column.

Function documentation: https://docs.influxdata.com/flux/v0.x/stdlib/universe/unique/

Example 1 - without parameter

So if you use unique() without parameter like following query:

..
array.from(rows: var)
  |> unique()

Output
01-unique without parameter.png

This means following rows will be removed:
01-base-table after distinct without parameter.png


Example 2 - with parameter "host"

So if you use unique() with parameter like following query:

..
array.from(rows: var)
  |> unique(column: "host)

Output
03-unique with parameter.png

This means following rows will be removed:
01-base-table after distinct with parameter.png


Unique() - select the last/newest values

Because of - if unique() finds multiple times the same value in the specified column, then it selects the first/oldest data. - here is an example how to use unique() but select the last/newest values.

You only have to add the sort() function before unique():

..
array.from(rows: var)
  |> sort(columns: ["_time"], desc: true)
  |> unique(column: "host")

Output
01-unique with sort.png

This means instead of removing the yellow rows, the red rows will be removed:
02-baste-table after unique with sort.png