This is the multi-page printable view of this section. Click here to print.

Return to the regular view of this page.

Metrics

Detailed reference documentation about maiLink

1 - Metrics Explorer

Familiarize yourself with the Metrics Explorer.

The Metrics Explorer allows you to create useful time-sequence plots for evaluating metrics that your deployed devices send to the Cloud. The Metrics Explorer provides both the ability to select which metrics are to be uses, but also to preprocess the data using Query Functions and Query Operators, as well as alter the time period and sampling of data displayed.

Queries

To create a time-sequence plot in the Metrics Explorer in maiLink SRM you have to create a query in a language known as PromQL. maiLink SRM leverages VictoriaMetrics, which in turn relies on the Prometheus database. PromQL is the query language of Prometheus.

Raw Time-Sequence Queries

■ For a Specific Device

To plot the raw values (straight from the database) for a specified metric and a specified device, use a query that refers to the true Device ID:

my_metric{device="my_device_id"}
# example: heartbeat{device="123317-01A"}
■ For the Current Device

To plot the raw values (straight from the database) for a specified metric on the current device, use a query that subsitutes $device.id for the true Device ID (the current Device ID will be appropriately substituted when the query is performed):

my_metric{device="$device.id"}
# example: heartbeat{device="$device.id"}

Using $device.id lets a plot be created at the Device Type level and, when visualized for a specific device, have the data from this device plotted.

Accumulated Time-Sequence Queries

Calculations such as sums can be performed during the query in order to accumulate values over a time period. This helps reduce the number of data points and simplifies the plotting of data.

sum_over_time(my_metric{device="my_device_id"}[my_time_period])
# example: sum_over_time(ProbeTemp_C{device="$device.id"}[1m])

For the example shown above, assume a heartbeat comes every 10 seconds. The example queiry will create a value of about 6 because it is summing over a 1 minute time period (one should expect about 6 heartbeats to arrive at a rate of one every 10 seconds). Note that there may be some sampling error in this data (because of clock offsets), so you would get between 5 and 7 heartbeats per minute in reality.

■ Specifying Time Periods

Time period are specified using concatenated numbers and units, goverened by the regex expression [0-9]+([ywdhms]|ms) for each subsection. The units in each section should be one of:

Unit Description
ms milliseconds
s seconds
m minutes
h hours
d days (with 24 hours)
w weeks (with 7 days)
y years (with 365 days)

When periods are mixed, such as 1h30m, they should be constructed logically (largest units first).

Rolling Average of Accumulated Time-Sequence Queries

Rolling averages can be produced by averaging data over time. In the examples below tp represents a time period.

avg_over_time(sum_over_time(my_metric{device="my_device"}[tp1])[tp2])
# example: avg_over_time(sum_over_time(heartbeat{device="$device.id"}[1m])[2m30s])

In the above example the syntax of the function nesting means that the [1m] time period applies to the sum_over_time() function, and the [2m30s] time period applies to the avg_over_time() function. Also, simpler to the prior example, let’s assume the heartbeat comes every 10 seconds. As discussed above, the sum_over_time() value will be between 5 and 7. But if you do a rolling average using avg_over_time() you can expect the data to be smoothed to 6 heartbeats per minute, one every 10 seconds.

Additional Resources

Further information is available at:

2 - Query Basics

Familiarize yourself with the PromQL Query Syntax.

maiLink Telemetry supports the PromQL query syntax. Below is a basic guide to querying in maiLink Telemetry. Apply what you learn here in the maiLink Telemetry Metrics Explorer. This page is derived from PromQL documentation – it is possible to find the latest updates here.

maiLink Telemetry uses a functional query language called PromQL (Prometheus Query Language) that lets the user select and aggregate time series data in real time. The result of an expression is shown as a graph within maiLink SRM.

Expression language data types

In Prometheus’s expression language, an expression or sub-expression can evaluate to one of four types:

Type Description
Instant vector A set of time series containing a single sample for each time series, all sharing the same timestamp
Range vector A set of time series containing a range of data points over time for each time series
Scalar A simple numeric floating point value
String A simple string value; currently unused

Depending on the use-case (e.g. when graphing vs. displaying the output of an expression), only some of these types are legal as the result from a user-specified expression. For example, an expression that returns an instant vector is the only type that can be directly graphed.

Literals

■ String literals

Strings may be specified as literals in single quotes, double quotes or backticks.

PromQL follows the same escaping rules as Go. In single or double quotes a backslash begins an escape sequence, which may be followed by a, b, f, n, r, t, v or \. Specific characters can be provided using octal (\nnn) or hexadecimal (\xnn, \unnnn and \Unnnnnnnn).

No escaping is processed inside backticks. Unlike Go, PromQL does not discard newlines inside backticks.

Example:

"this is a string"
'these are unescaped: \n \\ \t'
`these are not unescaped: \n ' " \t`

■ Float literals

Scalar float values can be written as literal integer or floating-point numbers in the format (whitespace only included for better readability):

[-+]?(
    [0-9]*\\.?[0-9]+([eE][-+]?[0-9]+)?
  | 0[xX][0-9a-fA-F]+
  | [nN][aA][nN]
  | [iI][nN][fF]
)

Examples:

23
-2.43
3.4e-9
0x8f
-Inf
NaN

Time series Selectors

■ Instant vector selectors

Instant vector selectors allow the selection of a set of time series and a single sample value for each at a given timestamp (instant): in the simplest form, only a metric name is specified. This results in an instant vector containing elements for all time series that have this metric name.

This example selects all time series that have the http_requests_total metric name:

http_requests_total

It is possible to filter these time series further by appending a comma separated list of label matchers in curly braces ({}).

This example selects only those time series with the http_requests_total metric name that also have the job label set to prometheus and their group label set to canary:

http_requests_total{job="prometheus",group="canary"}

It is also possible to negatively match a label value, or to match label values against regular expressions. The following label matching operators exist:

Operator Description
=: Select labels that are exactly equal to the provided string.
!=: Select labels that are not equal to the provided string.
=~: Select labels that regex-match the provided string.
!~: Select labels that do not regex-match the provided string.

Regex matches are fully anchored. A match of env=~“foo” is treated as env=~"^foo$".

For example, this selects all http_requests_total time series for staging, testing, and development environments and HTTP methods other than GET.

http_requests_total{environment=~"staging|testing|development",method!="GET"}

Label matchers that match empty label values also select all time series that do not have the specific label set at all. It is possible to have multiple matchers for the same label name.

Vector selectors must either specify a name or at least one label matcher that does not match the empty string. The following expression is illegal:

{job=~".*"} # Bad!

In contrast, these expressions are valid as they both have a selector that does not match empty label values.

{job=~".+"}              # Good!
{job=~".*",method="get"} # Good!

Label matchers can also be applied to metric names by matching against the internal __name__ label. For example, the expression http_requests_total is equivalent to {__name__=“http_requests_total”}. Matchers other than = (!=, =~, !~) may also be used. The following expression selects all metrics that have a name starting with “job:":

{\_\_name__=~"job:.*"}

The metric name must not be one of the keywords bool, on, ignoring, group_left and group_right. The following expression is illegal:

on{} # Bad!

A workaround for this restriction is to use the __name__ label:

{__name__="on"} # Good!

All regular expressions in Prometheus use RE2 syntax.

■ Range Vector Selectors

Range vector literals work like instant vector literals, except that they select a range of samples back from the current instant. Syntactically, a time duration is appended in square brackets ([]) at the end of a vector selector to specify how far back in time values should be fetched for each resulting range vector element.

In this example, we select all the values we have recorded within the last 5 minutes for all time series that have the metric name http_requests_total and a job label set to prometheus:

http_requests_total{job=“prometheus”}[5m]

■ Time Durations

Time durations are specified as a number, followed immediately by one of the following units:

Duration Description
ms milliseconds
s seconds
m minutes
h hours
d days - assuming a day has always 24h
w weeks - assuming a week has always 7d
y years - assuming a year has always 365d

Time durations can be combined, by concatenation. Units must be ordered from the longest to the shortest. A given unit must only appear once in a time duration.

Here are some examples of valid time durations:

5h
1h30m
5m
10s

■ Offset modifier

The offset modifier allows changing the time offset for individual instant and range vectors in a query.

For example, the following expression returns the value of http_requests_total 5 minutes in the past relative to the current query evaluation time:

http_requests_total offset 5m

Note that the offset modifier always needs to follow the selector immediately, i.e. the following would be correct:

sum(http_requests_total{method="GET"} offset 5m) // GOOD.

While the following would be incorrect:

sum(http_requests_total{method="GET"}) offset 5m // INVALID.

The same works for range vectors. This returns the 5-minute rate that http_requests_total had a week ago:

rate(http_requests_total[5m] offset 1w)

For comparisons with temporal shifts forward in time, a negative offset can be specified:

rate(http_requests_total[5m] offset -1w)

Note that this allows a query to look ahead of its evaluation time.

■ @ modifier

The @ modifier allows changing the evaluation time for individual instant and range vectors in a query. The time supplied to the @ modifier is a unix timestamp and described with a float literal.

For example, the following expression returns the value of http_requests_total at 2021-01-04T07:40:00+00:00:

http_requests_total @ 1609746000

Note that the @ modifier always needs to follow the selector immediately, i.e. the following would be correct:

While the following would be incorrect:

sum(http_requests_total{method="GET"}) @ 1609746000 // INVALID.

The same works for range vectors. This returns the 5-minute rate that http_requests_total had at 2021-01-04T07:40:00+00:00:

rate(http_requests_total[5m] @ 1609746000)

The @ modifier supports all representation of float literals described above within the limits of int64. It can also be used along with the offset modifier where the offset is applied relative to the @ modifier time irrespective of which modifier is written first. These 2 queries will produce the same result.

# offset after @
http_requests_total @ 1609746000 offset 5m
# offset before @
http_requests_total offset 5m @ 1609746000

Additionally, start() and end() can also be used as values for the @ modifier as special values.

For a range query, they resolve to the start and end of the range query respectively and remain the same for all steps.

For an instant query, start() and end() both resolve to the evaluation time.

http_requests_total @ start()
rate(http_requests_total[5m] @ end())

Note that the @ modifier allows a query to look ahead of its evaluation time.

Subquery

Subquery allows you to run an instant query for a given range and resolution. The result of a subquery is a range vector.

Syntax: <instant_query> ‘[’ <range> ‘:’ [<resolution>] ‘]’ [ @ <float_literal> ] [ offset <duration> ]

  • <resolution> is optional. Default is the global evaluation interval.

Operators

Prometheus supports many binary and aggregation operators. These are described in detail in the expression language operators page.

Functions

Prometheus supports several functions to operate on data. These are described in detail in the expression language functions page.

Comments

PromQL supports line comments that start with #. Example:

# This is a comment

Gotchas

■ Staleness

When queries are run, timestamps at which to sample data are selected independently of the actual present time series data. This is mainly to support cases like aggregation (sum, avg, and so on), where multiple aggregated time series do not exactly align in time. Because of their independence, Prometheus needs to assign a value at those timestamps for each relevant time series. It does so by simply taking the newest sample before this timestamp.

If a target scrape or rule evaluation no longer returns a sample for a time series that was previously present, that time series will be marked as stale. If a target is removed, its previously returned time series will be marked as stale soon afterwards.

If a query is evaluated at a sampling timestamp after a time series is marked stale, then no value is returned for that time series. If new samples are subsequently ingested for that time series, they will be returned as normal.

If no sample is found (by default) 5 minutes before a sampling timestamp, no value is returned for that time series at this point in time. This effectively means that time series “disappear” from graphs at times where their latest collected sample is older than 5 minutes or after they are marked stale.

Staleness will not be marked for time series that have timestamps included in their scrapes. Only the 5 minute threshold will be applied in that case.

■ Avoiding slow queries and overloads

If a query needs to operate on a very large amount of data, graphing it might time out or overload the server or browser. Thus, when constructing queries over unknown data, always start building the query in the tabular view of Prometheus’s expression browser until the result set seems reasonable (hundreds, not thousands, of time series at most). Only when you have filtered or aggregated your data sufficiently, switch to graph mode. If the expression still takes too long to graph ad-hoc, pre-record it via a recording rule.

This is especially relevant for Prometheus’s query language, where a bare metric name selector like api_http_requests_total could expand to thousands of time series with different labels. Also keep in mind that expressions which aggregate over many time series will generate load on the server even if the output is only a small number of time series. This is similar to how it would be slow to sum all values of a column in a relational database, even if the output value is only a single number.

Additional Resources

Further information is available at:

3 - Query Examples

Some simple examples to help you understand Metrics Queries.

maiLink Telemetry supports the PromQL query syntax. Below is a set of examples of querying in maiLink Telemetry. Apply what you learn here in the maiLink Telemetry Metrics Explorer.

But First …

■ Specifying Devices

In maiLink Telemetry queries, you can use the device= syntax to specify the device context for plotting.

To display data for a specific device use:

deviceID="12345"

where “12345” is the unique Device Identifier for that device.

To display data for the “current” device use, and be able to use the very same query across other devices of the same type, use:

deviceID="$device.id"

■ Beware of Data Overload

The choices you make when querying data can impact performance. For instance, if you make the selected period too long, you may try and retrieve too much data and slow performance. If you make the sample rate too small, you may similarly try and retrieve too much data and slow performance.

■ Beware of Sampling Error

If you try and retreive too little data, you can run into other issues. For instance, if your selected period is too short, you might not observe important events within your data. Or, if you make the sample rate too long, then you might inadvertently wind up with aliasing, where you miss important details in the data.

Smoothing Data

You can easily smooth the data with functions like avg_over_time():

avg_over_time(OvenTemperature__C{device=$device.id})[5m]

In this instance the temperature is averaged over time, creating a rolling average of your temperature. This smooths the data to let you observe more general trends in the data. It gets rid of the “noise” by effectively applying a low-pass filter.

Query Examples

■ Example 1

Suppose you have a been sending a metric called “OvenTemperature__C” to maiLink Metrics. To plot the time series you can use the query:

OvenTemperature__C{device=$device.id}

This will display the raw data from the time series … for data over the selected period, sampled at the selected sample rate. Be aware that there is a balance you must make.

■ Example 2

Suppose you have a been sending a software heartbeat to maiLink SRM once per minute. In theory you should be receiving 60 heartbeats per hour. The raw matric you could plot is:

MySoftwareHeartbeat{device=$device.id}

This should plot as a horizontal line, which isn’t very interesting. Instead, perhaps you might plot the number of heartbeats received per hour. This would allow you to see if the rate ever changes, like taking the derivative of the data. You can do this by summing the data over a period of an hour:

sum_over_time(MySoftwareHeartbeat{device=$device.id})[1h]

Additional Resources

Further information is available at:

4 - Query Functions

This document provides comprehensive descriptions of all the built-in functions of PromQL.

maiLink Telemetry supports the PromQL query syntax, which uses the functions described below. This page is derived from PromQL documentation, which may be more recent here.

Some functions have default arguments, e.g. year(v=vector(time()) instant-vector). This means that there is one argument v which is an instant vector, which if not provided it will default to the value of the expression vector(time()).

Regular Functions

Regular functions operate on input vectors from the Telemetry database. See also Summary Functions and Trigonemetric Functions, below.

abs()

abs(v instant-vector) returns the input vector with all sample values converted to their absolute value.

absent()

absent(v instant-vector) returns an empty vector if the vector passed to it has any elements and a 1-element vector with the value 1 if the vector passed to it has no elements.

This is useful for alerting on when no time series exist for a given metric name and label combination.

absent(nonexistent{job="myjob"})
# => {job="myjob"}
absent(nonexistent{job="myjob",instance=~".*"})
# => {job="myjob"}
absent(sum(nonexistent{job="myjob"}))
\# => {}

In the first two examples, absent() tries to be smart about deriving labels of the 1-element output vector from the input vector.

absent_over_time()

absent_over_time(v range-vector) returns an empty vector if the range vector passed to it has any elements and a 1-element vector with the value 1 if the range vector passed to it has no elements.

This is useful for alerting on when no time series exist for a given metric name and label combination for a certain amount of time.

absent_over_time(nonexistent{job="myjob"}[1h])
# => {job="myjob"}
absent_over_time(nonexistent{job="myjob",instance=~".*"}[1h])
# => {job="myjob"}
absent_over_time(sum(nonexistent{job="myjob"})[1h:])
# => {}

In the first two examples, absent_over_time() tries to be smart about deriving labels of the 1-element output vector from the input vector.

ceil()

ceil(v instant-vector) rounds the sample values of all elements in v up to the nearest integer.

changes()

For each input time series, changes(v range-vector) returns the number of times its value has changed within the provided time range as an instant vector.

clamp()

clamp(v instant-vector, min scalar, max scalar) clamps the sample values of all elements in v to have a lower limit of min and an upper limit of max. Special cases are:

Returns an empty vector if min > max
Returns NaN if min or max is NaN

clamp_max()

clamp_max(v instant-vector, max scalar) clamps the sample values of all elements in v to have an upper limit of max.

clamp_min()

clamp_min(v instant-vector, min scalar) clamps the sample values of all elements in v to have a lower limit of min.

day_of_month()

day_of_month(v=vector(time()) instant-vector) returns the day of the month for each of the given times in UTC. Returned values are from 1 to 31.

day_of_week()

day_of_week(v=vector(time()) instant-vector) returns the day of the week for each of the given times in UTC. Returned values are from 0 to 6, where 0 means Sunday etc.

day_of_year()

day_of_year(v=vector(time()) instant-vector) returns the day of the year for each of the given times in UTC. Returned values are from 1 to 365 for non-leap years, and 1 to 366 in leap years.

days_in_month()

days_in_month(v=vector(time()) instant-vector) returns number of days in the month for each of the given times in UTC. Returned values are from 28 to 31.

delta()

delta(v range-vector) calculates the difference between the first and last value of each time series element in a range vector v, returning an instant vector with the given deltas and equivalent labels. The delta is extrapolated to cover the full time range as specified in the range vector selector, so that it is possible to get a non-integer result even if the sample values are all integers.

The following example expression returns the difference in CPU temperature between now and 2 hours ago:

delta(cpu_temp_celsius{host="zeus"}[2h])

delta should only be used with maiLink Telemetry metrics.

deriv()

deriv(v range-vector) calculates the per-second derivative of the time series in a range vector v, using simple linear regression.

deriv should only be used with maiLink Telemetry metrics.

exp()

exp(v instant-vector) calculates the exponential function for all elements in v. Special cases are:

Exp(+Inf) = +Inf
Exp(NaN) = NaN

floor()

floor(v instant-vector) rounds the sample values of all elements in v down to the nearest integer.

histogram_quantile()

histogram_quantile(φ scalar, b instant-vector) calculates the φ-quantile (0 ≤ φ ≤ 1) from the buckets b of a histogram. (See histograms and summaries for a detailed explanation of φ-quantiles and the usage of the histogram metric type in general.) The samples in b are the counts of observations in each bucket. Each sample must have a label le where the label value denotes the inclusive upper bound of the bucket. (Samples without such a label are silently ignored.) The histogram metric type automatically provides time series with the _bucket suffix and the appropriate labels.

Use the rate() function to specify the time window for the quantile calculation.

Example: A histogram metric is called http_request_duration_seconds. To calculate the 90th percentile of request durations over the last 10m, use the following expression:

histogram_quantile(0.9, rate(http_request_duration_seconds_bucket[10m]))

The quantile is calculated for each label combination in http_request_duration_seconds. To aggregate, use the sum() aggregator around the rate() function. Since the le label is required by histogram_quantile(), it has to be included in the by clause. The following expression aggregates the 90th percentile by job:

histogram_quantile(0.9, sum by (job, le) (rate(http_request_duration_seconds_bucket[10m])))

To aggregate everything, specify only the le label:

histogram_quantile(0.9, sum by (le) (rate(http_request_duration_seconds_bucket[10m])))

The histogram_quantile() function interpolates quantile values by assuming a linear distribution within a bucket. The highest bucket must have an upper bound of +Inf. (Otherwise, NaN is returned.) If a quantile is located in the highest bucket, the upper bound of the second highest bucket is returned. A lower limit of the lowest bucket is assumed to be 0 if the upper bound of that bucket is greater than 0. In that case, the usual linear interpolation is applied within that bucket. Otherwise, the upper bound of the lowest bucket is returned for quantiles located in the lowest bucket.

If b has 0 observations, NaN is returned. If b contains fewer than two buckets, NaN is returned. For φ < 0, -Inf is returned. For φ > 1, +Inf is returned. For φ = NaN, NaN is returned.

holt_winters()

holt_winters(v range-vector, sf scalar, tf scalar) produces a smoothed value for time series based on the range in v. The lower the smoothing factor sf, the more importance is given to old data. The higher the trend factor tf, the more trends in the data is considered. Both sf and tf must be between 0 and 1.

holt_winters should only be used with maiLink Telemetry metrics.

hour()

hour(v=vector(time()) instant-vector) returns the hour of the day for each of the given times in UTC. Returned values are from 0 to 23.

idelta()

idelta(v range-vector) calculates the difference between the last two samples in the range vector v, returning an instant vector with the given deltas and equivalent labels.

idelta should only be used with maiLink Telemetry metrics.

increase()

increase(v range-vector) calculates the increase in the time series in the range vector. Breaks in monotonicity (such as counter resets due to target restarts) are automatically adjusted for. The increase is extrapolated to cover the full time range as specified in the range vector selector, so that it is possible to get a non-integer result even if a counter increases only by integer increments.

The following example expression returns the number of HTTP requests as measured over the last 5 minutes, per time series in the range vector:

increase(http_requests_total{job="api-server"}[5m])

increase should only be used with counters. It is syntactic sugar for rate(v) multiplied by the number of seconds under the specified time range window, and should be used primarily for human readability. Use rate in recording rules so that increases are tracked consistently on a per-second basis.

irate()

irate(v range-vector) calculates the per-second instant rate of increase of the time series in the range vector. This is based on the last two data points. Breaks in monotonicity (such as counter resets due to target restarts) are automatically adjusted for.

The following example expression returns the per-second rate of HTTP requests looking up to 5 minutes back for the two most recent data points, per time series in the range vector:

irate(http_requests_total{job="api-server"}[5m])

irate should only be used when graphing volatile, fast-moving counters. Use rate for alerts and slow-moving counters, as brief changes in the rate can reset the FOR clause and graphs consisting entirely of rare spikes are hard to read.

Note that when combining irate() with an aggregation operator (e.g. sum()) or a function aggregating over time (any function ending in _over_time), always take a irate() first, then aggregate. Otherwise irate() cannot detect counter resets when your target restarts.

label_join()

For each timeseries in v, label_join(v instant-vector, dst_label string, separator string, src_label_1 string, src_label_2 string, …) joins all the values of all the src_labels using separator and returns the timeseries with the label dst_label containing the joined value. There can be any number of src_labels in this function.

This example will return a vector with each time series having a foo label with the value a,b,c added to it:

label_join(up{job="api-server",src1="a",src2="b",src3="c"}, "foo", ",", "src1", "src2", "src3")

label_replace()

For each timeseries in v, label_replace(v instant-vector, dst_label string, replacement string, src_label string, regex string) matches the regular expression regex against the value of the label src_label. If it matches, the value of the label dst_label in the returned timeseries will be the expansion of replacement, together with the original labels in the input. Capturing groups in the regular expression can be referenced with $1, $2, etc. If the regular expression doesn’t match then the timeseries is returned unchanged.

This example will return timeseries with the values a:c at label service and a at label foo:

label_replace(up{job="api-server",service="a:c"}, "foo", "$1", "service", "(.*):.*")

ln()

ln(v instant-vector) calculates the natural logarithm for all elements in v. Special cases are:

ln(+Inf) = +Inf
ln(0) = -Inf
ln(x < 0) = NaN
ln(NaN) = NaN

log2()

log2(v instant-vector) calculates the binary logarithm for all elements in v. The special cases are equivalent to those in ln.

log10()

log10(v instant-vector) calculates the decimal logarithm for all elements in v. The special cases are equivalent to those in ln.

minute()

minute(v=vector(time()) instant-vector) returns the minute of the hour for each of the given times in UTC. Returned values are from 0 to 59.

month()

month(v=vector(time()) instant-vector) returns the month of the year for each of the given times in UTC. Returned values are from 1 to 12, where 1 means January etc.

predict_linear()

predict_linear(v range-vector, t scalar) predicts the value of time series t seconds from now, based on the range vector v, using simple linear regression.

predict_linear should only be used with maiLink Telemetry metrics.

rate()

rate(v range-vector) calculates the per-second average rate of increase of the time series in the range vector. Breaks in monotonicity (such as counter resets due to target restarts) are automatically adjusted for. Also, the calculation extrapolates to the ends of the time range, allowing for missed scrapes or imperfect alignment of scrape cycles with the range’s time period.

The following example expression returns the per-second rate of HTTP requests as measured over the last 5 minutes, per time series in the range vector:

rate(http_requests_total{job="api-server"}[5m])<p>

rate should only be used with counters. It is best suited for alerting, and for graphing of slow-moving counters.

Note that when combining rate() with an aggregation operator (e.g. sum()) or a function aggregating over time (any function ending in _over_time), always take a rate() first, then aggregate. Otherwise rate() cannot detect counter resets when your target restarts.

resets()

For each input time series, resets(v range-vector) returns the number of counter resets within the provided time range as an instant vector. Any decrease in the value between two consecutive samples is interpreted as a counter reset.

resets should only be used with counters.

round()

round(v instant-vector, to_nearest=1 scalar) rounds the sample values of all elements in v to the nearest integer. Ties are resolved by rounding up. The optional to_nearest argument allows specifying the nearest multiple to which the sample values should be rounded. This multiple may also be a fraction.

scalar()

Given a single-element input vector, scalar(v instant-vector) returns the sample value of that single element as a scalar. If the input vector does not have exactly one element, scalar will return NaN.

sgn()

sgn(v instant-vector) returns a vector with all sample values converted to their sign, defined as this: 1 if v is positive, -1 if v is negative and 0 if v is equal to zero.

sort()

sort(v instant-vector) returns vector elements sorted by their sample values, in ascending order.

sort_desc()

Same as sort, but sorts in descending order.

sqrt()

sqrt(v instant-vector) calculates the square root of all elements in v.

time()

time() returns the number of seconds since January 1, 1970 UTC. Note that this does not actually return the current time, but the time at which the expression is to be evaluated.

timestamp()

timestamp(v instant-vector) returns the timestamp of each of the samples of the given vector as the number of seconds since January 1, 1970 UTC.

vector()

vector(s scalar) returns the scalar s as a vector with no labels.

year()

year(v=vector(time()) instant-vector) returns the year for each of the given times in UTC.

Summary Functions

Summary functions provide operations across a time series. The names are always appended with “_over_time”.

<aggregation>_over_time()

The following functions allow aggregating each series of a given range vector over time and return an instant vector with per-series aggregation results:

Function Description
avg_over_time(range-vector) The average value of all points in the specified interval.
min_over_time(range-vector) The minimum value of all points in the specified interval.
max_over_time(range-vector) The maximum value of all points in the specified interval.
sum_over_time(range-vector) The sum of all values in the specified interval.
count_over_time(range-vector) The count value of all values in the specified interval.
quantile_over_time(range-vector) The φ-quantile (0 ≤ φ ≤ 1) of the values in the specified interval.
stddev_over_time(range-vector) The population standard deviation of the values in the specified interval.
stdvar_over_time(range-vector) The population standard variance of the values in the specified interval.
last_over_time(range-vector) The most recent point value in specified interval
present_over_time(range-vector) The value 1 for any series in the specified interval.

Note that all values in the specified interval have the same weight in the aggregation even if the values are not equally spaced throughout the interval.

Trigonometric Functions

The trigonometric functions work in radians:

Function Description
acos(v instant-vector) Calculates the arccosine of all elements in v (special cases).
acosh(v instant-vector) Calculates the inverse hyperbolic cosine of all elements in v (special cases).
asin(v instant-vector) Calculates the arcsine of all elements in v (special cases).
asinh(v instant-vector) Calculates the inverse hyperbolic sine of all elements in v (special cases).
atan(v instant-vector) Calculates the arctangent of all elements in v (special cases).
atanh(v instant-vector) Calculates the inverse hyperbolic tangent of all elements in v (special cases).
cos(v instant-vector) Calculates the cosine of all elements in v (special cases).
cosh(v instant-vector) Calculates the hyperbolic cosine of all elements in v (special cases).
sin(v instant-vector) Calculates the sine of all elements in v (special cases).
sinh(v instant-vector) Calculates the hyperbolic sine of all elements in v (special cases).
tan(v instant-vector) Calculates the tangent of all elements in v (special cases).
tanh(v instant-vector) Calculates the hyperbolic tangent of all elements in v (special cases).

Angle Conversion Functions

The following additional functions are useful when working with Trigonometric Functions:

Function Description
deg(v instant-vector) Converts radians to degrees for all elements in v.
pi() Returns pi.
rad(v instant-vector) Converts degrees to radians for all elements in v.

Additional Resources

Further information is available at:

5 - Query Operators

Learn about the operators available in PromQL.

maiLink Telemetry supports the PromQL query syntax, which uses the operators described below. This page is derived from PromQL documentation, which may be more recent here.

Operators

There are several kinds of operators available in PromQL that support complex queries.

Binary operators

PromQL supports basic logical and arithmetic operators. For operations between two instant vectors, the matching behavior can be modified.

■ Arithmetic binary operators

The following binary arithmetic operators exist in Prometheus:

+   addition
-   subtraction
*   multiplication
/   division
%   modulo
^   power/exponentiation

Binary arithmetic operators are defined between scalar/scalar, vector/scalar, and vector/vector value pairs.

Between two scalars, the behavior is obvious: they evaluate to another scalar that is the result of the operator applied to both scalar operands.

Between an instant vector and a scalar, the operator is applied to the value of every data sample in the vector. E.g. if a time series instant vector is multiplied by 2, the result is another vector in which every sample value of the original vector is multiplied by 2. The metric name is dropped.

Between two instant vectors, a binary arithmetic operator is applied to each entry in the left-hand side vector and its matching element in the right-hand vector. The result is propagated into the result vector with the grouping labels becoming the output label set. The metric name is dropped. Entries for which no matching entry in the right-hand vector can be found are not part of the result.

■ Trigonometric binary operators

The following trigonometric binary operators, which work in radians, exist in Prometheus:

atan2   returns the arc tangent of y/x

Trigonometric operators allow trigonometric functions to be executed on two vectors using vector matching, which isn’t available with normal functions. They act in the same manner as arithmetic operators.

■ Comparison binary operators

The following binary comparison operators exist in Prometheus:

==   equal
!=   not-equal
>    greater-than
<    less-than
>=   greater-or-equal
<=   less-or-equal

Comparison operators are defined between scalar/scalar, vector/scalar, and vector/vector value pairs. By default they filter. Their behavior can be modified by providing bool after the operator, which will return 0 or 1 for the value rather than filtering.

Between two scalars, the bool modifier must be provided and these operators result in another scalar that is either 0 (false) or 1 (true), depending on the comparison result.

Between an instant vector and a scalar, these operators are applied to the value of every data sample in the vector, and vector elements between which the comparison result is false get dropped from the result vector. If the bool modifier is provided, vector elements that would be dropped instead have the value 0 and vector elements that would be kept have the value 1. The metric name is dropped if the bool modifier is provided.

Between two instant vectors, these operators behave as a filter by default, applied to matching entries. Vector elements for which the expression is not true or which do not find a match on the other side of the expression get dropped from the result, while the others are propagated into a result vector with the grouping labels becoming the output label set. If the bool modifier is provided, vector elements that would have been dropped instead have the value 0 and vector elements that would be kept have the value 1, with the grouping labels again becoming the output label set. The metric name is dropped if the bool modifier is provided.

■ Logical/set binary operators

These logical/set binary operators are only defined between instant vectors:

and     intersection
or       union
unless   complement

vector1 and vector2 results in a vector consisting of the elements of vector1 for which there are elements in vector2 with exactly matching label sets. Other elements are dropped. The metric name and values are carried over from the left-hand side vector.

vector1 or vector2 results in a vector that contains all original elements (label sets + values) of vector1 and additionally all elements of vector2 which do not have matching label sets in vector1.

vector1 unless vector2 results in a vector consisting of the elements of vector1 for which there are no elements in vector2 with exactly matching label sets. All matching elements in both vectors are dropped.

■ Binary operator precedence

The following list shows the precedence of binary operators in Prometheus, from highest to lowest.

  1. ^
  2. *, /, %, atan2
  3. +, -
  4. ==, !=, <=, <, >=, >
  5. and, unless
  6. or

Operators on the same precedence level are left-associative. For example, 2 * 3 % 2 is equivalent to (2 * 3) % 2. However ^ is right associative, so 2 ^ 3 ^ 2 is equivalent to 2 ^ (3 ^ 2).#### Vector matching operators Operations between vectors attempt to find a matching element in the right-hand side vector for each entry in the left-hand side. There are two basic types of matching behavior: One-to-one and many-to-one/one-to-many.

Vector matching operators

Operations between vectors attempt to find a matching element in the right-hand side vector for each entry in the left-hand side. There are two basic types of matching behavior: One-to-one and many-to-one/one-to-many.

■ One-to-one vector matches

One-to-one finds a unique pair of entries from each side of the operation. In the default case, that is an operation following the format vector1 <operator> vector2. Two entries match if they have the exact same set of labels and corresponding values. The ignoring keyword allows ignoring certain labels when matching, while the on keyword allows reducing the set of considered labels to a provided list:

<vector expr> <bin-op> ignoring(<label list>) <vector expr>
<vector expr> <bin-op> on(<label list>) <vector expr>

Example input:

method_code:http_errors:rate5m{method="get", code="500"} 24
method_code:http_errors:rate5m{method="get", code="404"} 30
method_code:http_errors:rate5m{method="put", code="501"} 3
method_code:http_errors:rate5m{method="post", code="500"} 6
method_code:http_errors:rate5m{method="post", code="404"} 21
method:http_requests:rate5m{method="get"} 600
method:http_requests:rate5m{method="del"} 34
method:http_requests:rate5m{method="post"} 120

Example query:

method_code:http_errors:rate5m{code="500"} / ignoring(code) method:http_requests:rate5m

This returns a result vector containing the fraction of HTTP requests with status code of 500 for each method, as measured over the last 5 minutes. Without ignoring(code) there would have been no match as the metrics do not share the same set of labels. The entries with methods put and del have no match and will not show up in the result:

{method="get"}  0.04            //  24 / 600
{method="post"} 0.05            //   6 / 120
■ Many-to-one and one-to-many vector matches

Many-to-one and one-to-many matchings refer to the case where each vector element on the “one”-side can match with multiple elements on the “many”-side. This has to be explicitly requested using the group_left or group_right modifier, where left/right determines which vector has the higher cardinality.

<vector expr> <bin-op> ignoring(<label list>) group_left(<label list>) <vector expr><p>
<vector expr> <bin-op> ignoring(<label list>) group_right(<label list>) <vector expr><p>
<vector expr> <bin-op> on(<label list>) group_left(<label list>) <vector expr><p>
<vector expr> <bin-op> on(<label list>) group_right(<label list>) <vector expr><p>

The label list provided with the group modifier contains additional labels from the “one”-side to be included in the result metrics. For on a label can only appear in one of the lists. Every time series of the result vector must be uniquely identifiable.

Note: Grouping modifiers can only be used for comparison and arithmetic. Operations as and, unless and or operations match with all possible entries in the right vector by default.

Example query:

method_code:http_errors:rate5m / ignoring(code) group_left method:http_requests:rate5m In this case the left vector contains more than one entry per method label value. Thus, we indicate this using group_left. The elements from the right side are now matched with multiple elements with the same method label on the left:

{method="get", code="500"}  0.04            //  24 / 600
{method="get", code="404"}  0.05            //  30 / 600
{method="post", code="500"} 0.05            //   6 / 120
{method="post", code="404"} 0.175           //  21 / 120

Many-to-one and one-to-many matching are advanced use cases that should be carefully considered. Often a proper use of ignoring() provides the desired outcome.

Aggregation operators

Prometheus supports the following built-in aggregation operators that can be used to aggregate the elements of a single instant vector, resulting in a new vector of fewer elements with aggregated values:

Operator Description
sum Calculate sum over dimensions
min Select minimum over dimensions
max Select maximum over dimensions
avg Calculate the average over dimensions
group All values in the resulting vector are 1
stddev Calculate population standard deviation over dimensions
stdvar Calculate population standard variance over dimensions
count Count number of elements in the vector
count_values Count number of elements with the same value. Outputs one time series per unique sample value. Each series has an additional label. The name of that label is given by the aggregation parameter, and the label value is the unique sample value. The value of each time series is the number of times that sample value was present.
bottomk Smallest k elements by sample value. Different from other aggregators in that a subset of the input samples, including the original labels, are returned in the result vector. by and without are only used to bucket the input vector.
topk Largest k elements by sample value. Different from other aggregators in that a subset of the input samples, including the original labels, are returned in the result vector. by and without are only used to bucket the input vector.
quantile Calculates the φ-quantile, the value that ranks at number φ*N among the N metric values of the dimensions aggregated over. φ is provided as the aggregation parameter. For example, quantile(0.5, …) calculates the median, quantile(0.95, …) the 95th percentile. For φ = NaN, NaN is returned. For φ < 0, -Inf is returned. For φ > 1, +Inf is returned.

These operators can either be used to aggregate over all label dimensions or preserve distinct dimensions by including a without or by clause. These clauses may be used before or after the expression.

<aggr-op> [without|by (<label list>)] ([parameter,] <vector expression>)

or

<aggr-op>([parameter,] <vector expression>) [without|by (<label list>)]

label list is a list of unquoted labels that may include a trailing comma, i.e. both (label1, label2) and (label1, label2,) are valid syntax.

without removes the listed labels from the result vector, while all other labels are preserved in the output. by does the opposite and drops labels that are not listed in the by clause, even if their label values are identical between all elements of the vector.

parameter is only required for count_values, quantile, topk and bottomk.

Example:

If the metric http_requests_total had time series that fan out by application, instance, and group labels, we could calculate the total number of seen HTTP requests per application and group over all instances via:

sum without (instance) (http_requests_total)

Which is equivalent to:

sum by (application, group) (http_requests_total)

If we are just interested in the total of HTTP requests we have seen in all applications, we could simply write:

sum(http_requests_total)

To count the number of binaries running each build version we could write:

count_values("version", build_version)

To get the 5 largest HTTP requests counts across all instances we could write:

topk(5, http_requests_total)

Additional Resources

Further information is available at: