Nomad Autoscaler Telemetry

The Nomad Autoscaler agent collects various runtime metrics about the performance of different libraries and subsystems. These metrics are aggregated on a ten second interval and are retained for one minute. To configure the telemetry output please see the agent configuration.

This data can be accessed via the /v1/metrics HTTP endpoint, via sending a signal to the Nomad Autoscaler process or via a number of integrations.

To view this data via sending a signal to the Nomad Autoscaler process: on Unix, this is USR1 while on Windows it is BREAK. Once Nomad Autoscaler receives the signal, it will dump the current telemetry information to the agent's stderr.

This telemetry information can be used for debugging or otherwise getting a better view of what Nomad is doing.

Below is sample output of a telemetry dump:

[2020-08-25 10:01:20 +0100 BST][G] 'nomad-autoscaler.pathfinder.runtime.sys_bytes': 74793216.000
[2020-08-25 10:01:20 +0100 BST][G] 'nomad-autoscaler.pathfinder.runtime.malloc_count': 219856.000
[2020-08-25 10:01:20 +0100 BST][G] 'nomad-autoscaler.pathfinder.runtime.free_count': 183613.000
[2020-08-25 10:01:20 +0100 BST][G] 'nomad-autoscaler.pathfinder.runtime.total_gc_pause_ns': 348822.000
[2020-08-25 10:01:20 +0100 BST][G] 'nomad-autoscaler.pathfinder.runtime.total_gc_runs': 5.000
[2020-08-25 10:01:20 +0100 BST][G] 'nomad-autoscaler.pathfinder.runtime.num_goroutines': 12.000
[2020-08-25 10:01:20 +0100 BST][G] 'nomad-autoscaler.pathfinder.policy.total_num': 0.000
[2020-08-25 10:01:20 +0100 BST][G] 'nomad-autoscaler.pathfinder.runtime.alloc_bytes': 4316568.000
[2020-08-25 10:01:20 +0100 BST][G] 'nomad-autoscaler.pathfinder.runtime.heap_objects': 36243.000
[2020-08-25 10:01:20 +0100 BST][S] 'nomad-autoscaler.runtime.gc_pause_ns': Count: 5 Min: 38083.000 Mean: 69764.400 Max: 122291.000 Stddev: 31487.808 Sum: 348822.000 LastUpdated: 2020-08-25 10:01:26.574809 +0100 BST m=+1.241576679
[2020-08-25 10:01:30 +0100 BST][G] 'nomad-autoscaler.pathfinder.runtime.alloc_bytes': 4370504.000
[2020-08-25 10:01:30 +0100 BST][G] 'nomad-autoscaler.pathfinder.runtime.malloc_count': 220853.000
[2020-08-25 10:01:30 +0100 BST][G] 'nomad-autoscaler.pathfinder.runtime.free_count': 183613.000
[2020-08-25 10:01:30 +0100 BST][G] 'nomad-autoscaler.pathfinder.policy.total_num': 0.000
[2020-08-25 10:01:30 +0100 BST][G] 'nomad-autoscaler.pathfinder.runtime.num_goroutines': 12.000
[2020-08-25 10:01:30 +0100 BST][G] 'nomad-autoscaler.pathfinder.runtime.total_gc_pause_ns': 348822.000
[2020-08-25 10:01:30 +0100 BST][G] 'nomad-autoscaler.pathfinder.runtime.total_gc_runs': 5.000
[2020-08-25 10:01:30 +0100 BST][G] 'nomad-autoscaler.pathfinder.runtime.sys_bytes': 74793216.000
[2020-08-25 10:01:30 +0100 BST][G] 'nomad-autoscaler.pathfinder.runtime.heap_objects': 37240.000

Runtime Metrics

The runtime metrics help understand the Nomad Autoscaler agent's memory and load pressure performance.

Metric Description Type
nomad-autoscaler.runtime.num_goroutines Number of running goroutines Gauge
nomad-autoscaler.runtime.alloc_bytes The number of allocated heap bytes Gauge
nomad-autoscaler.runtime.sys_bytes The total bytes of memory obtained from the OS Gauge
nomad-autoscaler.runtime.malloc_count Cumulative count of heap objects allocated Gauge
nomad-autoscaler.runtime.free_count Cumulative count of heap objects freed Gauge
nomad-autoscaler.runtime.heap_objects Number of allocated heap objects Gauge
nomad-autoscaler.runtime.total_gc_pause_ns Cumulative nanoseconds in GC stop-the-world pauses Gauge
nomad-autoscaler.runtime.total_gc_runs Number of completed GC cycles Gauge
nomad-autoscaler.runtime.gc_pause_ns Number of nanoseconds to complete the last GC cycle Timer

Policy Metrics

Policy metrics provide insights into the performance of the Nomad Autoscaler's policy handling.

Metric Description Type Labels
nomad-autoscaler.policy.total_num The number of policies currently held within the autoscaler Gauge
nomad-autoscaler.policy.source.error_count Tracks the number of errors generated by the policy sources Counter policy_source

Scaling Metrics

Scaling metrics provide insight into the performance of scaling actions as well as overall success and failure counters.

Metric Description Type Labels
nomad-autoscaler.scale.evaluate_ms The time taken to evaluate the checks within a single policy Timer policy_id, target_name
nomad-autoscaler.scale.invoke_ms The time taken to invoke scaling based on the scaling evaluations Timer policy_id, target_name
nomad-autoscaler.scale.invoke.success_count Tracks the number of successful scaling actions triggered Counter
nomad-autoscaler.scale.invoke.error_count Tracks the number of unsuccessful scaling actions triggered Counter

Plugin Metrics

Plugin metrics provide insight into the performance of Nomad Autoscaler plugins and help identify potential bottle necks or latency issues.

Metric Description Type Labels
nomad-autoscaler.plugin.manager.access_ms The time taken to dispense a plugin Timer
nomad-autoscaler.target.status.invoke_ms The time taken to perform the target plugin status call Timer policy_id, plugin_name
nomad-autoscaler.target.scale.invoke_ms The time taken to perform the target plugin scale call Timer policy_id, plugin_name
nomad-autoscaler.apm.query.invoke_ms The time taken to perform the APM plugin query call Timer policy_id, plugin_name
nomad-autoscaler.strategy.run.invoke_ms The time taken to perform the strategy plugin run call Timer policy_id, plugin_name