NetDATA

68.2k 5.7k

01 May, 2024

What is Netdata ?

Netdata collects metrics per second and presents them in beautiful low-latency dashboards. It is designed to run on all of your physical and virtual servers, cloud deployments, Kubernetes clusters, and edge/IoT devices, to monitor your systems, containers, and applications.

It scales nicely from just a single server to thousands of servers, even in complex multi/mixed/hybrid cloud environments, and given enough disk space it can keep your metrics for years.

Netdata Features

Collects metrics from 800+ integrations Operating system metrics, container metrics, virtual machines, hardware sensors, applications metrics, OpenMetrics exporters, StatsD, and logs.
Real-Time, Low-Latency, High-Resolution All metrics are collected per second and are on the dashboard immediately after data collection. Netdata is designed to be fast.
Unsupervised Anomaly Detection Trains multiple Machine-Learning (ML) models for each metric collected and detects anomalies based on the past behavior of each metric individually.
Powerful Visualization Clear and precise visualization that allows you to quickly understand any dataset, but also to filter, slice and dice the data directly on the dashboard, without the need to learn any query language.
Out of box Alerts Comes with hundreds of alerts out of the box to detect common issues and pitfalls, revealing issues that can easily go unnoticed. It supports several notification methods to let you know when your attention is needed.
systemd Journal Logs Explorer Provides a systemd journal logs explorer, to view, filter and analyze system and applications logs by directly accessing systemd journal files on individual hosts and infrastructure-wide logs centralization servers.
Low Maintenance Fully automated in every aspect: automated dashboards, out-of-the-box alerts, auto-detection and auto-discovery of metrics, zero-touch machine-learning, easy scalability and high availability, and CI/CD friendly.
Open and Extensible Netdata is a modular platform that can be extended in all possible ways and it also integrates nicely with other monitoring solutions.

What’s New and Coming?

What	Description	When	Status
WebRTC	Browser to Agent communication via WebRTC.	later	POC
Advanced Troubleshooting	Expanded view of dashboard charts integrating Metrics Correlations, Anomaly Advisor, and many more.	later	interrupted
Easy CustomDashboards	Drag and drop charts to create custom dashboards on the fly, while troubleshooting!	soon	planned
More Customizability	Set default settings for all charts and views!	soon	planned
UCUM Units	Migrate all metrics to the Unified Code for Units of Measure.	soon	in progress
Click to Activate	Configure Alerts and Data Collectors from the UI!	soon	in progress
Netdata CloudOn-Prem	Netdata Cloud available for On-Prem installation!	available	fill this form
`systemd` journal	View the `systemd` journal logs of your systems on the dashboard.	Oct2023	v1.43
Integrations	Netdata Integrations Marketplace!	Aug2023	v1.42
New Agent UI	Now Netdata Cloud and Netdata Agent share the same dashboard!	Jul2023	v1.41
Summary Dashboards	High level tiles everywhere!	Jun2023	v1.40
Machine Learning	Multiple ML models per metric.	Jun2023	v1.40
SSL	Netdata Agent gets a new SSL layer.	Jun2023	v1.40
New Cloud UI	Filter, slice and dice any dataset from the UI! ML-first!	May2023	v1.39
Microsoft Windows	Monitor Windows hosts and apps!	May2023	v1.39
Virtual Nodes	Go collectors can now be assigned to virtual nodes!	May2023	v1.39
DBENGINE v2	Faster, more reliable, far more scalable!	Feb2023	v1.38
Netdata Functions	Netdata beyond metrics! Monitoring anything!	Feb2023	v1.38
Events Feed	Live feed of events about topology changes and alerts.	Feb2023	v1.38
Role BasedAccess Control	More roles, offering finer control over access to infrastructure.	Feb2023	v1.38
Infinite Scalability	Streaming compression. Replication. Active-active clustering.	Nov2022	v1.37
Grafana Plugin	Netdata Cloud as a data source for Grafana.	Nov2022	v1.37
PostgreSQL	Completely rewritten, to reveal all the info, even at the table level.	Nov2022	v1.37
Metrics Correlations	Advanced algorithms to find the needle in the haystack.	Aug2022	v1.36
Database Tiering	Netdata gets unlimited retention!	Aug2022	v1.36
Kubernetes	Monitor your Kubernetes workloads.	Aug2022	v1.36
Machine Learning	Anomaly Rate information on every chart.	Aug2022	v1.36
Machine Learning	Anomaly Advisor! Bottom-up unsupervised anomaly detection.	Jun2022	v1.35
Machine Learning	Metrics Correlation on the Agent.	Jun2022	v1.35

Getting Started

1. Install Netdata everywhere :v:

Netdata can be installed on all Linux, macOS, and FreeBSD systems. We provide binary packages for the most popular operating systems and package managers.

Install on Ubuntu, Debian CentOS, Fedora, Suse, Red Hat, Arch, Alpine, Gentoo, even BusyBox.
Install with Docker. Netdata is a Verified Publisher on DockerHub and our users enjoy free unlimited DockerHub pulls :heart_eyes:.
Install on MacOS :metal:.
Install on FreeBSD and pfSense.
Install from source
For Kubernetes deployments check here.

Check also the Netdata Deployment Strategies to decide how to deploy it in your infrastructure.

By default, you will have immediately available a local dashboard. Netdata starts a web server for its dashboard at port 19999. Open up your web browser of choice and navigate to http://NODE:19999, replacing NODE with the IP address or hostname of your Agent. If installed on localhost, you can access it through http://localhost:19999.

2. Configure Collectors :boom:

Netdata auto-detects and auto-discovers most operating system data sources and applications. However, many data sources require some manual configuration, usually to allow Netdata to get access to the metrics.

For a detailed list of the 800+ collectors available, check this guide.
To monitor Windows servers and applications use this guide.
To monitor SNMP devices check this guide.

3. Configure Alert Notifications :bell:

Netdata comes with hundreds of pre-configured alerts, that automatically check your metrics, immediately after they start getting collected.

Netdata can dispatch alert notifications to multiple third party systems, including: email, Alerta, AWS SNS, Discord, Dynatrace, flock, gotify, IRC, Matrix, MessageBird, Microsoft Teams, ntfy, OPSgenie, PagerDuty, Prowl, PushBullet, PushOver, RocketChat, Slack, SMS tools, Syslog, Telegram, Twilio.

By default, Netdata will send e-mail notifications, if there is a configured MTA on the system.

4. Configure Netdata Parents :family:

Optionally, configure one or more Netdata Parents. A Netdata Parent is a Netdata Agent that has been configured to accept streaming connections from other Netdata agents.

Netdata Parents provide:

Infrastructure level dashboards, at http://parent.server.ip:19999/.

Each Netdata Agent has an API listening at the TCP port 19999 of each server. When you hit that port with a web browser (e.g. http://server.ip:19999/), the Netdata Agent UI is presented. When the Netdata Agent is also a Parent, the UI of the Parent includes data for all nodes that stream metrics to that Parent.
Increased retention for all metrics of all your nodes.

Each Netdata Agent maintains each own database of metrics. But Parents can be given additional resources to maintain a much longer database than individual Netdata Agents.
Central configuration of alerts and dispatch of notifications.

Using Netdata Parents, all the alert notifications integrations can be configured only once, at the Parent and they can be disabled at the Netdata Agents.

You can also use Netdata Parents to:

Offload your production systems (the parents run ML, alerts, queries, etc. for all their children)
Secure your production systems (the parents accept user connections, for all their children)

5. Connect to Netdata Cloud :cloud:

Optionally, sign-in to Netdata Cloud and claim your Netdata Agents and Parents. If you connect your Netdata Parents, there is no need to connect your Netdata Agents. They will be connected via the Parents.

When your Netdata nodes are connected to Netdata Cloud, you can (on top of the above):

Access your Netdata agents from anywhere
Access sensitive Netdata agent features (like “Netdata Functions”: processes, systemd-journal)
Organize your infra in spaces and rooms
Create, manage, and share custom dashboards
Invite your team and assign roles to them (Role Based Access Control - RBAC)
Get infinite horizontal scalability (multiple independent Netdata Agents are viewed as one infra)
Configure alerts from the UI (coming soon)
Configure data collection from the UI (coming soon)
Netdata Mobile App notifications (coming soon)

:love_you_gesture: Netdata Cloud does not prevent you from using your Netdata Agents and Parents directly, and vice versa.

:ok_hand: Your metrics are still stored in your network when you connect your Netdata Agents and Parents to Netdata Cloud.

How it works

Netdata is built around a modular metrics processing pipeline.

Each Netdata Agent can perform the following functions:

COLLECT metrics from their sources Uses internal and external plugins to collect data from their sources.

Netdata auto-detects and collects almost everything from the operating system: including CPU, Interrupts, Memory, Disks, Mount Points, Filesystems, Network Stack, Network Interfaces, Containers, VMs, Processes, systemd units, Linux Performance Metrics, Linux eBPF, Hardware Sensors, IPMI, and more.

It collects application metrics from applications: PostgreSQL, MySQL/MariaDB, Redis, MongoDB, Nginx, Apache, and hundreds more.

Netdata also collects your custom application metrics by scraping OpenMetrics exporters, or via StatsD.

It can convert web server log files to metrics and apply ML and alerts to them, in real-time.

And it also supports synthetic tests / white box tests, so you can ping servers, check API responses, or even check filesystem files and directories to generate metrics, train ML and run alerts and notifications on their status.
STORE metrics to a database Uses database engine plugins to store the collected data, either in memory and/or on disk. We have developed our own dbengine for storing the data in a very efficient manner, allowing Netdata to have less than 1 byte per sample on disk and amazingly fast queries.
LEARN the behavior of metrics (ML) Trains multiple Machine-Learning (ML) models per metric to learn the behavior of each metric individually. Netdata uses the kmeans algorithm and creates by default a model per metric per hour, based on the values collected for that metric over the last 6 hours. The trained models are persisted to disk.
DETECT anomalies in metrics (ML) Uses the trained machine learning (ML) models to detect outliers and mark collected samples as anomalies. Netdata stores anomaly information together with each sample and also streams it to Netdata Parents so that the anomaly is also available at query time for the whole retention of each metric.
CHECK metrics and trigger alert notifications Uses its configured alerts (you can configure your own) to check the metrics for common issues and uses notifications plugins to send alert notifications.
STREAM metrics to other Netdata Agents Push metrics in real-time to Netdata Parents.
ARCHIVE metrics to 3rd party databases Export metrics to industry standard time-series databases, like Prometheus, InfluxDB, OpenTSDB, Graphite, etc.
QUERY metrics and present dashboards Provide an API to query the data and present interactive dashboards to users.
SCORE metrics to reveal similarities and patterns Score the metrics according to the given criteria, to find the needle in the haystack.

When using Netdata Parents, all the functions of a Netdata Agent (except data collection) can be delegated to Parents to offload production systems.

The core of Netdata is developed in C. We have our own libnetdata, that provides:

DICTIONARY A high-performance algorithm to maintain both indexed and ordered pools of structures Netdata needs. It uses JudyHS arrays for indexing, although it is modular: any hashtable or tree can be integrated into it. Despite being in C, dictionaries follow object-oriented programming principles, so there are constructors, destructors, automatic memory management, garbage collection, and more. For more see here.
ARAL ARray ALlocator (ARAL) is used to minimize the system allocations made by Netdata. ARAL is optimized for peak performance when multi-threaded. It also allows all structures that use it to be allocated in memory-mapped files (shared memory) instead of RAM. For more see here.
PROCFILE A high-performance /proc (but also any) file parser and text tokenizer. It achieves its performance by keeping files open and adjustings its buffers to read the entire file in one call (which is also required by the Linux kernel). For more see here.
STRING A string internet mechanism, for string deduplication and indexing (using JudyHS arrays), optimized for multi-threaded usage. For more see here.
ARL Adaptive Resortable List (ARL), is a very fast list iterator, that keeps the expected items on the list in the same order they are found in input list. So, the first iteration is somewhat slower, but all the following iterations are perfectly aligned for best performance. For more see here.
BUFFER A flexible text buffer management system that allows Netdata to automatically handle dynamically sized text buffer allocations. The same mechanism is used for generating consistent JSON output by the Netdata APIs. For more see here.
SPINLOCK Like POSIX MUTEX and RWLOCK but a lot faster, based on atomic operations, with significantly smaller memory impact, while being portable.
PGC A caching layer that can be used to cache any kind of time-related data, with automatic indexing (based on a tree of JudyL arrays), memory management, evictions, flushing, pressure management. This is extensively used in dbengine. For more see here.

The above, and many more, allow Netdata developers to work on the application fast and with confidence. Most of the business logic in Netdata is a work of mixing the above.

Netdata data collection plugins can be developed in any language. Most of our application collectors though are developed in Go.