Federating Prometheus Effectively

Efficiency and focus are the keys to success

Published in

Level Up Coding

4 min readOct 26, 2021

Federation allows a Prometheus server to scrape selected time series from another Prometheus server. Prometheus federation can be used to scale to hundreds of clusters or to pull related metrics from one service’s Prometheus into another. It supports hierarchical and cross-service federation, which are explained in the official documentation in more detail.

Configuring federation with simply way

The /federate endpoint allows retrieving the current value for a selected set of time series in that server on any given Prometheus server. Imagine you need to write a federation job for collecting the time series of the other Prometheus server. You often will meet the examples of the configurations like below when looking up “Prometheus federation configuration examples” on the internet.

The simple example of Prometheus job for federation

This example of the configuration is bad, and probably you will experience a problem with federation if you have e.g. ≈100K or more time series in your child-prometheus1:31090 instance. Why this configuration is bad, you’ll find out soon.

What problems you can have?

If you’ve ever got the following error message in the logs of your Prometheus server, then this topic is definitely will interest you.

level=error ts=2021-09-21T11:30:33.163676493Z caller=federate.go:163 component=web msg="federation failed" err="write tcp 192.168.22.145:9090->10.0.0.12:31090: write: broken pipe"

As discussed in this conversation thread in the prometheus-users mailing list federation is not intended to pull an unbounded number of time series or data replication. The error message mentioned above appears when large amounts of data are federated, then scrape_timeout is exceeded on the caller, which causes it to hang up. As the connection is closed, the client runs into a write error - broken pipe - because the other end disconnected.

Some engineers say that federation is not a good idea when collecting more than 10K time series, but I’d like to bring our practice as an example that we have a hierarchical federation with many endpoints in production, where we collect up to 400K time series on average from each endpoint. Needs to notice that the amount of time series depends on the value of scrape_interval and scrape_timeout also. Depending on them the amount of the ingested data can be different. Maybe you’ll ask, how we’ve achieved it?

Configuring federation with advanced way

As we said above the phrase bad configuration about the simple-federation job, let’s step by step understand why it is so bad.

The main reason is the {__name__=~".+"} expression under the match[] section, which means that the Prometheus server will collect all the time series from the child-prometheus1:31090 instance. You have to be sure, do you need all the time series of your child Prometheus instance? Maybe there is no meaning to collect all the time series. Here is another example for the federation which is configured more advanced way than the previous one.

The advanced example of Prometheus job for federation

Here you need to focus on two important points. The first one is the match[] parameter. As you already guess in this way your Prometheus will collect only time series whose metric names have kube_ , node_ or container_ prefix. It assumes that you avoid collecting the unnecessary time series and instead of that, you only keep the time series you need. The selector of query expression under match[] param can accept any label you have, e.g. job , id , etc.

The second important point is the metric_relabel_configs . In general, it is a nice part of the job configurations. According to this example, you’ll drop the time series that have id="static-agent" key-value pairs. Sometimes it can solve the problem such as deduplication (e.g. if you have time series with the same metric name).

Important to know.
The usage of metric_relabel_configs is not guarantees less time series injection. You should remember the Prometheus server collects all time series that match with the match[] query parameter. It means that Prometheus only drops the time series which match by the regex keyword after they have been collected by the Prometheus server, and after that the rest of the series stores in the TSDB, persistent storage.

Some best practices.

For avoiding huge data injection via federation there are some solutions that you can apply with the tricks which you have already learned in this story. The solutions are about data separation. Data separation can be either by job level or by database level. Let’s discover what does mean data separation?

Job-level data separation.
When we speak about job level data separation, it means that you can write more than one federation job for the same endpoint, which will ensure unique data injection, from each one. It’s preferred to have the same values for the scrape_interval and scrape_timeout. See the example below.

Example of Job level separated Prometheus federation jobs

Here we just split the advanced-federation job into two separate jobs appropriately with names advanced-federation1 and advanced-federation2 .

Database-level separation.

Suppose your child-prometheus1 instance is running on the Kubernetes cluster, so you can easily add a new deployment object e.g. with the name child-prometheus2 , which only will differ by its config map. In this case, you will need to move some of the jobs from the child-prometheus1 instance to the child-prometheus2 . Doing this way the Prometheus server will scrape data from two endpoints, as shown below.

Example of database level separated Prometheus federation jobs

Conclusion.

The last two examples work fine, but my advice is to use the job-level data separation instead of database-level data separation because you avoid having an additional endpoint in your infrastructure. Obviously, the first approach is an easier, more practical, and more professional solution.

Thanks for reading. I hope this story was helpful. If you are interested, check out my other Medium articles.