Kubernetes InitContainers are a neat way to run arbitrary code before your container starts. It ensures that certain pre-conditions are met before your app is up and running. For example it allows you to:
- run database migrations with Django or Rails before your app starts
- ensure a microservice or API you depend on to is running
Unfortunately InitContainers can fail and when that happens you probably want to be notified because your app will never start. Kube-state-metrics exposes plenty of Kubernetes cluster metrics for Prometheus. Combining the two we can monitor and alert whenever we discover container problems. Recently, a pull-request was merged that provides InitContainer data.
The metric kube_pod_init_container_status_last_terminated_reason
tells us why a specific InitContainer failed to run; whether it's because it timed out or ran into errors.
To use the InitContainer metrics deploy Prometheus and kube-state-metrics. Then target the metrics server in your Prometheus scrape_configs to ensure we're pulling all the cluster metrics into Prometheus:
- job_name: 'kube-state-metrics'
static_configs:
- targets: ['kube-state-metrics:8080']
kube_pod_init_container_status_last_terminated_reason
contains the metric label reason
that can be in five different states:
- Completed
- OOMKilled
- Error
- ContainerCannotRun
- DeadlineExceeded
We want to be alerted whenever a metric that is not 'Completed' is scraped because that means an InitContainer has failed to run. Here is an example alerting rule.
groups:
- name: Init container failure
rules:
- alert: InitContainersFailed
expr: kube_pod_init_container_status_last_terminated_reason{reason!="Completed"} == 1
annotations:
summary: '{{ $labels.container }} init failed'
description: '{{ $labels.container }} has not completed init containers with the reason {{ $labels.reason }}'
Happy monitoring!