Table of contents

CloudBees Core on modern cloud platforms administration guide


On this page

Alerting

CloudBees Core automatically monitors its infrastructure elements and all of the Jenkins masters that it manages. These alerts can be helpful in troubleshooting.

The CloudBees Core infrastructure element monitoring includes Operations Center and Managed Masters. For the various infrastructure nodes, it monitors the following metrics.

  • Available disk space

  • CPU utilization for the most recent 5 minutes

  • RAM utilization for the most recent 5 minutes

If any of the data points for these metrics exceed 90% or more, a threshold which is currently immutable, CloudBees Core will emit an alert, for example:

Health checks failing: [worker-14: Disk util at 95%, worker-9: Worker down]

The following table show the possible error messages and corresponding descriptions.

Table 1. Possible Failure Messages
Messages Descriptions

Disk util at <number>%

Disk utilization reaches 90% or higher

RAM util at <number>%

RAM utilization reaches 90% or higher for five or more minutes

CPU util at <number>%

Total CPU utilization reaches 90% or higher for five or more minutes. The percent utilization is normalized to 100% across all CPU’s on the node.

Additional monitoring is available with the Elasticsearch Reporter plugin.