metrics monitor alert
CPU Memory Disk space Processes
Error and success rates Service failures and restarts Performance and latency of responses Resource usage
Connectivity Error rates and packet loss Latency Bandwidth utilization
Pooled resource usage Scaling adjustment indicators Degraded instances
Service status and availability Success and error rates Run rate and operational costs Resource exhaustion
To measure CPU, the following measurements might be appropriate:
Latency: Average or maximum delay in CPU scheduler Traffic: CPU utilization Errors: Processor specific error events, faulted CPUs Saturation: Run queue length
Latency: The time to complete requests Traffic: Number of requests per second served Errors: Application errors that occur when processing client requests or accessing resources Saturation: The percentage or amount of resources currently being used
Latency: Time it takes to receive a response from the service or to provision new resources from a provider Traffic: Amount of work being pushed to an external service, the number of requests being made to an external API Errors: Error rates for service requests Saturation: Amount of account-restricted resources used (instances, API requests, acceptable cost, etc.)
Latency: The time to complete user requests Traffic: Number of user requests per second Errors: Errors that occur when processing client requests or accessing resources Saturation: The percentage or amount of resources currently being used
可以采用pull or push的方式都可以。push的话，agent需要知道server的位置，并进行通信。pull的话，自己暴露在一个特定的endpoint上，服务端来拉。
For push-based systems, the metrics ingress endpoint is a central location on the network where each monitoring agent or stats aggregator sends its collected data. The endpoint should be able to authenticate and receive data from a large number of hosts simultaneously. Ingress endpoints for metrics systems are often load balanced or distributed at scale both for reliability and to keep up with high volumes of traffic.
For pull-based systems, the corresponding component is the polling mechanism that reaches out and parses the metrics endpoints exposed on individual hosts. This has some of the same requirements, but some responsibilities are reversed. For instance, if individual hosts implement authentication, the metrics gathering process must be able to provide the correct credentials to log in and access the secure endpoint.