> 技术文档 > Prometheus + Grafana + Micrometer 监控方案详解

Prometheus + Grafana + Micrometer 监控方案详解

这套组合是当前Java生态中最流行的监控解决方案之一,特别适合云原生环境下的微服务应用监控。下面我将从技术实现到最佳实践进行全面解析。

一、技术栈组成与协作

1. 组件分工

组件 角色 关键能力 Micrometer 应用指标门面(Facade) 统一指标采集API,对接多种监控系统 Prometheus 时序数据库+采集器 指标存储、查询、告警规则处理 Grafana 可视化平台 仪表盘展示、数据可视化分析

2. 数据流动

#mermaid-svg-pvt0C6mNB0xmskp0 {font-family:\"trebuchet ms\",verdana,arial,sans-serif;font-size:16px;fill:#333;}#mermaid-svg-pvt0C6mNB0xmskp0 .error-icon{fill:#552222;}#mermaid-svg-pvt0C6mNB0xmskp0 .error-text{fill:#552222;stroke:#552222;}#mermaid-svg-pvt0C6mNB0xmskp0 .edge-thickness-normal{stroke-width:2px;}#mermaid-svg-pvt0C6mNB0xmskp0 .edge-thickness-thick{stroke-width:3.5px;}#mermaid-svg-pvt0C6mNB0xmskp0 .edge-pattern-solid{stroke-dasharray:0;}#mermaid-svg-pvt0C6mNB0xmskp0 .edge-pattern-dashed{stroke-dasharray:3;}#mermaid-svg-pvt0C6mNB0xmskp0 .edge-pattern-dotted{stroke-dasharray:2;}#mermaid-svg-pvt0C6mNB0xmskp0 .marker{fill:#333333;stroke:#333333;}#mermaid-svg-pvt0C6mNB0xmskp0 .marker.cross{stroke:#333333;}#mermaid-svg-pvt0C6mNB0xmskp0 svg{font-family:\"trebuchet ms\",verdana,arial,sans-serif;font-size:16px;}#mermaid-svg-pvt0C6mNB0xmskp0 .actor{stroke:hsl(259.6261682243, 59.7765363128%, 87.9019607843%);fill:#ECECFF;}#mermaid-svg-pvt0C6mNB0xmskp0 text.actor>tspan{fill:black;stroke:none;}#mermaid-svg-pvt0C6mNB0xmskp0 .actor-line{stroke:grey;}#mermaid-svg-pvt0C6mNB0xmskp0 .messageLine0{stroke-width:1.5;stroke-dasharray:none;stroke:#333;}#mermaid-svg-pvt0C6mNB0xmskp0 .messageLine1{stroke-width:1.5;stroke-dasharray:2,2;stroke:#333;}#mermaid-svg-pvt0C6mNB0xmskp0 #arrowhead path{fill:#333;stroke:#333;}#mermaid-svg-pvt0C6mNB0xmskp0 .sequenceNumber{fill:white;}#mermaid-svg-pvt0C6mNB0xmskp0 #sequencenumber{fill:#333;}#mermaid-svg-pvt0C6mNB0xmskp0 #crosshead path{fill:#333;stroke:#333;}#mermaid-svg-pvt0C6mNB0xmskp0 .messageText{fill:#333;stroke:#333;}#mermaid-svg-pvt0C6mNB0xmskp0 .labelBox{stroke:hsl(259.6261682243, 59.7765363128%, 87.9019607843%);fill:#ECECFF;}#mermaid-svg-pvt0C6mNB0xmskp0 .labelText,#mermaid-svg-pvt0C6mNB0xmskp0 .labelText>tspan{fill:black;stroke:none;}#mermaid-svg-pvt0C6mNB0xmskp0 .loopText,#mermaid-svg-pvt0C6mNB0xmskp0 .loopText>tspan{fill:black;stroke:none;}#mermaid-svg-pvt0C6mNB0xmskp0 .loopLine{stroke-width:2px;stroke-dasharray:2,2;stroke:hsl(259.6261682243, 59.7765363128%, 87.9019607843%);fill:hsl(259.6261682243, 59.7765363128%, 87.9019607843%);}#mermaid-svg-pvt0C6mNB0xmskp0 .note{stroke:#aaaa33;fill:#fff5ad;}#mermaid-svg-pvt0C6mNB0xmskp0 .noteText,#mermaid-svg-pvt0C6mNB0xmskp0 .noteText>tspan{fill:black;stroke:none;}#mermaid-svg-pvt0C6mNB0xmskp0 .activation0{fill:#f4f4f4;stroke:#666;}#mermaid-svg-pvt0C6mNB0xmskp0 .activation1{fill:#f4f4f4;stroke:#666;}#mermaid-svg-pvt0C6mNB0xmskp0 .activation2{fill:#f4f4f4;stroke:#666;}#mermaid-svg-pvt0C6mNB0xmskp0 .actorPopupMenu{position:absolute;}#mermaid-svg-pvt0C6mNB0xmskp0 .actorPopupMenuPanel{position:absolute;fill:#ECECFF;box-shadow:0px 8px 16px 0px rgba(0,0,0,0.2);filter:drop-shadow(3px 5px 2px rgb(0 0 0 / 0.4));}#mermaid-svg-pvt0C6mNB0xmskp0 .actor-man line{stroke:hsl(259.6261682243, 59.7765363128%, 87.9019607843%);fill:#ECECFF;}#mermaid-svg-pvt0C6mNB0xmskp0 .actor-man circle,#mermaid-svg-pvt0C6mNB0xmskp0 line{stroke:hsl(259.6261682243, 59.7765363128%, 87.9019607843%);fill:#ECECFF;stroke-width:2px;}#mermaid-svg-pvt0C6mNB0xmskp0 :root{--mermaid-font-family:\"trebuchet ms\",verdana,arial,sans-serif;}AppMicrometerPrometheusGrafana生成指标数据(JVM/HTTP等)暴露/metrics端点定期拉取指标数据存储和聚合数据查询数据渲染可视化图表AppMicrometerPrometheusGrafana

二、Micrometer 集成实践

1. Spring Boot 配置

Maven依赖

<dependency> <groupId>io.micrometer</groupId> <artifactId>micrometer-registry-prometheus</artifactId></dependency><dependency> <groupId>org.springframework.boot</groupId> <artifactId>spring-boot-starter-actuator</artifactId></dependency>

application.yml配置

management: endpoints: web: exposure: include: health,info,prometheus metrics: export: prometheus: enabled: true tags: application: ${spring.application.name} # 统一添加应用标签

2. 自定义指标示例

业务指标采集

@Servicepublic class OrderService { private final Counter orderCounter; private final Timer orderProcessingTimer; public OrderService(MeterRegistry registry) { // 创建计数器 orderCounter = Counter.builder(\"orders.total\") .description(\"Total number of orders\") .tag(\"type\", \"online\") .register(registry);  // 创建计时器 orderProcessingTimer = Timer.builder(\"orders.processing.time\") .description(\"Order processing time\") .publishPercentiles(0.5, 0.95) // 50%和95%分位 .register(registry); } public void processOrder(Order order) { // 方法1: 手动计时 long start = System.currentTimeMillis(); try { // 业务逻辑... orderCounter.increment(); } finally { long duration = System.currentTimeMillis() - start; orderProcessingTimer.record(duration, TimeUnit.MILLISECONDS); } // 方法2: 使用Lambda自动计时 orderProcessingTimer.record(() -> { // 业务逻辑... orderCounter.increment(); }); }}

三、Prometheus 配置优化

1. 抓取配置示例

# prometheus.ymlglobal: scrape_interval: 15s evaluation_interval: 30sscrape_configs: - job_name: \'spring-apps\' metrics_path: \'/actuator/prometheus\' scrape_interval: 10s # 对应用更频繁采集 static_configs: - targets: [\'app1:8080\', \'app2:8080\'] relabel_configs: - source_labels: [__address__] target_label: instance - source_labels: [__meta_service_name] target_label: service

2. 关键优化参数

存储配置

# 控制块存储行为storage: tsdb: retention: 15d # 数据保留时间 out_of_order_time_window: 1h # 允许乱序数据窗口# 限制内存使用query: lookback-delta: 5m max-concurrency: 20

四、Grafana 仪表盘设计

1. 核心监控仪表盘

JVM监控面板配置

Panel 1: Heap Memory UsageQuery: sum(jvm_memory_used_bytes{area=\"heap\"}) by (instance) / sum(jvm_memory_max_bytes{area=\"heap\"}) by (instance)Visualization: Time series with % unitPanel 2: GC Pause TimeQuery: rate(jvm_gc_pause_seconds_sum[1m])Visualization: HeatmapPanel 3: Thread StatesQuery: jvm_threads_states_threads{instance=~\"$instance\"}Visualization: Stacked bar chart

2. 业务指标可视化

订单业务看板

{ \"panels\": [ { \"title\": \"Orders per Minute\", \"targets\": [{ \"expr\": \"rate(orders_total[1m])\", \"legendFormat\": \"{{instance}}\" }], \"type\": \"graph\", \"yaxes\": [{\"format\": \"ops\"}] }, { \"title\": \"Processing Time (95%)\", \"targets\": [{ \"expr\": \"histogram_quantile(0.95, rate(orders_processing_time_seconds_bucket[1m]))\", \"legendFormat\": \"P95\" }], \"type\": \"stat\", \"unit\": \"s\" } ]}

五、生产环境最佳实践

1. 指标命名规范

类型 前缀 示例 计数器 _total http_requests_total 计量器 _current queue_size_current 计时器 _seconds api_latency_seconds 分布概要 _summary response_size_summary

2. 标签使用原则

  • 避免高基数标签:如用户ID、订单号等
  • 统一标签命名:团队内保持一致(如env vs environment
  • 重要维度标记regionazservice_version

3. 资源优化技巧

Micrometer配置

@BeanMeterRegistryCustomizer<MeterRegistry> metricsCommonTags() { return registry -> registry.config() .meterFilter( MeterFilter.deny(id -> { // 过滤不需要的指标 return id.getName().startsWith(\"jvm_classes\"); })) .commonTags(\"region\", System.getenv(\"AWS_REGION\"));}

Prometheus资源限制

# 容器部署时设置资源限制resources: limits: memory: 8Gi requests: cpu: 2 memory: 4Gi

六、高级功能实现

1. 自定义Collector

public class CustomMetricsCollector extends Collector { @Override public List<MetricFamilySamples> collect() { List<MetricFamilySamples> samples = new ArrayList<>(); // 添加自定义指标 samples.add(new MetricFamilySamples( \"custom_metric\", Type.GAUGE, \"Custom metric description\", Collections.singletonList( new MetricFamilySamples.Sample(  \"custom_metric\",  List.of(\"label1\"),  List.of(\"value1\"),  getCurrentValue() ) ))); return samples; }}// 注册Collectornew CustomMetricsCollector().register();

2. 告警规则示例

groups:- name: application-alerts rules: - alert: HighErrorRate expr: rate(http_server_requests_errors_total[5m]) / rate(http_server_requests_total[5m]) > 0.05 for: 10m labels: severity: critical annotations: summary: \"High error rate on {{ $labels.instance }}\" description: \"Error rate is {{ $value }}\" - alert: GCTooLong expr: rate(jvm_gc_pause_seconds_sum[1h]) > 0.1 labels: severity: warning

这套监控组合的优势在于:

  1. 云原生友好:完美契合Kubernetes环境
  2. 低侵入性:Micrometer作为抽象层减少代码耦合
  3. 高效存储:Prometheus的TSDB压缩比高
  4. 丰富可视化:Grafana社区提供大量现成仪表盘

建议实施路径:

  1. 先搭建基础监控(JVM/HTTP指标)
  2. 逐步添加业务指标
  3. 最后实现自定义告警和自动化处理

钢材价格行情