Description
After upgrading to Cadence v1.3.6, we're seeing a large volume of warnings from the Prometheus reporter. Metrics emitted from the scope that registers second are silently dropped, resulting in incomplete monitoring data.
a previously registered descriptor with the same fully-qualified name as <metric>
has different label names or a different help string
Affected metrics include persistence_latency, persistence_requests, persistence_requests_per_shard, persistence_latency_per_shard, cache_count, cache_evict, cache_latency, and others.
Root cause: Cadence emits the same metric name from multiple scopes that carry different tag sets. For example, emits persistence_latency_per_shard from two scopes here:
shardOperationsMetricsScope tagged with {operation, domain, shard, is_retry} (receives additionalTags).
shardOverallMetricsScope tagged with {operation, domain, shard} (does not receive additionalTags).
The additionalTags argument (e.g. is_retry) is passed in from callers like this one, but only applied to one of the two scopes.
The prometheus/client_golang library requires that all descriptors registered under the same metric name have the same label-name set. When the second scope tries to register with a different label set, registration fails. The Tally Prometheus reporter then returns a noopMetric{}, silently discarding all data from that scope.
This pattern works where there's no registration or label-consistency requirement, but is incompatible with the Prometheus reporter.
Steps to Reproduce / How to Trigger
- Deploy Cadence v1.3.6 with the Prometheus metrics reporter enabled.
- Run any workload that exercises persistence operations (e.g. workflow starts, activity completions, or timer firings).
- Observe warnings in the Cadence server logs matching "error in prometheus reporter" with "has different label names or a different help string".
Expected Behavior
All metrics should be successfully registered and reported to Prometheus with complete data across all scopes and label set.
Actual Behavior
- The Prometheus registry rejects the second (and subsequent) registration(s) of a metric name when the label-name set differs from the first.
- The Tally reporter returns a
noopMetric{} for the rejected scope, silently dropping all data from it.
- Operational dimensions like is_retry, specific operation values, or per-shard breakdowns become invisible in Prometheus.
- High volume of warning-level log lines.
Logs / Screenshots
{ level":"warn","ts":"2026-03-05T04:01:24.714Z","msg":"error in prometheus reporter","error":"a previously registered descriptor with the same fully-qualified name as Desc{fqName: \"cache_count\", help: \"cache_count gauge\", constLabels: {}, variableLabels: [shard_id cadence_service operation cache_type]} has different label names or a different help string","logging-call-at":"metrics.go:151"}
{"level":"warn","ts":"2026-03-05T10:13:44.267Z","msg":"error in prometheus reporter","error":"a previously registered descriptor with the same fully-qualified name as Desc{fqName: \"cache_count\", help: \"cache_count gauge\", constLabels: {}, variableLabels: [cadence_service shard_id operation cache_type]} has different label names or a different help string","logging-call-at":"metrics.go:151"}
{"level":"warn","ts":"2026-03-05T10:10:45.013Z","msg":"error in prometheus reporter","error":"a previously registered descriptor with the same fully-qualified name as Desc{fqName: \"persistence_requests_per_shard\", help: \"persistence_requests_per_shard counter\", constLabels: {}, variableLabels: [cadence_service operation shard_id]} has different label names or a different help string","logging-call-at":"metrics.go:151"}
{"level":"warn","ts":"2026-03-05T05:21:49.657Z","msg":"error in prometheus reporter","error":"a previously registered descriptor with the same fully-qualified name as Desc{fqName: \"cache_evict\", help: \"cache_evict counter\", constLabels: {}, variableLabels: [cadence_service operation shard_id]} has different label names or a different help string","logging-call-at":"metrics.go:151"}
{"level":"warn","ts":"2026-03-05T05:21:49.657Z","msg":"error in prometheus reporter","error":"a previously registered descriptor with the same fully-qualified name as Desc{fqName: \"cache_evict\", help: \"cache_evict counter\", constLabels: {}, variableLabels: [cadence_service operation shard_id]} has different label names or a different help string","logging-call-at":"metrics.go:151"}
{"level":"warn","ts":"2026-03-05T02:45:42.982Z","msg":"error in prometheus reporter","error":"a previously registered descriptor with the same fully-qualified name as Desc{fqName: \"persistence_latency\", help: \"persistence_latency summary\", constLabels: {}, variableLabels: [cadence_service operation is_retry task_category]} has different label names or a different help string","logging-call-at":"metrics.go:151"}
Environment
- Cadence server version: v1.3.6
- Cadence SDK language and version (if applicable): N/A
- Cadence web version (if applicable): N/A
- DB & version: Apache Cassandra
- Scale: Production
Suggested Fix
Ensure all emission paths for a given metric name use the same label-name set. Where additional context like is_retry is needed, either:
- Always include the label with a default value (e.g. "" or "false") on scopes where it doesn't apply, or
- Use a distinct metric name for the variant (e.g.
persistence_latency_per_shard_with_retry)
Description
After upgrading to Cadence v1.3.6, we're seeing a large volume of warnings from the Prometheus reporter. Metrics emitted from the scope that registers second are silently dropped, resulting in incomplete monitoring data.
Affected metrics include
persistence_latency,persistence_requests,persistence_requests_per_shard,persistence_latency_per_shard,cache_count,cache_evict,cache_latency, and others.Root cause: Cadence emits the same metric name from multiple scopes that carry different tag sets. For example, emits
persistence_latency_per_shardfrom two scopes here:shardOperationsMetricsScopetagged with{operation, domain, shard, is_retry}(receivesadditionalTags).shardOverallMetricsScopetagged with{operation, domain, shard}(does not receiveadditionalTags).The
additionalTagsargument (e.g.is_retry) is passed in from callers like this one, but only applied to one of the two scopes.The
prometheus/client_golanglibrary requires that all descriptors registered under the same metric name have the same label-name set. When the second scope tries to register with a different label set, registration fails. The Tally Prometheus reporter then returns anoopMetric{}, silently discarding all data from that scope.This pattern works where there's no registration or label-consistency requirement, but is incompatible with the Prometheus reporter.
Steps to Reproduce / How to Trigger
Expected Behavior
All metrics should be successfully registered and reported to Prometheus with complete data across all scopes and label set.
Actual Behavior
noopMetric{}for the rejected scope, silently dropping all data from it.Logs / Screenshots
Environment
Suggested Fix
Ensure all emission paths for a given metric name use the same label-name set. Where additional context like is_retry is needed, either:
persistence_latency_per_shard_with_retry)