How Istio generates golden signals automatically -- and why missing trace propagation headers gives you 0% complete traces.
Know the 4 golden signals Istio provides automatically. Understand why distributed tracing requires application header propagation. Know what randomSamplingPercentage does.
Build Grafana dashboards from Istio metrics. Debug trace propagation issues. Tune sampling rates for production. Know cardinality limits and why high-cardinality labels are dangerous.
Design the observability strategy for the platform: metrics cardinality budget, trace sampling policy, log retention. Define trace propagation as a platform contract all services must implement.
How Istio generates golden signals automatically -- and why missing trace propagation headers gives you 0% complete traces.
Istio installed with Jaeger enabled -- team expects full distributed traces
Jaeger shows traces -- but all traces are single-span, no parent-child relationships
WARNINGInvestigation: Envoy correctly adds x-b3-traceid/spanid headers on every request
Root cause: app HTTP clients not forwarding trace headers to downstream calls
CRITICAL30 services updated to propagate B3 headers -- full traces appear in Jaeger
The question this raises
How does Istio generate observability data, what does it do automatically vs what requires application cooperation, and why is distributed tracing the hardest part to get right?
Your team enables Istio tracing with Jaeger. After a week, you check Jaeger and see hundreds of traces -- but every trace shows only one span with no child spans. Services are definitely calling each other. What is the root cause?
Lesson outline
Istio generates L7 golden signals for every service pair -- automatically
Because every request flows through Envoy, Istio can measure: request rate, error rate, and latency (P50/P95/P99) for every source-destination pair. These four metrics (the four golden signals from SRE) are available in Prometheus/Grafana without any changes to application code.
Automatic observability from Envoy sidecars
Service A --> Service B --> Service C --> Service D (frontend) (orders) (inventory) (database) What Istio does automatically at each hop: A's sidecar: generates span-1, adds trace headers to request to B B's sidecar: receives span-1 context, generates span-2 for B-to-C call THE PROBLEM: B's sidecar generates span-2 using the INCOMING request context. But when B's app makes the outbound call to C, it uses its OWN HTTP client. If that HTTP client does not copy the incoming trace headers to the outbound call, the outgoing request has NO parent trace context. Result: Span-1: A to B <- traced Span-2: B to C (orphan) <- unconnected -- NEW root trace Span-3: C to D (orphan) <- unconnected -- NEW root trace Fix: Applications MUST copy these headers from incoming to outgoing requests: x-request-id x-b3-traceid x-b3-spanid x-b3-parentspanid x-b3-sampled x-b3-flags x-ot-span-context (if using OpenTracing)
Trace propagation is the application's responsibility
Istio cannot automatically propagate trace context through your application code. The incoming x-b3-* headers must be extracted from the inbound HTTP request and attached to all outbound HTTP calls. This requires a one-time change to your HTTP client wrapper in every service. Without it, tracing shows isolated one-hop spans, not end-to-end traces.
Making traces end-to-end in your services
01
Identify the 7 B3 trace headers your framework must propagate: x-request-id, x-b3-traceid, x-b3-spanid, x-b3-parentspanid, x-b3-sampled, x-b3-flags, x-ot-span-context
02
Create a centralized HTTP client wrapper that extracts incoming headers from the request context
03
Pass extracted headers to all outbound HTTP calls made during request processing
04
Alternatively, use OpenTelemetry SDK with auto-instrumentation -- handles propagation automatically
05
Validate in Jaeger: apply a test trace, confirm the trace graph shows A -> B -> C -> D (not 4 separate root spans)
Identify the 7 B3 trace headers your framework must propagate: x-request-id, x-b3-traceid, x-b3-spanid, x-b3-parentspanid, x-b3-sampled, x-b3-flags, x-ot-span-context
Create a centralized HTTP client wrapper that extracts incoming headers from the request context
Pass extracted headers to all outbound HTTP calls made during request processing
Alternatively, use OpenTelemetry SDK with auto-instrumentation -- handles propagation automatically
Validate in Jaeger: apply a test trace, confirm the trace graph shows A -> B -> C -> D (not 4 separate root spans)
1# Check if tracing is configured in the mesh2kubectl get configmap istio -n istio-system -o jsonpath='{.data.mesh}' | grep -A5 tracing34# Check Envoy's tracing config for a specific pod5istioctl proxy-config bootstrap my-pod.my-namespace | grep -A10 tracing67# Check Prometheus metrics Istio generates (golden signals)8# Rate of requests per second by service9kubectl exec -n monitoring prometheus-0 -- \10curl -sg 'http://localhost:9090/api/v1/query?query=rate(istio_requests_total[5m])'1112# P99 latency per destination service13# Query: histogram_quantile(0.99, rate(istio_request_duration_milliseconds_bucket[5m]))1415# Enable access logging for a workload16kubectl apply -f - <<EOF17apiVersion: telemetry.istio.io/v1alpha118kind: Telemetry19metadata:20name: enable-access-log21namespace: my-namespace22spec:23accessLogging:24- providers:25- name: envoy26EOF
1# Istio Telemetry API -- configure tracing per namespace2apiVersion: telemetry.istio.io/v1alpha13kind: Telemetry4metadata:5name: tracing-config6namespace: production7spec:8tracing:9- providers:10- name: jaeger11randomSamplingPercentage: 1.0 # 1% sampling -- adjust for traffic volume12# Use 100% only in development -- at 1000 RPS, 100% = 1000 spans/second to Jaeger1314# Increase sampling for a specific workload during debugging15---16apiVersion: telemetry.istio.io/v1alpha117kind: Telemetry18metadata:19name: debug-tracing20namespace: production21spec:22selector:23matchLabels:24app: payment-svc # only this workload25tracing:26- providers:27- name: jaeger28randomSamplingPercentage: 100.0 # 100% for debugging only
Blast radius of observability misconfig
100% trace sampling in production -- storage and CPU overload
# Telemetry with 100% sampling in production
apiVersion: telemetry.istio.io/v1alpha1
kind: Telemetry
metadata:
name: tracing
namespace: production
spec:
tracing:
- providers:
- name: jaeger
randomSamplingPercentage: 100.0 # WRONG for production
# At 500 RPS: 500 spans/second written to Jaeger
# Jaeger storage fills within hours
# Envoy reports metrics overhead for every span# Production: 1% sampling (sample 1 in 100 requests)
apiVersion: telemetry.istio.io/v1alpha1
kind: Telemetry
metadata:
name: tracing
namespace: production
spec:
tracing:
- providers:
- name: jaeger
randomSamplingPercentage: 1.0 # 1% = 5 spans/sec at 500 RPS
# For error tracing: configure head-based sampling
# or use Jaeger's adaptive sampling to auto-tune
# Override to 100% only for specific debug sessions:
# kubectl annotate pod <pod> sidecar.istio.io/traceSampling=1001% sampling captures 1 in 100 requests -- sufficient to see latency distributions and catch common errors. For rare errors (< 1%), use tail-based sampling or increase temporarily via pod annotation. Never run 100% in production at scale.
| Signal | Source | Automatic? | App changes needed | Key metric/config |
|---|---|---|---|---|
| Request rate | Envoy sidecar -> Prometheus | Yes | None | istio_requests_total |
| Error rate | Envoy sidecar -> Prometheus | Yes | None | istio_requests_total{response_code!~"2.."} |
| Latency | Envoy sidecar -> Prometheus | Yes | None | istio_request_duration_milliseconds |
| Distributed traces | Envoy -> Jaeger/Zipkin | Partial | Must propagate B3 headers | randomSamplingPercentage |
| Access logs | Envoy -> stdout/Loki | Yes (disabled by default) | None | Telemetry API or MeshConfig |
What Istio generates automatically
📖 What the exam expects
Istio automatically generates request rate, error rate, and latency metrics for every service-to-service communication. These metrics are available in Prometheus without application code changes.
Toggle between what certifications teach and what production actually requires
Asked when the interviewer wants to test real-world Istio experience. "What observability does Istio give you for free?" is the lead-in. The follow-up "what does it NOT give you for free?" reveals depth.
Common questions:
Strong answer: Immediately mentions header propagation limitation, knows B3 header names, has built Grafana dashboards from istio_requests_total, knows Telemetry API.
Red flags: Thinks distributed tracing is fully automatic, does not know about B3 header propagation, thinks 100% sampling is fine in production.
Related concepts
Explore topics that connect to this one.
Ready to see how this works in the cloud?
Switch to Career Paths for structured paths (e.g. Developer, DevOps) and provider-specific lessons.
View role-based pathsSign in to track your progress and mark lessons complete.
Questions? Discuss in the community or start a thread below.
Join DiscordSign in to start or join a thread.