Custom Metrics
OptiPod supports custom metrics providers beyond Prometheus and metrics-server. This guide shows you how to integrate your own metrics source.
Custom Metrics Provider Interface
Section titled “Custom Metrics Provider Interface”OptiPod expects metrics providers to implement a simple HTTP API that returns resource usage data.
API Specification
Section titled “API Specification”Endpoint: GET /api/v1/metrics
Section titled “Endpoint: GET /api/v1/metrics”Query Parameters:
workload: Workload namenamespace: Namespacecontainer: Container name (optional)metric: Metric type (cpuormemory)start: Start time (RFC3339)end: End time (RFC3339)step: Sample interval (duration)
Response Format:
{ "status": "success", "data": { "resultType": "matrix", "result": [ { "metric": { "container": "my-app", "pod": "my-app-xxx", "namespace": "default" }, "values": [ [1706097600, "0.25"], [1706097900, "0.30"], [1706098200, "0.28"] ] } ] }}Values Format:
- First element: Unix timestamp
- Second element: Metric value as string
- CPU: cores (e.g., “0.25” = 250m)
- Memory: bytes (e.g., “536870912” = 512Mi)
Configuration
Section titled “Configuration”Basic Configuration
Section titled “Basic Configuration”apiVersion: optipod.optipod.io/v1alpha1kind: OptimizationPolicymetadata: name: custom-metrics-policyspec: mode: Recommend selector: workloadSelector: matchLabels: optipod.io/enabled: "true" metricsConfig: provider: custom rollingWindow: 7d resourceBounds: cpu: min: 10m max: 4000m memory: min: 64Mi max: 8Gi updateStrategy: strategy: webhookNote: Custom metrics provider configuration (endpoint, authentication, etc.) is configured externally via Helm values or environment variables, not in the CRD. See the Implementation Examples section below for how to build a custom metrics provider that OptiPod can query.
Authentication
Section titled “Authentication”Custom metrics providers should implement their own authentication. OptiPod can be configured to pass authentication credentials via Helm values:
# Helm values for OptiPod with custom metrics providercustomMetrics: enabled: true endpoint: http://custom-metrics.monitoring.svc:8080 auth: type: bearer # or basic token: "your-api-token" # pragma: allowlist secret # Or for basic auth: # username: "user" # password: "pass" # pragma: allowlist secretYour custom metrics provider should validate these credentials on each request.
Implementation Examples
Section titled “Implementation Examples”Example 1: Datadog Metrics
Section titled “Example 1: Datadog Metrics”from flask import Flask, request, jsonifyimport requestsfrom datetime import datetime
app = Flask(__name__)
DATADOG_API_KEY = "your-api-key" # pragma: allowlist secretDATADOG_APP_KEY = "your-app-key" # pragma: allowlist secret
@app.route('/api/v1/metrics', methods=['GET'])def get_metrics(): workload = request.args.get('workload') namespace = request.args.get('namespace') metric_type = request.args.get('metric') start = int(datetime.fromisoformat(request.args.get('start')).timestamp()) end = int(datetime.fromisoformat(request.args.get('end')).timestamp())
# Map to Datadog metric if metric_type == 'cpu': query = f"avg:kubernetes.cpu.usage{{kube_deployment:{workload},kube_namespace:{namespace}}}" else: query = f"avg:kubernetes.memory.usage{{kube_deployment:{workload},kube_namespace:{namespace}}}"
# Query Datadog response = requests.get( 'https://api.datadoghq.com/api/v1/query', params={ 'query': query, 'from': start, 'to': end }, headers={ 'DD-API-KEY': DATADOG_API_KEY, 'DD-APPLICATION-KEY': DATADOG_APP_KEY } )
# Transform to OptiPod format datadog_data = response.json() values = [[point[0] / 1000, str(point[1])] for point in datadog_data['series'][0]['pointlist']]
return jsonify({ 'status': 'success', 'data': { 'resultType': 'matrix', 'result': [{ 'metric': { 'container': workload, 'namespace': namespace }, 'values': values }] } })
if __name__ == '__main__': app.run(host='0.0.0.0', port=8080)Example 2: New Relic Metrics
Section titled “Example 2: New Relic Metrics”const express = require('express');const axios = require('axios');
const app = express();const NEW_RELIC_API_KEY = process.env.NEW_RELIC_API_KEY;
app.get('/api/v1/metrics', async (req, res) => { const { workload, namespace, metric, start, end } = req.query;
// Build NRQL query const metricName = metric === 'cpu' ? 'k8s.container.cpuUsedCores' : 'k8s.container.memoryUsedBytes';
const nrql = ` SELECT average(${metricName}) FROM Metric WHERE k8s.deploymentName = '${workload}' AND k8s.namespaceName = '${namespace}' SINCE ${new Date(start).getTime()} UNTIL ${new Date(end).getTime()} TIMESERIES 5 minutes `;
// Query New Relic const response = await axios.post( 'https://insights-api.newrelic.com/v1/accounts/YOUR_ACCOUNT_ID/query', { nrql }, { headers: { 'X-Query-Key': NEW_RELIC_API_KEY } } );
// Transform to OptiPod format const values = response.data.results[0].timeSeries.map(point => [ point.beginTimeSeconds, point.results[0].average.toString() ]);
res.json({ status: 'success', data: { resultType: 'matrix', result: [{ metric: { container: workload, namespace: namespace }, values: values }] } });});
app.listen(8080, () => { console.log('Custom metrics provider listening on port 8080');});Example 3: CloudWatch Metrics
Section titled “Example 3: CloudWatch Metrics”package main
import ( "encoding/json" "net/http" "time"
"github.com/aws/aws-sdk-go/aws" "github.com/aws/aws-sdk-go/aws/session" "github.com/aws/aws-sdk-go/service/cloudwatch")
type MetricsResponse struct { Status string `json:"status"` Data Data `json:"data"`}
type Data struct { ResultType string `json:"resultType"` Result []Result `json:"result"`}
type Result struct { Metric map[string]string `json:"metric"` Values [][]interface{} `json:"values"`}
func metricsHandler(w http.ResponseWriter, r *http.Request) { workload := r.URL.Query().Get("workload") namespace := r.URL.Query().Get("namespace") metricType := r.URL.Query().Get("metric") start, _ := time.Parse(time.RFC3339, r.URL.Query().Get("start")) end, _ := time.Parse(time.RFC3339, r.URL.Query().Get("end"))
// Create CloudWatch client sess := session.Must(session.NewSession()) cw := cloudwatch.New(sess)
// Determine metric name var metricName string if metricType == "cpu" { metricName = "CPUUtilization" } else { metricName = "MemoryUtilization" }
// Query CloudWatch input := &cloudwatch.GetMetricStatisticsInput{ Namespace: aws.String("AWS/ECS"), MetricName: aws.String(metricName), Dimensions: []*cloudwatch.Dimension{ { Name: aws.String("ServiceName"), Value: aws.String(workload), }, }, StartTime: aws.Time(start), EndTime: aws.Time(end), Period: aws.Int64(300), Statistics: []*string{aws.String("Average")}, }
result, err := cw.GetMetricStatistics(input) if err != nil { http.Error(w, err.Error(), http.StatusInternalServerError) return }
// Transform to OptiPod format values := make([][]interface{}, len(result.Datapoints)) for i, dp := range result.Datapoints { values[i] = []interface{}{ dp.Timestamp.Unix(), *dp.Average, } }
response := MetricsResponse{ Status: "success", Data: Data{ ResultType: "matrix", Result: []Result{{ Metric: map[string]string{ "container": workload, "namespace": namespace, }, Values: values, }}, }, }
json.NewEncoder(w).Encode(response)}
func main() { http.HandleFunc("/api/v1/metrics", metricsHandler) http.ListenAndServe(":8080", nil)}Deployment
Section titled “Deployment”Deploy as Kubernetes Service
Section titled “Deploy as Kubernetes Service”apiVersion: apps/v1kind: Deploymentmetadata: name: custom-metrics-provider namespace: monitoringspec: replicas: 2 selector: matchLabels: app: custom-metrics-provider template: metadata: labels: app: custom-metrics-provider spec: containers: - name: provider image: your-registry/custom-metrics-provider:latest ports: - containerPort: 8080 env: - name: API_KEY valueFrom: secretKeyRef: name: metrics-provider-secret key: api-key resources: requests: cpu: 100m memory: 128Mi limits: cpu: 500m memory: 512Mi---apiVersion: v1kind: Servicemetadata: name: custom-metrics-provider namespace: monitoringspec: selector: app: custom-metrics-provider ports: - port: 8080 targetPort: 8080Testing
Section titled “Testing”Test Metrics Endpoint
Section titled “Test Metrics Endpoint”# Test directlycurl "http://custom-metrics-provider.monitoring.svc:8080/api/v1/metrics?workload=my-app&namespace=default&metric=cpu&start=2026-01-24T00:00:00Z&end=2026-01-24T23:59:59Z&step=5m"
# Test with authenticationcurl -H "Authorization: Bearer your-token" \ "http://custom-metrics-provider.monitoring.svc:8080/api/v1/metrics?..."Validate Response Format
Section titled “Validate Response Format”# Check response structurecurl ... | jq '.data.result[0].values[0]'# Should output: [timestamp, "value"]
# Verify timestamps are Unix timestampscurl ... | jq '.data.result[0].values[0][0]'# Should output: 1706097600
# Verify values are stringscurl ... | jq '.data.result[0].values[0][1] | type'# Should output: "string"Monitoring
Section titled “Monitoring”Provider Health Check
Section titled “Provider Health Check”Add health endpoint to your provider:
@app.route('/health', methods=['GET'])def health(): return jsonify({'status': 'healthy'}), 200Monitor with Kubernetes:
livenessProbe: httpGet: path: /health port: 8080 initialDelaySeconds: 10 periodSeconds: 30
readinessProbe: httpGet: path: /health port: 8080 initialDelaySeconds: 5 periodSeconds: 10Metrics Provider Metrics
Section titled “Metrics Provider Metrics”Expose metrics about your provider:
from prometheus_client import Counter, Histogram, generate_latest
requests_total = Counter('metrics_requests_total', 'Total requests')request_duration = Histogram('metrics_request_duration_seconds', 'Request duration')
@app.route('/metrics', methods=['GET'])def metrics(): return generate_latest()Troubleshooting
Section titled “Troubleshooting”OptiPod Can’t Reach Provider
Section titled “OptiPod Can’t Reach Provider”# Test from OptiPod operator podkubectl exec -n optipod-system <operator-pod> -- \ curl -v http://custom-metrics-provider.monitoring.svc:8080/health
# Check servicekubectl get svc -n monitoring custom-metrics-provider
# Check endpointskubectl get endpoints -n monitoring custom-metrics-providerAuthentication Failures
Section titled “Authentication Failures”# Verify secret existskubectl get secret custom-metrics-auth -n default
# Check secret contentskubectl get secret custom-metrics-auth -n default -o jsonpath='{.data.token}' | base64 -d
# Test with authenticationcurl -H "Authorization: Bearer $(kubectl get secret custom-metrics-auth -n default -o jsonpath='{.data.token}' | base64 -d)" \ http://custom-metrics-provider.monitoring.svc:8080/api/v1/metrics?...Invalid Response Format
Section titled “Invalid Response Format”# Validate JSON structurecurl ... | jq '.data.result[0]'
# Check for required fieldscurl ... | jq 'has("status") and has("data")'
# Verify values array formatcurl ... | jq '.data.result[0].values[] | length == 2'Best Practices
Section titled “Best Practices”- Caching: Cache metrics to reduce load on upstream provider
- Rate limiting: Implement rate limiting to prevent abuse
- Error handling: Return appropriate HTTP status codes
- Logging: Log all requests for debugging
- Monitoring: Expose metrics about your provider
- High availability: Run multiple replicas
- Authentication: Always use authentication
- Validation: Validate all input parameters