- Add app/health.py with HealthChecker and MetricsCollector classes
- Implement composite /health endpoint checking DB and HTTP client
- Add /metrics endpoint with Prometheus exposition format
- Add webhook request metrics (counters, histograms, gauges)
- Create app/http_client.py for shared AsyncClient management
- Update app/main.py lifespan to init/close HTTP client
- Add comprehensive tests in tests/test_health.py
This enables proper observability with health checks and metrics
for monitoring system status and webhook processing performance.
- Add app/utils/retry.py with configurable async retry decorator
- Update DeliveryLog model to track attempt_count and latency_seconds
- Apply @http_retry to engine._exec_forward and _exec_notify methods
- Update save_logs to record retry metadata
- Add comprehensive unit tests for retry functionality
- Support configuration via environment variables (RETRY_*)
This improves reliability for downstream HTTP calls by automatically
retrying transient failures with exponential backoff and jitter.