Monitoring¶
Service UIs¶
| Tool | URL (dev) | Purpose |
|---|---|---|
| Temporal UI | localhost:8088 | Workflow execution history, task queue status, running/failed workflows |
| Redpanda Console | localhost:8080 | Topic inspection, consumer group lag, message browsing |
| Meilisearch Dashboard | localhost:7700 | Search index stats, document counts |
| MinIO Console | localhost:9001 | Object storage bucket inspection |
| Strawberry GraphQL IDE | localhost:8000/graphql | Interactive API testing |
Application Logging¶
Vectis uses Python's standard logging module. All services log to stdout for container-native collection.
Log Levels¶
| Level | When Used |
|---|---|
INFO | Request handling, order creation, state transitions |
WARNING | Non-critical issues — stale cache, missing optional config |
ERROR | Failures — database errors, payment gateway failures, unhandled exceptions |
DEBUG | Detailed tracing — SQL queries, event payloads (development only) |
Configure with the LOG_LEVEL environment variable (default: INFO).
Temporal Workflows¶
Monitor long-running business processes in the Temporal UI:
- OrderLifecycleWorkflow — tracks order from creation through fulfillment
- RecurringOrderWorkflow — scheduled subscription order placement
- ApprovalWorkflow — B2B account registration and order approval
- ImportWorkflow — bulk data imports and migrations
Check the Task Queues tab to verify workers are connected and processing tasks.
Warning
If workflows accumulate in "Running" state without progress, check that the Temporal worker is running and connected: make worker or the temporal-worker Docker service.
Redpanda Events¶
Key topics to monitor:
| Topic | Normal Volume | Alert If |
|---|---|---|
vectis.orders | Proportional to order volume | Consumer lag > 1000 messages |
vectis.inventory | Proportional to stock adjustments | Consumer lag growing steadily |
vectis.accounts | Low (account creation/updates) | Any consumer errors |
Use the Redpanda Console to check consumer group lag and browse recent messages for debugging.
Key Metrics¶
For production monitoring, expose and track:
| Metric | Source | Threshold |
|---|---|---|
| API response time (p95) | Uvicorn access logs | < 500ms |
| Database connection pool utilization | SQLAlchemy pool stats | < 80% |
| Redis memory usage | Redis INFO command | < available memory |
| Temporal workflow failure rate | Temporal metrics | < 1% |
| Redpanda consumer lag | Consumer group offset | < 1000 |
| Meilisearch index freshness | Last indexed timestamp | < 5 min lag |
Alerting Recommendations¶
- API 5xx rate > 1% — check application logs for stack traces
- Database connections exhausted — increase pool size or investigate slow queries
- Temporal task queue backlog — add worker replicas
- Redpanda consumer lag increasing — event consumer crashed or overwhelmed