Skip to content

Monitoring

Service UIs

Tool URL (dev) Purpose
Temporal UI localhost:8088 Workflow execution history, task queue status, running/failed workflows
Redpanda Console localhost:8080 Topic inspection, consumer group lag, message browsing
Meilisearch Dashboard localhost:7700 Search index stats, document counts
MinIO Console localhost:9001 Object storage bucket inspection
Strawberry GraphQL IDE localhost:8000/graphql Interactive API testing

Application Logging

Vectis uses Python's standard logging module. All services log to stdout for container-native collection.

Log Levels

Level When Used
INFO Request handling, order creation, state transitions
WARNING Non-critical issues — stale cache, missing optional config
ERROR Failures — database errors, payment gateway failures, unhandled exceptions
DEBUG Detailed tracing — SQL queries, event payloads (development only)

Configure with the LOG_LEVEL environment variable (default: INFO).

Temporal Workflows

Monitor long-running business processes in the Temporal UI:

  • OrderLifecycleWorkflow — tracks order from creation through fulfillment
  • RecurringOrderWorkflow — scheduled subscription order placement
  • ApprovalWorkflow — B2B account registration and order approval
  • ImportWorkflow — bulk data imports and migrations

Check the Task Queues tab to verify workers are connected and processing tasks.

Warning

If workflows accumulate in "Running" state without progress, check that the Temporal worker is running and connected: make worker or the temporal-worker Docker service.

Redpanda Events

Key topics to monitor:

Topic Normal Volume Alert If
vectis.orders Proportional to order volume Consumer lag > 1000 messages
vectis.inventory Proportional to stock adjustments Consumer lag growing steadily
vectis.accounts Low (account creation/updates) Any consumer errors

Use the Redpanda Console to check consumer group lag and browse recent messages for debugging.

Key Metrics

For production monitoring, expose and track:

Metric Source Threshold
API response time (p95) Uvicorn access logs < 500ms
Database connection pool utilization SQLAlchemy pool stats < 80%
Redis memory usage Redis INFO command < available memory
Temporal workflow failure rate Temporal metrics < 1%
Redpanda consumer lag Consumer group offset < 1000
Meilisearch index freshness Last indexed timestamp < 5 min lag

Alerting Recommendations

  • API 5xx rate > 1% — check application logs for stack traces
  • Database connections exhausted — increase pool size or investigate slow queries
  • Temporal task queue backlog — add worker replicas
  • Redpanda consumer lag increasing — event consumer crashed or overwhelmed