Skip to main content

Goals

🎯 Primary Goals​

The main objective is to design and implement a Highly Available Centralized Logging System for our on-premise Kubernetes infrastructure that ensures:

  1. Consistent log collection across multiple services and nodes.
  2. High Availability (HA) during pod, node, or network failures.
  3. Scalability, both in terms of log volume and system components.
  4. Structured logs with rich context for better observability.
  5. Search and visualization capabilities through Grafana/Kibana.
  6. Secure and auditable access for internal teams.
  7. Storage and retention using on-premise solutions like MinIO or NFS.

βœ… Success Criteria​

A solution will be considered successful if it meets the following:

  • βœ… Logs from Node.js, Go, Python services are collected consistently.
  • βœ… System continues operating when one or more Kubernetes nodes go down.
  • βœ… Structured JSON logs with fields like request_id, timestamp, service, severity are supported.
  • βœ… Query performance remains acceptable under load.
  • βœ… Helm-based deployment is available and repeatable for on-prem K8s.
  • βœ… No single point of failure exists in ingestion or query pipeline.

πŸ“‹ Acceptance Criteria​

CriteriaRequirement
πŸ”Œ Logging StackSupport for Loki, ELK, or Graylog
☸️ Kubernetes NativeComponents deployed using Helm, StatefulSet, DaemonSet, Ingress
πŸ“¦ StorageCompatible with MinIO, NFS, or CephFS
πŸ”„ HA & ScalabilityIngesters and query services must scale horizontally
πŸ“„ Log FormatMust support JSON format with standardized fields
πŸ” Search & AlertingMust integrate with Grafana or Kibana for dashboards and alert rules
πŸ” SecurityAccess control via RBAC or reverse proxy auth
πŸ“ PersistenceEach stateful component has persistent volumes configured
πŸ§ͺ Fault ToleranceSystem tolerates node or pod crashes without data loss

πŸ“Œ Out of Scope​

  • Managed logging solutions like Datadog, CloudWatch, or GCP Logging
  • Multi-cloud or hybrid-cloud scenarios
  • Full SIEM integration (only optional)

πŸ§‘β€πŸ’» Stakeholders​

  • Backend Team β€” for log visibility and debugging
  • DevOps / Infra Team β€” for deployment, scaling, and resilience
  • Security / Compliance β€” for audit logs and access control