Kubernetes

Master this essential documentation concept

Quick Definition

An open-source container orchestration platform used to automate the deployment, scaling, and management of containerized applications, commonly referenced in DevOps and cloud documentation.

How Kubernetes Works

graph TD Dev[Developer] -->|kubectl apply| API[API Server] API -->|stores state| ETCD[(etcd)] API -->|schedules pods| SCHED[Scheduler] SCHED -->|assigns to node| NODE1[Worker Node 1] SCHED -->|assigns to node| NODE2[Worker Node 2] NODE1 -->|runs| POD1[Pod: nginx:1.21] NODE2 -->|runs| POD2[Pod: nginx:1.21] POD1 -->|exposes| SVC[Service: LoadBalancer] POD2 -->|exposes| SVC CM[Controller Manager] -->|monitors & reconciles| API SVC -->|routes traffic| INGRESS[Ingress Controller] INGRESS -->|serves| USER[End User]

Understanding Kubernetes

An open-source container orchestration platform used to automate the deployment, scaling, and management of containerized applications, commonly referenced in DevOps and cloud documentation.

Key Features

  • Centralized information management
  • Improved documentation workflows
  • Better team collaboration
  • Enhanced user experience

Benefits for Documentation Teams

  • Reduces repetitive documentation tasks
  • Improves content consistency
  • Enables better content reuse
  • Streamlines review processes

See how Docsie helps with on-premise knowledge base

Looking for a better way to handle kubernetes in your organization? Docsie's On-Premise Knowledge Base solution helps teams streamline their workflows and improve documentation quality.

Real-World Documentation Use Cases

Zero-Downtime Rolling Deployments for a Payment Processing API

Problem

Engineering teams manually SSHing into servers to update a payment API cause 15-30 minute outages during each release cycle, violating SLA commitments and eroding customer trust.

Solution

Kubernetes RollingUpdate deployment strategy incrementally replaces old pods with new ones, maintaining minimum available replicas throughout the update so traffic is never fully interrupted.

Implementation

['Define a Deployment manifest with strategy.type: RollingUpdate, setting maxUnavailable: 1 and maxSurge: 1 to control pod replacement pace.', 'Configure readinessProbe on the container pointing to the /health endpoint so Kubernetes only routes traffic to pods that pass the health check.', 'Run kubectl rollout status deployment/payment-api to monitor the rollout in CI/CD pipelines and block the pipeline on failure.', 'Use kubectl rollout undo deployment/payment-api to instantly revert to the previous ReplicaSet if error rates spike post-deployment.']

Expected Outcome

Deployment downtime drops from 15-30 minutes to zero, with rollbacks completing in under 60 seconds, enabling teams to ship multiple times per day safely.

Autoscaling a Machine Learning Inference Service During Traffic Spikes

Problem

An ML inference service running on fixed VM infrastructure gets overwhelmed during business hours, causing p99 latency to spike from 200ms to 8 seconds and dropping requests when GPU utilization exceeds 90%.

Solution

Kubernetes Horizontal Pod Autoscaler (HPA) combined with Cluster Autoscaler automatically scales both pod count and underlying node count based on custom GPU utilization metrics from Prometheus.

Implementation

['Deploy the Prometheus Adapter to expose custom GPU utilization metrics to the Kubernetes metrics API under the custom.metrics.k8s.io endpoint.', 'Create an HPA manifest targeting the inference Deployment with a custom metric threshold of 70% GPU utilization, setting minReplicas: 2 and maxReplicas: 20.', 'Configure Cluster Autoscaler on the node group with GPU instances so new nodes provision automatically when pods remain in Pending state due to insufficient resources.', 'Set PodDisruptionBudgets to ensure at least 50% of inference pods remain available during scale-down events to prevent latency spikes during node removal.']

Expected Outcome

p99 latency stays below 300ms even at 10x normal traffic, infrastructure costs drop 40% during off-peak hours due to scale-down, and zero manual intervention is required during traffic events.

Multi-Tenant Namespace Isolation for a SaaS Platform Serving Enterprise Clients

Problem

A SaaS company runs all customer workloads in a shared cluster without isolation, causing a noisy-neighbor situation where one client's batch job consumes all cluster CPU, degrading response times for other paying customers.

Solution

Kubernetes Namespaces combined with ResourceQuotas, LimitRanges, and NetworkPolicies create hard boundaries between tenants, guaranteeing resource allocation and preventing cross-tenant network access.

Implementation

['Create a dedicated Namespace per enterprise client (e.g., tenant-acme, tenant-globex) and apply a ResourceQuota limiting CPU requests to 16 cores and memory to 64Gi per namespace.', 'Define a LimitRange in each namespace to set default container CPU limits to 500m and memory to 512Mi, preventing unbounded resource consumption by misconfigured pods.', "Apply a default-deny NetworkPolicy in each namespace and whitelist only ingress from the shared ingress controller namespace, ensuring tenants cannot query each other's pod IPs.", "Use RBAC RoleBindings to grant each tenant's CI/CD service account deploy permissions only within their own namespace, preventing accidental cross-tenant deployments."]

Expected Outcome

Noisy-neighbor incidents are eliminated, SLA compliance rises to 99.95%, and onboarding a new enterprise tenant is reduced from 2 days of manual setup to a 10-minute automated namespace provisioning script.

Secrets Management for a Microservices Application Migrating from Hardcoded Credentials

Problem

A team discovers database passwords and API keys hardcoded in Docker images and environment variables in YAML files committed to a public GitHub repository, creating a critical security vulnerability requiring immediate credential rotation.

Solution

Kubernetes Secrets integrated with HashiCorp Vault via the Vault Agent Injector dynamically injects short-lived credentials into pods at runtime, eliminating static secrets from source code and container images entirely.

Implementation

['Deploy HashiCorp Vault with the Kubernetes auth method enabled, allowing pods to authenticate using their ServiceAccount JWT tokens bound to specific Vault policies.', "Annotate application Deployments with vault.hashicorp.com/agent-inject: 'true' and specify the Vault secret path so the sidecar agent writes credentials to an in-memory tmpfs volume at /vault/secrets/.", 'Remove all hardcoded environment variables and secretKeyRef references from Deployment manifests, replacing them with application logic that reads credentials from the mounted file path.', "Enable Vault's dynamic database secrets engine to issue unique, time-limited PostgreSQL credentials per pod, so compromised credentials automatically expire within 1 hour."]

Expected Outcome

All static credentials are eliminated from the codebase and images, credential rotation happens automatically without pod restarts, and the security audit passes with zero hardcoded secret findings.

Best Practices

Always Define CPU and Memory Requests and Limits on Every Container

Kubernetes uses resource requests for scheduling decisions and limits for runtime enforcement. Without requests, the scheduler may place too many pods on a single node causing OOMKill events; without limits, a single runaway process can starve neighboring pods of memory. Setting both ensures predictable scheduling and prevents cascading failures in shared clusters.

✓ Do: Set requests based on measured baseline consumption from Prometheus metrics (e.g., requests: {cpu: '250m', memory: '256Mi'}) and limits at 2-3x the request value to absorb traffic spikes without allowing unbounded growth.
✗ Don't: Do not omit resource definitions or set CPU limits identical to requests in latency-sensitive services, as CPU throttling at the limit boundary will cause p99 latency spikes even when nodes have available capacity.

Use Liveness and Readiness Probes Independently for Different Health Signals

Liveness probes restart containers that are deadlocked or in an unrecoverable state, while readiness probes temporarily remove pods from Service endpoints when they cannot handle traffic. Conflating the two by using the same probe for both purposes causes unnecessary pod restarts during temporary overload conditions, worsening the situation instead of shedding load gracefully.

✓ Do: Configure readinessProbe to check application-level readiness such as database connectivity or cache warmup completion, and configure livenessProbe to check only that the process itself is alive using a lightweight /healthz endpoint that never calls downstream dependencies.
✗ Don't: Do not point the livenessProbe at an endpoint that calls downstream services like databases or external APIs, because a dependency outage will trigger cascading pod restarts across your entire fleet, turning a partial outage into a total one.

Pin Container Image Tags to Immutable Digests in Production Manifests

Using mutable tags like :latest or :stable in Deployment manifests means a pod restart or node replacement can silently pull a different image version than what was originally deployed, making debugging production incidents extremely difficult. Image digest pinning (e.g., nginx@sha256:abc123) guarantees every replica runs the exact same binary regardless of when or where it is scheduled.

✓ Do: Use your CI/CD pipeline to automatically replace image tags with their SHA256 digest after pushing to the registry (e.g., myapp:v1.4.2@sha256:3f9d...) and commit the updated digest to the GitOps repository for full auditability.
✗ Don't: Do not use :latest or branch-name tags like :main in any Deployment, StatefulSet, or DaemonSet manifest deployed to staging or production environments, as this breaks reproducibility and makes rollbacks unreliable.

Implement Pod Disruption Budgets Before Enabling Cluster Autoscaler or Node Upgrades

Without PodDisruptionBudgets (PDBs), Kubernetes node drain operations during cluster upgrades or Cluster Autoscaler scale-down events can simultaneously evict all replicas of a Deployment, causing complete service unavailability. PDBs enforce a minimum availability guarantee during voluntary disruptions, ensuring the control plane respects your SLA requirements when making scheduling decisions.

✓ Do: Create a PodDisruptionBudget for every customer-facing Deployment specifying minAvailable: 50% or maxUnavailable: 1, and validate PDB coverage across namespaces using kubectl get pdb -A before initiating any node pool upgrade.
✗ Don't: Do not set minAvailable equal to the total replica count (e.g., minAvailable: 3 with 3 replicas), as this makes nodes undrained indefinitely, blocking cluster upgrades and causing Cluster Autoscaler to fail to reclaim underutilized nodes.

Structure RBAC Roles Around Least-Privilege Principles with Namespace-Scoped Bindings

Granting ClusterAdmin to CI/CD service accounts or developer namespaces is a common shortcut that creates serious security exposure, allowing a compromised pipeline to delete production workloads or exfiltrate secrets cluster-wide. Namespace-scoped Roles with only the specific verbs and resources required for each use case dramatically reduce the blast radius of credential compromise.

✓ Do: Create dedicated ServiceAccounts per application and bind them to Roles (not ClusterRoles) with only the minimum required permissions such as {verbs: [get, list, watch], resources: [pods, configmaps]} within their own namespace, audited quarterly using kubectl auth can-i --list.
✗ Don't: Do not bind the default ServiceAccount in any namespace to ClusterRole cluster-admin, and do not reuse a single CI/CD service account token across multiple applications or environments, as this prevents meaningful audit trails and violates the principle of least privilege.

How Docsie Helps with Kubernetes

Build Better Documentation with Docsie

Join thousands of teams creating outstanding documentation

Start Free Trial