Kubernetes Server Setup Guide
Introduction and Scope
This guide shows how to plan, build, secure, and maintain a production-ready Kubernetes cluster from scratch. It focuses on practical steps you can follow on your own hardware or virtual machines. You will learn about the control plane, worker nodes, networking, storage, security, monitoring, backups, and routine maintenance.
This is not a copy-paste cookbook for every environment. Instead, it explains principles and gives clear commands and tool recommendations so you can adapt them to your operating system, cloud provider, or bare-metal setup.
Prerequisites and Environment Planning
Before you start, make decisions and prepare resources. Good planning saves time and reduces surprises.
- Decide where the cluster will run: on-prem, VMs, or cloud.
- Choose a Kubernetes distribution or tool: kubeadm, kops, kubeadm for custom clusters, or managed services (EKS/GKE/AKS) if you want less operational work.
- Pick a container runtime: containerd is now the common choice.
- Choose a CNI plugin: Calico, Cilium, Flannel, or Weave. Each offers different features (network policies, eBPF, simplicity).
- Plan cluster size: at minimum 3 control-plane nodes for HA, and enough worker nodes to handle expected workloads plus buffer for maintenance.
- Plan networking: pod CIDR, service CIDR, MTU considerations, and whether a load balancer will front the API servers.
- Storage plan: dynamic provisioning via CSI drivers (Rook/Ceph, Longhorn, cloud provider CSI) is recommended.
- Security and compliance goals: encryption, audit logging retention, RBAC policies, and network segmentation.
- Backup and recovery strategy: etcd snapshots and resource backups (e.g., Velero) with offsite copies.
Minimum hardware suggestion for small production cluster:
- Control plane node: 2 vCPU, 8 GB RAM, fast disk (SSD)
- Worker node: 2–4 vCPU, 8–16 GB RAM, disk sized for containers and logs
- etcd/storage: ensure low-latency disks and backups
Cluster Architecture and Component Overview
Kubernetes runs several core pieces. Know what they are and why they matter.
- API Server: the cluster control endpoint. All changes go through it.
- etcd: key-value store that keeps cluster state. Back it up often.
- Controller Manager: runs controllers that reconcile desired state (deployments, endpoints).
- Scheduler: assigns pods to nodes based on resource needs and constraints.
- kubelet: runs on each node and maintains pods.
- kube-proxy: manages service routing on each node (or replaced with CNI eBPF features).
- Container runtime: runs container images (containerd, CRI-O).
- CNI plugin: provides networking between pods and implements network policies.
- CSI drivers: provide access to external storage and dynamic volume provisioning.
High Availability considerations:
- Run an odd number of etcd members (3 or 5) on separate nodes if possible.
- Run multiple API server instances behind a load balancer or virtual IP.
- Keep control plane and etcd hosted on different disks and ideally on separate nodes for resilience.
Hardware and Operating System Preparation
Prepare OS and hardware settings before installing Kubernetes. These steps reduce a lot of common problems.
- Use a recent, LTS Linux distribution (Ubuntu LTS, CentOS/Alma/Rocky, Debian).
- Disable swap and ensure it is off permanently:
- swapoff -a
- Remove swap entry from /etc/fstab
- Enable required kernel settings for networking:
- sysctl net.ipv4.ip_forward=1
- sysctl net.bridge.bridge-nf-call-iptables=1
- sysctl net.bridge.bridge-nf-call-ip6tables=1
- Persist in /etc/sysctl.d/k8s.conf
- Set maximum number of open files and pid limits as needed for heavy workloads.
- Ensure time sync (chrony or ntpd) — certificate and scheduling depend on correct time.
- Open required firewall ports or configure firewalls for control plane and node traffic:
- API server (6443), etcd (2379-2380), kubelet (10250), NodePort range (30000-32767) if used.
- Prepare container runtime (containerd recommended). Example quick steps to install containerd on Ubuntu:
- apt-get update && apt-get install -y containerd
- mkdir -p /etc/containerd && containerd config default > /etc/containerd/config.toml
- systemctl restart containerd
Installing the Kubernetes Control Plane
You can use kubeadm to bootstrap the control plane. This example assumes a single control plane for clarity; for production, run multiple control plane nodes.
-
Install kubeadm, kubelet, and kubectl.
- Add Kubernetes apt/yum repository and install packages.
- Pin package versions to avoid unexpected upgrades.
-
Initialize the control plane with kubeadm:
- Choose pod network CIDR that matches your CNI plugin (e.g., 192.168.0.0/16 for Flannel).
- Example:
- kubeadm init –pod-network-cidr=192.168.0.0/16 –control-plane-endpoint=”LOAD_BALANCER:6443″ –upload-certs
- Save the kubeadm join command output; you’ll use it to add nodes.
-
Post-init steps:
- Configure kubectl for the admin user:
- mkdir -p $HOME/.kube
- sudo cp -i /etc/kubernetes/admin.conf $HOME/.kube/config
- sudo chown $(id -u):$(id -g) $HOME/.kube/config
- Install your chosen CNI plugin immediately (Calico, Cilium, Flannel, etc.).
- Verify control-plane components are Running:
- kubectl get nodes
- kubectl get pods -n kube-system
- Configure kubectl for the admin user:
High-availability tips:
- Use a load balancer or virtual IP for the control-plane endpoint.
- Use stacked etcd or external etcd cluster. For external etcd, run it on dedicated machines or VMs.
- Ensure TLS certificates are rotated and backed up.
Provisioning and Configuring Worker Nodes
Worker nodes run your application pods. Add them after the control plane is ready.
- Prepare each worker node:
- Apply the same OS prep: swap off, sysctl, container runtime installed.
- Make sure kubelet and kubeadm are installed and versions match control plane.
- Join a worker to the cluster:
- Use the kubeadm join command output by kubeadm init:
- kubeadm join :6443 –token –discovery-token-ca-cert-hash sha256:
- Use the kubeadm join command output by kubeadm init:
- Verify:
- kubectl get nodes
- kubectl get pods -n kube-system to see CNI and kube-proxy pods on nodes.
- Node labels and taints:
- Label nodes by role: kubectl label node node-role.kubernetes.io/worker=
- Use taints to reserve nodes for special workloads.
- Runtime configuration:
- Tune kubelet flags (CPU manager, kube-reserved, system-reserved) in /var/lib/kubelet/config.yaml or via systemd drop-in.
Networking, CNI Plugins, and Service Discovery
Network choices shape your cluster’s behavior and security.
- CNI options:
- Calico: network policies, BGP, and IP-in-IP options.
- Cilium: eBPF-based, high performance, advanced security features.
- Flannel: simple, good for small clusters.
- Weave: straightforward overlay for simple installs.
Install the CNI right after control plane initialization. Most CNIs are installed via YAML manifests or Helm charts.
Service discovery:
- Kubernetes uses kube-proxy and iptables/ipvs rules to route Service traffic.
- CoreDNS handles DNS inside the cluster — ensure CoreDNS is healthy:
- kubectl get pods -n kube-system -l k8s-app=kube-dns
- Consider ipvs mode in kube-proxy for better performance on high traffic clusters.
Network policies:
- Implement NetworkPolicies to restrict pod-to-pod traffic.
- Default deny policies for namespaces improve security posture.
MTU and overlay considerations:
- If using overlay networks, ensure MTU is set to avoid packet fragmentation.
- Match pod CIDR and host routing settings to your network architecture.
Storage, Persistent Volumes, and CSI Integration
Storage is essential for stateful apps. Use CSI drivers for dynamic provisioning.
- Choose storage backend:
- Cloud provider block storage (AWS EBS, GCE PD)
- Rook (Ceph) for on-prem distributed storage
- Longhorn for simple replicated block storage on local disks
- NFS for simple file-shares (less robust for heavy production)
- Install the appropriate CSI driver (most come as Helm charts or manifests).
- Define StorageClasses for dynamic volume provisioning and performance tiers.
- PersistentVolumeClaim (PVC) usage:
- Apps request PVCs; the CSI driver provisions PersistentVolumes automatically.
- Backup strategies for PV data:
- Use snapshots supported by your storage (CSI snapshotter).
- Combine with application-aware backups (databases require consistent backups).
- Consider reclaim policies: Retain for safe deletion, Delete for cleanup.
Security Best Practices and Role-Based Access Control
Security should be built in from the start.
- RBAC:
- Enable and enforce RBAC for API access.
- Start with least privilege: give only necessary permissions.
- Authentication and MFA:
- Use identity providers (OIDC) or integrate with existing IAM where possible.
- Avoid using long-lived service account tokens for human access.
- Network policies:
- Use default-deny policies and then allow only required traffic.
- Pod security:
- Use PodSecurity admission or PodSecurityPolicies (deprecated) to block privileged containers.
- Enforce read-only root file systems and drop unnecessary capabilities.
- Secrets management:
- Use Kubernetes Secrets, but enable encryption at rest for ETCD and avoid storing secrets in plaintext.
- Consider external secret stores (Vault, AWS Secrets Manager) for extra controls.
- Audit logging:
- Enable API audit logs and store them centrally for analysis.
- TLS and certificates:
- Rotate certs before expiration.
- Use a certificate management tool (cert-manager) for workloads that need TLS.
- Image security:
- Use private registries, signed images (notary or cosign), and scan images for vulnerabilities.
- Node hardening:
- Limit SSH access, use intrusion detection, and keep OS patches current.
Monitoring, Logging, and Alerting
You must know when things go wrong. Build a monitoring and logging stack.
Monitoring stack:
- Prometheus for metrics collection (kube-state-metrics, node-exporter).
- Grafana for dashboards.
- Alertmanager for notifications.
- Install with Helm or operator patterns (Prometheus Operator).
Logging:
- Fluentd or Fluent Bit to gather logs from nodes and forward to a backend.
- Backends: Elasticsearch, Loki, or cloud logging services.
- Structure logs (JSON) and retain according to compliance needs.
Alerts:
- Define alerts for node down, control plane high latency, etcd leader changes, disk pressure, pod restarts, and high error rates.
- Tune alerting rules to reduce noise and ensure critical alerts are reliable.
Tracing and profiling:
- Consider distributed tracing (Jaeger) for complex microservices debugging.
- Use pprof and performance tools for CPU/memory issues.
Backup, Disaster Recovery, and Upgrades
Backups and a tested recovery plan are essential.
Backups:
- etcd snapshots are the most important cluster-state backup.
- Use etcdctl snapshot save and store snapshots offsite.
- Automate snapshot scheduling and retention.
- Resource backups:
- Use Velero to backup cluster objects (deployments, services) and optionally PV snapshots.
- Application backups:
- Use application-aware backup tools for databases (mysqldump, pg_basebackup, or vendor tools).
Disaster recovery:
- Test restore procedures regularly on a non-production environment.
- Keep kubeadm config and TLS assets backed up.
- Document the steps to rebuild control plane and recover workloads.
Upgrades:
- Plan upgrades during low-traffic windows.
- Use kubeadm upgrade plan to check compatibility.
- Upgrade control plane first, then kubelet/kube-proxy on nodes.
- Drain nodes before upgrading: kubectl drain –ignore-daemonsets –delete-local-data
- Test in a staging environment before production.
Troubleshooting, Maintenance, and Useful Tools
Expect problems. Know the right tools and methods.
Common troubleshooting steps:
- Check pod status and logs:
- kubectl get pods -A
- kubectl logs [-c ]
- Inspect node issues:
- kubectl describe node
- journalctl -u kubelet
- crictl ps / crictl logs for container runtime issues
- Networking problems:
- Check CNI pod logs.
- Verify iptables/ipvs rules and routing tables.
- Test DNS with busybox or curl inside a pod.
- API server or etcd issues:
- kubectl get componentstatuses
- etcdctl member list and etcdctl endpoint health
Maintenance practices:
- Regularly rotate certificates and credentials.
- Patch OS and container runtime vulnerabilities.
- Prune unused images and containers to free disk space.
- Monitor disk pressure and set eviction thresholds.
Useful tools:
- kubectl: cluster management
- kubeadm: bootstrap tool
- helm: package manager for Kubernetes apps
- k9s: terminal UI for troubleshooting
- kubectx/kubens: fast context and namespace switching
- crictl: container runtime debugging
- etcdctl: manage etcd
- Prometheus / Grafana: monitoring and dashboards
- Velero: backups and restores
Closing tips:
- Automate routine tasks (backups, upgrades, monitoring) to reduce human error.
- Keep documentation for your setup and recovery steps nearby.
- Start small, learn, and evolve the cluster as workloads demand.
This guide gives you a clear path from planning to running and maintaining a Kubernetes cluster. Adapt the choices and commands to your specific environment, test changes in non-production, and keep backups and observability in place before scaling up.
About Jack Williams
Jack Williams is a WordPress and server management specialist at Moss.sh, where he helps developers automate their WordPress deployments and streamline server administration for crypto platforms and traditional web projects. With a focus on practical DevOps solutions, he writes guides on zero-downtime deployments, security automation, WordPress performance optimization, and cryptocurrency platform reviews for freelancers, agencies, and startups in the blockchain and fintech space.
Leave a Reply