Server Management

High CPU Usage on Server: How to Fix

Written by Jack Williams • Reviewed by George Brown • Updated on 23 February 2026

Introduction and scope

High CPU usage on a server can slow apps, raise latency, and trigger outages. This guide shows practical, step-by-step ways to find the cause and fix it. It covers quick emergency moves, deeper diagnosis, OS tuning, app fixes, container and hardware considerations, and long-term monitoring. Use the parts that match your environment — Linux, Windows, containers, or virtual machines.

Understanding CPU usage metrics and tools

CPU metrics tell different stories. “user” is time spent in application code. “system” is time in kernel calls. “idle” is unused CPU. “iowait” is time waiting for disk I/O. “steal” is time the hypervisor stole from your VM. Load average is the number of runnable tasks waiting for CPU or I/O; it is not a percent.

Common tools and what they show:

top / htop — live per-process CPU and memory, per-thread view (htop).
ps, pstree — point-in-time process list and parent/child relations.
pidstat, mpstat, vmstat, iostat, sar — historical and per-CPU stats.
perf, eBPF tools (bcc, bpftrace) — function-level CPU hotspots and stacks.
strace / truss — system calls from a process.
flamegraphs / async-profiler / py-spy / pprof — sampling profilers for app-level hotspots.

Remember: on a multi-core server, 100% CPU in top means one core fully used. For N cores, 100% * N is full utilization.

Common causes of high CPU on servers

High CPU can come from many sources. Common ones:

Legitimate heavy computation (batch jobs, image processing, cryptography).
Busy-wait or tight loops in code (polling instead of blocking).
Excessive system calls or context switches.
High garbage collection or JIT compilation activity.
Too many short-lived processes or threads.
Background tasks running at the same time (backups, scans, cron jobs).
Network or disk interrupts forcing CPU work (high interrupt rate).
Misconfigured virtualization (CPU steal) or noisy neighbors on shared hosts.
Bad deployments or infinite loops introduced by code changes.
Excessive logging or tracing that serializes work.

Each cause changes the profile of CPU usage. For example, high system% plus many interrupts points to driver or I/O work.

Diagnosing live CPU spikes and intermittent issues

Start with low-cost observations, then increase detail only as needed.

Capture a baseline:
- Run top or htop to see current top processes.
- Use mpstat -P ALL 1 5 for per-core view.
- Check dmesg for kernel errors.
When you see a spike:
- Use pidstat -u -p ALL 1 5 to record per-process CPU over time.
- Run top -H -p to see thread-level CPU usage for a process.
For intermittent spikes:
- Enable continuous sampling: perf record -a -g — sleep 60, then perf script to inspect stacks.
- Use eBPF tools (offered by bcc) like runqlat, execsnoop, or profile to capture low-overhead info.
- Collect flamegraphs from perf or async-profiler for Java.
For native apps:
- Attach perf top -p to see hot functions.
- Use gdb to get thread backtraces when safe (gdb –batch -p -ex “thread apply all bt”).
For high syscall or I/O-induced CPU:
- strace -tt -p for a short period to see frequent syscalls.
- iostat -x 1 5 and vmstat 1 5 to correlate I/O and CPU.
For web services:
- Correlate CPU spikes with incoming traffic (nginx logs, load balancer metrics).
- Capture a small request dump (tcpdump with filters) if suspecting malformed clients.

Always prefer sampling profilers over heavy tracing in production. Sampling picks CPU stacks periodically and has low overhead.

Identifying resource-hungry processes and threads

Find the exact process or thread causing load.

Use top or htop to list top CPU processes. Sort by %CPU.
Use top -H to expand threads or htop’s thread view.
On Linux, pidstat -t -u -p 1 shows per-thread CPU for a process.
For Java apps: jstack or jcmd Thread.print to get thread dumps; use async-profiler for flamegraphs.
For Python: py-spy top -p or py-spy record to get sampling profiles without modifying the app.
For Node.js: 0x, clinic, or node’s built-in profiler.
Use lsof and netstat to see file or network handles held by the suspect process.
If process names are generic, map PID to container or systemd unit to find ownership.

Collect a few samples during the spike, not just one. Multiple samples reduce noise and reveal consistent hotspots.

OS-level tuning and kernel parameter adjustments

System settings can reduce unnecessary CPU work.

Scheduler and CPU:
- Set CPU governor to performance for latency-sensitive systems (echo performance > /sys/devices/system/cpu/cpu*/cpufreq/scaling_governor).
- Use CPU affinity (taskset) or cpuset cgroups to pin processes to cores if needed.
- Enable irqbalance to spread interrupts across cores.
Memory and swapping:
- Tune vm.swappiness (lower to prefer RAM).
- Reduce vm.dirty_ratio and vm.dirty_background_ratio to avoid long synchronous write spikes.
Networking:
- Increase net.core.somaxconn and tcp_max_syn_backlog for busy servers.
- Enable GRO/TSO on NICs and use appropriate driver settings.
I/O:
- Choose io scheduler (noop or deadline) for SSDs to lower CPU overhead.
- Use async direct I/O where supported to avoid blocking CPU waits.
Limits and isolation:
- Use cgroups v2 to limit CPU shares and guarantee quotas for critical services.
- Use systemd’s CPUQuota or CPUShares to prevent background jobs from starving important services.

Change kernel parameters carefully and document values. Test in staging before production.

Application-level fixes and code optimization

Fixing code often yields the biggest win.

Profile and find hot functions before changing code.
Replace O(n^2) algorithms with linear or log-linear ones.
Avoid busy-wait loops; use blocking I/O, epoll/select, or condition variables.
Batch work and reduce per-item overhead (group writes, requests, or DB calls).
Limit concurrency: too many threads can increase context switches and locks.
Tune garbage collectors and heap sizes for JVM/.NET to reduce GC CPU spikes.
Reduce logging, especially expensive string formatting at high volume.
Cache expensive computations and use memoization where safe.
Use optimized libraries for heavy math or string work (C-extensions, SIMD libraries).
Add back-pressure or rate-limiting to prevent overload.

Always measure before and after. Small code changes can have large system effects.

Managing background jobs, cron tasks, and scheduled work

Background tasks are frequent culprits.

Inventory all scheduled jobs (crontab, at, systemd timers, Kubernetes CronJobs).
Stagger heavy jobs across time and hosts to avoid overlapping peaks.
Use flock or lock files to prevent concurrent runs of the same job.
Assign lower priority with nice and ionice for non-critical background tasks.
For distributed systems, use leader election or a dedicated scheduler to avoid duplication.
For one-off maintenance jobs, run them on a dedicated host if possible.

Document schedules and notify teams when changing timing to avoid surprises.

Container and virtualization considerations

Containers and VMs add layers that change CPU behavior.

Understand cgroups: requests vs limits in Kubernetes matter. Limits can throttle and cause throttling-related latency.
Use CPU requests to reserve capacity; use limits to cap usage and avoid noisy neighbors.
Watch for CPU steal in VMs (steal% in top or vmstat). High steal means the host is oversubscribed.
Be aware that containerized apps may mis-report CPU if not accounting per-core normalization.
Use node-level monitoring for container hosts and container-level metrics (cadvisor, kubelet metrics).
Avoid running high-CPU system IDs and batch jobs on the same nodes as latency-sensitive pods.
For real-time tasks, consider privileged containers or pinning threads with cpuset.cpus.

In shared clouds, noisy neighbors and oversubscription are common; choose instance types accordingly.

Hardware, scaling and capacity planning

Sometimes the fix is more capacity.

Collect baseline metrics over time: CPU, memory, IO, network, load average. Use them for planning.
Decide scale-up (bigger CPU) vs scale-out (more nodes) based on app design.
For single-threaded bottlenecks, higher clock speed or fewer, faster cores help.
For parallelizable workloads, more cores and network bandwidth are better.
Consider NUMA: place memory and threads on the same socket to avoid cross-node penalties.
Use modern NICs with offloads (RSS, RX/TX checksum offload) to lower CPU for networking.
If storage is CPU-bound, consider NVMe or offloading to hardware (compression, RAID controllers).

Capacity planning should include reserve headroom for spikes and maintenance.

Monitoring, alerting and long-term prevention strategies

Good observability prevents surprises.

Monitor CPU usage per host, per core, per container, and per process when possible.
Correlate CPU with latency, error rates, request rate, and I/O to find root causes.
Capture periodic stack samples (flamegraphs) and store them for when alerts fire.
Set sensible alert thresholds: short, high bursts might be fine; sustained high CPU is important.
Include runbooks with alerts that list quick checks and safe mitigation steps.
Keep historical metrics to detect slow trends from code changes or data growth.
Use anomaly detection or rolling baselines to reduce noisy alerts.

A monitoring toolset (Prometheus + Grafana, Datadog, New Relic) plus retained traces and profiles is ideal.

Emergency steps, mitigation and rollback procedures

When CPU problems threaten availability, act fast and safe.

Quick containment:
- Identify the top CPU consumers (top/htop).
- If non-critical, lower priority: renice +19 PID; ionice -c3 -p PID for IO tasks.
- If a container, consider scaling it down or evicting it.
Throttle traffic:
- Reduce incoming requests at load balancer or reverse proxy.
- Put service in read-only or maintenance mode if safe.
Revert recent changes:
- Roll back the last deploy if the spike coincides with a new release.
- Disable new feature flags or integrations that might be causing load.
Isolate and restart safely:
- Restart only the affected service, not the whole host.
- If restarting, follow graceful shutdown and drain procedures.
Scale temporarily:
- Add more instances or increase CPU capacity temporarily.
- Use autoscaling to handle traffic surges where available.
Collect forensic data:
- Before killing processes, capture perf or stack samples and logs for post-mortem.
Post-mitigation:
- Run a root-cause analysis. Document cause, fix, and actions to prevent recurrence.
- Update runbooks and monitoring thresholds based on findings.

Have a rollback and mitigation checklist available with access instructions for on-call staff.

Final notes

Diagnosing high CPU requires both short-term fixes and long-term fixes. Start with simple observations, capture evidence during spikes, use low-overhead profilers, and address either system or application causes depending on findings. Combine configuration fixes, code changes, scheduling discipline, and capacity planning to prevent repeat incidents. Keep monitoring and runbooks up to date so the next spike is easier to resolve.

About Jack Williams

Jack Williams is a WordPress and server management specialist at Moss.sh, where he helps developers automate their WordPress deployments and streamline server administration for crypto platforms and traditional web projects. With a focus on practical DevOps solutions, he writes guides on zero-downtime deployments, security automation, WordPress performance optimization, and cryptocurrency platform reviews for freelancers, agencies, and startups in the blockchain and fintech space.

← Previous Post

Distributed Tracing Implementation Guide

Next Post →

Progressive Deployment Strategies

Stay Updated

Subscribe to our newsletter and get the latest updates delivered to your inbox.