Low Available Memory on Host (TrueScan-12)
Overview
An alert was triggered indicating that host TrueScan-12 has low available memory.
The alert is based on the system.mem.pct_usable metric reporting less than 10% usable memory on average over the last 5 minutes.
This condition typically means that running processes are consuming most of the system RAM, leaving insufficient memory for additional workloads or spikes in demand.
Symptoms
· Slower application performance due to paging or swapping
· Increased service latency
· Service crashes or unexpected restarts
· Background job failures
· System instability or degraded responsiveness
· Out Of Memory (OOM) kill events in logs
Impact
If not resolved, low memory can impact user-facing applications, backend services, scheduled jobs, monitoring agents, and overall host stability.
Alert Details
Metric: system.mem.pct_usable
Threshold: Less than 10%
Evaluation Window: 5-minute average
Initial Troubleshooting Steps
- Review Metrics in Datadog
Navigate to Infrastructure → Host Map → Select Host → Metrics Tab.
Review:
- system.mem.pct_usable
- system.mem.total
- system.mem.used
- system.swap.used
- process.mem.rss
- process.mem.real - Check Live Processes
Use Infrastructure → Live Processes and sort by memory usage (RSS) to identify high memory-consuming processes.
SSH Into the Host
Run the following commands:
· top -o %MEM
· htop
· free -m
· ps aux --sort=-%mem | head -10
· vmstat 1 5
Common Causes & Resolutions
1. Memory leak or runaway process: Restart or stop the problematic process.
2. Memory-intensive workload: Scale vertically (increase RAM) or redistribute workloads.
3. Concurrent heavy jobs: Stagger or reschedule batch tasks.
4. Misconfigured JVM/application heap: Tune heap size, buffers, or memory limits.
5. Insufficient instance size: Resize instance or migrate to higher memory class.
6. Swap disabled or insufficient: Enable/configure swap (short-term mitigation only).
Preventive Measures
· Configure early warning alerts at 20–25% usable memory.
· Implement autoscaling where applicable.
· Enable process-level memory monitoring.
· Conduct regular capacity planning reviews.
· Monitor OOM events and logs.