MarkoVujovic.com

Linux Performance Troubleshooting

When a linux server is having performance issues, it’s important to determine the root cause of the issue. Linux Performance Troubleshooting involves determining if the load is:

CPU bound (processes waiting for CPU resources)
RAM-bound (high RAM usage moved to swap)
I/O bound (processes fighting for disk or network I/O)

Troubleshooting Server bottlenecks

Use top/iotop to identify what resources you are running out of (CPU, RAM or Disk I/O)
Identify what processes are consuming the most resources (CPU, RAM or Disk I/O).

CPU Bound Load

CPU load is equal to the number of processes in a runnable or uninterruptible state. Linux system tools show us the “load average”, which is an average of the computer’s load (number of processes in a runnable or uninterruptible state) over several periods of time (1, 5 and 15 minute load averages).

Example: If you have one CPU present on a server and that core is utilization 100%, this system is at a load of 1.

From a LAMP perspective, web servers are usually CPU bound. If you receive a bust of traffic, you will see a spike as the apache processes competing for system resources. As the traffic bust lessens, the apache processes complete their requests the load comes back down.

How do you find the CPU load of a linux server?

1) uptime (1, 5 and 15 minute load averages).

[root@test ~]$ uptime
 19:58:05 up 270 days, 2:38, 2 users, load average: 2.55, 2.37, 1.87

Tip: Systems seem to be more responsive under CPU-bound load than when under I/O-bound load.

2) top

top reads the load average from /proc/loadavg

[root@test ~]$ cat /proc/loadavg
1.43 2.04 1.80 1/664 10619

When you run top, the load average is displayed on the top right corner:

However the load average doesn’t really tell us much unless we consider the total number of cores. Generally you want the load number to be less than the number of CPU(s)/cores you have. When the load average is over the CPU core count, this means the CPU(s) are utilized to the max and that the workload is being queued.

Finding the number of CPU cores

1) Run top, Press 1. This give you a breakout of the total number of cores and their usage.

Linux Performance Troubleshooting — Linux Top Command – Multiple Cores

2) You can also use the nproc command (http://www.cyberciti.biz/faq/linux-get-number-of-cpus-core-command/):

[root@test ~]$ nproc
16

In the example above, we have 16 cores. If the load is over 16, we know there is queuing and the CPUs are maxed.

Which process is causing the high CPU load?

If the load is CPU-bound, use top to display which processes are consuming the most CPU. top sorts processes by their CPU usage by default. Hit the F key to change to a screen there you can choose the sort column.

If a web server is experiencing heavy load and the apache processes are consuming the most CPU, you can use profiling tools like New Relic to determine what the apache processes are doing and which requests are the most expensive.

RAM-bound Load:

When free available memory on a server drops, there is no more room in memory and the system begins to swap. Swapping is when a page in memory is written to disk to free up memory. Compared to RAM memory, hard drive disks are very slow. Because a process running in memory is an order of magnitude faster than disk, the system usually slows down considerably. The more swapping , the slower a system becomes.

Because swapping leads to high I/O, when you see high I/O it’s important to determine if this is because the system is swapping or read/writes to the disks are high (I/O Bound workload).

Using top, you can see how much memory the system has and how much is free. Use the Mem row to see the total, used and free memory on a system.

[root@test ~]$ top
top - 19:51:48 up 270 days, 2:31, 2 users, load average: 2.11, 1.77, 1.50
Tasks: 440 total, 1 running, 439 sleeping, 0 stopped, 0 zombie
Cpu(s): 8.6%us, 1.1%sy, 0.0%ni, 90.1%id, 0.1%wa, 0.0%hi, 0.1%si, 0.0%st
Mem: 24591916k total, 19192916k used, 5399000k free, 441868k buffers
Swap: 2097144k total, 506732k used, 1590412k free, 6147300k cached

You can also use the free -m command to see memory information (including buffers/cache). To find out how much actual RAM is really being used by processes, you must subtract the file cache from the used RAM.

[root@test ~]# free -m
             total       used       free     shared    buffers     cached
Mem:           498         93        405          0         15         32
Low:           498         93        405
High:            0          0          0
-/+ buffers/cache:         44        453
Swap:         1023          0       1023

Is the system currently swapping?

You can use vmstat to determine if a server is currently swapping. The two columns under “swap” –

“si” shows you swap in
“so” shows you swap out

Swap in/out numbers should be zero. If they are greater than zero, the system is swapping.

Linux Performance Troubleshooting - vmstat every 1 second — Linux vmstat Command

Out-of-memory (OOM)

When memory is dangerously low, Linux kernel will run the out-of-memory (OOM) process which starts killing processes to free memory.

Any action taken by the OOM are stored in the /var/log/message* log file:

grep -i kill /var/log/messages*

Which process is causing the high Memory load?

If the load is RAM-bound, use top to display which processes are consuming the most Memory. To sort top by memory, hit M key on your keyboard which sorts by RAM usage. Hit the F key to change to a screen there you can choose the sort column.

I/O Bound Load:

When a system is I/O bound, the system is spending a large amount of CPU time waiting for I/O (either network or disk). If the output from top for I/O wait is low, you can rule out I/O as the reason for performance issues.

From a LAMP perspective, database servers are usually I/O bound. To keep I/O on database server to a minimum, you should try to keep your “working-set” of data in memory. If this becomes difficult on one server, you may have to consider sharding to spread your data across multiple servers. This will scale out both writes/reads.

top (I/O wait)

Use top to determine the current percentage of CPU(s) in iowait :

[root@test ~]$ top
top - 20:38:26 up 14 min, 1 user, load average: 0.00, 0.00, 0.00
Tasks: 103 total, 1 running, 102 sleeping, 0 stopped, 0 zombie
Cpu(s): 0.6%us, 0.3%sy, 0.0%ni, 98.9%id, 0.1%wa, 0.0%hi, 0.0%si, 0.0%st
Mem: 1922320k total, 564172k used, 1358148k free, 10916k buffers
Swap: 1254392k total, 0k used, 1254392k free, 236116k cached

Iostat

iostat – linux tool used to determine which hard drive partition has the highest I/0 activity.

 iostat -x 1 - This will run iostat every 1 second with extended results.  The number under %iowait will tell us how much iowait the system is experiencing.

tps - Transfer per second.  Transfer is the same as I/O requests sent to the device.
blk_read/s - Block reads from the device per second.
blk_wrtn/s - Blocks written to the device per second.
blk_read - Total number of blocks read from the device
blk_wrtn - Total number of blocks written to the device.

Which process is causing the high I/O?

[root@test ~]# iotop

Note: iotop is not installed by default on most RHEL machines. To install:
yum install iotop

iotop will sort processes causing the highest I/O utilization:

If a process is waiting for I/O, you can use ps to view its process state. A “D” state (uninterrupted sleep) implies it’s waiting for I/O:

[root@test ~]# ps aux
USER PID %CPU %MEM VSZ RSS TTY STAT START TIME COMMAND
apache    1256  0.0  0.4 493136  8120 ?        D    20:24   0:00 myprocess

You can then use that process id to see its I/O stats:

[root@test ~]# cat /proc/1256/io
rchar: 600
wchar: 0
syscr: 3
syscw: 0
read_bytes: 10000
write_bytes: 10000
cancelled_write_bytes: 0

To view files open by a process id (lsof) :

[root@test ~]$ lsof -p 1256
COMMAND PID USER FD TYPE DEVICE SIZE/OFF NODE NAME
php 12107 root cwd DIR 253,1 4096 9439252 /root
php 12107 root rtd DIR 253,1 4096 2 /
php 12107 root txt REG 253,1 3871784 14687350 /usr/bin/php
php     12107 root    1w   REG      253,1 961407489    4221807 /var/www/html/website/tmp/logs/some_log.log

The Performance Troubleshooting Process

First look at I/O wait.
- If I/O wait is high:
  - If the system is swapping:
    - Determine which processes are consuming the most Memory.
  - If the system is NOT swapping:
    - Determine which processes are consuming the most I/O.
- If I/O wait is low:
  - If idle percentage times are low:
    - Determine which processes are consuming the most CPU
  - If idle percentage times are high:
    - If memory is low, determine which processes are consuming the most Memory.
    - Look for network issues or other problems.

Other Helpful Disk commands

fdisk -l - partition information

/etc/fstab - System filesystems

pvdisplay - physical volume display


df -h - list all your mounted partitions along with their size

df /tmp - determine filesystem /tmp folder is using

du -sh - Disk utilization in the form of a summary

du -ckx | sort -n > /tmp/large_directories - Track down the largest directories

ls -lShr - list all of the files sorted by their size

sudo sh -c “> /var/log/messages” - Truncate file

Troubleshoot Historical Load Issues

Sar is an excellent tool which stores historical system information. This program requires the sysstat package on RHEL:

yum install sysstat

/etc/default/sysstat or /etc/sysconfig/sysstat

stores in /var/log/sysstat or /var/log/sa from /etc/cron.d/sysstat

sar command

sar - Default outputs the CPU statistics for the current day

sar -r - Display RAM statistics for the current day

sar -b - Display I/O information

sar -A - Output all the statics from load average, CPU load, RAM, disk I/O, network I/O and all sorts of other interesting statistics

sar -s 20:00:00 -e 20:30:00 - start and end dates ranges

sar -f /var/log/sysstat/sa04 - pull data from the statistics on the sixth of the month.

sar -r -s 13:00:00 -e 13:20:00 - Memory Utilization for start and end date range

sar -q -s 13:10:00 -e 13:20:00 - Processor Utilization for start and end date range

Here are some best practices when running Mongo on AWS:

Always use XFS or Ext4 filesystems when running Mongo. Both support consistent snapshots for backups.
Use SSD volumes – General Purpose SSD (GP2) or Provisioned IOPS SSD (IO1). Magnetic volumes provide much less IOPs than SSDs.
For best performance, use I2 – High I/O (I2) instances or D2 – Dense-storage (D2) instances. The more RAM the merrier, which would keep your working set in memory and not hit disks.
Use EBS-Optimized instances. These instances use an “optimized configuration stack and provides additional, dedicated capacity for Amazon EBS I/O” (http://docs.aws.amazon.com/AWSEC2/latest/UserGuide/EBSOptimized.html).
Mount drives used by Mongo without atime and diratime options. You can mount the disk manually or write to the /etc/fstab file to keep mount options persistent after restarts:
- ```
sudo mount -o remount,noatime,noexec  /var/lib/mongo
```
- ```
/etc/fstab: /dev/xvde1 /var/lib/mongo/ ext4 noatime,noexec 0 2
```
Raise the ulimits for Mongo. The defaults are usually low and not very performant for a production Mongo server. You can view ulimits with ulimit -a. Recommend settings are (https://docs.mongodb.org/v2.4/reference/ulimit/)
- -f (file size): unlimited
- -t (cpu time): unlimited
- -v (virtual memory): unlimited
- -n (open files): 64000
- -m (memory size): unlimited
- -u (processes/threads): 64000
Set block read ahead to 32. Example (assuming /dev/xvdfe is your block device for mount – run lsblk to confirm):
- ```
sudo blockdev --setra 32 /dev/xvdfe
```
Pre-warm snaphosts. This only applies if your volume was created from a snapshot. When you create an AWS EBS volume which is restored from a snapshot it’s not initialized (http://docs.aws.amazon.com/AWSEC2/latest/UserGuide/ebs-initialize.html). This causes “an increase in the latency of an I/O operation the first time each block is accessed”. You can access all blocks on a fresh EBS snapshot volumen by running:
- ```
sudo fio --filename=/dev/xvde --rw=randread --bs=128k --iodepth=32 --ioengine=libaio --direct=1 --name=volume-initialize
```
Pre-warm documents into memory. If you are promoting a SECONDARY into a PRIMARY, the working set data should be pre-warmed. If not, you will see I/O degradation with faults increasing in mongostat until all data is read from disk. Ideally you would want to have all documents in memory without touching disk (disk I/O access is much slower then RAM memory access). You can pre-warm Mongo documents by running live production queries to load them into memory. Also, you can leave a fresh instance as a SECONDARY for a day or two before promoting to PRIMARY.
Run each replica-set node in a different Availability Zone (AZ).

Above: Image from “Mongo on AWS Whitepaper“

Sources:

Amazon EBS- Optimized Instances – http://docs.aws.amazon.com/AWSEC2/latest/UserGuide/EBSOptimized.html
MongoDB ulimits – https://docs.mongodb.org/v2.4/reference/ulimit/
Mongo on AWS Whitepaper – https://d0.awsstatic.com/whitepapers/AWS_NoSQL_MongoDB.pdf