- 3rd January 2017
- Posted by: Stephen Downie
- Category: DevOps, Performance
Systems performance more often than not rely heavily on the speed of the underlying platform. Of course bad code will just run slow, but things can be done to optimise server performance to at least give it a chance. In this article we’ll run through some of those tricks.
What are the main components of a server?
- Disks (IO)
- Operating System
This hasn’t changed greatly in a long time, but they way we use servers these days has. It used to be the case that we’d have a rack of database servers, a rack of web servers and so on and we’d optimise the server and that software so they worked in harmony. In some cases this still holds true but these days with container and the huge range of workloads employed in businesses we often find that each server runs many different services and you can’t optimise specifically for it.
Developers and end users often assume that by throwing more processors at the server processes will often magically run faster. Of course if you are maxing out your cores this may be the case but more often than not servers are IO limited.
Unless you plan to run all your applications in memory, which is highly unlikely, ensure you have the fastest disks your budget will allow. SSD’s are commonplace and are brilliant for storing and retrieving data. Of course you might not want to store everything on more expensive SSD’s so you put your operating system onto spinning disks, but if you have data or processes that need fast read and writes, then invest in an array of SSDs.
Next up, RAM, the stuff where processes place things they need. The faster the better of course, but also volume depends on what you think the server will be used for. For example, if you can persist data to RAM instead of to disk, access will be much quicker as opposed to data being written to your hard drives. Similarly, if you run so many RAM consuming processes that you run out of RAM, your server will start swapping those processes to virtual memory… on your disk, assuming you have a swap space enabled, otherwise it’ll just start killing threads! If you start swapping, the processes will slow to a crawl so it needs to be avoided at all reasonable costs.
Finally the CPU and core count. There are a wide variety of CPUs on the market, which one you choose often depends on budget. Intel rule the roost when it comes to server CPUs but there are other options alongside Intel and AMD. Depending on workload there are also ARM processors and ARM Cloud vendors. Most CPU’s have a number of cores and those cores can run a thread each, things like hyperthreading also help speed things up. Depending on how the applications are written dictates how the application scales across CPUs. For a very basic example a MySQL query would take up 1 core whilst running a query, it can’t distribute that query across a number of cores, but should another user start a 2nd query, that wouldn’t have to wait for the first query, it would start executing on a second core assuming one was available.
Outside of the core components, optimisation might also include tuning of your networking stack, filesystems on the disks, your kernel or the applications.
SOME KEY CONCEPTS
Pretty self explanatory but work reminding users. CPU utilisation is the usage of the processing resources on offer by a specific CPU. Depending on what tool you are using to report CPU usage will affect the way its displayed but more often than not it is returned on a per core basis.
CPU LOAD AVERAGE
CPU Load Average is a great way to find out the load on your system over time. CPU utilisation will tell you the current usage but what has the load on the system been over a period of time? CPU load average helps you figure this out.
If you open top or similar you will see a label for load average that has 3 numbers. These are the load in the last minute, 5 minutes and 15 minutes. The acceptable level for this value differs depending on how many CPUs you have. On a single CPU system a load average of 1 means that your CPU was fully loaded for that period of time.
So, 1, 0.7, 4.09 would mean your CPU load was 100% for the past minute, 30% idle for the past 5 minutes but 309% overloaded on average over the last 15 minutes. This gives you good insight on to how much capacity and headroom your server has at a given point in time.
There are 4 types of disk scheduling, which one you choose depends a lot on your own testing but we can explain their operation somewhat, here.
The default scheduler is the Complete Fair Queuing (CFQ) scheduler. CFQ places requests submitted by processes into a number of queues and then depending on the I/O priority is is given a timeslice to access the disk.
There is the deadline scheduler which guarantees a start service time for a request. To ensure the start time is met the scheduler will impose a deadline on all other I/O operations. The deadline scheduler has two deadline queues and sorted queues. When preparing its next request the scheduler chooses in order of priority from the read queues then checks the deadline queue.
Anticipatory scheduling attempts to improve performance by anticipating the future read operations, but has largely been superseded by the CFQ scheduler and as such has been removed from the Linux kernel.
Over the years we have seen many Linux filesystems come and go but the EXT variants have always remained. Reiser, JFS have all come and gone, ReiserFS was self inflicted somewhat.
EXT4 is the latest iteration and largely the default across the major Linux distributions. One of the first things you should do when bootstrapping a new Linux server is to add the noatime option to the /etc/fstab entries. This stops the disk updating the file metadata every time the file is accessed. Most Linux servers make no use of this information so you should be able to safely switch it off.
Some sites suggest that you enable the dealloc option. This defers writing to disk until the last possible moment. If data corruption isn’t a worry on power failure or similar (it is for most people) then feel free to enable it. Personally I like my files intact so I leave it disabled. You can also tweak the data option default is data=ordered which the EXT developers see a the best trade off between performance and data protection, but if you needed quicker file system access you could set it to data=writeback.
Another very good option in more modern systems like Ubuntu Xenial is to use XFS as a mount point file system within your installation. XFS has been available to Solaris users for a long time but has only recently seen its licensing issues resolved in a way Linux systems can make use of it. XFS excels in execution of parallel IO and scales massively. As long as you don’t need more than 8 Exbibytes storage, you’re fine! XFS also supports snapshots, guaranteed rate IO and disk quotas.
RAID has been around for a long time and offers a number of enhancements over writing to a single disk. There are 7 core RAID versions 0 through 6. There are also 2 nested RAID types 0+1 and 1+0.
- RAID 0: Striping without mirroring or parity. This will split your storage requirements across all the disks in the set. Of course with no mirroring or parity, you get max performance because the IO will balance across all your disks, but if a drive failed then the set is lost until a new disk is inserted and the server restored.
- RAID 1: A plain mirror. This setup mirrors all the data across multiple drives in the set ensuring data integrity but causing a slowdown in performance as the speed is the max of the slowest drive.
- RAID 2: No longer used
- RAID 3: Striping with parity. No longer commonly used.
- RAID 4: Block level striping with parity. Improved IO over 2 and 3 but with the parity operation to ensure data integrity.
- RAID 5: Block-level striping with distributed parity. Requires at least 3 disks, will distribute the parity across all disks and can survive a single disk failure. The issue with RAID 5 is the requirement for all other drives to be read during rebuild which could cause a 2nd drive failure and bring down the RAID array.
- RAID 6: Block-level striping with double distributed parity. Similar to RAID 5 but provides fault tolerance over 2 failed drives. This allows you to use different disks from different manufacturers to mitigate the disk failure risk.
- RAID 10: Strtiped set from mirrored drives. This type of RAID array can sustain multiple losses assuming no single mirror looses all of its drives.
Standard on all Linux distributions, allows systems administrators to see whats consuming all their resources along with various important server statistics.
Personally I install htop almost as soon as I have installed my OS. It provies a much more userfriendly interface for administrators looking to pinpoint resource usage, processes or process consumption.
IOTOP is very useful for diagnosing IO issues. It has a TOP like interface allowing you overview of what is sucking up all your precious IO bandwidth.
IPTraf is another great console utility to show network traffic on your server. Showing a lot of useful information like connections, packets, interface stats and traffic type going over the connections, it lets you find out whats going up and down the pipes quickly and easily.
Iftop is yet another top style interface but this tipe for brandwidth usage on a network interface. Great tool to find out whats sucking up your bandwidth utilisation.
Spy on your users or your system users using PSACCT and make sure they’re not doing anything naughty on the command line.
Let your server monitor itself with Monit and provide instant web based feedback on the services and their activity.
Another lightweight web based project designed to provide operational oversight of your servers with as little effort as possible. Monitorix will monitor a lot of service and system resources out of the box and can be easily extended.
Intrusion detection is a big thing these days. Suricata provides real time intrusion detection, inline intrusion prevention and network security monitoring. Not bad for an open source tool!
OVER AND OUT
So there are some hints and tips on how to speed up your server and some helpful tools to find out where your bottlenecks are. If you want to find out more or want to setup a call to discuss your requirements then just schedule a meeting on my calendar and I look forward to discussing server deployments with you.