lecture: Inspecting a multi-everything Linux machine


Come to this talk if you want to understand better how a multi-core multi-disk Linux system operates and how you can inspect its operation in order to measure utilisation for capacity planning, or just for fun.

Long gone the days when a commodity server had one single-core CPU, one disk and understanding system utilisation was as easy as checking that load average is below 1. Yet some still think of system as highly utilised if not saturated when the system load average goes above and beyond 2-3.

What does load average really mean on a multi-core multi-disk server? Why are my tasks still slow if CPU is never more than 15% utilised? How much more disk resources do I have? Why can't my application read data any faster even though we have 50 disk array now? Answers to these and many similar questions you will be able to answer after this talk.

I will begin this talk by looking at the broad picture of how the key elements of the system all come into play - how a multi-component system operates on the high level and what does that mean to the system user (and system administrator of course). Then I will discuss what tools we can use to understand better what is our system really doing at any given moment and what does that have to say about system utilisation.