Keywords
Linux terminal server, LTSP, xrdp, kernel, scheduler, cgroups.Summary
The default Linux scheduling configuration divides CPU time on a per process basis. If a system accepts multiple interactive users, that is not the desired situation since a user that is running many processes will have a larger CPU share than the others. However, it is possible to configure fair share user scheduling on Linux using cgroups together with some scripting. That will distribute CPU time, as well as other resources, in a flexible user based manner. NFS I/O escapes the control groups mechanisms but we will provide a method for handling such situations. The motivation for this article is the need to ensure that, on a correctly configured server, a single user is unable to cause system wide performance degradationIntroduction
Once up on a time there was a kernel option called FAIR_USER_SCHED that divided the CPU time slices on a per user basis. For some time, later on, that option was called just USER_SCHED. Eventually it got removed because its existence meant a lot of code to maintain. As a result, each google search about "fair share user scheduling" brings now up old discussion threads and college homework assignments. But as this option was removed a hot replacement called cgroups was ready to rock... well, sort of.In system administration nothing is ever that simple. A kernel option that would automatically set the scheduler to work on a per user basis got replaced by a much more general process group approach that needs manual configuration and user space real time process classification to achieve the same effect. Less code to maintain in the kernel - lots of homemade scripts to be written worldwide and no documentation on how to do it properly. There is, however, a great thing in the new approach: it is not only the CPU that can be fairly shared. We can now control the scheduling of other resources such as I/O on block devices and network activity.
Resource sharing on terminal servers
There are different systems that can benefict from this such as mass hosting web servers running Apache with Suexec, large university development machines that take thousands of SSH logins and terminal servers. We will focus on terminal servers for this article.The case for terminal servers is a rather trivial one, in terms of system administration efficiency. But configuration and maintenance require a deep understanding of many aspects of OS and network operations. Part of that will be the subject of a future article. For now, we would like to stress that when talking about terminal servers we don't mean the single application remote session running on a simplified desktop (or no desktop at all). Neither we mean a virtual remote single-user PC. We are talking about a complete remote desktop server where N users deal simultaneously with KDE, Firefox, Libreoffice, Gwenview, Gimp, Okular, arK, rdesktop and others, as if they had a local machine with all that software installed.
From time to time Linus Torvalds tells the media that one of the remaining challenges for the Linux kernel is desktop workloads. So, the greatest possible challenge must be having N desktop users on the same machine sharing the same Linux kernel. We will now show you how to optimize the scheduling for fair sharing of CPU and block I/O. Everything described in this article was successfully tested on Ubuntu 12.04 LTS using XRDP as the remote desktop server.
Kernel control groups
Control groups, aka cgroups, are sets of processes to which the Linux kernel assigns specific resource shares. When you install cgroups on Ubuntuapt-get install cgroup-bin libcgroup1 libpam-cgroupyou will get a default configuration where all the running processes belong to the same cgroup. That is a particular configuration where the kernel behaves just as if the cgroup feature was not enabled - since there is a single group it takes all the resources and shares them internally on a per-process basis. From that point on we can start building a custom configuration.
Essentialy, this is done using
getent group REMOTEUSERSand pushing the list to the cgcreate command, that belongs to libcgroup.
There are some details related to the fact that the cgroups must be refreshed from time to time to keep the terminal server in sync with the list of users stored on the authentication server. Since there may be thousands of users and, in that case, we don't want to recreate the already existing cgroups, we will provide you a script that takes care of such details.
[...]/etc/cgrules.conf
session required pam_script.so
session optional pam_cgroup.so
session sufficient pam_winbind.so
session optional pam_ck_connector.so nox11
[...]
@REMOTEUSERS cpu,blkio %UWith this configuration users belonging to the group REMOTEUSERS will have an individual cgroup. On the other hand, users that are allowed to log in but don't belong to that group will compete for resources within a group called "others". Processes belonging to users root and xrdp will have a reserved share via a group called "system".
root cpu,blkio system
xrdp cpu,blkio system
* cpu,blkio others
3. The final step is allocating the available resources to the created groups.
Something like this:
for i in $USERLIST; doNote: if you are seeing pointless cgroup messages on the syslog add
cgcreate -g cpu:/$i
cgcreate -g blkio:/$i
cgset -r blkio.weight=$USERBLKIO /$i
done
# other users that have processes running
if [ ! -e /sys/fs/cgroup/cpu/others ]; then
cgcreate -g cpu:/others
cgcreate -g blkio:/others
cgset -r blkio.weight=$USERBLKIO /others
fi
# for anything that runs as root and something else that cgrules.conf puts in thys group
if [ ! -e /sys/fs/cgroup/cpu/system ]; then
cgcreate -g cpu:/system
cgcreate -g blkio:/system
# reserved for root ssh actions - default shares value is 1024
cgset -r cpu.shares=$SYSCPU /system
# default value is 500, max value is 1000
cgset -r blkio.weight=$SYSBLKIO /system
fi
LOG="--no-log"to /etc/cgred.conf.
Automating setup and maintenance
To automate the synchronization between the terminal server and the authentication directory it connects to (be it Active Directory, LDAP or Samba...) we suggest this script. It makes cgroup management a much easier task and can be edited to suit your needs.Testing the setup
After the the system is configured you should see one directory named as each user at /sys/fs/cgroup/cpu and /sys/fs/cgroup/blkio/. Inside each of those directories there will be a file called "tasks" that lists the processes of the corresponding user. You can compare the content of that file to the content of "ps -u USERNAME" and check that it matches. You can see that as another user logs in its processes are automatically inserted at both /sys/fs/cgroup/cpu/ANOTHERUSER/tasks and /sys/fs/cgroup/blkio/ANOTHERUSER/tasks.You can query the properties of each cgroup with commands such as the following:
cgget USERNAMEOnce you are sure that group configuration and process classification are working you can run CPU and block I/O sharing tests.
cgget system
cgget others
Imagine that you have a single core VM. If a certain CPU intensive test process takes 1 minute to complete for user A while pushing the virtual CPU to 100%, two of the same processes running - one for user A and another for user B - shall take 2 minutes to complete. Each of the users will wait 2 minutes for the result. Now, what if user B decides to run 2 instances of the same process? The answer is easy: it would take a total of 3 minutes for all the processes to be ready. But here you will see the important difference: without our cgroup configuration both users would wait 3 minutes for all their processes to complete whereas with our configuration user A would wait 2 minutes and B would wait 3. That is, each user will get half of the CPU time regardless of the number of processes it runs. When user A's process is done, after 2 minutes, user B will get the whole CPU and finish the remaining work in a single minute. The evolution of CPU shares in time is summarized in the following table.
Without fair sharing |
1st
minute
|
2nd
minute
|
3rd
minute
|
A |
1/3
|
1/3
|
1/3
|
B |
2/3
|
2/3
|
2/3
|
With fair sharing |
1st
minute
|
2nd
minute
|
3rd
minute
|
A |
1/2
|
1/2
|
0
|
B |
1/2
|
1/2
|
1
|
We should stress that for this behaviour to be verifiable the test must fully occupy the available CPUs. On a single core machine it is easy to do it with a single test process. To perform this simple test on a multicore machine you need to run extra CPU intensive processes just to keep all but one cores at 100% thus enabling competition between test processes - cgroups have no effect if there is no resource scarcity.
To test block I/O resource sharing the procedure is similar. We suggest that you saturate disk read access by using dd to read through a large file. Running concurrent dd processes mustn't allow the owner of those processes to write at a higher rate. After preparing a couple of large files with
dd if=/dev/zero of=/tmp/zerofile1 bs=1024 count=4096000one simple test would be executing something like
cp -f /tmp/zerofile1 /tmp/zerofile2
cp -f /tmp/zerofile1 /tmp/zerofile3
clearJust as with the CPU test you would conclude that ANOTHERUSER would not see an advantage in running 2 concurrent processes.
sync
echo 3 > /proc/sys/vm/drop_caches
cgexec -g blkio:ONEUSER dd if=/tmp/zerofile1 of=/dev/null &
cgexec -g blkio:ANOTHERUSER dd if=/tmp/zerofile2 of=/dev/null &
cgexec -g blkio:ANOTHERUSER dd if=/tmp/zerofile3 of=/dev/null &
CPU and block I/O can be monitored with the atop utilty - use shift+d to sort by DSK activity and shift+c to sort by CPU usage. If you'd like to see CPU usage grouped per user you can use this script.
NFS - not fair shareable
If everything above is correctly setup you should have a system where, if pressure mounts, each user becomes bound to a certain amount of CPU and block I/O. Thus, the impact of misbehaviour or accidental misuse is limited.Still, problems may arise if you are using NFS to store home directories. If you are doing that, probably due to having several terminal servers balanced across a large group of users, you should be aware that NFS usage is not fair shareable. This suggests a new funky acronym for NFS but that is purely accidental and we won't go further that way.
The problem is that NFS I/O does not count as block I/O - NFS is a network filesystem not a block device. Furthermore, network transfers on NFS are done as root so network related cgroups mechanisms can't be used either.
To mitigate this problem we developed a simple I/O governor that permanently monitors suspect processes - that must be defined in a case by case basis - and calms them down if they are taking too much I/O. By calming them down, we mean sending them SIGSTOP and, after a while, SIGCONT. What the governor actually does to an I/O intensive process, after leaving it alone for a certain configurable grace period, is put it to sleep for a decreasing fraction of its running time until its I/O activity reaches a specific pre-configured limit.
The usual suspects are processes that tend to saturate the I/O bandwidth to the physical storage by reading or writing at a very high rate causing large delays to the short reads and writes of other applications. For example, we wouldn't like that a user copying a 1GB folder to a shared network drive would degrade the startup time of Libreoffice or Firefox for all the others. But that could happen, since NFS is not fair shareable and that's precisely what the I/O governor avoids. The original idea came from this CPU governor (which we don't need since cgroups are doing the cpu governing job) and the working logic is probably not very different from the Process Lasso windows tool that we mentioned here.
We usually monitor kio_file processes which are responsible for handling file copy operations performed with Dolphin, on KDE. That, of course, needs to be adapted to the particular use of each terminal server. You can take a look at the governor here. Please note that it should be running at all times and depends on the helper script iolim.sh that you can find here.
Conclusion
Resource governing can be achieved on terminal servers by using a combination of the cgroups kernel feature and a couple of bash scripts. Performance loss situations, that could become stressful to the users, can be effectively avoided. Runaway processes that potentially consume too much CPU, block I/O or NFS I/O can be controlled automatically.Here is how an 8 VCPU virtual machine, supported on 8 real CPU cores, handles 15 concurrent full desktop users with great performance. Note that we are in average delivering half a core and 400MB of RAM to each user with excellent performance. The number of users could still grow significantly without causing work delays.
References
http://www.janoszen.com/2013/02/06/limiting-linux-processes-cgroups-explainedhttp://utcc.utoronto.ca/~cks/space/blog/linux/CGroupsPerUser
https://www.kernel.org/doc/Documentation/cgroups
Sem comentários:
Enviar um comentário