« Lotsa quickies | Main | Linux as a desktop... err... "needs a lot of work" »

Linux processor affinity: a rant

Update in September 2007: Google Analytics tells me that people are continually finding this page while searching for terms like “linux processor affinity”. You should know that I created the Portable Linux Processor Affinity project to address the problems stated in this blog entry. Please go there after reading this entry. Thanks.


This is a technical rant that can be summarized quickly: the current state of Linux processor affinity sucks.

There are esentially three different variants of the API (that I can find); which one you have depends on a combination of several factors:

  • your Linux distribution/vendor
  • what version of kernel you are using
  • what version of glibc you are using

Annoyingly, regardless of which variant of the API that you have on your system, the man page for sched_setaffinity(2) and sched_getaffinity(2) is the same. Specifically, it looks like this one man page has been copied everywhere and never updated to be what you actually have on your system. So you have — at best — a 1 in 3 shot of having these functions correctly documented.

As far as I can tell, here’s what the three variants are:

  1. int sched_setaffinity(pid_t pid, unsigned int len, unsigned long *mask);

    This originated in 2.5 kernels (which we won’t worry about) and some distros back-ported it to their 2.4 kernels. It’s unknown (to me) if this appears in any 2.6 kernels.

  2. int sched_setaffinity (pid_t __pid, size_t __cpusetsize, const cpu_set_t *__cpuset);

    This appears to be in recent 2.6 kernels (confirmed in Gentoo 2.6.11). I don’t know when #1 changed into #2. However, this prototype is nice — the cpu_set_t type is accompanied by fdset-like CPU_ZERO(), CPU_SET(), CPU_ISSET(), etc. macros.

  3. int sched_setaffinity (pid_t __pid, const cpu_set_t *__mask);

    (note the missing len parameter) This is in at least some Linux distros (e.g., MDK 10.0 with a 2.6.3 kernel, and SGI Altix, even though the Altix uses a 2.4-based kernel and therefore likely back-ported the 2.5 work but modified it for their needs). Similar to #2, the cpu_set_t type is accompanied by fdset-like CPU_ZERO(), CPU_SET(), CPU_ISSET(), etc. macros.

Also note that at least some distros of Linux have a broken CPU_ZERO macro (a pair of typos in /usr/include/bits/sched.h). MDK 9.2 is the screaming example, but it’s pretty old and probably only matters because I use that as a compilation machine :-) (it also appears to have been fixed in MDK 10.0, but they also changed from #2 to #3 — arrgh!). However, there’s no way of knowing where these typos came from and if they exist elsewhere. So it seems safest to have a configure script to check for a bad CPU_ZERO macro.

Glibc itself shares a bunch of the blame. Case in point — look at this implementation of sched_setaffinity from Glibc 2.3.2:

int
sched_setaffinity (pid, len, mask)
     pid_t pid;
     unsigned int len;
     unsigned long int *mask;
{
  __set_errno (ENOSYS);
  return -1;
}

Why even have the function there if all it’s going to do is return an error? It’s better to not have it at all (because we already have to have a complex configure script to figure out which one to use) than to provide one that is simply broken. Arrrggggghhhh!!

Finally, note that even the syscal() interface won’t help — apparently the back-end kernel function has changed the number and type of parameters multiple times (so that may not actually be Glibc’s fault). So there appears to be no portable way to use sched_setaffinity() and sched_getaffinity() without a complex configure script and multiple implementations in your code. That totally, totally sucks.

This rant is therefore an open appeal for the Linux development community to get its act together and figure this darn thing out once and for all, and standardize on a single API.


Update in September 2007: Google Analytics tells me that people are continually finding this page while searching for terms like “linux processor affinity”. You should know that I created the Portable Linux Processor Affinity project to address the problems stated in this blog entry. Please go there after reading this entry. Thanks.

Comments (7)

Ming:

I wonder if anyone can give me some advice.
I use this sched_setaffinity call on a Pentium 4 PC with the gentoo-Linux 2.6.14-r5 kernel with GNU C Library 2.3.5 and works as expected - as the man page says. But when I use it on a Xeon dual-processor PC with Red Hat Linux 2.4.21-37 kernel with glibc 2.4.20, I got an error: “EFAULT: A supplied memory address was invalid.” (The man page does not really explain what this error means. Apparently it refers to ‘mask’ but I can’t see how I could screw up the declaration of an long int pointer for ‘mask’.) Apparently Red Hat back-ports this setaffinity thing to its 2.4 kernel. I know nothing about kernel programming. I wonder whether this 2.4 kernel uses a faulty setaffinity() as mentioned above - i.e. returning an error regardless. How does ENOSYS relate to EFAULT? Jeff, are you saying that both an OS and the C Lib can provide the definition of a call, in this case sched_setaffinity()? And is there a way to select which one to use if both the OS and libc provides their own versions of the call (“system call” vs “library call”)? Is there any general rule as to which one is the default? And if libc provides a library call, should it have an entry for that call in “man“‘s section 3?

Ming:

Just to correct something I said in my last post.
Now it seems to me that the sched_setaffinity() call in my Xeon system (Linux 2.4 - glibc 2.3.2 - gcc 3.2.3) is NOT provided as a system call (i.e. Red Hat does not appear to back port it to the 2.4 kernel) but rather as a C Library call. I used the system call parameter types, i.e. unsigned long* for the third parameter ‘mask’, and I received the EFAULT. But when I changed the parameter types to what is described in the glibc manual, i.e. cpu_set_t * for the third parameter ‘cpuset’, I do not have any compile error. Therefore I conclude that the sched_setaffinity() on my Xeon system comes from glibc. However, when I try to use CPU_SETSIZE, CPU_SET, etc, the compiler complains that “`CPU_SETSIZE’ undeclared” etc. even though I did #include (and #define __USE_GNU) in my C program and those macros are defined in my system’s . And it seems I have to use CPU_SET to enable some CPU because the cpu_set is 0 unless initialized otherwise. With the cpu_set empty, the setaffinity call return EFAULT. This is another mystery: according to the glic manual section 22.3.5, EINVAL should have been returned because EINVAL means “the affinity set might not leave a processor for the process or thread to run on” which is exactly the case when the bitset is 0.
In conclusion, the hurdle for me now is the failure to use macros like CPU_SET. Any help is appreciated.
Jeff, I don’t know why you said glibc 2.3.2 defines sched_setaffinity() as what you said in your original post. At least according to Section 22.3.5 of the glibc 2.3.x manual (http://docs.biostat.wustl.edu/cgi-bin/info2html?(libc.info.gz)Top), the call is defined as
int sched_setaffinity (pid_t PID, size_t CPUSETSIZE, const cpu_set_t *CPUSET). And the glibc 2.3.2 on my system seems to confirm that.

Ming:

This is my last post tonight.
I copied the definition of the macros from to my C program:
#define CPU_SETSIZE __CPU_SETSIZE
#define CPU_SET(cpu, cpusetp) __CPU_SET (cpu, cpusetp)
#define CPU_CLR(cpu, cpusetp) __CPU_CLR (cpu, cpusetp)
#define CPU_ISSET(cpu, cpusetp) __CPU_ISSET (cpu, cpusetp)
#define CPU_ZERO(cpusetp) __CPU_ZERO (cpusetp)
I do not know whether this is a good idea but now the compiler no longer complains that it cannot find those macros.
Now I have:
cpu_set_t cpu_set;
unsigned int len = sizeof(cpu_set);
CPU_ZERO(&cpu_set);
CPU_SET(1,&cpu_set);
ret = sched_setaffinity(0, len, &cpu_set);
and I got a run-time error: sched_setaffinity returns -1 and EFAULT: The pointer CPUSET is does not point to a valid object.

I’ll answer your questions more-or-less in order…

  1. The sched_getaffinity() and sched_setaffinity() functions actually reside in glibc, but they are technically just small wrappers that invoke the back-end kernel functions. Hence, I believe they are referred to as “system calls” because the actual glibc code for them is quite small and only meant as a conduit to invoke the back-end kernel function.
  2. I do not know exactly what RH did in their 2.4 kernels to back port the processor affinity functionality.
  3. If you obtain the source for glibc 2.3.2 (from ftp://ftp.gnu.org/gnu/glibc/glibc-2.3.2.tar.gz) and look in both sysdeps/generic/sched_setaffinity.c and posix/sched.h, you’ll see exactly the implementation of sched_setaffinity() that I described in my original post. If your system has something different, then it is possible (likely?) that your distro vendor changed the implementation in glibc (it is not uncommon for distro vendors to put custom versions of packages in their software).
  4. Per the above bullet, glibc 2.3.2 does not have the CPU_* macros because it does not define the cpu_set_t type.
  5. It is definitely a bad idea to copy the CPU_* macros into your program.
  6. To solve all of these problems, we created the Portable Linux Processor Affinity (PLPA) software package as a spinoff of the Open MPI project ( http://www.open-mpi.org/ ). PLPA provides a consistent processor affinity interface and avoids the glibc functions altogether (not just for the reasons described above, but also because some versions of glibc actually had buggy implementations) and directly invoke the back-end kernel functions. PLPA performs run-time probing to figure out which variant of the back-end kernel function to use and dispatches accordingly. You can find the PLPA here:

http://www.open-mpi.org/software/plpa/

This should solve all your problems. Contact us on the PLPA mailing lists if you have problems.

Ming:

>>>>>>>>>
If you obtain the source for glibc 2.3.2 and look in both sysdeps/generic/sched_setaffinity.c and posix/sched.h, you’ll see exactly the implementation of sched_setaffinity() that I described in my original post. If your system has something different, then it is possible (likely?) that your distro vendor changed the implementation in glibc …
>>>>>>>>>
The distro vendor may have changed the glibc implementation (does that violate the GNU license?), but they cannot change the glibc manual. How come the prototypes of the functions in the manual are different from the actual glibc code (from the ftp site)? Perhaps one possible explanation is that the GNU developers changed the implementation of the functions but did not update the Manual.
Another strange thing is that I looked at the NEWS of glibc: http://sources.redhat.com/cgi-bin/cvsweb.cgi/libc/NEWS?rev=1.159&content-type=text/x-cvsweb-markup&cvsroot=glibc It does not even mention these affinity functions. Maybe they think this is not significant enough?
>>>>>>>>>
Per the above bullet, glibc 2.3.2 does not have the CPU_* macros because it does not define the cpu_set_t type.
It is definitely a bad idea to copy the CPU_* macros into your program.
>>>>>>>>>
So it is likely that the distro vendor put the CPU_* macros there in . But anyway it is in my /usr/include and I should be able to use it. I even include “-I/usr/include” in my gcc command and the compiler still complains that those macros are not declared. I just want to know why.

> The distro vendor may have changed the glibc implementation (does that violate the GNU license?),

No, that goes not violate the GPL. The GPL is all about distributing source and letting people modify it.

> but they cannot change the glibc manual.

They cannot change the glibc manual that is hosed on other web servers, correct.

> How come the prototypes of the functions in the manual are different from the actual glibc code (from the ftp site)?

I have no idea.

> Perhaps one possible explanation is that the GNU developers changed the implementation of the functions but did not update the Manual.

Did a copy of the glibc manual come in your distribution? You might want to look in the local copy. But they could have forgotten to update that as well. [shrug]

> Another strange thing is that I looked at the NEWS of glibc: http://sources.redhat.com/cgi-bin/cvsweb.cgi/libc/NEWS?rev=1.159&content-type=text/x-cvsweb-markup&cvsroot=glibc It does not even mention these affinity functions. Maybe they think this is not significant enough?

I have no idea.

> So it is likely that the distro vendor put the CPU_* macros there in sched.h. But anyway it is in my /usr/include and I should be able to use it. I even include “-I/usr/include” in my gcc command and the compiler still complains that those macros are not declared. I just want to know why.

It is not a good idea to put -I/usr/include in your compile command line because /usr/include is pretty much guaranteed to already be in the compiler’s header file search path. If you put it in yourself, you may disturb the ordering.

If the preprocessor cannot find those macros, then there is something else in sched.h preventing those macros from being defined — perhaps a conditional of some form…?

Regardless, I strongly encourage you to use PLPA ( http://www.open-mpi.org/software/plpa/ ) — it was designed to solve exactly these problems and provide a consistent, reliable processor affinity API in Linux.

M. Martin:

RHEL also has this problem. missing macros.

About

This page contains a single entry from the blog posted on October 18, 2005 9:52 AM.

The previous post in this blog was Lotsa quickies.

The next post in this blog is Linux as a desktop... err... "needs a lot of work".

Many more can be found on the main index page or by looking through the archives.

Powered by
Movable Type 3.34