Two important fileserver fixes are available for OpenAFS 1.4.11, both of which address intermittent fileserver crashes. Source code patches are available in the OpenAFS git source code repository and are in the pipeline for the next release of OpenAFS.
The first patch fixes an error in the handling of multi-homed client hosts. An OpenAFS client host may have multiple interfaces, and hence multiple IP addresses. The fileserver attempts to associate these IP address to the host in memory. This multi-home tracking has been improved in recent releases of OpenAFS, however a subtle error was introduced around OpenAFS 1.4.8. When the last address associated with a host is removed, the callback connection for that host was also removed. In some cases that connection object was still in use by other threads, and the premature removal of the connection object will lead to a server crash when the fileserver attempts to access a null pointer.
The second fix is for an insidious and long standing bug in the host package of the fileserver. Several cases were found where the fileserver could be using a host object that had been freed. This bug could manifest in a number of terrible ways. Sometimes this bug lead to a situation where the internal list of client hosts was corrupted, in which case the fileserver could crash or even hang as it was trying to traverse a linked list that looped on itself. In other cases, the fileserver heap could be corrupted and the fileserver would crash when calling malloc, or the filerserver would crash when attempting to free an object which was already freed.
The fixes are available in the OpenAFS git repository, and are mirrored on bm1vsrv05.sinenomine.net,
- viced-null-callback-rxcon-20091022 eliminates the premature removal of the connection object
- viced-avoid-using-released-hosts-20091102 fixes the host package bug where the host list could be corrupted