Bugzilla – Bug 714510
DRBD lose connection complaining about BAD BarrierAck
Last modified: 2011-10-07 22:11:51 UTC
Created attachment 448005 [details] Log of primary and secondary server. User-Agent: Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/535.1 (KHTML, like Gecko) Chrome/13.0.782.215 Safari/535.1 Primary server: openSUSE 11.3 Linux version 2.6.34.7-0.7-xen (geeko@buildhost) (gcc version 4.5.0 20100604 [gcc-4_5-branch revision 160292] (SUSE Linux) ) #1 SMP 2010-12-13 11:13:53 +0100 Secondary server: openSUSE 11.3 upgraded to 11.4 via "zypper dup" Linux version 2.6.37.6-0.7-xen (geeko@buildhost) (gcc version 4.5.1 20101208 [gcc-4_5-branch revision 167585] (SUSE Linux) ) #1 SMP 2011-07-21 02:17:24 +0200 During normal functioning the connection is dropped and then reestabilished. You can see in log Aug 26 22:04:54 primaryserver kernel: [121494.328079] block drbd0: BAD! BarrierAck #1092185342 received, expected #1092185341! During reconnection (1-2 seconds) system freezes because missing disk. The problem happens completely random: now, after one minute, then after two hours, the ten minutes and so on. I use a primary/secondary configuration: on reconnection primary updates secondary. In a primary/primary I think this will cause a split brain. In attach log of primary and secondary server. Reproducible: Always Actual Results: DRBD complaining about lost barrierack and dropping network connection. Expected Results: No errors, no lost connection. I've seen something similar in http://www.mail-archive.com/drbd-user@lists.linbit.com/msg02980.html
I've compiled and installed, on the secondary node, the 8.3.10 module from http://oss.linbit.com/drbd/ , and matching userland tools from http://download.opensuse.org/repositories/Base:/System/openSUSE_11.4/ At the moment I see no more disconnections.
That's good, but we can't accept the out-of-tree drbd code at this time to the 11.4 kernel, sorry. The drbd developers are pushing that code into the main kernel tree, so hopefully in time for 12.1 this will be resolved that way.
Just as a clarification, the working DRBD code is already in the upstream version.