Bug 714510 - DRBD lose connection complaining about BAD BarrierAck
Summary: DRBD lose connection complaining about BAD BarrierAck
Status: RESOLVED WONTFIX
Alias: None
Product: openSUSE 11.4
Classification: openSUSE
Component: Kernel (show other bugs)
Version: Final
Hardware: x86-64 openSUSE 11.4
: P3 - Medium : Critical with 5 votes (vote)
Target Milestone: ---
Assignee: E-mail List
QA Contact: E-mail List
URL:
Whiteboard:
Keywords:
Depends on:
Blocks:
 
Reported: 2011-08-26 21:14 UTC by Valerio Granato
Modified: 2011-10-07 22:11 UTC (History)
1 user (show)

See Also:
Found By: ---
Services Priority:
Business Priority:
Blocker: ---
Marketing QA Status: ---
IT Deployment: ---


Attachments
Log of primary and secondary server. (11.24 KB, text/plain)
2011-08-26 21:14 UTC, Valerio Granato
Details

Note You need to log in before you can comment on or make changes to this bug.
Description Valerio Granato 2011-08-26 21:14:41 UTC
Created attachment 448005 [details]
Log of primary and secondary server.

User-Agent:       Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/535.1 (KHTML, like Gecko) Chrome/13.0.782.215 Safari/535.1

Primary server: openSUSE 11.3 
Linux version 2.6.34.7-0.7-xen (geeko@buildhost) (gcc version 4.5.0 20100604 [gcc-4_5-branch revision 160292] (SUSE Linux) ) #1 SMP 2010-12-13 11:13:53 +0100

Secondary server: openSUSE 11.3 upgraded to 11.4 via "zypper dup"
Linux version 2.6.37.6-0.7-xen (geeko@buildhost) (gcc version 4.5.1 20101208 [gcc-4_5-branch revision 167585] (SUSE Linux) ) #1 SMP 2011-07-21 02:17:24 +0200

During normal functioning the connection is dropped and then reestabilished. You can see in log
Aug 26 22:04:54 primaryserver kernel: [121494.328079] block drbd0: BAD! BarrierAck #1092185342 received, expected #1092185341!

During reconnection (1-2 seconds) system freezes because missing disk.
The problem happens completely random: now, after one minute, then after two hours, the ten minutes and so on.

I use a primary/secondary configuration: on reconnection primary updates secondary.
In a primary/primary I think this will cause a split brain.


In attach log of primary and secondary server.


Reproducible: Always

Actual Results:  
DRBD complaining about lost barrierack and dropping network connection.

Expected Results:  
No errors, no lost connection.

I've seen something similar in
http://www.mail-archive.com/drbd-user@lists.linbit.com/msg02980.html
Comment 1 Valerio Granato 2011-09-01 10:44:56 UTC
I've compiled and installed, on the secondary node, the 8.3.10 module from http://oss.linbit.com/drbd/ , and matching userland tools from
http://download.opensuse.org/repositories/Base:/System/openSUSE_11.4/

At the moment I see no more disconnections.
Comment 2 Greg Kroah-Hartman 2011-09-01 23:22:51 UTC
That's good, but we can't accept the out-of-tree drbd code at this time to the 11.4 kernel, sorry.

The drbd developers are pushing that code into the main kernel tree, so hopefully in time for 12.1 this will be resolved that way.
Comment 3 Gavin Jones 2011-10-07 22:11:51 UTC
Just as a clarification, the working DRBD code is already in the upstream version.