[Ocfs2-users] Node 8 doesn't mount / Wrong slot map assignment?
Michael Ulbrich
mul at rentapacs.de
Wed Sep 13 00:54:22 PDT 2017
Hi all,
we've a small (?) problem with a 2-node cluster on Debian 8:
Linux h1b 3.16.0-4-amd64 #1 SMP Debian 3.16.43-2+deb8u2 (2017-06-26)
x86_64 GNU/Linux
ocfs2-tools 1.6.4-3
Two ocfs2 filesystems (drbd0 600 GB w/ 8 slots and drbd1 6 TB w/ 6
slots) are created on top of drbd w/ 4k block and cluster size,
'max_features' enabled.
cluster.conf assigns sequential node numbers 1 - 8. Nodes 1, 2 are the
hypervisors. Nodes 3, 4, 5 are VMs on node 1. Nodes 6, 7, 8 the
corresponding VMs on node 2.
VMs all run Debian 8 as well:
Linux srv2 3.16.0-4-amd64 #1 SMP Debian 3.16.39-1 (2016-12-30) x86_64
GNU/Linux
When mounting drbd0 in order of increasing node numbers and concurrently
watching the 'hb' output from debugsfs.ocfs2 we get a clean slot map (?):
hb
node: node seq generation checksum
1: 1 0000000059b8d94a fa60f0d8423590d9 edec9643
2: 2 0000000059b8d94c aca059df4670f467 994e3458
3: 3 0000000059b8d949 f03dc9ba8f27582c d4473fc2
4: 4 0000000059b8d94b df5bbdb756e757f8 12a198eb
5: 5 0000000059b8d94a 1af81d94a7cb681b 91fba906
6: 6 0000000059b8d94b 104538f30cdb35fa 8713e798
7: 7 0000000059b8d94b 195658c9fb8ca7f9 5e54edf6
8: 8 0000000059b8d949 dc6bfb46b9cf1ac3 de7a8757
Device drbd1 in contrast yields the following table after mounting on
nodes 1, 2:
hb
node: node seq generation checksum
8: 1 0000000059b8d9ba 73a63eb550a33095 f4e074d1
16: 2 0000000059b8d9b9 5c7504c05637983e 07d696ec
Proceeding with the drbd1 mounts on nodes 3, 5, 6 leads us to:
hb
node: node seq generation checksum
3: 3 0000000059b8da3b 9443b4b209b16175 f2cc87ec
5: 5 0000000059b8da3c 4b742f709377466f 3ac41cf3
6: 6 0000000059b8da3b d96e2de0a55514f6 335a4d90
8: 1 0000000059b8da3c 73a63eb550a33095 2312c1c4
16: 2 0000000059b8da3d 5c7504c05637983e 659571a1
The problem arises when trying to mount node 8 since its slot is already
occupied by node 1:
kern.log node 1:
(o2hb-0AEE381A14,50990,4):o2hb_check_own_slot:582 ERROR: Another node is
heartbeating on device (drbd1): expected(1:0x18acf7b0b3e5544c,
0x59b8445c), ondisk(8:0xb91302db72a65364, 0x59b8445b)
kern.log node 8:
ocfs2: Mounting device (254,16) on (node 8, slot 7) with ordered data mode.
(o2hb-0AEE381A14,518,1):o2hb_check_own_slot:582 ERROR: Another node is
heartbeating on device (vdc): expected(8:0x18acf7b0b3e5544c,
0x59b8445c), ondisk(1:0x18acf7b0b3e5544c, 0x59b8445c)
This can be "fixed" by exchanging node numbers 1 and 8 in cluster.conf.
Then node 8 will be assigned slot 8, node 2 stays in slot 16, 3 to 7 as
expected. There is no node 16 configured so there's no conflict. But
since we experience some other so far not explainable instabilities with
this ocfs2 device / system during operation further down the road we
decided to take care of and try to fix this issue first.
Somehow the failure reminds of bit shift or masking problems:
1 << 3 = 8
2 << 3 = 16
But then again - what do I know ...
Tried so far:
A. Create offending file system with 8 slots instead of 6 -> same issue.
B. Set features to 'default' (disables feature 'extended-slotmap') ->
same issue.
We'd very much appreciate any comments on this. Has anything similar
ever been experienced before? Are we completely missing something
important here?
If there's a fix already out for this any pointers (src files / commits)
to where to look would be greatly appreciated.
Thanks in advance + Best regards ... Michael U.
More information about the Ocfs2-users
mailing list