[Ocfs2-devel] [PATCH 0/4] re-enable non-clustered mount & add MMP support
Heming Zhao
heming.zhao at suse.com
Sat Jul 30 01:14:07 UTC 2022
This serial patches re-enable ocfs2 non-clustered mount feature.
the previous patch c80af0c250c8 (Revert "ocfs2: mount shared volume
without ha stack") revert Gang's non-clustered mount patch. This
serial patches re-enable ocfs2 non-clustered mount.
the key different between local mount and non-clustered mount:
local mount feature (tunefs.ocfs2 --fs-features=[no]local) can't do
convert job without ha stack. non-clustered mount feature can run
totally without ha stack.
For avoiding data corruption when non-clustered & clustered mount are
happening at same time, this serial patches also introduces MMP
feature. MMP (Multiple Mount Protection) idea got from ext4 MMP
(fs/ext4/mmp.c) which protects fs from being mounted more than once.
For ocfs2 is a clustered fs and also for compatible with existing
slotmap feature, I did some optimization and modification when
porting from ext4 MMP to ocfs2.
The related userspace code for supporting MMP had been sent to github
for reviewing:
- https://github.com/markfasheh/ocfs2-tools/pull/58
ocfs2-tools enable MMP and check status:
```
# enable MMP
tunefs.ocfs2 --fs-feature=mmp /dev/vdb
# check the command result
tunefs.ocfs2 -Q "%H\n" /dev/vdb | grep MMP
# active MMP on nocluster mount
mount -t ocfs2 -o nocluster /dev/vdb /mnt
# check slotmap info
# echo slotmap | PAGER=cat debugfs.ocfs2 /dev/vdb
```
=== below are test cases for patches ====
<1> non-clustered mount vs local mount
1.1 tunefs.ocfs2 can't convert local/nolocal mount without ha stack.
```
(on ha stack env)
mkfs.ocfs2 --cluster-stack=pcmk --cluster-name=hacluster -N 4 /dev/vdb
tunefs.ocfs2 --fs-features=local /dev/vdb (<== success)
tunefs.ocfs2 --fs-features=nolocal /dev/vdb (<== success)
(on another node without ha stack)
tunefs.ocfs2 --fs-features=local /dev/vdb (<== failure)
```
1.2 non-cluster feature can run without ha stack.
```
(on ha stack env)
mkfs.ocfs2 --cluster-stack=pcmk --cluster-name=hacluster -N 4 /dev/vdb
(on another node without ha stack)
mount -t ocfs2 -o nocluster /dev/vdb /mnt (<== success)
```
<2> do clustered & non-clustered mount on same node
2.1 non-clustered mount => clustered mount
```
mkfs.ocfs2 --cluster-stack=pcmk --cluster-name=hacluster -N 4 /dev/vdb
mount -t ocfs2 -o nocluster /dev/vdb /mnt
mount -t ocfs2 /dev/vdb /mnt (<=== failure)
```
2.2 clustered mount => non-clustered mount
```
mkfs.ocfs2 --cluster-stack=pcmk --cluster-name=hacluster -N 4 /dev/vdb
mount -t ocfs2 /dev/vdb /mnt
mount -t ocfs2 -o nocluster /dev/vdb /mnt (<=== failure)
```
<3> one node does clustered mount, another does non-clustered mount
test rule: clustered mount and non-clustered mount can not exist at same
time.
3.1 clustered mount @node1 => [no]clustered mount @node2
```
node1:
mkfs.ocfs2 --cluster-stack=pcmk --cluster-name=hacluster -N 4 /dev/vdb
mount -t ocfs2 /dev/vdb /mnt
node2:
mount -t ocfs2 /dev/vdb /mnt (<== success)
umount /mnt
mount -t ocfs2 -o nocluster /dev/vdb /mnt (<== failure)
```
3.2 enable mmp, repeate 3.1 case
```
node1:
mkfs.ocfs2 --cluster-stack=pcmk --cluster-name=hacluster -N 4 /dev/vdb
tunefs.ocfs2 --fs-features=mmp /dev/vdb (<== enable mmp)
mount -t ocfs2 /dev/vdb /mnt
node2:
mount -t ocfs2 /dev/vdb /mnt (<== wait ~22s [*] for mmp,
then success)
umount /mnt
mount -t ocfs2 -o nocluster /dev/vdb /mnt (<== failure)
```
[*] 22s:
(OCFS2_MMP_MIN_CHECK_INTERVAL * 2 + 1) * 2 times (calling
schedule_timeout_interruptible)
3.3 noclustered mount @node1 => [no]clustered mount @node2
```
node1:
mkfs.ocfs2 --cluster-stack=pcmk --cluster-name=hacluster -N 4 /dev/vdb
mount -t ocfs2 -o nocluster /dev/vdb /mnt
node2:
mount -t ocfs2 /dev/vdb /mnt (<== failure)
mount -t ocfs2 -o nocluster /dev/vdb /mnt (<== success, without mmp
enable)
umount /mnt (<== will ZERO out slotmap area while node1
still mounting)
```
3.4 enable mmp, repeate 3.3 case.
```
node1:
mkfs.ocfs2 --cluster-stack=pcmk --cluster-name=hacluster -N 4 /dev/vdb
tunefs.ocfs2 --fs-features=mmp /dev/vdb (<== enable mmp)
mount -t ocfs2 -o nocluster /dev/vdb /mnt
node2:
mount -t ocfs2 /dev/vdb /mnt (<== failure)
mount -t ocfs2 -o nocluster /dev/vdb /mnt (<== failure, denied by mmp)
```
<4> simulate mounting after machine crash
info:
- below all steps do on one node
- address 287387648 is the '//slot_map' extent address.
- test the rule: If last mount didn't do unmount, (eg: crash), the next
mount MUST be same mount type.
4.0 how to calculate '//slot_map' extent address
```
# PAGER=cat debugfs.ocfs2 -R "stats" /dev/vdb | grep "Block Size Bits"
Block Size Bits: 12 Cluster Size Bits: 12
# PAGER=cat debugfs.ocfs2 -R "stat //slot_map" /dev/vdb | grep -A1
# "Block#"
## Offset Clusters Block# Flags
0 0 1 70163 0x0
```
70163 * (1<<12) = 70163 * 4096 = 287387648
4.1 clustered mount => crash => non-clustered mount fails => clean
slotmap => non-clustered mount succeeds
```
mkfs.ocfs2 --cluster-stack=pcmk --cluster-name=hacluster -N 4 /dev/vdb
mount -t ocfs2 /dev/vdb /mnt
dd if=/dev/vdb bs=1 count=32 skip=287387648
of=/root/slotmap.cluster.mnted (<== backup slot info)
umount /mnt
dd if=/root/slotmap.cluster.mnted of=/dev/vdb seek=287387648 bs=1
count=32 (<== overwrite)
mount -t ocfs2 -o nocluster /dev/vdb /mnt <== failure
mount -t ocfs2 /dev/vdb /mnt && umount /mnt <== clean slot 0
mount -t ocfs2 -o nocluster /dev/vdb /mnt <== success
```
4.2 non-clustered mount => crash => clustered mount fails => clean
slotmap => clustered mount succeeds
```
mkfs.ocfs2 --cluster-stack=pcmk --cluster-name=hacluster -N 4 /dev/vdb
mount -t ocfs2 -o nocluster /dev/vdb /mnt
dd if=/dev/vdb bs=1 count=32 skip=287387648
of=/root/slotmap.nocluster.mnted
umount /mnt
dd if=/root/slotmap.nocluster.mnted of=/dev/vdb seek=287387648 bs=1
count=32
mount -t ocfs2 /dev/vdb /mnt <== failure
mount -t ocfs2 -o nocluster /dev/vdb /mnt && umount /mnt <== clean slot
0
mount -t ocfs2 /dev/vdb /mnt <== success
```
<5> MMP test
5.1 node1 noclustered mount => node 2 noclustered mount
disable mmp
```
node1:
mkfs.ocfs2 --cluster-stack=pcmk --cluster-name=hacluster -N 4 /dev/vdb
mount -t ocfs2 -o nocluster /dev/vdb /mnt
node2:
mount -t ocfs2 -o nocluster /dev/vdb /mnt (<== success)
```
enable mmp
```
node1:
mkfs.ocfs2 --cluster-stack=pcmk --cluster-name=hacluster -N 4 /dev/vdb
tunefs.ocfs2 --fs-features=mmp /dev/vdb
mount -t ocfs2 -o nocluster /dev/vdb /mnt
node2:
mount -t ocfs2 -o nocluster /dev/vdb /mnt (<== wait ~12s[*], failure by
mmp)
```
[*] 12s:
sleep (OCFS2_MMP_MIN_CHECK_INTERVAL * 2 + 1) then detect mmp_seq was
changed, then failed.
5.2 node1 clustered mount => node 2 clustered mount
see case 3.2
5.3 node1 noclustered mount => node 2 noclustered mount
see case 3.4
5.4 remount test
5.4.1 non-clustered mount (run commands on same node)
```
mkfs.ocfs2 --cluster-stack=pcmk --cluster-name=hacluster -N 4 /dev/vdb
tunefs.ocfs2 --fs-features=mmp /dev/vdb
mount -t ocfs2 -o nocluster /dev/vdb /mnt
ps axj | grep kmmpd (<== will show kmmpd)
PAGER=cat debugfs.ocfs2 -R "slotmap" /dev/vdb (<== show
'OCFS2_MMP_SEQ')
mount -o remount,ro,nocluster /dev/vdb /mnt (<== kmmpd will stop)
ps axj | grep kmmpd (<== won't show kmmpd)
PAGER=cat debugfs.ocfs2 -R "slotmap" /dev/vdb (<== show
'OCFS2_MMP_SEQ_CLEAN')
mount -o remount,rw,nocluster /dev/vdb /mnt (<== kmmpd will start)
ps axj | grep kmmpd (<== will show kmmpd)
PAGER=cat debugfs.ocfs2 -R "slotmap" /dev/vdb (<== show
'OCFS2_MMP_SEQ')
```
5.4.2 clustered mount
```
mkfs.ocfs2 --cluster-stack=pcmk --cluster-name=hacluster -N 4 /dev/vdb
tunefs.ocfs2 --fs-features=mmp /dev/vdb
mount -t ocfs2 /dev/vdb /mnt (<== clustered mount
won't create kmmpd)
PAGER=cat debugfs.ocfs2 -R "slotmap" /dev/vdb (<== show
'OCFS2_VALID_CLUSTER')
mount -o remount,ro /dev/vdb /mnt
PAGER=cat debugfs.ocfs2 -R "slotmap" /dev/vdb (<== show
'OCFS2_VALID_CLUSTER')
mount -o remount,rw /dev/vdb /mnt (<== wait for ~22s by mmp
start)
PAGER=cat debugfs.ocfs2 -R "slotmap" /dev/vdb (<== show
'OCFS2_VALID_CLUSTER')
```
Heming Zhao (4):
ocfs2: Fix freeing uninitialized resource on ocfs2_dlm_shutdown
ocfs2: add mlog ML_WARNING support
re-enable "ocfs2: mount shared volume without ha stack"
ocfs2: introduce ext4 MMP feature
fs/ocfs2/cluster/masklog.c | 3 +
fs/ocfs2/cluster/masklog.h | 9 +-
fs/ocfs2/dlmglue.c | 3 +
fs/ocfs2/ocfs2.h | 6 +-
fs/ocfs2/ocfs2_fs.h | 13 +-
fs/ocfs2/slot_map.c | 479 +++++++++++++++++++++++++++++++++++--
fs/ocfs2/slot_map.h | 3 +
fs/ocfs2/super.c | 42 +++-
8 files changed, 527 insertions(+), 31 deletions(-)
--
2.37.1
More information about the Ocfs2-devel
mailing list