[Ocfs2-devel] [PATCH 2/2] ocfs2: fix for local alloc window restore unconditionally
Heming Zhao
heming.zhao at suse.com
Sat May 21 10:14:16 UTC 2022
When la state is ENABLE, ocfs2_recalc_la_window restores la window
unconditionally. The logic is wrong.
Let's image below path.
1. la state (->local_alloc_state) is set THROTTLED or DISABLED.
2. About 30s (OCFS2_LA_ENABLE_INTERVAL), delayed work is triggered,
ocfs2_la_enable_worker set la state to ENABLED directly.
3. a write IOs thread run:
```
ocfs2_write_begin
...
ocfs2_lock_allocators
ocfs2_reserve_clusters
ocfs2_reserve_clusters_with_limit
ocfs2_reserve_local_alloc_bits
ocfs2_local_alloc_slide_window // [1]
+ ocfs2_recalc_la_window(osb, OCFS2_LA_EVENT_SLIDE) // [2]
+ ...
+ ocfs2_local_alloc_new_window
ocfs2_claim_clusters // [3]
```
[1]: will be called when la window bits used up.
[2]: under la state is ENABLED (eg OCFS2_LA_ENABLE_INTERVAL delayed work
happened), it unconditionally restores la window to default value.
[3]: will use default la window size to search clusters. IMO the timing
is O(n^4). The timing O(n^4) will cost huge time to scan global
bitmap. It makes write IOs (eg user space 'dd') become dramatically
slow.
i.e.
an ocfs2 partition size: 1.45TB, cluster size: 4KB,
la window default size: 106MB.
The partition is fragmentation by creating & deleting huge mount of
small file.
the timing should be (the number got from real world):
- la window size change order (size: MB):
106, 53, 26.5, 13, 6.5, 3.25, 1.6, 0.8
only 0.8MB succeed, 0.8MB also triggers la window to disable.
ocfs2_local_alloc_new_window retries 8 times, first 7 times totally
runs in worst case.
- group chain number: 242
ocfs2_claim_suballoc_bits calls for-loop 242 times
- each chain has 49 block group
ocfs2_search_chain calls while-loop 49 times
- each bg has 32256 blocks
ocfs2_block_group_find_clear_bits calls while-loop for 32256 bits.
for ocfs2_find_next_zero_bit uses ffz() to find zero bit, let's use
(32256/64) for timing calucation.
So the loop times: 7*242*49*(32256/64) = 41835024 (~42 million times)
In the worst case, user space writes 100MB data will trigger 42M scanning
times, and if the write can't finish within 30s (OCFS2_LA_ENABLE_INTERVAL),
the write IO will suffer another 42M scanning times. It makes the ocfs2
partition keep pool performance all the time.
The fix method:
1. la restores double la size once.
current code logic decrease la window with half size once, but directly
restores default_bits one time. It bounces the la window between '<1M'
and default_bits. This patch makes restoring process more smoothly.
eg.
la default window is 106MB, current la window is 13MB.
when there is a free action to release one block group space, la should
roll back la size to 26MB (by 13*2).
if there are many free actions to release many block group space, la
will smoothly roll back to default window (106MB).
2. introduced a new state: OCFS2_LA_RESTORE.
Current code uses OCFS2_LA_ENABLED to mark a new big space available.
the state overwrite OCFS2_LA_THROTTLED, it makes la window forget
it's already in throttled status.
'->local_alloc_state' should keep OCFS2_LA_THROTTLED until la window
restore to default_bits.
Signed-off-by: Heming Zhao <heming.zhao at suse.com>
---
fs/ocfs2/localalloc.c | 30 +++++++++++++++++++++---------
fs/ocfs2/ocfs2.h | 18 +++++++++++-------
fs/ocfs2/suballoc.c | 2 +-
3 files changed, 33 insertions(+), 17 deletions(-)
diff --git a/fs/ocfs2/localalloc.c b/fs/ocfs2/localalloc.c
index c4426d12a2ad..28acea717d7f 100644
--- a/fs/ocfs2/localalloc.c
+++ b/fs/ocfs2/localalloc.c
@@ -205,20 +205,21 @@ void ocfs2_la_set_sizes(struct ocfs2_super *osb, int requested_mb)
static inline int ocfs2_la_state_enabled(struct ocfs2_super *osb)
{
- return (osb->local_alloc_state == OCFS2_LA_THROTTLED ||
- osb->local_alloc_state == OCFS2_LA_ENABLED);
+ return osb->local_alloc_state & OCFS2_LA_ACTIVE;
}
void ocfs2_local_alloc_seen_free_bits(struct ocfs2_super *osb,
unsigned int num_clusters)
{
spin_lock(&osb->osb_lock);
- if (osb->local_alloc_state == OCFS2_LA_DISABLED ||
- osb->local_alloc_state == OCFS2_LA_THROTTLED)
+ if (osb->local_alloc_state & (OCFS2_LA_DISABLED |
+ OCFS2_LA_THROTTLED | OCFS2_LA_RESTORE)) {
if (num_clusters >= osb->local_alloc_default_bits) {
cancel_delayed_work(&osb->la_enable_wq);
- osb->local_alloc_state = OCFS2_LA_ENABLED;
+ osb->local_alloc_state &= ~OCFS2_LA_DISABLED;
+ osb->local_alloc_state |= OCFS2_LA_RESTORE;
}
+ }
spin_unlock(&osb->osb_lock);
}
@@ -228,7 +229,10 @@ void ocfs2_la_enable_worker(struct work_struct *work)
container_of(work, struct ocfs2_super,
la_enable_wq.work);
spin_lock(&osb->osb_lock);
- osb->local_alloc_state = OCFS2_LA_ENABLED;
+ if (osb->local_alloc_state & OCFS2_LA_DISABLED) {
+ osb->local_alloc_state &= ~OCFS2_LA_DISABLED;
+ osb->local_alloc_state |= OCFS2_LA_ENABLED;
+ }
spin_unlock(&osb->osb_lock);
}
@@ -1067,7 +1071,7 @@ static int ocfs2_recalc_la_window(struct ocfs2_super *osb,
* reason to assume the bitmap situation might
* have changed.
*/
- osb->local_alloc_state = OCFS2_LA_THROTTLED;
+ osb->local_alloc_state |= OCFS2_LA_THROTTLED;
osb->local_alloc_bits = bits;
} else {
osb->local_alloc_state = OCFS2_LA_DISABLED;
@@ -1083,8 +1087,16 @@ static int ocfs2_recalc_la_window(struct ocfs2_super *osb,
* risk bouncing around the global bitmap during periods of
* low space.
*/
- if (osb->local_alloc_state != OCFS2_LA_THROTTLED)
- osb->local_alloc_bits = osb->local_alloc_default_bits;
+ if (osb->local_alloc_state & OCFS2_LA_RESTORE) {
+ bits = osb->local_alloc_bits * 2;
+ if (bits > osb->local_alloc_default_bits) {
+ osb->local_alloc_bits = osb->local_alloc_default_bits;
+ osb->local_alloc_state = OCFS2_LA_ENABLED;
+ } else {
+ /* keep RESTORE state & set new bits */
+ osb->local_alloc_bits = bits;
+ }
+ }
out_unlock:
state = osb->local_alloc_state;
diff --git a/fs/ocfs2/ocfs2.h b/fs/ocfs2/ocfs2.h
index 337527571461..1764077e3229 100644
--- a/fs/ocfs2/ocfs2.h
+++ b/fs/ocfs2/ocfs2.h
@@ -245,14 +245,18 @@ struct ocfs2_alloc_stats
enum ocfs2_local_alloc_state
{
- OCFS2_LA_UNUSED = 0, /* Local alloc will never be used for
- * this mountpoint. */
- OCFS2_LA_ENABLED, /* Local alloc is in use. */
- OCFS2_LA_THROTTLED, /* Local alloc is in use, but number
- * of bits has been reduced. */
- OCFS2_LA_DISABLED /* Local alloc has temporarily been
- * disabled. */
+ /* Local alloc will never be used for this mountpoint. */
+ OCFS2_LA_UNUSED = 1 << 0,
+ /* Local alloc is in use. */
+ OCFS2_LA_ENABLED = 1 << 1,
+ /* Local alloc is in use, but number of bits has been reduced. */
+ OCFS2_LA_THROTTLED = 1 << 2,
+ /* In throttle state, Local alloc meets contig big space. */
+ OCFS2_LA_RESTORE = 1 << 3,
+ /* Local alloc has temporarily been disabled. */
+ OCFS2_LA_DISABLED = 1 << 4,
};
+#define OCFS2_LA_ACTIVE (OCFS2_LA_ENABLED | OCFS2_LA_THROTTLED | OCFS2_LA_RESTORE)
enum ocfs2_mount_options
{
diff --git a/fs/ocfs2/suballoc.c b/fs/ocfs2/suballoc.c
index 166c8918c825..b0df1ab2d6dd 100644
--- a/fs/ocfs2/suballoc.c
+++ b/fs/ocfs2/suballoc.c
@@ -1530,7 +1530,7 @@ static int ocfs2_cluster_group_search(struct inode *inode,
* of bits. */
if (min_bits <= res->sr_bits)
search = 0; /* success */
- else if (res->sr_bits) {
+ if (res->sr_bits) {
/*
* Don't show bits which we'll be returning
* for allocation to the local alloc bitmap.
--
2.34.1
More information about the Ocfs2-devel
mailing list