Re: [PATCH] Btrfs: avoid allocating too many data chunks on massive concurrent write

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Filipe Manana 於 2019-05-03 18:53 寫到:
On Fri, Apr 26, 2019 at 12:10 PM robbieko <robbieko@xxxxxxxxxxxx> wrote:

From: Robbie Ko <robbieko@xxxxxxxxxxxx>

I found a issue when btrfs allocates much more space than it actual needed on massive concurrent writes. That may consume all free space and when it
need to allocate more space for metadata that result in ENOSPC.

I did a test that issue by 5000 dd to do write stress concurrently.

The space info after ENOSPC:
Overall:
Device size: 926.91GiB
Device allocated: 926.91GiB
Device unallocated: 1.00MiB
Device missing: 0.00B
Used: 211.59GiB
Free (estimated): 714.18GiB (min: 714.18GiB)
Data ratio: 1.00
Metadata ratio: 2.00
Global reserve: 512.00MiB (used: 0.00B)

Data,single: Size:923.77GiB, Used:209.59GiB
/dev/devname 923.77GiB

Metadata,DUP: Size:1.50GiB, Used:1022.50MiB
/dev/devname 3.00GiB

System,DUP: Size:72.00MiB, Used:128.00KiB
/dev/devname 144.00MiB

So can you provide more details? What parameters you gave to the dd processes?

I tried to reproduce that like this:

for ((i = 0; i < 5000; i++)); do
  dd if=/dev/zero of=/mnt/sdi/dd$i bs=2M oflag=dsync &
  dd_pids[$i]=$!
done

wait ${dd_pids[@]}

But after it finished, the used data space was about the same as the
allocated data space (it was on a 200G fs).
So I didn't observe the problem you mention, even though at first
glance the patch looks ok.

Thanks.


I use the following script :

#!/bin/sh
for (( i=1; i<=1000000; i++ ))
do
	dd if=/dev/urandom of="/mnt/dir/sa${i}" bs=1k count=1024 status=none &
	remain=$(( $i % 5000))
	if [ $remain -eq 0 ]; then
		echo "i = ${i}"
		wait
	fi
done

The problem occurred after the script ran for one days.

Thanks.



We can see that the Metadata space (1022.50MiB + 512.00MiB) is almost full.
But Data allocated much more space (923.77GiB) than it actually needed
(209.59GiB).

When the data space is not enough, this 5000 dd process will call
do_chunk_alloc() to allocate more space.

In the while loop of do_chunk_alloc, the variable force will be changed to CHUNK_ALLOC_FORCE in second run and should_alloc_chunk() will always return true when force is CHUNK_ALLOC_FORCE. That means every concurrent
dd process will allocate a chunk in do_chunk_alloc().

Fix this by keeping original value of force and re-assign it to force in
the end of the loop.

Signed-off-by: Robbie Ko <robbieko@xxxxxxxxxxxx>
---
 fs/btrfs/extent-tree.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/fs/btrfs/extent-tree.c b/fs/btrfs/extent-tree.c
index c5880329ae37..73856f70db31 100644
--- a/fs/btrfs/extent-tree.c
+++ b/fs/btrfs/extent-tree.c
@@ -4511,6 +4511,7 @@ static int do_chunk_alloc(struct btrfs_trans_handle *trans, u64 flags,
        bool wait_for_alloc = false;
        bool should_alloc = false;
        int ret = 0;
+       int orig_force = force;

        /* Don't re-enter if we're already allocating a chunk */
        if (trans->allocating_chunk)
@@ -4544,6 +4545,7 @@ static int do_chunk_alloc(struct btrfs_trans_handle *trans, u64 flags,
                         */
                        wait_for_alloc = true;
                        spin_unlock(&space_info->lock);
+                       force = orig_force;
                        mutex_lock(&fs_info->chunk_mutex);
                        mutex_unlock(&fs_info->chunk_mutex);
                } else {
--
2.17.1





[Index of Archives]     [Linux Filesystem Development]     [Linux NFS]     [Linux NILFS]     [Linux USB Devel]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]

  Powered by Linux