Filipe Manana 於 2019-05-03 18:53 寫到:
On Fri, Apr 26, 2019 at 12:10 PM robbieko <robbieko@xxxxxxxxxxxx>
wrote:
From: Robbie Ko <robbieko@xxxxxxxxxxxx>
I found a issue when btrfs allocates much more space than it actual
needed
on massive concurrent writes. That may consume all free space and when
it
need to allocate more space for metadata that result in ENOSPC.
I did a test that issue by 5000 dd to do write stress concurrently.
The space info after ENOSPC:
Overall:
Device size: 926.91GiB
Device allocated: 926.91GiB
Device unallocated: 1.00MiB
Device missing: 0.00B
Used: 211.59GiB
Free (estimated): 714.18GiB (min: 714.18GiB)
Data ratio: 1.00
Metadata ratio: 2.00
Global reserve: 512.00MiB (used: 0.00B)
Data,single: Size:923.77GiB, Used:209.59GiB
/dev/devname 923.77GiB
Metadata,DUP: Size:1.50GiB, Used:1022.50MiB
/dev/devname 3.00GiB
System,DUP: Size:72.00MiB, Used:128.00KiB
/dev/devname 144.00MiB
So can you provide more details? What parameters you gave to the dd
processes?
I tried to reproduce that like this:
for ((i = 0; i < 5000; i++)); do
dd if=/dev/zero of=/mnt/sdi/dd$i bs=2M oflag=dsync &
dd_pids[$i]=$!
done
wait ${dd_pids[@]}
But after it finished, the used data space was about the same as the
allocated data space (it was on a 200G fs).
So I didn't observe the problem you mention, even though at first
glance the patch looks ok.
Thanks.
I use the following script :
#!/bin/sh
for (( i=1; i<=1000000; i++ ))
do
dd if=/dev/urandom of="/mnt/dir/sa${i}" bs=1k count=1024 status=none &
remain=$(( $i % 5000))
if [ $remain -eq 0 ]; then
echo "i = ${i}"
wait
fi
done
The problem occurred after the script ran for one days.
Thanks.
We can see that the Metadata space (1022.50MiB + 512.00MiB) is almost
full.
But Data allocated much more space (923.77GiB) than it actually needed
(209.59GiB).
When the data space is not enough, this 5000 dd process will call
do_chunk_alloc() to allocate more space.
In the while loop of do_chunk_alloc, the variable force will be
changed
to CHUNK_ALLOC_FORCE in second run and should_alloc_chunk() will
always
return true when force is CHUNK_ALLOC_FORCE. That means every
concurrent
dd process will allocate a chunk in do_chunk_alloc().
Fix this by keeping original value of force and re-assign it to force
in
the end of the loop.
Signed-off-by: Robbie Ko <robbieko@xxxxxxxxxxxx>
---
fs/btrfs/extent-tree.c | 2 ++
1 file changed, 2 insertions(+)
diff --git a/fs/btrfs/extent-tree.c b/fs/btrfs/extent-tree.c
index c5880329ae37..73856f70db31 100644
--- a/fs/btrfs/extent-tree.c
+++ b/fs/btrfs/extent-tree.c
@@ -4511,6 +4511,7 @@ static int do_chunk_alloc(struct
btrfs_trans_handle *trans, u64 flags,
bool wait_for_alloc = false;
bool should_alloc = false;
int ret = 0;
+ int orig_force = force;
/* Don't re-enter if we're already allocating a chunk */
if (trans->allocating_chunk)
@@ -4544,6 +4545,7 @@ static int do_chunk_alloc(struct
btrfs_trans_handle *trans, u64 flags,
*/
wait_for_alloc = true;
spin_unlock(&space_info->lock);
+ force = orig_force;
mutex_lock(&fs_info->chunk_mutex);
mutex_unlock(&fs_info->chunk_mutex);
} else {
--
2.17.1