On Wed, Apr 08, 2020 at 11:36:04AM +0100, fdmanana@xxxxxxxxxx wrote:
> From: Filipe Manana <fdmanana@xxxxxxxx>
>
> While running generic/457 I've had fsx taking a lot of CPU time and not
> making any progress for over an hour. Attaching gdb to the fsx process
> revealed that fsx was in the loop that generates the ranges for a clone
> operation, in particular the loop seemed to never end because the range
> defined by 'offset2' kept overlapping with the range defined by 'offset'.
> So far this happened two times in one of my test VMs with generic/457.
>
> Fix this by breaking out of the loop after trying 30 times, like we
> currently do for dedupe operations, which results in logging the operation
> as skipped.
>
> Signed-off-by: Filipe Manana <fdmanana@xxxxxxxx>
> ---
Reviewed-by: Brian Foster <bfoster@xxxxxxxxxx>
> ltp/fsx.c | 28 ++++++++++++++++++----------
> 1 file changed, 18 insertions(+), 10 deletions(-)
>
> diff --git a/ltp/fsx.c b/ltp/fsx.c
> index fa383c94..5949ebf0 100644
> --- a/ltp/fsx.c
> +++ b/ltp/fsx.c
> @@ -2004,16 +2004,24 @@ test(void)
> keep_size = random() % 2;
> break;
> case OP_CLONE_RANGE:
> - TRIM_OFF_LEN(offset, size, file_size);
> - offset = offset & ~(block_size - 1);
> - size = size & ~(block_size - 1);
> - do {
> - offset2 = random();
> - TRIM_OFF(offset2, maxfilelen);
> - offset2 = offset2 & ~(block_size - 1);
> - } while (range_overlaps(offset, offset2, size) ||
> - offset2 + size > maxfilelen);
> - break;
> + {
> + int tries = 0;
> +
> + TRIM_OFF_LEN(offset, size, file_size);
> + offset = offset & ~(block_size - 1);
> + size = size & ~(block_size - 1);
> + do {
> + if (tries++ >= 30) {
> + size = 0;
> + break;
> + }
> + offset2 = random();
> + TRIM_OFF(offset2, maxfilelen);
> + offset2 = offset2 & ~(block_size - 1);
> + } while (range_overlaps(offset, offset2, size) ||
> + offset2 + size > maxfilelen);
> + break;
> + }
> case OP_DEDUPE_RANGE:
> {
> int tries = 0;
> --
> 2.11.0
>