Re: btrfs filesystem freeze

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



thanks, 

i tried it and ran my tests for some hours now - looks really good. no crashes, no freezes. 

anyway, some "minor" glitches remain.

i looked at some "ls -la /btrfs" output via "watch ls -la..", and by chance i saw this one for a moment.

dr-xr-xr-x  1 root root     126 Dec 22 22:19 snap96
dr-xr-xr-x  1 root root     126 Dec 22 22:19 snap97
dr-xr-xr-x  1 root root     126 Dec 22 22:19 snap98
dr-xr-xr-x  1 root root     110 Dec 22 22:19 snap99
-rw-r--r--  1 root root 1048576 Dec 22 22:49 test.dat
-?????????  ? ?    ?          ?            ? test.tmp
-rw-r--r--  1 root root 7020046 Dec 22 22:48 testfsx
-rw-r--r--  1 root root       0 Dec 22 22:23 testfsx.fsxgood
-rw-r--r--  1 root root       0 Dec 22 22:23 testfsx.fsxlog

that file test.tmp looks weird. 
it`s constantly created by copying test.dat forth and back (i.e cp test.dat test.tmp, md5sum test.tmp;rm test.dat;mv test.tmp test.dat  in a loop, to check file consistency)

should i worry here ?


furthermore, after several hours, i got this one (also once):

Tue Dec 23 00:40:33 CET 2008 59987747e2568fa81bb38603706eff07  test.tmp
Tue Dec 23 00:42:59 CET 2008 59987747e2568fa81bb38603706eff07  test.tmp
Tue Dec 23 00:43:17 CET 2008 59987747e2568fa81bb38603706eff07  test.tmp
Tue Dec 23 00:44:28 CET 2008 59987747e2568fa81bb38603706eff07  test.tmp
Tue Dec 23 00:49:43 CET 2008 59987747e2568fa81bb38603706eff07  test.tmp
Tue Dec 23 00:50:37 CET 2008 59987747e2568fa81bb38603706eff07  test.tmp
Tue Dec 23 00:50:57 CET 2008 59987747e2568fa81bb38603706eff07  test.tmp
Tue Dec 23 00:56:44 CET 2008 59987747e2568fa81bb38603706eff07  test.tmp
Tue Dec 23 00:59:33 CET 2008 59987747e2568fa81bb38603706eff07  test.tmp
Tue Dec 23 01:00:03 CET 2008 md5sum: test.tmp: Input/output error
cp: reading `test.dat': Input/output error
Tue Dec 23 01:01:44 CET 2008 ca6bbc6a8aa5ec080a2a10d727ecc563  test.tmp
Tue Dec 23 01:08:05 CET 2008 ca6bbc6a8aa5ec080a2a10d727ecc563  test.tmp

how can this happen? 
"cp test.dat test.tmp" is _always_ something which happens after "mv test.tmp test.dat", and there`s a "sync" in between

looks like a race condition !?
the mv command returned but btfs did not complete the file move and the next command does not yet see the moved file !?


i don`t have timing information for dmesg and cannot tell, if there is any relation with those glitches, but here is some messages in dmesg :

device fsid 7c4ee06dc0149bc8-44e376a69f9aa08b devid 1 transid 9 /dev/sdb1
btrfs: use compression
btrfs: unlinked 1 orphans
btrfs: unlinked 1 orphans
btrfs: unlinked 1 orphans
btrfs: unlinked 1 orphans
btrfs: unlinked 1 orphans
btrfs: unlinked 1 orphans
btrfs: unlinked 1 orphans
btrfs: unlinked 1 orphans
btrfs: unlinked 1 orphans
btrfs csum failed ino 39502 off 524288 csum 1261748817 private 1813441608
btrfs csum failed ino 39502 off 524288 csum 1261748817 private 1813441608
btrfs csum failed ino 39502 off 524288 csum 1261748817 private 1813441608
btrfs csum failed ino 39502 off 524288 csum 1261748817 private 1813441608
btrfs csum failed ino 39502 off 524288 csum 1261748817 private 1813441608
btrfs csum failed ino 39502 off 524288 csum 1261748817 private 1813441608
btrfs csum failed ino 39502 off 524288 csum 1261748817 private 1813441608
btrfs csum failed ino 39502 off 524288 csum 1261748817 private 1813441608
btrfs: unlinked 1 orphans
btrfs: unlinked 1 orphans

so, apparently also some checksum was wrong !?


ah, and some more (please forgive - this is all for making btfs better!)

i ran some posix regression test suite (http://www.ntfs-3g.org/pjd-fstest.html) , which also reported some problems. 

i`m not sure if this are false positives because the testsuite does not officially support btrfs. 

i`m posting it here for review.

linux-uqw0:/btrfs/pjd-fstest-20080917-RC # prove -r .
tests/chflags/00.....ok
tests/chflags/01.....ok
tests/chflags/02.....ok
tests/chflags/03.....ok
tests/chflags/04.....ok
tests/chflags/05.....ok
tests/chflags/06.....ok
tests/chflags/07.....ok
tests/chflags/08.....ok
tests/chflags/09.....ok
tests/chflags/10.....ok
tests/chflags/11.....ok
tests/chflags/12.....ok
tests/chflags/13.....ok
tests/chmod/00.......ok
tests/chmod/01.......ok
tests/chmod/02.......ok
tests/chmod/03.......ok
tests/chmod/04.......ok
tests/chmod/05.......ok
tests/chmod/06.......ok
tests/chmod/07.......ok
tests/chmod/08.......ok
tests/chmod/09.......ok
tests/chmod/10.......ok
tests/chmod/11.......ok
tests/chown/00.......ok
tests/chown/01.......ok
tests/chown/02.......ok
tests/chown/03.......ok
tests/chown/04.......ok
tests/chown/05.......ok
tests/chown/06.......ok
tests/chown/07.......ok
tests/chown/08.......ok
tests/chown/09.......ok
tests/chown/10.......ok
tests/link/00........FAILED tests 56, 63
        Failed 2/82 tests, 97.56% okay
tests/link/01........ok
tests/link/02........ok
tests/link/03........ok
tests/link/04........ok
tests/link/05........ok
tests/link/06........ok
tests/link/07........ok
tests/link/08........ok
tests/link/09........ok
tests/link/10........ok
tests/link/11........ok
tests/link/12........ok
tests/link/13........ok
tests/link/14........ok
tests/link/15........ok
tests/link/16........ok
tests/link/17........ok
tests/mkdir/00.......ok
tests/mkdir/01.......ok
tests/mkdir/02.......ok
tests/mkdir/03.......ok
tests/mkdir/04.......ok
tests/mkdir/05.......ok
tests/mkdir/06.......ok
tests/mkdir/07.......ok
tests/mkdir/08.......ok
tests/mkdir/09.......ok
tests/mkdir/10.......ok
tests/mkdir/11.......ok
tests/mkdir/12.......ok
tests/mkfifo/00......ok
tests/mkfifo/01......ok
tests/mkfifo/02......ok
tests/mkfifo/03......ok
tests/mkfifo/04......ok
tests/mkfifo/05......ok
tests/mkfifo/06......ok
tests/mkfifo/07......ok
tests/mkfifo/08......ok
tests/mkfifo/09......ok
tests/mkfifo/10......ok
tests/mkfifo/11......ok
tests/mkfifo/12......ok
tests/open/00........ok
tests/open/01........ok
tests/open/02........ok
tests/open/03........ok
tests/open/04........ok
tests/open/05........ok
tests/open/06........ok
tests/open/07........ok
tests/open/08........ok
tests/open/09........ok
tests/open/10........ok
tests/open/11........ok
tests/open/12........ok
tests/open/13........ok
tests/open/14........ok
tests/open/15........ok
tests/open/16........ok
tests/open/17........ok
tests/open/18........ok
tests/open/19........ok
tests/open/20........ok
tests/open/21........ok
tests/open/22........ok
tests/open/23........ok
tests/rename/00......ok
tests/rename/01......ok
tests/rename/02......ok
tests/rename/03......ok
tests/rename/04......ok
tests/rename/05......ok
tests/rename/06......ok
tests/rename/07......ok
tests/rename/08......ok
tests/rename/09......ok
tests/rename/10......ok
tests/rename/11......ok
tests/rename/12......ok
tests/rename/13......ok
tests/rename/14......ok
tests/rename/15......ok
tests/rename/16......ok
tests/rename/17......ok
tests/rename/18......ok
tests/rename/19......ok
tests/rename/20......ok
tests/rmdir/00.......ok
tests/rmdir/01.......ok
tests/rmdir/02.......ok
tests/rmdir/03.......ok
tests/rmdir/04.......ok
tests/rmdir/05.......ok
tests/rmdir/06.......ok
tests/rmdir/07.......ok
tests/rmdir/08.......ok
tests/rmdir/09.......ok
tests/rmdir/10.......ok
tests/rmdir/11.......ok
tests/rmdir/12.......ok
tests/rmdir/13.......ok
tests/rmdir/14.......ok
tests/rmdir/15.......ok
tests/symlink/00.....ok
tests/symlink/01.....ok
tests/symlink/02.....ok
tests/symlink/03.....ok
tests/symlink/04.....ok
tests/symlink/05.....ok
tests/symlink/06.....ok
tests/symlink/07.....ok
tests/symlink/08.....ok
tests/symlink/09.....ok
tests/symlink/10.....ok
tests/symlink/11.....ok
tests/symlink/12.....ok
tests/truncate/00....FAILED test 15
        Failed 1/21 tests, 95.24% okay
tests/truncate/01....ok
tests/truncate/02....ok
tests/truncate/03....ok
tests/truncate/04....ok
tests/truncate/05....ok
tests/truncate/06....ok
tests/truncate/07....ok
tests/truncate/08....ok
tests/truncate/09....ok
tests/truncate/10....ok
tests/truncate/11....ok
tests/truncate/12....ok
tests/truncate/13....ok
tests/truncate/14....ok
tests/unlink/00......ok
tests/unlink/01......ok
tests/unlink/02......ok
tests/unlink/03......ok
tests/unlink/04......ok
tests/unlink/05......ok
tests/unlink/06......ok
tests/unlink/07......ok
tests/unlink/08......ok
tests/unlink/09......ok
tests/unlink/10......ok
tests/unlink/11......ok
tests/unlink/12......ok
tests/unlink/13......ok
tests/xacl/00........FAILED test 2
        Failed 1/42 tests, 97.62% okay
tests/xacl/01........FAILED tests 2, 22
        Failed 2/32 tests, 93.75% okay
tests/xacl/02........FAILED tests 2, 41
        Failed 2/80 tests, 97.50% okay
tests/xacl/03........FAILED tests 2, 31, 35, 38, 43, 47
        Failed 6/57 tests, 89.47% okay
tests/xacl/04........FAILED tests 2, 52
        Failed 2/53 tests, 96.23% okay
tests/xacl/05........ok
tests/xacl/06........FAILED tests 16, 26-27, 30, 33-34, 36-38, 40-42
        Failed 12/42 tests, 71.43% okay
Failed Test         Stat Wstat Total Fail  List of Failed
-------------------------------------------------------------------------------
tests/link/00.t                   82    2  56 63
tests/truncate/00.t               21    1  15
tests/xacl/00.t                   42    1  2
tests/xacl/01.t                   32    2  2 22
tests/xacl/02.t                   80    2  2 41
tests/xacl/03.t                   57    6  2 31 35 38 43 47
tests/xacl/04.t                   53    2  2 52
tests/xacl/06.t                   42   12  16 26-27 30 33-34 36-38 40-42
Failed 8/191 test scripts. 28/2284 subtests failed.
Files=191, Tests=2284, 2762 wallclock secs (18.56 cusr + 179.63 csys = 198.19 CPU)
Failed 8/191 test programs. 28/2284 subtests failed.



besides all that, iŽm really impressed by btrfs and i think it`s at really good condition and quite stable.
i think i dare using it for a server to do some pre-production testing....

Keep up the good work!

regards
roland





> devzero@xxxxxx wrote:
> > thank you.
> > 
> > i tried your patch and did another test run.
> > 
> > first, it looked better as i could do much more snapshots than before, but then it froze again.
> > 
> > i don`t really have a clue if your patch enhanced anything, as my test setup isn`t exactly  reproducable for now and i did not check for exact "testing lab conditions".
> > 
> > after /btrfs froze again, i tried to unmount by forcibly unloading btrfs module. 
> > 
> > after reloading the module and trying to mount again, it failed with the following  kernel message:
> > 
> I hope the new patch can solve the problem.
> 
> Yan Zheng 
> 
> ---
> diff -urp 1/fs/btrfs/inode.c 2/fs/btrfs/inode.c
> --- 1/fs/btrfs/inode.c	2008-12-18 08:09:16.062111805 +0800
> +++ 2/fs/btrfs/inode.c	2008-12-22 08:47:06.000000000 +0800
> @@ -2891,7 +2891,7 @@ void btrfs_delete_inode(struct inode *in
>  	btrfs_wait_ordered_range(inode, 0, (u64)-1);
>  
>  	btrfs_i_size_write(inode, 0);
> -	trans = btrfs_start_transaction(root, 1);
> +	trans = btrfs_join_transaction(root, 1);
>  
>  	btrfs_set_trans_block_group(trans, inode);
>  	ret = btrfs_truncate_inode_items(trans, root, inode, inode->i_size, 0);
> diff -urp 1/fs/btrfs/transaction.c 2/fs/btrfs/transaction.c
> --- 1/fs/btrfs/transaction.c	2008-12-13 12:35:29.487886730 +0800
> +++ 2/fs/btrfs/transaction.c	2008-12-21 19:09:09.000000000 +0800
> @@ -804,7 +804,7 @@ static noinline int finish_pending_snaps
>  
>  	parent_inode = pending->dentry->d_parent->d_inode;
>  	parent_root = BTRFS_I(parent_inode)->root;
> -	trans = btrfs_start_transaction(parent_root, 1);
> +	trans = btrfs_join_transaction(parent_root, 1);
>  
>  	/*
>  	 * insert the directory item
> 


_______________________________________________________________________
Täglich 1.000.000 Euro gewinnen! Jetzt kostenlos WEB.DE MillionenKlick 
spielen! https://millionenklick.web.de/?mc=mail@xxxxxxxxxxxxx@home

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[Index of Archives]     [Linux Filesystem Development]     [Linux NFS]     [Linux NILFS]     [Linux USB Devel]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]

  Powered by Linux