Re: kernel BUG at linux-4.2.0/fs/btrfs/extent-tree.c:1833 on rebalance

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 





在 2015年09月18日 18:26, Stéphane Lesimple 写道:
Le 2015-09-18 12:15, Stéphane Lesimple a écrit :
Le 2015-09-18 09:36, Stéphane Lesimple a écrit :
Sure, I did a quota disable / quota enable before running the snapshot
debug procedure, so the qgroups were clean again when I started :

qgroupid          rfer          excl     max_rfer     max_excl parent
child
--------          ----          ----     --------     -------- ------
-----
0/5              16384         16384         none         none --- ---
0/1906   1657848029184 1657848029184         none         none --- ---
0/1909    124950921216  124950921216         none         none --- ---
0/1911   1054587293696 1054587293696         none         none --- ---
0/3270     23727300608   23727300608         none         none --- ---
0/3314     23221784576   23221784576         none         none --- ---
0/3341      7479275520    7479275520         none         none --- ---
0/3367     24185790464   24185790464         none         none --- ---

The test is running, I expect to post the results within an hour or two.

Well, my system crashed twice while running the procedure...
By "crashed" I mean : the machine no longer pings, and nothing is
logged in kern.log unfortunately :

[ 7096.735731] BTRFS info (device dm-3): qgroup scan completed
(inconsistency flag cleared)
[ 7172.614851] BTRFS info (device dm-3): qgroup scan completed
(inconsistency flag cleared)
[ 7242.870259] BTRFS info (device dm-3): qgroup scan completed
(inconsistency flag cleared)
[ 7321.466931] BTRFS info (device dm-3): qgroup scan completed
(inconsistency flag cleared)
[    0.000000] Initializing cgroup subsys cpuset

The even stranger part is that the last 2 stdout dump files exist but
are empty :

-rw-r--r-- 1 root root   21 Sep 18 10:29 snap32.step5
-rw-r--r-- 1 root root 3.2K Sep 18 10:29 snap32.step6
-rw-r--r-- 1 root root 3.2K Sep 18 10:29 snap33.step1
-rw-r--r-- 1 root root 3.3K Sep 18 10:29 snap33.step3
-rw-r--r-- 1 root root   21 Sep 18 10:30 snap33.step5
-rw-r--r-- 1 root root 3.3K Sep 18 10:30 snap33.step6
-rw-r--r-- 1 root root 3.3K Sep 18 10:30 snap34.step1
-rw-r--r-- 1 root root    0 Sep 18 10:30 snap34.step3 <==
-rw-r--r-- 1 root root    0 Sep 18 10:30 snap34.step5 <==

The mentioned steps are as follows :

0) Rsync data from the next ext4 "snapshot" to the subvolume
1) Do 'sync; btrfs qgroup show -⁠prce -⁠-⁠raw' and save the output <==
2) Create the needed readonly snapshot on btrfs
3) Do 'sync; btrfs qgroup show -⁠prce -⁠-⁠raw' and save the output <==
4) Avoid doing IO if possible until step 6)
5) Do 'btrfs quota rescan -⁠w' and save it <==
6) Do 'sync; btrfs qgroup show -⁠prce -⁠-⁠raw' and save the output <==

The resulting files are available here:
http://speed47.net/tmp2/qgroup.tar.gz
The run2 is the more complete one, during run1 the machine crashed
even faster.
It's interesting to note, however, that it seems to have crashed the
same way and at the same step in the process.

Your data really helps a lot!!

And the good news is, the qgroup accouting part is working as expected.
Although I only checked about 1/3/6 of about 5 snaps, they are all OK.

I can make a script to cross check them, but from the last few result, I think qgroup works fine.

I'm more confident about the minus number, which should be a result of deleted subvolume, and the real problem is, such qgroup is not handled well with qgroup rescan.

I'll try to add a hot fix for such case if needed.
But right now, I don't have a good idea for it until Mark's work of rescan subtree.

Maybe I can add a new option for btrfs-progs to automatically remove the qgroup and trigger a rescan?

Actually about that, I forgot I did set up netconsole before starting
the second run after the first "muted" crash, and it did work : even if
I have no logs in kern.log, netconsole managed to send them to my other
machine before going down, so here it is :

---
[ 5738.172692] BUG: unable to handle kernel NULL pointer dereference at
00000000000001f0
[ 5738.172702] IP: [<ffffffffc03150db>] start_transaction+0x1b/0x580
[btrfs]
[ 5738.172719] PGD c0aa7067 PUD c0aa6067 PMD 0
[ 5738.172723] Oops: 0000 [#1] SMP
[ 5738.172726] Modules linked in: netconsole configfs xts gf128mul drbg
ansi_cprng xt_multiport xt_comment xt_conntrack xt_nat xt_tcpudp
nfnetlink_queue nfnetlink_log nfnetlink nf_conntrack_ftp
nf_conntrack_sane iptable_security iptable_filter iptable_mangle
iptable_nat nf_conntrack_ipv4 nf_defrag_ipv4 nf_nat_ipv4 nf_nat
nf_conntrack iptable_raw ip_tables x_tables nfsd auth_rpcgss nfs_acl nfs
cmac dm_crypt rfcomm bnep lockd grace sunrpc fscache binfmt_misc
intel_rapl snd_hda_codec_realtek iosf_mbi x86_pkg_temp_thermal
intel_powerclamp kvm_intel snd_hda_codec_generic snd_hda_intel
snd_hda_codec kvm eeepc_wmi asus_wmi snd_hda_core btusb sparse_keymap
btrtl snd_hwdep btbcm snd_pcm btintel 8021q bluetooth snd_seq_midi
dm_multipath snd_seq_midi_event garp snd_rawmidi mrp snd_seq stp llc
snd_seq_device snd_timer crct10dif_pclmul crc32_pclmul snd
ghash_clmulni_intel cryptd serio_raw soundcore mei_me mei lpc_ich shpchp
mac_hid parport_pc ppdev nct6775 hwmon_vid coretemp lp parport btrfs
raid10 raid456 async_raid6_recov async_memcpy async_pq async_xor
async_tx xor raid6_pq raid0 multipath linear nbd raid1 i915 e1000e
i2c_algo_bit drm_kms_helper syscopyarea ptp sysfillrect sysimgblt
fb_sys_fops psmouse ahci drm libahci pps_core wmi video [last unloaded:
netconsole]
[ 5738.172831] CPU: 1 PID: 10932 Comm: kworker/u4:14 Not tainted
4.3.0-rc1 #1
[ 5738.172833] Hardware name: ASUS All Series/H87I-PLUS, BIOS 1005
01/06/2014
[ 5738.172843] Workqueue: btrfs-qgroup-rescan btrfs_qgroup_rescan_helper
[btrfs]
[ 5738.172845] task: ffff8800c7010000 ti: ffff88006acf4000 task.ti:
ffff88006acf4000
[ 5738.172847] RIP: 0010:[<ffffffffc03150db>]  [<ffffffffc03150db>]
start_transaction+0x1b/0x580 [btrfs]
[ 5738.172855] RSP: 0018:ffff88006acf7ca8  EFLAGS: 00010282
[ 5738.172856] RAX: 0000000000000004 RBX: 0000000000000201 RCX:
0000000000000002
[ 5738.172857] RDX: 0000000000000201 RSI: 0000000000000001 RDI:
0000000000000000
[ 5738.172858] RBP: ffff88006acf7cf0 R08: ffff88010990eab0 R09:
00000001801c0017
[ 5738.172860] R10: 000000000990e701 R11: ffffea0004264380 R12:
0000000000000000
[ 5738.172861] R13: ffff8800c73a6e08 R14: ffff880027963800 R15:
0000160000000000
[ 5738.172862] FS:  0000000000000000(0000) GS:ffff88011fb00000(0000)
knlGS:0000000000000000
[ 5738.172863] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 5738.172864] CR2: 00000000000001f0 CR3: 0000000027a65000 CR4:
00000000000406e0
[ 5738.172866] Stack:
[ 5738.172867]  ffff8800c73a6e08 ffff880027963800 0000160000000000
ffff88006acf7ce8
[ 5738.172871]  00000000000000be 00000000fffffffc ffff8800c73a6e08
ffff880027963800
[ 5738.172875]  0000160000000000 ffff88006acf7d00 ffffffffc031565b
ffff88006acf7dc0
[ 5738.172879] Call Trace:
[ 5738.172887]  [<ffffffffc031565b>] btrfs_start_transaction+0x1b/0x20
[btrfs]
[ 5738.172896]  [<ffffffffc0378038>]
btrfs_qgroup_rescan_worker+0x388/0x5a0 [btrfs]

Your netconsole backtrace is also of greate value.
This one implies that, my rework also caused some stupid bug.
(Yeah, I always make such bugs) or some existing unexposed rescan bug.

Would you please use gdb to show the codes of "btrfs_qgroup_rescan_worker+0x388" ?
(Need kernel debuginfo)

My guess is the following line:(pretty sure, but not 100% sure)
------
/*
* only update status, since the previous part has alreay updated the
         * qgroup info.
         */
        trans = btrfs_start_transaction(fs_info->quota_root, 1); <<<<<
        if (IS_ERR(trans)) {
                err = PTR_ERR(trans);
                btrfs_err(fs_info,
"fail to start transaction for status update: %d\n",
                          err);
                goto done;
        }
------

But that means, at rescan time, fs_info->quota_root is still NULL, which is quite wired. I can add extra check to avoid such NULL pointer for now, but it's better to review the existing rescan workflow, as I think there is some race for it to init quota_root.

You can also try the following hotfix patch to see if it works:
http://pastebin.com/966GQXPk

My concern is, this may cause qgroup rescan to exit without updating its accounting info...

So still need your help.
Or I can use your reproducer script to test it next Monday.


[ 5738.172904]  [<ffffffffc03444e0>] normal_work_helper+0xc0/0x270 [btrfs]
[ 5738.172912]  [<ffffffffc03448a2>]
btrfs_qgroup_rescan_helper+0x12/0x20 [btrfs]
[ 5738.172915]  [<ffffffff8109127e>] process_one_work+0x14e/0x3d0
[ 5738.172917]  [<ffffffff8109192a>] worker_thread+0x11a/0x470
[ 5738.172919]  [<ffffffff81091810>] ? rescuer_thread+0x310/0x310
[ 5738.172921]  [<ffffffff81097059>] kthread+0xc9/0xe0
[ 5738.172923]  [<ffffffff81096f90>] ? kthread_park+0x60/0x60
[ 5738.172926]  [<ffffffff817aac4f>] ret_from_fork+0x3f/0x70
[ 5738.172928]  [<ffffffff81096f90>] ? kthread_park+0x60/0x60
[ 5738.172929] Code: 49 c1 e9 5c ff ff ff 66 0f 1f 84 00 00 00 00 00 0f
1f 44 00 00 55 48 89 e5 41 57 41 56 41 55 41 54 49 89 fc 53 89 d3 48 83
ec 20 <48> 8b 87 f0 01 00 00 48 8b 90 60 0e 00 00 83 e2 01 0f 85 86 00
[ 5738.172973] RIP  [<ffffffffc03150db>] start_transaction+0x1b/0x580
[btrfs]
[ 5738.172981]  RSP <ffff88006acf7ca8>
[ 5738.172982] CR2: 00000000000001f0
[ 5738.172984] ---[ end trace 9feb85def1327ee9 ]---
[ 5738.173010] BUG: unable to handle kernel paging request at
ffffffffffffffd8
[ 5738.173012] IP: [<ffffffff810977d0>] kthread_data+0x10/0x20
[ 5738.173015] PGD 1c13067 PUD 1c15067 PMD 0
[ 5738.173019] Oops: 0000 [#2] SMP
[ 5738.173021] Modules linked in: netconsole configfs xts gf128mul drbg
---

Clearly this is during a rescan.

As the machine is now, qgroups seems OK :

~# btrfs qgroup show -pcre --raw /tank/
qgroupid          rfer          excl     max_rfer     max_excl parent
child
--------          ----          ----     --------     -------- ------
-----
0/5              32768         32768         none         none --- ---
0/1906   3315696058368 3315696058368         none         none --- ---
0/1909    249901842432  249901842432         none         none --- ---
0/1911   2109174587392 2109174587392         none         none --- ---
0/3270     47454601216   47454601216         none         none --- ---
0/3314     46408499200         32768         none         none --- ---
0/3341     14991097856         32768         none         none --- ---
0/3367     48371580928   48371580928         none         none --- ---
0/5335     56523751424     280592384         none         none --- ---
0/5336     60175253504    2599960576         none         none --- ---
0/5337     45751746560     250888192         none         none --- ---
0/5338     45804650496     186531840         none         none --- ---
0/5339     45875167232     190521344         none         none --- ---
0/5340     45933486080        327680         none         none --- ---
0/5341     45933502464        344064         none         none --- ---
0/5342     46442815488      35454976         none         none --- ---
0/5343     46442520576      30638080         none         none --- ---
0/5344     46448312320      36495360         none         none --- ---
0/5345     46425235456      86204416         none         none --- ---
0/5346     46081941504     119398400         none         none --- ---
0/5347     46402715648      55615488         none         none --- ---
0/5348     46403534848      50528256         none         none --- ---
0/5349     45486301184      91463680         none         none --- ---
0/5351     46414635008        393216         none         none --- ---
0/5352     46414667776        294912         none         none --- ---
0/5353     46414667776        294912         none         none --- ---
0/5354     46406148096      24829952         none         none --- ---
0/5355     46415986688      33103872         none         none --- ---
0/5356     46406262784      23216128         none         none --- ---
0/5357     46408245248      17408000         none         none --- ---
0/5358     46416052224      25280512         none         none --- ---
0/5359     46406336512      23158784         none         none --- ---
0/5360     46408335360      25157632         none         none --- ---
0/5361     46406402048      24395776         none         none --- ---
0/5362     46415273984      32260096         none         none --- ---
0/5363     46408499200         32768         none         none --- ---
0/5364     14949441536     139812864         none         none --- ---
0/5365     14996299776     176889856         none         none --- ---
0/5366     14958616576     143065088         none         none --- ---
0/5367     14919172096     100171776         none         none --- ---
0/5368     14945968128     142409728         none         none --- ---
0/5369     14991097856         32768         none         none --- ---


But I'm pretty sure I can get that (u64)-1 value again by deleting
snapshots. Shall I ? Or do you have something else for me to run
before that ?

You have already done a great job in helping maturing qgroups.
The minus number and 0 excl is somewhat expected for deleted snapshots.

Good news is, 1) it doesn't affect valid(non-orphan) qgroup.
2) Mark is already working on it.

I'll try to add a btrfs-progs hotfix for you to delete and rescan qgroups to avoid such problem.


So, as a quick summary of this big thread, it seems I've been hitting
3 bugs, all reproductible :
- kernel BUG on balance (this original thread)

For this, I can't provide much help, as extent backref bug is quite hard to debug, unless a developer is interested in it and find a stable way to reproduce it.

The rest two are explained or have hot fix mentioned above.

Thanks,
Qu

- negative or zero "excl" qgroups
- hard freezes without kernel trace when playing with snapshots and quota

Still available to dig deeper where needed.

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html




[Index of Archives]     [Linux Filesystem Development]     [Linux NFS]     [Linux NILFS]     [Linux USB Devel]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]

  Powered by Linux