The bug just happened again. Attached is a log since the time I
mounted the FS right after the fsck.
Note the only things between the message I got while mounting:
[216798.144518] BTRFS info (device sdc1): disk space caching is enabled
and the beginning of the crash dump:
[241534.760651] ------------[ cut here ]------------
is this:
[218266.098344] usb 4-1.1: reset high-speed USB device number 3 using ehci-pci
[233647.332085] usb 4-1.1: reset high-speed USB device number 3 using ehci-pci
I am not sure why those resets happen, though. I bought a few cables
and experimented with them, and the usb ports themselves are located
directly on the motherboard.
Also, they happened some considerable time before the crash dump. So
I'm not sure they're even related. Especially given that I was copying
a lot of very small files, and they all copied onto the disk fine all
the time between the last usb reset and the crash dump, which is
roughly two and a half hours. In fact I pressed ctrl-z on a move
operation and then wrote something like sleep $(echo '60*60*3' | bc) ;
fg and ran it just past 9 am, so the mv resumed past 12 pm, so as
things add up the last usb reset happened even before the mv was
resumed with fg.
I unmounted the fs and re-mounted the it to make it writeable again.
This showed up in dmesg:
[241766.485365] BTRFS error (device sdc1): cleaner transaction attach
returned -30
[241770.115897] BTRFS info (device sdc1): disk space caching is enabled
this time there was no "info" line about the free space cache file. So
maybe it wasn't important for the bug to occur at all.
The new output of btrfs fi df -g is:
Data, single: total=2080.01GiB, used=2078.80GiB
System, DUP: total=0.01GiB, used=0.00GiB
System, single: total=0.00GiB, used=0.00GiB
Metadata, DUP: total=5.50GiB, used=3.73GiB
Metadata, single: total=0.01GiB, used=0.00GiB
GlobalReserve, single: total=0.50GiB, used=0.00GiB
I could swap this disk onto sata and the other disk back onto usb to
see if the usb resets have anything to do with this. But I'm skeptic.
Also maybe btrfs has some other issues related to just the disk being
on usb, resets or not, and this way if the bug doesn't trigger on sata
we'll think "aha it was the resets, buggy hardware etc" but instead
it'll have been something else that just has to do with the disk being
on usb operating normally.
On Mon, Jan 11, 2016 at 2:11 PM, cheater00 . <cheater00@xxxxxxxxx> wrote:
> On Mon, Jan 11, 2016 at 2:05 PM, Austin S. Hemmelgarn
> <ahferroin7@xxxxxxxxx> wrote:
>> On 2016-01-09 16:07, cheater00 . wrote:
>>>
>>> Would like to point out that this can cause data loss. If I'm writing
>>> to disk and the disk becomes unexpectedly read only - that data will
>>> be lost, because who in their right mind makes their code expect this
>>> and builds a contingency (e.g. caching, backpressure, etc)...
>>
>> If a data critical application (mail server, database server, anything
>> similar) can't gracefully handle ENOSPC, then that application is broken,
>> not the FS. As an example, set up a small VM with an SMTP server, then
>> force the FS the server uses for queuing mail read-only, and see if you can
>> submit mail, then go read the RFCs for SMTP and see what clients are
>> supposed to do when they can't submit mail. A properly designed piece of
>> software is supposed to be resilient against common failure modes of the
>> resources it depends on (which includes ENOSPC and read-only filesystems for
>> anything that works with data on disk).
>>>
>>>
>>> There's no loss of data on the disk because the data doesn't make it
>>> to disk in the first place. But it's exactly the same as if the data
>>> had been written to disk, and then lost.
>>>
>> No, it isn't. If you absolutely need the data on disk, you should be
>> calling fsync or fdatasync, and then assuming if those return an error that
>> none of the data written since the last call has gotten to the disk (some of
>> it might have, but you need to assume it hasn't). Every piece of software
>> in wide usage that requires data to be on the disk does this, because
>> otherwise it can't guarantee that the data is on disk.
>
> I agree that a lot of stuff goes right in a perfect world. But most of
> the time what you're running isn't a mail server used by billions of
> users, but instead a bash script someone wrote once that's supposed to
> do something, and no one knows how it works.
[216798.144518] BTRFS info (device sdc1): disk space caching is enabled
[218266.098344] usb 4-1.1: reset high-speed USB device number 3 using ehci-pci
[233647.332085] usb 4-1.1: reset high-speed USB device number 3 using ehci-pci
[241534.760651] ------------[ cut here ]------------
[241534.760675] WARNING: CPU: 0 PID: 19845 at /home/kernel/COD/linux/fs/btrfs/extent-tree.c:2851 btrfs_run_delayed_refs+0x227/0x250 [btrfs]()
[241534.760677] BTRFS: Transaction aborted (error -28)
[241534.760679] Modules linked in: ufs qnx4 hfsplus hfs minix ntfs msdos jfs xfs libcrc32c nls_utf8 isofs pci_stub vboxpci(OE) vboxnetadp(OE) vboxnetflt(OE) vboxdrv(OE) cuse ipt_MASQUERADE nf_nat_masquerade_ipv4 iptable_nat nf_nat_ipv4 nf_nat nf_conntrack_ipv4 nf_defrag_ipv4 xt_conntrack nf_conntrack ipt_REJECT nf_reject_ipv4 xt_CHECKSUM iptable_mangle xt_tcpudp bridge stp llc ip6table_filter ip6_tables iptable_filter ip_tables ebtable_nat ebtables x_tables usblp snd_hda_codec_hdmi hp_wmi sparse_keymap snd_hda_codec_idt snd_hda_codec_generic intel_rapl iosf_mbi x86_pkg_temp_thermal intel_powerclamp dm_multipath snd_hda_intel snd_hda_codec coretemp snd_hda_core kvm_intel radeon snd_hwdep hid_logitech_hidpp kvm snd_pcm i915 snd_seq_midi crc32_pclmul snd_seq_midi_event snd_rawmidi aesni_intel aes_i586 ttm xts snd_seq lrw drm_kms_helper snd_seq_device gf128mul snd_timer ablk_helper drm joydev cryptd bnep rfcomm snd input_leds i2c_algo_bit fb_sys_fops rtsx_pci_ms bluetooth serio_raw soundcore syscopyarea memstick sysfillrect sysimgblt hp_accel mei_me lis3lv02d lpc_ich wmi shpchp input_polldev mei video mac_hid nfsd auth_rpcgss nfs_acl parport_pc nfs ppdev lockd lp grace sunrpc parport fscache binfmt_misc hid_generic hid_logitech_dj usbhid hid btrfs xor uas usb_storage raid6_pq rtsx_pci_sdmmc ahci r8169 sdhci_pci psmouse libahci sdhci rtsx_pci mii fjes
[241534.760775] CPU: 0 PID: 19845 Comm: btrfs-transacti Tainted: G W OE 4.3.0-040300rc7-generic #201510260712
[241534.760776] Hardware name: Hewlett-Packard HP Pavilion dv6 Notebook PC/17FA, BIOS F.02 10/03/2011
[241534.760778] 00000000 00000000 e4c9ddd8 c13610e8 e4c9de18 e4c9de08 c1068107 f8ae4190
[241534.760782] e4c9de34 00004d85 f8ae3ff0 00000b23 f8a45a77 f8a45a77 ea674078 ffffffe4
[241534.760786] cafaf400 e4c9de20 c1068173 00000009 e4c9de18 f8ae4190 e4c9de34 e4c9de5c
[241534.760790] Call Trace:
[241534.760796] [<c13610e8>] dump_stack+0x41/0x59
[241534.760800] [<c1068107>] warn_slowpath_common+0x87/0xc0
[241534.760810] [<f8a45a77>] ? btrfs_run_delayed_refs+0x227/0x250 [btrfs]
[241534.760818] [<f8a45a77>] ? btrfs_run_delayed_refs+0x227/0x250 [btrfs]
[241534.760820] [<c1068173>] warn_slowpath_fmt+0x33/0x40
[241534.760828] [<f8a45a77>] btrfs_run_delayed_refs+0x227/0x250 [btrfs]
[241534.760836] [<f8a467f0>] btrfs_write_dirty_block_groups+0x170/0x2a0 [btrfs]
[241534.760848] [<f8adb3c8>] commit_cowonly_roots+0x1e9/0x26a [btrfs]
[241534.760859] [<f8a5b6ba>] btrfs_commit_transaction+0x87a/0xe90 [btrfs]
[241534.760869] [<f8a5bd4d>] ? start_transaction+0x7d/0x5b0 [btrfs]
[241534.760878] [<f8a56865>] transaction_kthread+0x215/0x230 [btrfs]
[241534.760887] [<f8a56650>] ? btrfs_cleanup_transaction+0x490/0x490 [btrfs]
[241534.760890] [<c1083c3b>] kthread+0x9b/0xb0
[241534.760894] [<c1743581>] ret_from_kernel_thread+0x21/0x30
[241534.760897] [<c1083ba0>] ? kthread_create_on_node+0x110/0x110
[241534.760899] ---[ end trace dc3cf6814526c7ca ]---
[241534.760923] BTRFS: error (device sdc1) in btrfs_run_delayed_refs:2851: errno=-28 No space left
[241534.760927] BTRFS info (device sdc1): forced readonly
[241534.761198] BTRFS warning (device sdc1): Skipping commit of aborted transaction.
[241534.761201] BTRFS: error (device sdc1) in cleanup_transaction:1741: errno=-28 No space left
# lsusb
Bus 004 Device 007: ID 05e3:0608 Genesys Logic, Inc. Hub
Bus 004 Device 006: ID 046d:c31d Logitech, Inc. Media Keyboard K200
Bus 004 Device 005: ID 046d:c52b Logitech, Inc. Unifying Receiver
Bus 004 Device 004: ID 05e3:0608 Genesys Logic, Inc. Hub
Bus 004 Device 003: ID 067b:2773 Prolific Technology, Inc. PL2773 SATAII bridge controller
Bus 004 Device 002: ID 8087:0024 Intel Corp. Integrated Rate Matching Hub
Bus 004 Device 001: ID 1d6b:0002 Linux Foundation 2.0 root hub
Bus 002 Device 001: ID 1d6b:0003 Linux Foundation 3.0 root hub
Bus 001 Device 003: ID 04fc:0c05 Sunplus Technology Co., Ltd
Bus 001 Device 002: ID 04e8:3252 Samsung Electronics Co., Ltd
Bus 001 Device 001: ID 1d6b:0002 Linux Foundation 2.0 root hub
Bus 003 Device 003: ID 138a:0018 Validity Sensors, Inc. Fingerprint scanner
Bus 003 Device 002: ID 8087:0024 Intel Corp. Integrated Rate Matching Hub
Bus 003 Device 001: ID 1d6b:0002 Linux Foundation 2.0 root hub
# lsusb -t
/: Bus 04.Port 1: Dev 1, Class=root_hub, Driver=ehci-pci/2p, 480M
|__ Port 1: Dev 2, If 0, Class=Hub, Driver=hub/6p, 480M
|__ Port 1: Dev 3, If 0, Class=Mass Storage, Driver=usb-storage, 480M
|__ Port 2: Dev 4, If 0, Class=Hub, Driver=hub/4p, 480M
|__ Port 1: Dev 5, If 0, Class=Human Interface Device, Driver=usbhid, 12M
|__ Port 1: Dev 5, If 1, Class=Human Interface Device, Driver=usbhid, 12M
|__ Port 1: Dev 5, If 2, Class=Human Interface Device, Driver=usbhid, 12M
|__ Port 2: Dev 6, If 0, Class=Human Interface Device, Driver=usbhid, 1.5M
|__ Port 2: Dev 6, If 1, Class=Human Interface Device, Driver=usbhid, 1.5M
|__ Port 3: Dev 7, If 0, Class=Hub, Driver=hub/4p, 480M
/: Bus 03.Port 1: Dev 1, Class=root_hub, Driver=ehci-pci/2p, 480M
|__ Port 1: Dev 2, If 0, Class=Hub, Driver=hub/6p, 480M
|__ Port 1: Dev 3, If 0, Class=Vendor Specific Class, Driver=, 12M
/: Bus 02.Port 1: Dev 1, Class=root_hub, Driver=xhci_hcd/2p, 5000M
/: Bus 01.Port 1: Dev 1, Class=root_hub, Driver=xhci_hcd/2p, 480M
|__ Port 1: Dev 2, If 0, Class=Printer, Driver=usblp, 480M
|__ Port 2: Dev 3, If 0, Class=Mass Storage, Driver=usb-storage, 480M