I triggered the bug again, attaching log. There were some usb resets, but they happened 23 minutes before the fs crashed. At mount, the output of btrfs fi df -g was like this: Data, single: total=2080.01GiB, used=2078.80GiB System, DUP: total=0.01GiB, used=0.00GiB System, single: total=0.00GiB, used=0.00GiB Metadata, DUP: total=5.50GiB, used=3.73GiB Metadata, single: total=0.01GiB, used=0.00GiB GlobalReserve, single: total=0.50GiB, used=0.00GiB Now it is: Data, single: total=2094.01GiB, used=2092.26GiB System, DUP: total=0.01GiB, used=0.00GiB System, single: total=0.00GiB, used=0.00GiB Metadata, DUP: total=5.50GiB, used=3.79GiB Metadata, single: total=0.01GiB, used=0.00GiB GlobalReserve, single: total=0.50GiB, used=0.00GiB The file being copied at the time was 954 MB. On Mon, Jan 11, 2016 at 3:10 PM, Austin S. Hemmelgarn <ahferroin7@xxxxxxxxx> wrote: > On 2016-01-11 08:11, cheater00 . wrote: >> >> On Mon, Jan 11, 2016 at 2:05 PM, Austin S. Hemmelgarn >> <ahferroin7@xxxxxxxxx> wrote: >>> >>> On 2016-01-09 16:07, cheater00 . wrote: >>>> >>>> >>>> Would like to point out that this can cause data loss. If I'm writing >>>> to disk and the disk becomes unexpectedly read only - that data will >>>> be lost, because who in their right mind makes their code expect this >>>> and builds a contingency (e.g. caching, backpressure, etc)... >>> >>> >>> If a data critical application (mail server, database server, anything >>> similar) can't gracefully handle ENOSPC, then that application is broken, >>> not the FS. As an example, set up a small VM with an SMTP server, then >>> force the FS the server uses for queuing mail read-only, and see if you >>> can >>> submit mail, then go read the RFCs for SMTP and see what clients are >>> supposed to do when they can't submit mail. A properly designed piece of >>> software is supposed to be resilient against common failure modes of the >>> resources it depends on (which includes ENOSPC and read-only filesystems >>> for >>> anything that works with data on disk). >>>> >>>> >>>> >>>> There's no loss of data on the disk because the data doesn't make it >>>> to disk in the first place. But it's exactly the same as if the data >>>> had been written to disk, and then lost. >>>> >>> No, it isn't. If you absolutely need the data on disk, you should be >>> calling fsync or fdatasync, and then assuming if those return an error >>> that >>> none of the data written since the last call has gotten to the disk (some >>> of >>> it might have, but you need to assume it hasn't). Every piece of >>> software >>> in wide usage that requires data to be on the disk does this, because >>> otherwise it can't guarantee that the data is on disk. >> >> >> I agree that a lot of stuff goes right in a perfect world. But most of >> the time what you're running isn't a mail server used by billions of >> users, but instead a bash script someone wrote once that's supposed to >> do something, and no one knows how it works. >> > And that's why no sane person does stuff like that on enterprise level > systems. And even then, if the person writing the bash script actually > knows what they're doing, they will be using the 'sync' command to ensure > data integrity when they actually need it, or they will write their script > in such a way that it gracefully handles a partial run.
[241770.115897] BTRFS info (device sdc1): disk space caching is enabled [242773.777365] usb 4-1.1: reset high-speed USB device number 3 using ehci-pci [248064.722181] usb 4-1.1: reset high-speed USB device number 3 using ehci-pci [249457.369166] ------------[ cut here ]------------ [249457.369215] WARNING: CPU: 4 PID: 7358 at /home/kernel/COD/linux/fs/btrfs/extent-tree.c:6360 __btrfs_free_extent+0x354/0xe70 [btrfs]() [249457.369220] BTRFS: Transaction aborted (error -28) [249457.369224] Modules linked in: ufs qnx4 hfsplus hfs minix ntfs msdos jfs xfs libcrc32c nls_utf8 isofs pci_stub vboxpci(OE) vboxnetadp(OE) vboxnetflt(OE) vboxdrv(OE) cuse ipt_MASQUERADE nf_nat_masquerade_ipv4 iptable_nat nf_nat_ipv4 nf_nat nf_conntrack_ipv4 nf_defrag_ipv4 xt_conntrack nf_conntrack ipt_REJECT nf_reject_ipv4 xt_CHECKSUM iptable_mangle xt_tcpudp bridge stp llc ip6table_filter ip6_tables iptable_filter ip_tables ebtable_nat ebtables x_tables usblp snd_hda_codec_hdmi hp_wmi sparse_keymap snd_hda_codec_idt snd_hda_codec_generic intel_rapl iosf_mbi x86_pkg_temp_thermal intel_powerclamp dm_multipath snd_hda_intel snd_hda_codec coretemp snd_hda_core kvm_intel radeon snd_hwdep hid_logitech_hidpp kvm snd_pcm i915 snd_seq_midi crc32_pclmul snd_seq_midi_event snd_rawmidi aesni_intel aes_i586 ttm xts snd_seq lrw drm_kms_helper snd_seq_device gf128mul snd_timer ablk_helper drm joydev cryptd bnep rfcomm snd input_leds i2c_algo_bit fb_sys_fops rtsx_pci_ms bluetooth serio_raw soundcore syscopyarea memstick sysfillrect sysimgblt hp_accel mei_me lis3lv02d lpc_ich wmi shpchp input_polldev mei video mac_hid nfsd auth_rpcgss nfs_acl parport_pc nfs ppdev lockd lp grace sunrpc parport fscache binfmt_misc hid_generic hid_logitech_dj usbhid hid btrfs xor uas usb_storage raid6_pq rtsx_pci_sdmmc ahci r8169 sdhci_pci psmouse libahci sdhci rtsx_pci mii fjes [249457.369455] CPU: 4 PID: 7358 Comm: btrfs-transacti Tainted: G W OE 4.3.0-040300rc7-generic #201510260712 [249457.369460] Hardware name: Hewlett-Packard HP Pavilion dv6 Notebook PC/17FA, BIOS F.02 10/03/2011 [249457.369464] 00000000 00000000 d6d1bc40 c13610e8 d6d1bc80 d6d1bc70 c1068107 f8ae4190 [249457.369490] d6d1bc9c 00001cbe f8ae3ff0 000018d8 f8a3d8d4 f8a3d8d4 ea42f2a0 ffffffe4 [249457.369503] 00000000 d6d1bc88 c1068173 00000009 d6d1bc80 f8ae4190 d6d1bc9c d6d1bd4c [249457.369516] Call Trace: [249457.369530] [<c13610e8>] dump_stack+0x41/0x59 [249457.369542] [<c1068107>] warn_slowpath_common+0x87/0xc0 [249457.369574] [<f8a3d8d4>] ? __btrfs_free_extent+0x354/0xe70 [btrfs] [249457.369610] [<f8a3d8d4>] ? __btrfs_free_extent+0x354/0xe70 [btrfs] [249457.369620] [<c1068173>] warn_slowpath_fmt+0x33/0x40 [249457.369655] [<f8a3d8d4>] __btrfs_free_extent+0x354/0xe70 [btrfs] [249457.369666] [<c10d6001>] ? ktime_get+0x41/0x120 [249457.369715] [<f8aad26b>] ? btrfs_delayed_ref_lock+0x2b/0x200 [btrfs] [249457.369749] [<f8a42370>] __btrfs_run_delayed_refs+0x970/0x1110 [btrfs] [249457.369763] [<c11674a1>] ? set_page_dirty+0x31/0x70 [249457.369814] [<f8a837cc>] ? set_extent_buffer_dirty+0x7c/0xd0 [btrfs] [249457.369847] [<f8a458bd>] btrfs_run_delayed_refs+0x6d/0x250 [btrfs] [249457.369879] [<f8a467f0>] btrfs_write_dirty_block_groups+0x170/0x2a0 [btrfs] [249457.369926] [<f8adb3c8>] commit_cowonly_roots+0x1e9/0x26a [btrfs] [249457.369974] [<f8a5b6ba>] btrfs_commit_transaction+0x87a/0xe90 [btrfs] [249457.370012] [<f8a5bd4d>] ? start_transaction+0x7d/0x5b0 [btrfs] [249457.370026] [<c10a5060>] ? wake_atomic_t_function+0x70/0x70 [249457.370066] [<f8a56865>] transaction_kthread+0x215/0x230 [btrfs] [249457.370101] [<f8a56650>] ? btrfs_cleanup_transaction+0x490/0x490 [btrfs] [249457.370113] [<c1083c3b>] kthread+0x9b/0xb0 [249457.370125] [<c1743581>] ret_from_kernel_thread+0x21/0x30 [249457.370136] [<c1083ba0>] ? kthread_create_on_node+0x110/0x110 [249457.370144] ---[ end trace dc3cf6814526c7cb ]--- [249457.370203] BTRFS: error (device sdc1) in __btrfs_free_extent:6360: errno=-28 No space left [249457.370211] BTRFS info (device sdc1): forced readonly [249457.370220] BTRFS: error (device sdc1) in btrfs_run_delayed_refs:2851: errno=-28 No space left [249457.419978] BTRFS warning (device sdc1): Skipping commit of aborted transaction. [249457.419984] BTRFS: error (device sdc1) in cleanup_transaction:1741: errno=-28 No space left
