Re: 6TB partition, Data only 2TB - aka When you haven't hit the "usual" problem

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



I triggered the bug again, attaching log. There were some usb resets,
but they happened 23 minutes before the fs crashed.

At mount, the output of btrfs fi df -g was like this:
Data, single: total=2080.01GiB, used=2078.80GiB
System, DUP: total=0.01GiB, used=0.00GiB
System, single: total=0.00GiB, used=0.00GiB
Metadata, DUP: total=5.50GiB, used=3.73GiB
Metadata, single: total=0.01GiB, used=0.00GiB
GlobalReserve, single: total=0.50GiB, used=0.00GiB

Now it is:
Data, single: total=2094.01GiB, used=2092.26GiB
System, DUP: total=0.01GiB, used=0.00GiB
System, single: total=0.00GiB, used=0.00GiB
Metadata, DUP: total=5.50GiB, used=3.79GiB
Metadata, single: total=0.01GiB, used=0.00GiB
GlobalReserve, single: total=0.50GiB, used=0.00GiB

The file being copied at the time was 954 MB.



On Mon, Jan 11, 2016 at 3:10 PM, Austin S. Hemmelgarn
<ahferroin7@xxxxxxxxx> wrote:
> On 2016-01-11 08:11, cheater00 . wrote:
>>
>> On Mon, Jan 11, 2016 at 2:05 PM, Austin S. Hemmelgarn
>> <ahferroin7@xxxxxxxxx> wrote:
>>>
>>> On 2016-01-09 16:07, cheater00 . wrote:
>>>>
>>>>
>>>> Would like to point out that this can cause data loss. If I'm writing
>>>> to disk and the disk becomes unexpectedly read only - that data will
>>>> be lost, because who in their right mind makes their code expect this
>>>> and builds a contingency (e.g. caching, backpressure, etc)...
>>>
>>>
>>> If a data critical application (mail server, database server, anything
>>> similar) can't gracefully handle ENOSPC, then that application is broken,
>>> not the FS.  As an example, set up a small VM with an SMTP server, then
>>> force the FS the server uses for queuing mail read-only, and see if you
>>> can
>>> submit mail, then go read the RFCs for SMTP and see what clients are
>>> supposed to do when they can't submit mail.  A properly designed piece of
>>> software is supposed to be resilient against common failure modes of the
>>> resources it depends on (which includes ENOSPC and read-only filesystems
>>> for
>>> anything that works with data on disk).
>>>>
>>>>
>>>>
>>>> There's no loss of data on the disk because the data doesn't make it
>>>> to disk in the first place. But it's exactly the same as if the data
>>>> had been written to disk, and then lost.
>>>>
>>> No, it isn't.  If you absolutely need the data on disk, you should be
>>> calling fsync or fdatasync, and then assuming if those return an error
>>> that
>>> none of the data written since the last call has gotten to the disk (some
>>> of
>>> it might have, but you need to assume it hasn't).  Every piece of
>>> software
>>> in wide usage that requires data to be on the disk does this, because
>>> otherwise it can't guarantee that the data is on disk.
>>
>>
>> I agree that a lot of stuff goes right in a perfect world. But most of
>> the time what you're running isn't a mail server used by billions of
>> users, but instead a bash script someone wrote once that's supposed to
>> do something, and no one knows how it works.
>>
> And that's why no sane person does stuff like that on enterprise level
> systems.  And even then, if the person writing the bash script actually
> knows what they're doing, they will be using the 'sync' command to ensure
> data integrity when they actually need it, or they will write their script
> in such a way that it gracefully handles a partial run.
[241770.115897] BTRFS info (device sdc1): disk space caching is enabled
[242773.777365] usb 4-1.1: reset high-speed USB device number 3 using ehci-pci
[248064.722181] usb 4-1.1: reset high-speed USB device number 3 using ehci-pci
[249457.369166] ------------[ cut here ]------------
[249457.369215] WARNING: CPU: 4 PID: 7358 at /home/kernel/COD/linux/fs/btrfs/extent-tree.c:6360 __btrfs_free_extent+0x354/0xe70 [btrfs]()
[249457.369220] BTRFS: Transaction aborted (error -28)
[249457.369224] Modules linked in: ufs qnx4 hfsplus hfs minix ntfs msdos jfs xfs libcrc32c nls_utf8 isofs pci_stub vboxpci(OE) vboxnetadp(OE) vboxnetflt(OE) vboxdrv(OE) cuse ipt_MASQUERADE nf_nat_masquerade_ipv4 iptable_nat nf_nat_ipv4 nf_nat nf_conntrack_ipv4 nf_defrag_ipv4 xt_conntrack nf_conntrack ipt_REJECT nf_reject_ipv4 xt_CHECKSUM iptable_mangle xt_tcpudp bridge stp llc ip6table_filter ip6_tables iptable_filter ip_tables ebtable_nat ebtables x_tables usblp snd_hda_codec_hdmi hp_wmi sparse_keymap snd_hda_codec_idt snd_hda_codec_generic intel_rapl iosf_mbi x86_pkg_temp_thermal intel_powerclamp dm_multipath snd_hda_intel snd_hda_codec coretemp snd_hda_core kvm_intel radeon snd_hwdep hid_logitech_hidpp kvm snd_pcm i915 snd_seq_midi crc32_pclmul snd_seq_midi_event snd_rawmidi aesni_intel aes_i586 ttm xts snd_seq lrw drm_kms_helper snd_seq_device gf128mul snd_timer ablk_helper drm joydev cryptd bnep rfcomm snd input_leds i2c_algo_bit fb_sys_fops rtsx_pci_ms bluetooth serio_raw soundcore syscopyarea memstick sysfillrect sysimgblt hp_accel mei_me lis3lv02d lpc_ich wmi shpchp input_polldev mei video mac_hid nfsd auth_rpcgss nfs_acl parport_pc nfs ppdev lockd lp grace sunrpc parport fscache binfmt_misc hid_generic hid_logitech_dj usbhid hid btrfs xor uas usb_storage raid6_pq rtsx_pci_sdmmc ahci r8169 sdhci_pci psmouse libahci sdhci rtsx_pci mii fjes
[249457.369455] CPU: 4 PID: 7358 Comm: btrfs-transacti Tainted: G        W  OE   4.3.0-040300rc7-generic #201510260712
[249457.369460] Hardware name: Hewlett-Packard HP Pavilion dv6 Notebook PC/17FA, BIOS F.02 10/03/2011
[249457.369464]  00000000 00000000 d6d1bc40 c13610e8 d6d1bc80 d6d1bc70 c1068107 f8ae4190
[249457.369490]  d6d1bc9c 00001cbe f8ae3ff0 000018d8 f8a3d8d4 f8a3d8d4 ea42f2a0 ffffffe4
[249457.369503]  00000000 d6d1bc88 c1068173 00000009 d6d1bc80 f8ae4190 d6d1bc9c d6d1bd4c
[249457.369516] Call Trace:
[249457.369530]  [<c13610e8>] dump_stack+0x41/0x59
[249457.369542]  [<c1068107>] warn_slowpath_common+0x87/0xc0
[249457.369574]  [<f8a3d8d4>] ? __btrfs_free_extent+0x354/0xe70 [btrfs]
[249457.369610]  [<f8a3d8d4>] ? __btrfs_free_extent+0x354/0xe70 [btrfs]
[249457.369620]  [<c1068173>] warn_slowpath_fmt+0x33/0x40
[249457.369655]  [<f8a3d8d4>] __btrfs_free_extent+0x354/0xe70 [btrfs]
[249457.369666]  [<c10d6001>] ? ktime_get+0x41/0x120
[249457.369715]  [<f8aad26b>] ? btrfs_delayed_ref_lock+0x2b/0x200 [btrfs]
[249457.369749]  [<f8a42370>] __btrfs_run_delayed_refs+0x970/0x1110 [btrfs]
[249457.369763]  [<c11674a1>] ? set_page_dirty+0x31/0x70
[249457.369814]  [<f8a837cc>] ? set_extent_buffer_dirty+0x7c/0xd0 [btrfs]
[249457.369847]  [<f8a458bd>] btrfs_run_delayed_refs+0x6d/0x250 [btrfs]
[249457.369879]  [<f8a467f0>] btrfs_write_dirty_block_groups+0x170/0x2a0 [btrfs]
[249457.369926]  [<f8adb3c8>] commit_cowonly_roots+0x1e9/0x26a [btrfs]
[249457.369974]  [<f8a5b6ba>] btrfs_commit_transaction+0x87a/0xe90 [btrfs]
[249457.370012]  [<f8a5bd4d>] ? start_transaction+0x7d/0x5b0 [btrfs]
[249457.370026]  [<c10a5060>] ? wake_atomic_t_function+0x70/0x70
[249457.370066]  [<f8a56865>] transaction_kthread+0x215/0x230 [btrfs]
[249457.370101]  [<f8a56650>] ? btrfs_cleanup_transaction+0x490/0x490 [btrfs]
[249457.370113]  [<c1083c3b>] kthread+0x9b/0xb0
[249457.370125]  [<c1743581>] ret_from_kernel_thread+0x21/0x30
[249457.370136]  [<c1083ba0>] ? kthread_create_on_node+0x110/0x110
[249457.370144] ---[ end trace dc3cf6814526c7cb ]---
[249457.370203] BTRFS: error (device sdc1) in __btrfs_free_extent:6360: errno=-28 No space left
[249457.370211] BTRFS info (device sdc1): forced readonly
[249457.370220] BTRFS: error (device sdc1) in btrfs_run_delayed_refs:2851: errno=-28 No space left
[249457.419978] BTRFS warning (device sdc1): Skipping commit of aborted transaction.
[249457.419984] BTRFS: error (device sdc1) in cleanup_transaction:1741: errno=-28 No space left


[Index of Archives]     [Linux Filesystem Development]     [Linux NFS]     [Linux NILFS]     [Linux USB Devel]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]

  Powered by Linux