Rough (re)start with btrfs

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hello,

as discussed in the other thread, I am migrating to BTRFS (again).
Unfortunately, I had a bit of a rough start
[Mo Apr 29 20:44:47 2019] INFO: task btrfs-transacti:10227 blocked for more than 120 seconds.
[...]
This repeated several times while moving data over. Full log of one instance below.

I am using btrfs-progs v4.20.2 and debian stretch with 4.19.0-0.bpo.2-amd64 (I think, this is the latest Kernel available in stretch. Please correct if I am wrong.

I understand that this was a 'timeout'. Is this caused by hardware?

In the SMART logs indeed, I see that errors happened (Full log below)
84 51 20 df 27 81 40 Error: ICRC, ABRT 32 sectors at LBA = 0x008127df = 8464351 But this happened after 44h. I am now at 82h, so that was long before the error -infact during my 'burn-in test'.

Is this a case for an RMA of the drive?

Greetings,
Hendrik












[Mo Apr 29 20:44:47 2019] INFO: task btrfs-transacti:10227 blocked for more than 120 seconds. [Mo Apr 29 20:44:47 2019] Not tainted 4.19.0-0.bpo.2-amd64 #1 Debian 4.19.16-1~bpo9+1 [Mo Apr 29 20:44:47 2019] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[Mo Apr 29 20:44:47 2019] btrfs-transacti D    0 10227      2 0x80000000
[Mo Apr 29 20:44:47 2019] Call Trace:
[Mo Apr 29 20:44:47 2019]  ? __schedule+0x3f5/0x880
[Mo Apr 29 20:44:47 2019]  schedule+0x32/0x80
[Mo Apr 29 20:44:47 2019]  wait_for_commit+0x41/0x80 [btrfs]
[Mo Apr 29 20:44:47 2019]  ? remove_wait_queue+0x60/0x60
[Mo Apr 29 20:44:47 2019]  btrfs_commit_transaction+0x10b/0x8a0 [btrfs]
[Mo Apr 29 20:44:47 2019]  ? start_transaction+0x8f/0x3e0 [btrfs]
[Mo Apr 29 20:44:47 2019]  transaction_kthread+0x157/0x180 [btrfs]
[Mo Apr 29 20:44:47 2019]  kthread+0xf8/0x130
[Mo Apr 29 20:44:47 2019] ? btrfs_cleanup_transaction+0x500/0x500 [btrfs]
[Mo Apr 29 20:44:47 2019]  ? kthread_create_worker_on_cpu+0x70/0x70
[Mo Apr 29 20:44:47 2019]  ret_from_fork+0x35/0x40
[Mo Apr 29 20:44:47 2019] INFO: task kworker/u4:8:10576 blocked for more than 120 seconds. [Mo Apr 29 20:44:47 2019] Not tainted 4.19.0-0.bpo.2-amd64 #1 Debian 4.19.16-1~bpo9+1 [Mo Apr 29 20:44:47 2019] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[Mo Apr 29 20:44:47 2019] kworker/u4:8    D    0 10576      2 0x80000000
[Mo Apr 29 20:44:47 2019] Workqueue: events_unbound btrfs_async_reclaim_metadata_space [btrfs]
[Mo Apr 29 20:44:47 2019] Call Trace:
[Mo Apr 29 20:44:47 2019]  ? __schedule+0x3f5/0x880
[Mo Apr 29 20:44:47 2019]  schedule+0x32/0x80
[Mo Apr 29 20:44:47 2019]  wait_current_trans+0xb9/0xf0 [btrfs]
[Mo Apr 29 20:44:47 2019]  ? remove_wait_queue+0x60/0x60
[Mo Apr 29 20:44:47 2019]  start_transaction+0x201/0x3e0 [btrfs]
[Mo Apr 29 20:44:47 2019]  flush_space+0x14d/0x5e0 [btrfs]
[Mo Apr 29 20:44:47 2019]  ? __switch_to_asm+0x40/0x70
[Mo Apr 29 20:44:47 2019]  ? __switch_to_asm+0x34/0x70
[Mo Apr 29 20:44:47 2019]  ? __switch_to_asm+0x40/0x70
[Mo Apr 29 20:44:47 2019]  ? __switch_to_asm+0x34/0x70
[Mo Apr 29 20:44:47 2019]  ? __switch_to_asm+0x34/0x70
[Mo Apr 29 20:44:47 2019]  ? __switch_to_asm+0x34/0x70
[Mo Apr 29 20:44:47 2019]  ? __switch_to_asm+0x40/0x70
[Mo Apr 29 20:44:47 2019]  ? __switch_to_asm+0x34/0x70
[Mo Apr 29 20:44:47 2019]  ? __switch_to_asm+0x40/0x70
[Mo Apr 29 20:44:47 2019]  ? get_alloc_profile+0x72/0x180 [btrfs]
[Mo Apr 29 20:44:47 2019]  ? can_overcommit.part.63+0x55/0xe0 [btrfs]
[Mo Apr 29 20:44:47 2019] btrfs_async_reclaim_metadata_space+0xfb/0x4c0 [btrfs]
[Mo Apr 29 20:44:47 2019]  process_one_work+0x191/0x370
[Mo Apr 29 20:44:47 2019]  worker_thread+0x4f/0x3b0
[Mo Apr 29 20:44:47 2019]  kthread+0xf8/0x130
[Mo Apr 29 20:44:47 2019]  ? rescuer_thread+0x340/0x340
[Mo Apr 29 20:44:47 2019]  ? kthread_create_worker_on_cpu+0x70/0x70
[Mo Apr 29 20:44:47 2019]  ret_from_fork+0x35/0x40





SMART Attributes Data Structure revision number: 16
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME FLAG VALUE WORST THRESH TYPE UPDATED WHEN_FAILED RAW_VALUE 1 Raw_Read_Error_Rate 0x000b 100 100 016 Pre-fail Always - 0 2 Throughput_Performance 0x0004 128 128 054 Old_age Offline - 116 3 Spin_Up_Time 0x0007 198 198 024 Pre-fail Always - 349 (Average 317) 4 Start_Stop_Count 0x0012 100 100 000 Old_age Always - 26 5 Reallocated_Sector_Ct 0x0033 100 100 005 Pre-fail Always - 0 7 Seek_Error_Rate 0x000a 100 100 067 Old_age Always - 0 8 Seek_Time_Performance 0x0004 128 128 020 Old_age Offline - 18 9 Power_On_Hours 0x0012 100 100 000 Old_age Always - 82 10 Spin_Retry_Count 0x0012 100 100 060 Old_age Always - 0 12 Power_Cycle_Count 0x0032 100 100 000 Old_age Always - 26 22 Helium_Level 0x0023 100 100 025 Pre-fail Always - 100 192 Power-Off_Retract_Count 0x0032 100 100 000 Old_age Always - 27 193 Load_Cycle_Count 0x0012 100 100 000 Old_age Always - 27 194 Temperature_Celsius 0x0002 250 250 000 Old_age Always - 26 (Min/Max 24/55) 196 Reallocated_Event_Count 0x0032 100 100 000 Old_age Always - 0 197 Current_Pending_Sector 0x0022 100 100 000 Old_age Always - 0 198 Offline_Uncorrectable 0x0008 100 100 000 Old_age Offline - 0 199 UDMA_CRC_Error_Count 0x000a 200 200 000 Old_age Always - 2

SMART Error Log Version: 1
ATA Error Count: 2
        CR = Command Register [HEX]
        FR = Features Register [HEX]
        SC = Sector Count Register [HEX]
        SN = Sector Number Register [HEX]
        CL = Cylinder Low Register [HEX]
        CH = Cylinder High Register [HEX]
        DH = Device/Head Register [HEX]
        DC = Device Command Register [HEX]
        ER = Error register [HEX]
        ST = Status register [HEX]
Powered_Up_Time is measured from power on, and printed as
DDd+hh:mm:SS.sss where DD=days, hh=hours, mm=minutes,
SS=sec, and sss=millisec. It "wraps" after 49.710 days.

Error 2 occurred at disk power-on lifetime: 44 hours (1 days + 20 hours)
When the command that caused the error occurred, the device was doing SMART Offline or Self-test.

  After command completion occurred, registers were:
  ER ST SC SN CL CH DH
  -- -- -- -- -- -- --
84 51 20 df 27 81 40 Error: ICRC, ABRT 32 sectors at LBA = 0x008127df = 8464351

  Commands leading to the command that caused the error were:
  CR FR SC SN CL CH DH DC   Powered_Up_Time  Command/Feature_Name
  -- -- -- -- -- -- -- --  ----------------  --------------------
  35 03 20 df 27 81 40 00      00:00:54.041  WRITE DMA EXT
  35 03 01 ff 27 81 40 00      00:00:54.040  WRITE DMA EXT
  25 03 20 02 00 00 40 00      00:00:54.040  READ DMA EXT
  25 03 20 02 00 00 40 00      00:00:54.040  READ DMA EXT
  25 03 01 af 2a 81 40 00      00:00:54.011  READ DMA EXT

Error 1 occurred at disk power-on lifetime: 44 hours (1 days + 20 hours)
When the command that caused the error occurred, the device was doing SMART Offline or Self-test.

  After command completion occurred, registers were:
  ER ST SC SN CL CH DH
  -- -- -- -- -- -- --
  84 51 00 00 00 00 00

  Commands leading to the command that caused the error were:
  CR FR SC SN CL CH DH DC   Powered_Up_Time  Command/Feature_Name
  -- -- -- -- -- -- -- --  ----------------  --------------------
  47 00 01 00 00 00 a0 08      00:00:37.434  READ LOG DMA EXT
  47 00 01 13 00 00 a0 08      00:00:37.433  READ LOG DMA EXT
  47 00 01 00 00 00 a0 08      00:00:37.432  READ LOG DMA EXT
ef 10 02 00 00 00 a0 08 00:00:37.430 SET FEATURES [Enable SATA feature] 27 00 00 00 00 00 e0 08 00:00:37.427 READ NATIVE MAX ADDRESS EXT [OBS-ACS-3]

SMART Self-test log structure revision number 1
No self-tests have been logged.  [To run self-tests, use: smartctl -t]

SMART Selective self-test log data structure revision number 1
 SPAN  MIN_LBA  MAX_LBA  CURRENT_TEST_STATUS
    1        0        0  Not_testing
    2        0        0  Not_testing
    3        0        0  Not_testing
    4        0        0  Not_testing
    5        0        0  Not_testing
Selective self-test flags (0x0):
  After scanning selected spans, do NOT read-scan remainder of disk.
If Selective self-test is pending on power-up, resume after 0 minute delay.




[Index of Archives]     [Linux Filesystem Development]     [Linux NFS]     [Linux NILFS]     [Linux USB Devel]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]

  Powered by Linux