Hello,
Short version: while doing scrub on 5 disk btrfs filesystem, /dev/sdd
"failed" and also had some error on other disk (/dev/sdh)
Because filesystem still mounts, I assume I should do "btrfs device
delete /dev/sdd /mntpoint" and then restore damaged files from backup.
Are all affected files listed in journal? there's messages about "x
callbacks suppressed" so I'm not sure and if there aren't how to get
full list of damaged files?
Also I wonder if there are any tools to recover partial file fragments
and reconstruct file? (where missing fragments filled with nulls)
I assume that there's no point in running "btrfs check
--check-data-csum" because scrub already does check that?
from journal:
kernel: drivers/scsi/mvsas/mv_sas.c 1863:Release slot [1] tag[1], task
[ffff88007efb8800]:
kernel: drivers/scsi/mvsas/mv_94xx.c 625:command active 00000002, slot [1].
kernel: sas: sas_ata_task_done: SAS error 8a
kernel: sas: Enter sas_scsi_recover_host busy: 1 failed: 1
kernel: sas: ata9: end_device-7:2: cmd error handler
kernel: sas: ata7: end_device-7:0: dev error handler
kernel: sas: ata14: end_device-7:7: dev error handler
kernel: ata9.00: exception Emask 0x0 SAct 0x800 SErr 0x0 action 0x0
kernel: ata9.00: failed command: READ FPDMA QUEUED
kernel: ata9.00: cmd 60/00:00:00:3d:a1/04:00:ab:00:00/40 tag 11 ncq 524288 in
res
41/40:00:48:40:a1/00:04:ab:00:00/00 Emask 0x409 (media error) <F>
kernel: ata9.00: status: { DRDY ERR }
kernel: ata9.00: error: { UNC }
kernel: ata9.00: configured for UDMA/133
kernel: sd 7:0:2:0: [sdd] tag#0 UNKNOWN(0x2003) Result: hostbyte=0x00
driverbyte=0x08
kernel: sd 7:0:2:0: [sdd] tag#0 Sense Key : 0x3 [current] [descriptor]
kernel: sd 7:0:2:0: [sdd] tag#0 ASC=0x11 ASCQ=0x4
kernel: sd 7:0:2:0: [sdd] tag#0 CDB: opcode=0x28 28 00 ab a1 3d 00 00 04 00 00
kernel: blk_update_request: I/O error, dev sdd, sector 2879471688
kernel: ata9: EH complete
kernel: sas: --- Exit sas_scsi_recover_host: busy: 0 failed: 0 tries: 1
kernel: drivers/scsi/mvsas/mv_sas.c 1863:Release slot [1] tag[1], task
[ffff88007efb9a00]:
kernel: drivers/scsi/mvsas/mv_94xx.c 625:command active 00000003, slot [1].
kernel: sas: sas_ata_task_done: SAS error 8a
kernel: sas: Enter sas_scsi_recover_host busy: 2 failed: 2
kernel: sas: trying to find task 0xffff8801e0cadb00
kernel: sas: sas_scsi_find_task: aborting task 0xffff8801e0cadb00
kernel: sas: sas_scsi_find_task: task 0xffff8801e0cadb00 is aborted
kernel: sas: sas_eh_handle_sas_errors: task 0xffff8801e0cadb00 is aborted
kernel: sas: ata9: end_device-7:2: cmd error handler
kernel: sas: ata8: end_device-7:1: cmd error handler
kernel: sas: ata7: end_device-7:0: dev error handler
kernel: sas: ata8: end_device-7:1: dev error handler
kernel: ata8.00: exception Emask 0x0 SAct 0x40000 SErr 0x0 action 0x6 frozen
kernel: ata8.00: failed command: READ FPDMA QUEUED
kernel: ata8.00: cmd 60/00:00:00:1b:36/04:00:bf:00:00/40 tag 18 ncq 524288 in
res
40/00:08:00:58:11/00:00:a6:00:00/40 Emask 0x4 (timeout)
kernel: ata8.00: status: { DRDY }
kernel: ata8: hard resetting link
kernel: sas: ata9: end_device-7:2: dev error handler
kernel: sas: ata14: end_device-7:7: dev error handler
kernel: ata9: log page 10h reported inactive tag 26
kernel: ata9.00: exception Emask 0x1 SAct 0x400000 SErr 0x0 action 0x6
kernel: ata9.00: failed command: READ FPDMA QUEUED
kernel: ata9.00: cmd 60/08:00:48:40:a1/00:00:ab:00:00/40 tag 22 ncq 4096 in
res
01/04:a8:40:40:a1/00:00:ab:00:00/40 Emask 0x3 (HSM violation)
kernel: ata9.00: status: { ERR }
kernel: ata9.00: error: { ABRT }
kernel: ata9: hard resetting link
kernel: sas: sas_form_port: phy1 belongs to port1 already(1)!
kernel: ata9.00: both IDENTIFYs aborted, assuming NODEV
kernel: ata9.00: revalidation failed (errno=-2)
kernel: drivers/scsi/mvsas/mv_sas.c 1428:mvs_I_T_nexus_reset for device[1]:rc= 0
kernel: ata8.00: configured for UDMA/133
kernel: ata8.00: device reported invalid CHS sector 0
kernel: ata8: EH complete
kernel: ata9: hard resetting link
kernel: ata9.00: both IDENTIFYs aborted, assuming NODEV
kernel: ata9.00: revalidation failed (errno=-2)
kernel: ata9: hard resetting link
kernel: ata9.00: both IDENTIFYs aborted, assuming NODEV
kernel: ata9.00: revalidation failed (errno=-2)
kernel: ata9.00: disabled
kernel: ata9: EH complete
kernel: sas: --- Exit sas_scsi_recover_host: busy: 0 failed: 0 tries: 1
kernel: sd 7:0:2:0: [sdd] tag#0 UNKNOWN(0x2003) Result: hostbyte=0x04
driverbyte=0x00
kernel: sd 7:0:2:0: [sdd] tag#0 CDB: opcode=0x28 28 00 ab a1 40 48 00 00 08 00
kernel: blk_update_request: I/O error, dev sdd, sector 2879471688
kernel: sd 7:0:2:0: [sdd] tag#0 UNKNOWN(0x2003) Result: hostbyte=0x04
driverbyte=0x00
kernel: sd 7:0:2:0: [sdd] tag#0 CDB: opcode=0x28 28 00 ab a1 45 00 00 06 00 00
kernel: BTRFS: unable to fixup (regular) error at logical
7390602616832 on dev /dev/sdd
kernel: BTRFS: unable to fixup (regular) error at logical
7390602891264 on dev /dev/sdd
kernel: scsi_io_completion: 186117 callbacks suppressed
kernel: sd 7:0:2:0: [sdd] tag#0 UNKNOWN(0x2003) Result: hostbyte=0x04
driverbyte=0x00
kernel: sd 7:0:2:0: [sdd] tag#0 CDB: opcode=0x2a 2a 00 00 14 78 c0 00 00 20 00
kernel: blk_update_request: 186156 callbacks suppressed
kernel: blk_update_request: I/O error, dev sdd, sector 1341632
kernel: sd 7:0:2:0: [sdd] tag#1 UNKNOWN(0x2003) Result: hostbyte=0x04
driverbyte=0x00
kernel: sd 7:0:2:0: [sdd] tag#1 CDB: opcode=0x2a 2a 00 00 14 7a 80 00 00 20 00
kernel: blk_update_request: I/O error, dev sdd, sector 2879472896
kernel: BTRFS: i/o error at logical 7386235424768 on dev /dev/sdd,
sector 2891849768, root 3034, inode 5633529, offset 11878400, length
4096, links 1 (path: [...])
kernel: BTRFS: i/o error at logical 7386235039744 on dev /dev/sdd,
sector 2891849016, root 3034, inode 5633529, offset 11493376, length
4096, links 1 (path: [...])
kernel: btrfs_dev_stat_print_on_error: 78908 callbacks suppressed
kernel: BTRFS: bdev /dev/sdd errs: wr 347, rd 1644871, flush 0, corrupt 0, gen 0
kernel: BTRFS: bdev /dev/sdd errs: wr 356, rd 1644871, flush 0, corrupt 0, gen 0
kernel: BTRFS: error (device sdh) in write_all_supers:3454: errno=-5
IO failure (errors while submitting device barriers.)
kernel: BTRFS info (device sdh): forced readonly
kernel: BTRFS warning (device sdh): Skipping commit of aborted transaction.
kernel: ------------[ cut here ]------------
kernel: WARNING: CPU: 5 PID: 3756 at fs/btrfs/super.c:260
__btrfs_abort_transaction+0x54/0x130 [btrfs]()
kernel: BTRFS: Transaction aborted (error -5)
kernel: Modules linked in: nf_conntrack_netbios_ns
nf_conntrack_broadcast xt_tcpudp ip6t_rpfilter ip6t_REJECT [...]
kernel: nvidia(PO) tda8290 tuner aes_x86_64 lrw saa7134
snd_hda_codec_realtek gf128mul edac_core glue_helper [...]
kernel:
kernel: CPU: 5 PID: 3756 Comm: btrfs-transacti Tainted: P O
4.0.7-2-ARCH #1
kernel: Hardware name: Gigabyte Technology Co., Ltd.
GA-990FXA-UD3/GA-990FXA-UD3, BIOS FFe 11/08/2013
kernel: 0000000000000000 000000005f5d9ca7 ffff88006090fc18 ffffffff81574ec3
kernel: 0000000000000000 ffff88006090fc70 ffff88006090fc58 ffffffff81074e7a
kernel: 0000000000000000 ffff8800ce8e6c60 00000000fffffffb ffff8800bbaa4800
kernel: Call Trace:
kernel: [<ffffffff81574ec3>] dump_stack+0x4c/0x6e
kernel: [<ffffffff81074e7a>] warn_slowpath_common+0x8a/0xc0
kernel: [<ffffffff81074f05>] warn_slowpath_fmt+0x55/0x70
kernel: [<ffffffffa0253bb4>] __btrfs_abort_transaction+0x54/0x130 [btrfs]
kernel: [<ffffffffa0282ceb>] cleanup_transaction+0x7b/0x300 [btrfs]
kernel: [<ffffffff810b6ce0>] ? wake_atomic_t_function+0x60/0x60
kernel: [<ffffffffa0284162>] btrfs_commit_transaction+0x932/0xc10 [btrfs]
kernel: [<ffffffffa027f3a5>] transaction_kthread+0x1d5/0x240 [btrfs]
kernel: [<ffffffffa027f1d0>] ? btrfs_cleanup_transaction+0x5a0/0x5a0 [btrfs]
kernel: [<ffffffff810934b8>] kthread+0xd8/0xf0
kernel: [<ffffffff810933e0>] ? kthread_worker_fn+0x170/0x170
kernel: [<ffffffff8157a718>] ret_from_fork+0x58/0x90
kernel: [<ffffffff810933e0>] ? kthread_worker_fn+0x170/0x170
kernel: ---[ end trace 8ecc49ef203bd88c ]---
kernel: BTRFS: error (device sdh) in cleanup_transaction:1686:
errno=-5 IO failure
kernel: BTRFS info (device sdh): delayed_refs has NO entry
kernel: scrub_handle_errored_block: 92600 callbacks suppressed
kernel: BTRFS: i/o error at logical 7390928568320 on dev /dev/sdd,
sector 2892627456, root 3034, inode 5637106, offset 614400, length
4096, links 1 (path: [...])
kernel: BTRFS: i/o error at logical 7390928175104 on dev /dev/sdd,
sector 2892626688, root 3034, inode 5637106, offset 483328, length
4096, links 1 (path: [...])
kernel: scrub_handle_errored_block: 77404 callbacks suppressed
kernel: BTRFS: unable to fixup (regular) error at logical
7390928568320 on dev /dev/sdd
kernel: BTRFS: unable to fixup (regular) error at logical
7390928175104 on dev /dev/sdd
smartd[723]: Device: /dev/sdd [SAT], not capable of SMART self-check
smartd[723]: Device: /dev/sdd [SAT], failed to read SMART Attribute Data
smartd[723]: Device: /dev/sdd [SAT], Read SMART Self Test Log Failed
smartd[723]: Device: /dev/sdd [SAT], Read Summary SMART Error Log failed
kernel: scsi_io_completion: 8110 callbacks suppressed
kernel: sd 7:0:2:0: [sdd] tag#0 UNKNOWN(0x2003) Result: hostbyte=0x04
driverbyte=0x00
kernel: sd 7:0:2:0: [sdd] tag#0 CDB: opcode=0x28 28 00 e8 e0 88 00 00 00 08 00
kernel: blk_update_request: 8115 callbacks suppressed
kernel: blk_update_request: I/O error, dev sdd, sector 3907028992
kernel: sd 7:0:2:0: [sdd] tag#0 UNKNOWN(0x2003) Result: hostbyte=0x04
driverbyte=0x00
kernel: sd 7:0:2:0: [sdd] tag#0 CDB: opcode=0x28 28 00 e8 e0 88 00 00 00 08 00
kernel: blk_update_request: I/O error, dev sdd, sector 3907028992
kernel: Buffer I/O error on dev sdd, logical block 488378624, async page read
Long story:
I had Seagate disk which died, but still was covered by warranty so I
got replacement, only disk they returned wasn't new, but repaired
and I haven't used it much, but seems it won't hold for long as it got
uncorrectable sectors.
When I received it, I did full SMART test and checked all sectors,
everything passed and seemed to be good, but now I copied my data
and used it for a while, only to find
smartd[592]: Device: /dev/sdd [SAT], 16 Currently unreadable (pending) sectors
smartd[592]: Device: /dev/sdd [SAT], 16 Offline uncorrectable sectors
then I ran scrub
scrub status for 1ec5b839-acc6-4f70-be9d-6f9e6118c71c
scrub started at Sun Jul 12 13:36:11 2015 and was aborted after 02:43:21
total bytes scrubbed: 6.24TiB with 1648151 errors
error details: read=1648151
corrected errors: 704, uncorrectable errors: 1647447,
unverified errors: 0
it caused drive to become unrecognizable by Linux and seems it also
made some error for different disk (/dev/sdh)
which caused filesystem to become read-only and didn't mount
kernel: sd 7:0:2:0: [sdd] tag#0 UNKNOWN(0x2003) Result: hostbyte=0x04
driverbyte=0x00
kernel: sd 7:0:2:0: [sdd] tag#0 CDB: opcode=0x28 28 00 00 00 00 80 00 00 08 00
kernel: blk_update_request: I/O error, dev sdd, sector 128
kernel: BTRFS info (device sdh): enabling auto defrag
kernel: BTRFS info (device sdh): disk space caching is enabled
kernel: BTRFS: has skinny extents
kernel: BTRFS: failed to read chunk tree on sdh
mount[17625]: mount: wrong fs type, bad option, bad superblock on /dev/sdh,
mount[17625]: missing codepage or helper program, or other error
mount[17625]: In some cases useful info is found in syslog - try
mount[17625]: dmesg | tail or so.
kernel: BTRFS: open_ctree failed
kernel: sd 7:0:2:0: [sdd] Synchronizing SCSI cache
kernel: sd 7:0:2:0: [sdd] Synchronize Cache(10) failed: Result:
hostbyte=0x04 driverbyte=0x00
kernel: sd 7:0:2:0: [sdd] Stopping disk
kernel: sd 7:0:2:0: [sdd] Start/Stop Unit failed: Result:
hostbyte=0x04 driverbyte=0x00
pulled out that /dev/sdd drive and plugged back in
kernel: mvsas 0000:07:00.0: Phy2 : No sig fis
kernel: sas: phy-7:2 added to port-7:2, phy_mask:0x4 ( 200000000000000)
kernel: sas: DOING DISCOVERY on port 2, pid:16744
kernel: sas: DONE DISCOVERY on port 2, pid:16744, result:0
kernel: sas: Enter sas_scsi_recover_host busy: 0 failed: 0
kernel: ata20.00: ATA-8: ST2000DM001-9YN164, CC9F, max UDMA/133
kernel: ata20.00: 3907029168 sectors, multi 0: LBA48 NCQ (depth 31/32)
kernel: ata20.00: configured for UDMA/133
kernel: sas: --- Exit sas_scsi_recover_host: busy: 0 failed: 0 tries: 1
kernel: scsi 7:0:8:0: Direct-Access ATA ST2000DM001-9YN1 CC9F
PQ: 0 ANSI: 5
kernel: sd 7:0:8:0: [sdd] 3907029168 512-byte logical blocks: (2.00 TB/1.81 TiB)
kernel: sd 7:0:8:0: [sdd] 4096-byte physical blocks
kernel: sd 7:0:8:0: [sdd] Write Protect is off
kernel: sd 7:0:8:0: [sdd] Mode Sense: 00 3a 00 00
kernel: sd 7:0:8:0: [sdd] Write cache: enabled, read cache: enabled,
doesn't support DPO or FUA
kernel: sd 7:0:8:0: [sdd] Attached SCSI disk
smartd[723]: Device: /dev/sdd [SAT], SMART Usage Attribute: 187
Reported_Uncorrect changed from 100 to 98
smartd[723]: Device: /dev/sdd [SAT], previous self-test completed with
error (read test element)
smartd[723]: Device: /dev/sdd [SAT], Self-Test Log error count
increased from 0 to 2
smartd[723]: Device: /dev/sdd [SAT], ATA error count increased from 0 to 2
everything seems "ok" again, run short SMART self-test which now
failed for first time (but disk SMART status still says PASSED)
then resumed scrub and it completed
scrub status for 1ec5b839-acc6-4f70-be9d-6f9e6118c71c
scrub device /dev/sdc (id 1) history
scrub resumed at Sun Jul 12 18:07:06 2015 and finished after 04:34:02
total bytes scrubbed: 2.35TiB with 0 errors
scrub device /dev/sdd (id 2) history
scrub resumed at Sun Jul 12 18:07:06 2015 and finished after 02:56:23
total bytes scrubbed: 1.44TiB with 1648151 errors
error details: read=1648151
corrected errors: 704, uncorrectable errors: 1647447,
unverified errors: 0
scrub device /dev/sde (id 3) history
scrub started at Sun Jul 12 13:36:11 2015 and finished after 02:35:46
total bytes scrubbed: 1.43TiB with 0 errors
scrub device /dev/sdg (id 4) history
scrub started at Sun Jul 12 13:36:11 2015 and finished after 02:40:01
total bytes scrubbed: 1.44TiB with 0 errors
scrub device /dev/sdh (id 5) history
scrub started at Sun Jul 12 13:36:11 2015 and finished after 01:14:34
total bytes scrubbed: 537.82GiB with 0 errors
btrfs device stats doesn't show any errors
[/dev/sdc].write_io_errs 0
[/dev/sdc].read_io_errs 0
[/dev/sdc].flush_io_errs 0
[/dev/sdc].corruption_errs 0
[/dev/sdc].generation_errs 0
[/dev/sdd].write_io_errs 0
[/dev/sdd].read_io_errs 0
[/dev/sdd].flush_io_errs 0
[/dev/sdd].corruption_errs 0
[/dev/sdd].generation_errs 0
[/dev/sde].write_io_errs 0
[/dev/sde].read_io_errs 0
[/dev/sde].flush_io_errs 0
[/dev/sde].corruption_errs 0
[/dev/sde].generation_errs 0
[/dev/sdg].write_io_errs 0
[/dev/sdg].read_io_errs 0
[/dev/sdg].flush_io_errs 0
[/dev/sdg].corruption_errs 0
[/dev/sdg].generation_errs 0
[/dev/sdh].write_io_errs 0
[/dev/sdh].read_io_errs 0
[/dev/sdh].flush_io_errs 0
[/dev/sdh].corruption_errs 0
[/dev/sdh].generation_errs 0
other disk /dev/sdh doesn't show any signs if it would have become bad
so most likely it was controller's fault when sdd threw errors.
when scrub says about error counts, what exactly count's as error, a
file fragment?
also are there some easy way to locate those unreadable sectors and
rewrite them so hdd relocates them?
Thanks :)
Here's ful SMART info for /dev/sdd
=== START OF INFORMATION SECTION ===
Model Family: Seagate Barracuda 7200.14 (AF)
Device Model: ST2000DM001-9YN164
Serial Number: W2404VST
LU WWN Device Id: 5 000c50 044a7a68a
Firmware Version: CC9F
User Capacity: 2 000 398 934 016 bytes [2,00 TB]
Sector Sizes: 512 bytes logical, 4096 bytes physical
Rotation Rate: 7200 rpm
Device is: In smartctl database [for details use: -P show]
ATA Version is: ATA8-ACS T13/1699-D revision 4
SATA Version is: SATA 3.0, 6.0 Gb/s (current: 6.0 Gb/s)
Local Time is: Mon Jul 13 07:40:14 2015 EEST
SMART support is: Available - device has SMART capability.
SMART support is: Enabled
AAM feature is: Unavailable
APM level is: 128 (minimum power consumption without standby)
Rd look-ahead is: Enabled
Write cache is: Enabled
ATA Security is: Disabled, NOT FROZEN [SEC1]
Wt Cache Reorder: Unavailable
=== START OF READ SMART DATA SECTION ===
SMART overall-health self-assessment test result: PASSED
General SMART Values:
Offline data collection status: (0x00) Offline data collection activity
was never started.
Auto Offline Data Collection: Disabled.
Self-test execution status: ( 121) The previous self-test completed having
the read element of the test failed.
Total time to complete Offline
data collection: ( 592) seconds.
Offline data collection
capabilities: (0x73) SMART execute Offline immediate.
Auto Offline data collection
on/off support.
Suspend Offline collection upon new
command.
No Offline surface scan supported.
Self-test supported.
Conveyance Self-test supported.
Selective Self-test supported.
SMART capabilities: (0x0003) Saves SMART data before entering
power-saving mode.
Supports SMART auto save timer.
Error logging capability: (0x01) Error logging supported.
General Purpose Logging supported.
Short self-test routine
recommended polling time: ( 1) minutes.
Extended self-test routine
recommended polling time: ( 254) minutes.
Conveyance self-test routine
recommended polling time: ( 2) minutes.
SCT capabilities: (0x3081) SCT Status supported.
SMART Attributes Data Structure revision number: 10
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME FLAGS VALUE WORST THRESH FAIL RAW_VALUE
1 Raw_Read_Error_Rate POSR-- 117 100 006 - 166724616
3 Spin_Up_Time PO---- 092 092 000 - 0
4 Start_Stop_Count -O--CK 100 100 020 - 626
5 Reallocated_Sector_Ct PO--CK 100 100 036 - 0
7 Seek_Error_Rate POSR-- 060 060 030 - 1306645
9 Power_On_Hours -O--CK 097 097 000 - 3154
10 Spin_Retry_Count PO--C- 100 100 097 - 0
12 Power_Cycle_Count -O--CK 100 100 020 - 433
183 Runtime_Bad_Block -O--CK 100 100 000 - 0
184 End-to-End_Error -O--CK 100 100 099 - 0
187 Reported_Uncorrect -O--CK 098 098 000 - 2
188 Command_Timeout -O--CK 100 099 000 - 4 4 4
189 High_Fly_Writes -O-RCK 100 100 000 - 0
190 Airflow_Temperature_Cel -O---K 070 058 045 - 30 (0 1 34 29 0)
191 G-Sense_Error_Rate -O--CK 100 100 000 - 0
192 Power-Off_Retract_Count -O--CK 100 100 000 - 335
193 Load_Cycle_Count -O--CK 096 096 000 - 9566
194 Temperature_Celsius -O---K 030 042 000 - 30 (128 0 0 0 0)
197 Current_Pending_Sector -O--C- 100 100 000 - 16
198 Offline_Uncorrectable ----C- 100 100 000 - 16
199 UDMA_CRC_Error_Count -OSRCK 200 200 000 - 0
240 Head_Flying_Hours ------ 100 253 000 - 367h+26m+14.504s
241 Total_LBAs_Written ------ 100 253 000 - 38608136381115
242 Total_LBAs_Read ------ 100 253 000 - 7979572945843
||||||_ K auto-keep
|||||__ C event count
||||___ R error rate
|||____ S speed/performance
||_____ O updated online
|______ P prefailure warning
General Purpose Log Directory Version 1
SMART Log Directory Version 1 [multi-sector log support]
Address Access R/W Size Description
0x00 GPL,SL R/O 1 Log Directory
0x01 SL R/O 1 Summary SMART error log
0x02 SL R/O 5 Comprehensive SMART error log
0x03 GPL R/O 5 Ext. Comprehensive SMART error log
0x06 SL R/O 1 SMART self-test log
0x07 GPL R/O 1 Extended self-test log
0x09 SL R/W 1 Selective self-test log
0x10 GPL R/O 1 SATA NCQ Queued Error log
0x11 GPL R/O 1 SATA Phy Event Counters log
0x21 GPL R/O 1 Write stream error log
0x22 GPL R/O 1 Read stream error log
0x80-0x9f GPL,SL R/W 16 Host vendor specific log
0xa1 GPL,SL VS 20 Device vendor specific log
0xa2 GPL VS 4496 Device vendor specific log
0xa8 GPL,SL VS 20 Device vendor specific log
0xa9 GPL,SL VS 1 Device vendor specific log
0xab GPL VS 1 Device vendor specific log
0xb0 GPL VS 5067 Device vendor specific log
0xbd GPL VS 512 Device vendor specific log
0xbe-0xbf GPL VS 65535 Device vendor specific log
0xc0 GPL,SL VS 1 Device vendor specific log
0xe0 GPL,SL R/W 1 SCT Command/Status
0xe1 GPL,SL R/W 1 SCT Data Transfer
SMART Extended Comprehensive Error Log Version: 1 (5 sectors)
Device Error Count: 2
CR = Command Register
FEATR = Features Register
COUNT = Count (was: Sector Count) Register
LBA_48 = Upper bytes of LBA High/Mid/Low Registers ] ATA-8
LH = LBA High (was: Cylinder High) Register ] LBA
LM = LBA Mid (was: Cylinder Low) Register ] Register
LL = LBA Low (was: Sector Number) Register ]
DV = Device (was: Device/Head) Register
DC = Device Control Register
ER = Error register
ST = Status register
Powered_Up_Time is measured from power on, and printed as
DDd+hh:mm:SS.sss where DD=days, hh=hours, mm=minutes,
SS=sec, and sss=millisec. It "wraps" after 49.710 days.
Error 2 [1] occurred at disk power-on lifetime: 3139 hours (130 days + 19 hours)
When the command that caused the error occurred, the device was active or idle.
After command completion occurred, registers were:
ER -- ST COUNT LBA_48 LH LM LL DV DC
-- -- -- == -- == == == -- -- -- -- --
40 -- 51 00 00 00 00 ab a1 40 48 00 00 Error: UNC at LBA =
0xaba14048 = 2879471688
Commands leading to the command that caused the error were:
CR FEATR COUNT LBA_48 LH LM LL DV DC Powered_Up_Time Command/Feature_Name
-- == -- == -- == == == -- -- -- -- -- --------------- --------------------
60 00 00 00 08 00 00 ab a1 40 48 40 00 02:54:39.784 READ FPDMA QUEUED
60 00 00 00 08 00 00 ab a1 40 40 40 00 02:54:39.783 READ FPDMA QUEUED
60 00 00 00 08 00 00 ab a1 40 38 40 00 02:54:39.783 READ FPDMA QUEUED
60 00 00 00 08 00 00 ab a1 40 30 40 00 02:54:39.782 READ FPDMA QUEUED
60 00 00 00 08 00 00 ab a1 40 28 40 00 02:54:39.782 READ FPDMA QUEUED
Error 1 [0] occurred at disk power-on lifetime: 3139 hours (130 days + 19 hours)
When the command that caused the error occurred, the device was active or idle.
After command completion occurred, registers were:
ER -- ST COUNT LBA_48 LH LM LL DV DC
-- -- -- == -- == == == -- -- -- -- --
40 -- 51 00 00 00 00 ab a1 40 48 00 00 Error: UNC at LBA =
0xaba14048 = 2879471688
Commands leading to the command that caused the error were:
CR FEATR COUNT LBA_48 LH LM LL DV DC Powered_Up_Time Command/Feature_Name
-- == -- == -- == == == -- -- -- -- -- --------------- --------------------
60 00 00 04 00 00 00 ab a0 14 00 40 00 02:54:36.512 READ FPDMA QUEUED
60 00 00 04 00 00 00 ab a0 10 00 40 00 02:54:36.500 READ FPDMA QUEUED
60 00 00 04 00 00 00 ab a0 0c 00 40 00 02:54:36.498 READ FPDMA QUEUED
60 00 00 04 00 00 00 ab a0 08 00 40 00 02:54:36.497 READ FPDMA QUEUED
60 00 00 04 00 00 00 ab 9f f9 00 40 00 02:54:36.402 READ FPDMA QUEUED
SMART Error Log Version: 1
ATA Error Count: 2
CR = Command Register [HEX]
FR = Features Register [HEX]
SC = Sector Count Register [HEX]
SN = Sector Number Register [HEX]
CL = Cylinder Low Register [HEX]
CH = Cylinder High Register [HEX]
DH = Device/Head Register [HEX]
DC = Device Command Register [HEX]
ER = Error register [HEX]
ST = Status register [HEX]
Powered_Up_Time is measured from power on, and printed as
DDd+hh:mm:SS.sss where DD=days, hh=hours, mm=minutes,
SS=sec, and sss=millisec. It "wraps" after 49.710 days.
Error 2 occurred at disk power-on lifetime: 3139 hours (130 days + 19 hours)
When the command that caused the error occurred, the device was active or idle.
After command completion occurred, registers were:
ER ST SC SN CL CH DH
-- -- -- -- -- -- --
40 51 00 ff ff ff 0f Error: UNC at LBA = 0x0fffffff = 268435455
Commands leading to the command that caused the error were:
CR FR SC SN CL CH DH DC Powered_Up_Time Command/Feature_Name
-- -- -- -- -- -- -- -- ---------------- --------------------
60 00 08 ff ff ff 4f 00 02:54:39.784 READ FPDMA QUEUED
60 00 08 ff ff ff 4f 00 02:54:39.783 READ FPDMA QUEUED
60 00 08 ff ff ff 4f 00 02:54:39.783 READ FPDMA QUEUED
60 00 08 ff ff ff 4f 00 02:54:39.782 READ FPDMA QUEUED
60 00 08 ff ff ff 4f 00 02:54:39.782 READ FPDMA QUEUED
Error 1 occurred at disk power-on lifetime: 3139 hours (130 days + 19 hours)
When the command that caused the error occurred, the device was active or idle.
After command completion occurred, registers were:
ER ST SC SN CL CH DH
-- -- -- -- -- -- --
40 51 00 ff ff ff 0f Error: UNC at LBA = 0x0fffffff = 268435455
Commands leading to the command that caused the error were:
CR FR SC SN CL CH DH DC Powered_Up_Time Command/Feature_Name
-- -- -- -- -- -- -- -- ---------------- --------------------
60 00 00 ff ff ff 4f 00 02:54:36.512 READ FPDMA QUEUED
60 00 00 ff ff ff 4f 00 02:54:36.500 READ FPDMA QUEUED
60 00 00 ff ff ff 4f 00 02:54:36.498 READ FPDMA QUEUED
60 00 00 ff ff ff 4f 00 02:54:36.497 READ FPDMA QUEUED
60 00 00 ff ff ff 4f 00 02:54:36.402 READ FPDMA QUEUED
SMART Extended Self-test Log Version: 1 (1 sectors)
Num Test_Description Status Remaining
LifeTime(hours) LBA_of_first_error
# 1 Short offline Completed: read failure 90% 3139
2879471688
# 2 Short offline Completed: read failure 90% 3139
2879471688
# 3 Short offline Completed without error 00% 3049 -
# 4 Conveyance offline Completed without error 00% 2996 -
# 5 Short offline Completed without error 00% 2239 -
# 6 Extended offline Completed without error 00% 2238 -
# 7 Short offline Completed without error 00% 1550 -
# 8 Short offline Completed without error 00% 1550 -
# 9 Short offline Completed without error 00% 69 -
#10 Short offline Completed without error 00% 9 -
SMART Self-test log structure revision number 1
Num Test_Description Status Remaining
LifeTime(hours) LBA_of_first_error
# 1 Short offline Completed: read failure 90% 3139
2879471688
# 2 Short offline Completed: read failure 90% 3139
2879471688
# 3 Short offline Completed without error 00% 3049 -
# 4 Conveyance offline Completed without error 00% 2996 -
# 5 Short offline Completed without error 00% 2239 -
# 6 Extended offline Completed without error 00% 2238 -
# 7 Short offline Completed without error 00% 1550 -
# 8 Short offline Completed without error 00% 1550 -
# 9 Short offline Completed without error 00% 69 -
#10 Short offline Completed without error 00% 9 -
SMART Selective self-test log data structure revision number 1
SPAN MIN_LBA MAX_LBA CURRENT_TEST_STATUS
1 0 0 Not_testing
2 0 0 Not_testing
3 0 0 Not_testing
4 0 0 Not_testing
5 0 0 Not_testing
Selective self-test flags (0x0):
After scanning selected spans, do NOT read-scan remainder of disk.
If Selective self-test is pending on power-up, resume after 0 minute delay.
SCT Status Version: 3
SCT Version (vendor specific): 522 (0x020a)
SCT Support Level: 1
Device State: Active (0)
Current Temperature: 30 Celsius
Power Cycle Min/Max Temperature: 29/34 Celsius
Lifetime Min/Max Temperature: 9/42 Celsius
Under/Over Temperature Limit Count: 0/0
SCT Data Table command not supported
SCT Error Recovery Control command not supported
Device Statistics (GP/SMART Log 0x04) not supported
SATA Phy Event Counters (GP Log 0x11)
ID Size Value Description
0x000a 2 1 Device-to-host register FISes sent due to a COMRESET
0x0001 2 0 Command failed due to ICRC error
0x0003 2 0 R_ERR response for device-to-host data FIS
0x0004 2 0 R_ERR response for host-to-device data FIS
0x0006 2 0 R_ERR response for device-to-host non-data FIS
0x0007 2 0 R_ERR response for host-to-device non-data FIS
SMART info for /dev/sdh
=== START OF INFORMATION SECTION ===
Model Family: SAMSUNG SpinPoint F3
Device Model: SAMSUNG HD103SJ
Serial Number: S246JDWZ113593
LU WWN Device Id: 5 0024e9 002bf43c5
Firmware Version: 1AJ100E4
User Capacity: 1 000 204 886 016 bytes [1,00 TB]
Sector Size: 512 bytes logical/physical
Rotation Rate: 7200 rpm
Form Factor: 3.5 inches
Device is: In smartctl database [for details use: -P show]
ATA Version is: ATA8-ACS T13/1699-D revision 6
SATA Version is: SATA 2.6, 3.0 Gb/s
Local Time is: Mon Jul 13 07:53:49 2015 EEST
SMART support is: Available - device has SMART capability.
SMART support is: Enabled
AAM feature is: Disabled
APM feature is: Disabled
Rd look-ahead is: Enabled
Write cache is: Enabled
ATA Security is: Disabled, NOT FROZEN [SEC1]
Wt Cache Reorder: Enabled
=== START OF READ SMART DATA SECTION ===
SMART overall-health self-assessment test result: PASSED
General SMART Values:
Offline data collection status: (0x82) Offline data collection activity
was completed without error.
Auto Offline Data Collection: Enabled.
Self-test execution status: ( 0) The previous self-test routine completed
without error or no self-test has ever
been run.
Total time to complete Offline
data collection: ( 9420) seconds.
Offline data collection
capabilities: (0x5b) SMART execute Offline immediate.
Auto Offline data collection
on/off support.
Suspend Offline collection upon new
command.
Offline surface scan supported.
Self-test supported.
No Conveyance Self-test supported.
Selective Self-test supported.
SMART capabilities: (0x0003) Saves SMART data before entering
power-saving mode.
Supports SMART auto save timer.
Error logging capability: (0x01) Error logging supported.
General Purpose Logging supported.
Short self-test routine
recommended polling time: ( 2) minutes.
Extended self-test routine
recommended polling time: ( 157) minutes.
SCT capabilities: (0x003f) SCT Status supported.
SCT Error Recovery Control supported.
SCT Feature Control supported.
SCT Data Table supported.
SMART Attributes Data Structure revision number: 16
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME FLAGS VALUE WORST THRESH FAIL RAW_VALUE
1 Raw_Read_Error_Rate POSR-K 100 100 051 - 1
2 Throughput_Performance -OS--K 055 055 000 - 8621
3 Spin_Up_Time PO---K 073 071 025 - 8314
4 Start_Stop_Count -O--CK 091 091 000 - 9745
5 Reallocated_Sector_Ct PO--CK 252 252 010 - 0
7 Seek_Error_Rate -OSR-K 252 252 051 - 0
8 Seek_Time_Performance --S--K 252 252 015 - 0
9 Power_On_Hours -O--CK 100 100 000 - 20675
10 Spin_Retry_Count -O--CK 252 252 051 - 0
11 Calibration_Retry_Count -O--CK 252 252 000 - 0
12 Power_Cycle_Count -O--CK 097 097 000 - 3297
191 G-Sense_Error_Rate -O---K 100 100 000 - 42
192 Power-Off_Retract_Count -O---K 252 252 000 - 0
194 Temperature_Celsius -O---- 064 043 000 - 32 (Min/Max 4/57)
195 Hardware_ECC_Recovered -O-RCK 100 100 000 - 0
196 Reallocated_Event_Count -O--CK 252 252 000 - 0
197 Current_Pending_Sector -O--CK 252 252 000 - 0
198 Offline_Uncorrectable ----CK 252 252 000 - 0
199 UDMA_CRC_Error_Count -OS-CK 100 100 000 - 2
200 Multi_Zone_Error_Rate -O-R-K 100 100 000 - 101
223 Load_Retry_Count -O--CK 252 252 000 - 0
225 Load_Cycle_Count -O--CK 100 100 000 - 9897
||||||_ K auto-keep
|||||__ C event count
||||___ R error rate
|||____ S speed/performance
||_____ O updated online
|______ P prefailure warning
General Purpose Log Directory Version 1
SMART Log Directory Version 1 [multi-sector log support]
Address Access R/W Size Description
0x00 GPL,SL R/O 1 Log Directory
0x01 SL R/O 1 Summary SMART error log
0x02 SL R/O 2 Comprehensive SMART error log
0x03 GPL R/O 2 Ext. Comprehensive SMART error log
0x06 SL R/O 1 SMART self-test log
0x07 GPL R/O 2 Extended self-test log
0x08 GPL R/O 2 Power Conditions log
0x09 SL R/W 1 Selective self-test log
0x10 GPL R/O 1 SATA NCQ Queued Error log
0x11 GPL R/O 1 SATA Phy Event Counters log
0x80-0x9f GPL,SL R/W 16 Host vendor specific log
0xbb GPL VS 4 Device vendor specific log
0xbc GPL VS 2 Device vendor specific log
0xe0 GPL,SL R/W 1 SCT Command/Status
0xe1 GPL,SL R/W 1 SCT Data Transfer
SMART Extended Comprehensive Error Log Version: 1 (2 sectors)
Device Error Count: 2
CR = Command Register
FEATR = Features Register
COUNT = Count (was: Sector Count) Register
LBA_48 = Upper bytes of LBA High/Mid/Low Registers ] ATA-8
LH = LBA High (was: Cylinder High) Register ] LBA
LM = LBA Mid (was: Cylinder Low) Register ] Register
LL = LBA Low (was: Sector Number) Register ]
DV = Device (was: Device/Head) Register
DC = Device Control Register
ER = Error register
ST = Status register
Powered_Up_Time is measured from power on, and printed as
DDd+hh:mm:SS.sss where DD=days, hh=hours, mm=minutes,
SS=sec, and sss=millisec. It "wraps" after 49.710 days.
Error 2 [1] occurred at disk power-on lifetime: 4244 hours (176 days + 20 hours)
When the command that caused the error occurred, the device was active or idle.
After command completion occurred, registers were:
ER -- ST COUNT LBA_48 LH LM LL DV DC
-- -- -- == -- == == == -- -- -- -- --
84 -- 51 93 e8 00 00 00 00 00 00 e0 00 Error: ICRC, ABRT 37864
sectors at LBA = 0x00000000 = 0
Commands leading to the command that caused the error were:
CR FEATR COUNT LBA_48 LH LM LL DV DC Powered_Up_Time Command/Feature_Name
-- == -- == -- == == == -- -- -- -- -- --------------- --------------------
35 00 00 01 00 00 00 61 18 92 e8 e0 08 00:00:01.927 WRITE DMA EXT
25 00 00 01 00 00 00 1b ce e8 60 e0 08 00:00:01.927 READ DMA EXT
25 00 00 01 00 00 00 1b ce e7 60 e0 08 00:00:01.927 READ DMA EXT
25 00 00 01 00 00 00 1b ce e6 60 e0 08 00:00:01.927 READ DMA EXT
25 00 00 01 00 00 00 1b ce e5 60 e0 08 00:00:01.927 READ DMA EXT
Error 1 [0] occurred at disk power-on lifetime: 2234 hours (93 days + 2 hours)
When the command that caused the error occurred, the device was active or idle.
After command completion occurred, registers were:
ER -- ST COUNT LBA_48 LH LM LL DV DC
-- -- -- == -- == == == -- -- -- -- --
84 -- 51 e5 ee 00 00 00 00 00 00 e0 00 Error: ICRC, ABRT 58862
sectors at LBA = 0x00000000 = 0
Commands leading to the command that caused the error were:
CR FEATR COUNT LBA_48 LH LM LL DV DC Powered_Up_Time Command/Feature_Name
-- == -- == -- == == == -- -- -- -- -- --------------- --------------------
35 00 00 00 06 00 00 00 35 e5 e8 e0 08 00:00:17.173 WRITE DMA EXT
35 00 00 00 08 00 00 06 d5 77 10 e0 08 00:00:17.173 WRITE DMA EXT
35 00 00 00 03 00 00 00 82 12 48 e0 08 00:00:17.173 WRITE DMA EXT
35 00 00 00 07 00 00 06 d5 77 10 e0 08 00:00:17.171 WRITE DMA EXT
35 00 00 00 03 00 00 00 82 12 48 e0 08 00:00:17.171 WRITE DMA EXT
SMART Error Log Version: 1
No Errors Logged
SMART Extended Self-test Log Version: 1 (2 sectors)
Num Test_Description Status Remaining
LifeTime(hours) LBA_of_first_error
# 1 Short offline Completed without error 00% 20661 -
# 2 Extended offline Completed without error 00% 19724 -
# 3 Short offline Completed without error 00% 19721 -
# 4 Short offline Aborted by host 90% 19404 -
# 5 Short offline Completed without error 00% 18910 -
# 6 Short offline Completed without error 00% 15792 -
# 7 Short offline Completed without error 00% 15792 -
SMART Self-test log structure revision number 1
Num Test_Description Status Remaining
LifeTime(hours) LBA_of_first_error
# 1 Short offline Completed without error 00% 20661 -
# 2 Extended offline Completed without error 00% 19724 -
# 3 Short offline Completed without error 00% 19721 -
# 4 Short offline Aborted by host 90% 19404 -
# 5 Short offline Completed without error 00% 18910 -
# 6 Short offline Completed without error 00% 15792 -
# 7 Short offline Completed without error 00% 15792 -
SMART Selective self-test log data structure revision number 0
Note: revision number not 1 implies that no selective self-test has
ever been run
SPAN MIN_LBA MAX_LBA CURRENT_TEST_STATUS
1 0 0 Completed [00% left] (0-65535)
2 0 0 Not_testing
3 0 0 Not_testing
4 0 0 Not_testing
5 0 0 Not_testing
Selective self-test flags (0x0):
After scanning selected spans, do NOT read-scan remainder of disk.
If Selective self-test is pending on power-up, resume after 0 minute delay.
SCT Status Version: 2
SCT Version (vendor specific): 256 (0x0100)
SCT Support Level: 1
Device State: Active (0)
Current Temperature: 32 Celsius
Power Cycle Min/Max Temperature: 24/38 Celsius
Lifetime Min/Max Temperature: 7/57 Celsius
Under/Over Temperature Limit Count: 0/0
SCT Temperature History Version: 2
Temperature Sampling Period: 5 minutes
Temperature Logging Interval: 5 minutes
Min/Max recommended Temperature: -5/80 Celsius
Min/Max Temperature Limit: -10/85 Celsius
Temperature History Size (Index): 128 (106)
Index Estimated Time Temperature Celsius
107 2015-07-12 21:15 35 ****************
108 2015-07-12 21:20 34 ***************
105 2015-07-13 07:45 33 **************
106 2015-07-13 07:50 32 *************
SCT Error Recovery Control:
Read: Disabled
Write: Disabled
Device Statistics (GP/SMART Log 0x04) not supported
SATA Phy Event Counters (GP Log 0x11)
ID Size Value Description
0x0001 4 0 Command failed due to ICRC error
0x0002 4 0 R_ERR response for data FIS
0x0003 4 0 R_ERR response for device-to-host data FIS
0x0004 4 0 R_ERR response for host-to-device data FIS
0x0005 4 0 R_ERR response for non-data FIS
0x0006 4 0 R_ERR response for device-to-host non-data FIS
0x0007 4 0 R_ERR response for host-to-device non-data FIS
0x0008 4 0 Device-to-host non-data FIS retries
0x0009 4 1 Transition from drive PhyRdy to drive PhyNRdy
0x000a 4 2 Device-to-host register FISes sent due to a COMRESET
0x000b 4 0 CRC errors within host-to-device FIS
0x000d 4 0 Non-CRC errors within host-to-device FIS
0x000f 4 0 R_ERR response for host-to-device data FIS, CRC
0x0010 4 0 R_ERR response for host-to-device data FIS, non-CRC
0x0012 4 0 R_ERR response for host-to-device non-data FIS, CRC
0x0013 4 0 R_ERR response for host-to-device non-data FIS, non-CRC
0x8e00 4 0 Vendor specific
0x8e01 4 0 Vendor specific
0x8e02 4 0 Vendor specific
0x8e03 4 0 Vendor specific
0x8e04 4 0 Vendor specific
0x8e05 4 0 Vendor specific
0x8e06 4 0 Vendor specific
0x8e07 4 0 Vendor specific
0x8e08 4 0 Vendor specific
0x8e09 4 0 Vendor specific
0x8e0a 4 0 Vendor specific
0x8e0b 4 0 Vendor specific
0x8e0c 4 0 Vendor specific
0x8e0d 4 0 Vendor specific
0x8e0e 4 0 Vendor specific
0x8e0f 4 0 Vendor specific
0x8e10 4 0 Vendor specific
0x8e11 4 0 Vendor specific
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html