Hello guys, i like btrfs, and i want put it in production soon, one of the feature that i want use, is a deduplication. i frequently testing duperemove on btrfs and already see this problem before. i know what btrfs before, change mtime while deduping, but after dedup fixes from Mark (https://github.com/markfasheh), i've try to get checksums. As i know duperemove use kernel ioctl for deduping, i.e. it's not a duperemove issue, kernel must keep data consistent. File system is fresh and btrfs check not show any metadata corruption. Github issue: https://github.com/markfasheh/duperemove/issues/91 System info: $ uname -a Linux titovetst-beplan 4.2.0-rc8-next-20150825-0959-ARCH #1 SMP Wed Aug 26 10:27:18 MSK 2015 x86_64 GNU/Linux Mount options: rw,relatime,compress=lzo,space_cache,subvolid=257,subvol=/@home Okay, how i find it: md5sum_recursive(){ find $@ -type f -exec md5sum {} \; } cp -av --reflink=always ~/<src> ~/<dest> md5sum_recursive ~/<dest> > ~/dedup.before duperemove -vhrdb 8k ~/<dest> md5sum_recursive ~/<dest> > ~/dedup.after diff -up ~/dedup.before ~/dedup.after what i've got (full diff in attach): --- /home/nefelim4ag/dedup.after 2015-08-26 21:36:55.773452558 +0300 +++ /home/nefelim4ag/dedup.before 2015-08-26 21:21:01.203600761 +0300 @@ -25139,9 +25139,9 @@ caf9d41036e46b85d90a9541e8bc9ce1 /home/ .... -0ccbc9c81a51f59dcf2ac0d102de37cb /home/nefelim4ag/L4D2/left4dead2/pak01_003.vpk +e665b502ee977dc1c619ecbd415c91b8 /home/nefelim4ag/L4D2/left4dead2/pak01_000.vpk .... Files sizes not changed and it's > 1MB. Every time i've get a random data corruption. Only dependencies what i've find it is what smallest block -> more corruptions and vise versa, i.e. more data deduped -> more corrupted. Smart of the disk, it's not looks, like damaged. (attach) What i can provide to help fix this issue? If it's needed, i can recompile kernel with some parameters if it can help, of course. Thanks. -- Have a nice day, Timofey.
Attachment:
diff.dedup
Description: Binary data
smartctl 6.4 2015-06-04 r4109 [x86_64-linux-4.2.0-rc8-next-20150825-0959-ARCH] (local build)
Copyright (C) 2002-15, Bruce Allen, Christian Franke, www.smartmontools.org
=== START OF INFORMATION SECTION ===
Model Family: Western Digital Blue Mobile
Device Model: WDC WD10JPCX-24UE4T0
Serial Number: WD-WX61AC3J6551
LU WWN Device Id: 5 0014ee 6599e2c1a
Firmware Version: 01.01A01
User Capacity: 1,000,204,886,016 bytes [1.00 TB]
Sector Sizes: 512 bytes logical, 4096 bytes physical
Rotation Rate: 5400 rpm
Device is: In smartctl database [for details use: -P show]
ATA Version is: ACS-2 (minor revision not indicated)
SATA Version is: SATA 3.0, 6.0 Gb/s (current: 6.0 Gb/s)
Local Time is: Wed Aug 26 22:28:49 2015 MSK
SMART support is: Available - device has SMART capability.
SMART support is: Enabled
AAM feature is: Unavailable
APM level is: 254 (maximum performance)
Rd look-ahead is: Enabled
Write cache is: Enabled
ATA Security is: Disabled, NOT FROZEN [SEC1]
Wt Cache Reorder: Enabled
=== START OF READ SMART DATA SECTION ===
SMART overall-health self-assessment test result: PASSED
General SMART Values:
Offline data collection status: (0x00) Offline data collection activity
was never started.
Auto Offline Data Collection: Disabled.
Self-test execution status: ( 0) The previous self-test routine completed
without error or no self-test has ever
been run.
Total time to complete Offline
data collection: (18480) seconds.
Offline data collection
capabilities: (0x7b) SMART execute Offline immediate.
Auto Offline data collection on/off support.
Suspend Offline collection upon new
command.
Offline surface scan supported.
Self-test supported.
Conveyance Self-test supported.
Selective Self-test supported.
SMART capabilities: (0x0003) Saves SMART data before entering
power-saving mode.
Supports SMART auto save timer.
Error logging capability: (0x01) Error logging supported.
General Purpose Logging supported.
Short self-test routine
recommended polling time: ( 2) minutes.
Extended self-test routine
recommended polling time: ( 207) minutes.
Conveyance self-test routine
recommended polling time: ( 5) minutes.
SCT capabilities: (0x7035) SCT Status supported.
SCT Feature Control supported.
SCT Data Table supported.
SMART Attributes Data Structure revision number: 16
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME FLAGS VALUE WORST THRESH FAIL RAW_VALUE
1 Raw_Read_Error_Rate POSR-K 200 200 051 - 0
3 Spin_Up_Time POS--K 182 178 021 - 1883
4 Start_Stop_Count -O--CK 095 095 000 - 5413
5 Reallocated_Sector_Ct PO--CK 200 200 140 - 0
7 Seek_Error_Rate POSR-K 200 200 051 - 0
9 Power_On_Hours -O--CK 091 091 000 - 7153
10 Spin_Retry_Count -O--CK 100 100 000 - 0
11 Calibration_Retry_Count -O--CK 100 100 000 - 0
12 Power_Cycle_Count -O--CK 099 099 000 - 1340
192 Power-Off_Retract_Count -O--CK 200 200 000 - 190
193 Load_Cycle_Count -O--CK 191 191 000 - 28327
194 Temperature_Celsius -O---K 093 085 000 - 54
196 Reallocated_Event_Count -O--CK 200 200 000 - 0
197 Current_Pending_Sector -O--CK 200 200 000 - 0
198 Offline_Uncorrectable ----CK 100 253 000 - 0
199 UDMA_CRC_Error_Count -O--CK 200 200 000 - 0
200 Multi_Zone_Error_Rate ---R-- 100 253 000 - 0
240 Head_Flying_Hours -O--CK 091 091 000 - 6968
||||||_ K auto-keep
|||||__ C event count
||||___ R error rate
|||____ S speed/performance
||_____ O updated online
|______ P prefailure warning
General Purpose Log Directory Version 1
SMART Log Directory Version 1 [multi-sector log support]
Address Access R/W Size Description
0x00 GPL,SL R/O 1 Log Directory
0x01 SL R/O 1 Summary SMART error log
0x02 SL R/O 5 Comprehensive SMART error log
0x03 GPL R/O 6 Ext. Comprehensive SMART error log
0x06 SL R/O 1 SMART self-test log
0x07 GPL R/O 1 Extended self-test log
0x09 SL R/W 1 Selective self-test log
0x10 GPL R/O 1 SATA NCQ Queued Error log
0x11 GPL R/O 1 SATA Phy Event Counters log
0x80-0x9f GPL,SL R/W 16 Host vendor specific log
0xa0-0xa7 GPL,SL VS 16 Device vendor specific log
0xa8-0xb6 GPL,SL VS 1 Device vendor specific log
0xb7 GPL,SL VS 38 Device vendor specific log
0xbd GPL,SL VS 1 Device vendor specific log
0xc0 GPL,SL VS 1 Device vendor specific log
0xc1 GPL VS 93 Device vendor specific log
0xe0 GPL,SL R/W 1 SCT Command/Status
0xe1 GPL,SL R/W 1 SCT Data Transfer
SMART Extended Comprehensive Error Log Version: 1 (6 sectors)
No Errors Logged
SMART Extended Self-test Log Version: 1 (1 sectors)
Num Test_Description Status Remaining LifeTime(hours) LBA_of_first_error
# 1 Extended offline Aborted by host 70% 78 -
SMART Selective self-test log data structure revision number 1
SPAN MIN_LBA MAX_LBA CURRENT_TEST_STATUS
1 0 0 Not_testing
2 0 0 Not_testing
3 0 0 Not_testing
4 0 0 Not_testing
5 0 0 Not_testing
Selective self-test flags (0x0):
After scanning selected spans, do NOT read-scan remainder of disk.
If Selective self-test is pending on power-up, resume after 0 minute delay.
SCT Status Version: 3
SCT Version (vendor specific): 258 (0x0102)
SCT Support Level: 1
Device State: Active (0)
Current Temperature: 54 Celsius
Power Cycle Min/Max Temperature: 30/55 Celsius
Lifetime Min/Max Temperature: 17/62 Celsius
Lifetime Average Temperature: 37 Celsius
Under/Over Temperature Limit Count: 0/0
Vendor specific:
01 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
SCT Temperature History Version: 2
Temperature Sampling Period: 1 minute
Temperature Logging Interval: 1 minute
Min/Max recommended Temperature: 0/60 Celsius
Min/Max Temperature Limit: -41/85 Celsius
Temperature History Size (Index): 128 (37)
Index Estimated Time Temperature Celsius
38 2015-08-26 20:21 46 ***************************
39 2015-08-26 20:22 46 ***************************
40 2015-08-26 20:23 46 ***************************
41 2015-08-26 20:24 47 ****************************
... ..( 5 skipped). .. ****************************
47 2015-08-26 20:30 47 ****************************
48 2015-08-26 20:31 48 *****************************
... ..( 6 skipped). .. *****************************
55 2015-08-26 20:38 48 *****************************
56 2015-08-26 20:39 49 ******************************
... ..( 4 skipped). .. ******************************
61 2015-08-26 20:44 49 ******************************
62 2015-08-26 20:45 50 *******************************
... ..( 17 skipped). .. *******************************
80 2015-08-26 21:03 50 *******************************
81 2015-08-26 21:04 51 ********************************
... ..( 11 skipped). .. ********************************
93 2015-08-26 21:16 51 ********************************
94 2015-08-26 21:17 52 *********************************
95 2015-08-26 21:18 52 *********************************
96 2015-08-26 21:19 52 *********************************
97 2015-08-26 21:20 53 **********************************
... ..( 2 skipped). .. **********************************
100 2015-08-26 21:23 53 **********************************
101 2015-08-26 21:24 54 ***********************************
... ..( 9 skipped). .. ***********************************
111 2015-08-26 21:34 54 ***********************************
112 2015-08-26 21:35 55 ************************************
... ..( 15 skipped). .. ************************************
0 2015-08-26 21:51 55 ************************************
1 2015-08-26 21:52 54 ***********************************
... ..( 35 skipped). .. ***********************************
37 2015-08-26 22:28 54 ***********************************
SCT Error Recovery Control command not supported
Device Statistics (GP/SMART Log 0x04) not supported
SATA Phy Event Counters (GP Log 0x11)
ID Size Value Description
0x0001 2 0 Command failed due to ICRC error
0x0002 2 0 R_ERR response for data FIS
0x0003 2 0 R_ERR response for device-to-host data FIS
0x0004 2 0 R_ERR response for host-to-device data FIS
0x0005 2 0 R_ERR response for non-data FIS
0x0006 2 0 R_ERR response for device-to-host non-data FIS
0x0007 2 0 R_ERR response for host-to-device non-data FIS
0x0008 2 0 Device-to-host non-data FIS retries
0x0009 2 31 Transition from drive PhyRdy to drive PhyNRdy
0x000a 2 1 Device-to-host register FISes sent due to a COMRESET
0x000b 2 0 CRC errors within host-to-device FIS
0x000f 2 0 R_ERR response for host-to-device data FIS, CRC
0x0012 2 0 R_ERR response for host-to-device non-data FIS, CRC
0x8000 4 9736 Vendor specific
