On Fri, Feb 15, 2019 at 09:18:03PM +0800, Qu Wenruo wrote: > > > On 2019/2/15 下午9:10, Nikolay Borisov wrote: > > > > > > On 15.02.19 г. 12:50 ч., Qu Wenruo wrote: > >> Patchset can be fetched from github: > >> https://github.com/adam900710/linux/tree/write_time_tree_checker > >> Which is based on v5.0-rc1 tag. > >> Also there is no conflict rebasing the patchset to misc-next. > >> > >> This patchset has the following 3 features: > >> - Tree block validation output enhancement > >> * Output validation failure timing (write time or read time) > >> * Always output tree block level/key mismatch error message > >> This part is already submitted and reviewed. > >> > >> - Write time tree block validation check > >> To catch memory corruption either from hardware or kernel. > >> Example output would be: > >> > >> BTRFS critical (device dm-3): corrupt leaf: root=2 block=1350630375424 slot=68, bad key order, prev (10510212874240 169 0) current (1714119868416 169 0) > >> BTRFS error (device dm-3): write time tree block corruption detected > > This is not good. Those two error messages should be collapsed into > > one. Otherwise it's hard to actually match them up. > > That shouldn't be a problem, since the error won't happen so frequently > there is no other error message that could interrupt these 2 lines. > > > Better output will > > be "Corrupt leaf detected during writing: root=..." and eliminate "write > > time tree block corruption detected" line. Is that feasible? > > Feasible, currently tree checker only get called in 3 locations: > 1) read time full checker > 2) mark dirty time basic checker > 3) write time full checker > > And they all have different internal bool to indicate the timing, so > it's possible to output the timing. > > But that needs to pass the internal bool down a long long way, for all > the output help to accept an extra string. > I'm not a big fan for that, and prefer a timing neutral tree checker. I'd rather not merge the error messages, as we'll possibly add more sanity checks to various functions so there could be a list of problems and there's one final note about when it happened (read time/write time). Matching the lines together is desirable though, so if the block number could be part of all messages, I hope this makes it usable for analysis. Reading btree_readpage_end_io_hook, the message should be under the err: label, as there are 3 other possible messages printed (bad block start, fsid and level).
