Re: Kernel config option which causes reiser4 to be instable

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On 12/13/2012 07:56 PM, Ivan Shapovalov wrote:
On 12 December 2012 07:23:53 Ivan Shapovalov wrote:
On 11 December 2012 22:49:47 Ivan Shapovalov wrote:
On 11 December 2012 19:33:39 Edward Shishkin wrote:
On 12/11/2012 04:08 PM, Ivan Shapovalov wrote:
Hello!
Hello.

With help of Dušan Čolić <dusanc@xxxxxxxxx> who provided his kernel
config
diff I've found a kernel option which, when disabled, greatly reduces
(hopefully to zero, but need time to verify it) corruption rate in
reiser4.

It's CONFIG_TRANSPARENT_HUGEPAGE (or something which is used by it
like
CONFIG_COMPACTION or CONFIG_MIGRATION).
For now I'm testing it with CONFIG_TRANSPARENT_HUGEPAGE disabled
How long?
12 hours of indexing, scanning, compiling, repeated execution of
"find <mountpoint> -type f -exec grep wtf {} \;" and so on.

   on kernel

3.6.10, and everything seems to be OK so far (so the workaround is
version-
agnostic).

Edward, are there any guesses on what can make reiser4 choke on
hugepages/compaction/migration?
TBH, no ideas. They (hugepages) are _transparent_.
It means we shouldn't suffer in theory ;)
Maybe it's actually migration who does the damage? If we don't lock the
pages properly and they are "stolen" by the migration code... If this is
the case, I shall eventually get corruptions with current setup (since
migration/compaction is not disabled).
If I get them, I'll rebuild without migration at all and will see if
corruptions disappear completely. (Then they should disappear, if the
prediction is true.)
...So, the kernel did not pass the overnight testing with usual errors of
"cluster corrupted" and etc (which is just as planned).

I'm now rebuilding without CONFIG_COMPACTION and CONFIG_MIGRATION.
So far the kernel built without CONFIG_MIGRATION worked flawless. I gave it
double testing time compared to the previous attempt - that is, 2 days.

Regarding the actual solution (as plainly disabling kernel features doesn't
count as one):

I have a guess that the problem is related to default ->migratepage() of
struct address_space_operations (which is not no-op, but a "generic"
implementation by default).

Hmm, I didn't know about this new aop :(

Right now I can not surely say, that it is the default ->migratepage(),
who caused corruptions, however quick look showed, that it works
incorrectly: reiser4_writepage() doesn't necessarily make page clean.
So, yes, it would be better to disable migration for our mappings for
now..

Thank you for the finding!

Edward.


So I've just attempted to "quickfix" the problem by explicitly setting the
said pointer to fail_migrate_page and building 3.7.0 with all three
migration-related options enabled. I'll let the new kernel to work overnight
to see if it indeed fixes The Problem.

Attaching the reiser4 patch for 3.7 (just rebased the one for 3.6 against new
kernel version, no apparent API changes spotted by me) and that quickfix one-
liner (completely untested as of now).

Thanks,
Ivan.

   I'm not even barely familiar with the kernel

internals.

Thanks,
Ivan.

--
To unsubscribe from this list: send the line "unsubscribe reiserfs-devel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[Index of Archives]     [Linux File System Development]     [Linux BTRFS]     [Linux NFS]     [Linux Filesystems]     [Ext4 Filesystem]     [Kernel Newbies]     [Share Photos]     [Security]     [Netfilter]     [Bugtraq]     [Yosemite Forum]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Samba]     [Device Mapper]     [Linux Resources]

  Powered by Linux