On Mon, Jan 30, 2012 at 3:41 PM, Vincent Vanackere
<vincent.vanackere@xxxxxxxxx> wrote:
> On Wed, Jan 25, 2012 at 20:03, Mitch Harder
> <mitch.harder@xxxxxxxxxxxxxxxx> wrote:
>> A user has encountered a NULL pointer kernel oops in btrfs when
>> encountering media errors. The problem has been identified
>> as an unhandled NULL pointer returned from find_get_page().
>> This modification simply checks for a NULL page, and returns
>> with an error if found (the extent_range_uptodate() function
>> returns 1 on errors).
>>
>> After testing this patch, the user reported that the error with
>> the NULL pointer oops was solved. However, there is still a
>> remaining problem with a thread becoming stuck in
>> wait_on_page_locked(page) in the read_extent_buffer_pages(...)
>> function in extent_io.c
>>
>> for (i = start_i; i < num_pages; i++) {
>> page = extent_buffer_page(eb, i);
>> wait_on_page_locked(page);
>> if (!PageUptodate(page))
>> ret = -EIO;
>> }
>>
>> This patch leaves the issue with the locked page yet to be resolved.
>>
>> Signed-off-by: Mitch Harder <mitch.harder@xxxxxxxxxxxxxxxx>
>> ---
>> fs/btrfs/extent_io.c | 2 ++
>> 1 files changed, 2 insertions(+), 0 deletions(-)
>>
>> diff --git a/fs/btrfs/extent_io.c b/fs/btrfs/extent_io.c
>> index 9d09a4f..fcf77e1 100644
>> --- a/fs/btrfs/extent_io.c
>> +++ b/fs/btrfs/extent_io.c
>> @@ -3909,6 +3909,8 @@ int extent_range_uptodate(struct extent_io_tree *tree,
>> while (start <= end) {
>> index = start >> PAGE_CACHE_SHIFT;
>> page = find_get_page(tree->mapping, index);
>> + if (!page)
>> + return 1;
>> uptodate = PageUptodate(page);
>> page_cache_release(page);
>> if (!uptodate) {
>> --
>> 1.7.3.4
>>
>
>
> Hi,
>
> If any btrfs developer could have a look at it while I can still
> reproduce the situation (it won't last long, I'll send the disk to RMA
> next week), I'm still interested in solving the remaining part of the
> btrfs bug. Here is the trace I get with the current linux kernel
> (6bc2b95ee602659c1be6fac0f6aadeb0c5c29a5d) :
>
> [ 330.530015] btrfs bad tree block start 959241011200 959241011200
> [ 480.288046] INFO: task cat:2627 blocked for more than 120 seconds.
> [ 480.288050] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs"
> disables this message.
> [ 480.288052] cat D ffffffff8180c600 0 2627 2468 0x00000004
> [ 480.288057] ffff8801fe135618 0000000000000086 ffff8801fe1355d8
> ffff880222061650
> [ 480.288062] ffff880215b5db80 ffff8801fe135fd8 ffff8801fe135fd8
> ffff8801fe135fd8
> [ 480.288067] ffff8802241a16e0 ffff880215b5db80 ffff8801fe1355e8
> ffff88022fd93e88
> [ 480.288071] Call Trace:
> [ 480.288080] [<ffffffff81114440>] ? __lock_page+0x70/0x70
> [ 480.288084] [<ffffffff8162c0af>] schedule+0x3f/0x60
> [ 480.288087] [<ffffffff8162c15f>] io_schedule+0x8f/0xd0
> [ 480.288091] [<ffffffff8111444e>] sleep_on_page+0xe/0x20
> [ 480.288094] [<ffffffff8162a96f>] __wait_on_bit+0x5f/0x90
> [ 480.288098] [<ffffffff811145b8>] wait_on_page_bit+0x78/0x80
> [ 480.288102] [<ffffffff81070c70>] ? autoremove_wake_function+0x40/0x40
> [ 480.288129] [<ffffffffa005d161>]
> read_extent_buffer_pages+0x471/0x4d0 [btrfs]
> [ 480.288142] [<ffffffffa00347b0>] ? verify_parent_transid+0x160/0x160 [btrfs]
> [ 480.288155] [<ffffffffa003513a>]
> btree_read_extent_buffer_pages.isra.99+0x8a/0xc0 [btrfs]
> [ 480.288169] [<ffffffffa00371e1>] read_tree_block+0x41/0x60 [btrfs]
> [ 480.288179] [<ffffffffa001d6a3>]
> read_block_for_search.isra.34+0xf3/0x3d0 [btrfs]
> [ 480.288190] [<ffffffffa001f930>] btrfs_search_slot+0x300/0x8a0 [btrfs]
> [ 480.288203] [<ffffffffa0031ab4>] btrfs_lookup_csum+0x74/0x170 [btrfs]
> [ 480.288216] [<ffffffffa0031d5f>] __btrfs_lookup_bio_sums+0x1af/0x3b0 [btrfs]
> [ 480.288228] [<ffffffffa0031fb6>] btrfs_lookup_bio_sums+0x16/0x20 [btrfs]
> [ 480.288242] [<ffffffffa003e650>] btrfs_submit_bio_hook+0x140/0x170 [btrfs]
> [ 480.288256] [<ffffffffa00405d0>] ? btrfs_real_readdir+0x720/0x720 [btrfs]
> [ 480.288272] [<ffffffffa00571aa>] submit_one_bio+0x6a/0xa0 [btrfs]
> [ 480.288287] [<ffffffffa005be64>] extent_readpages+0xe4/0x100 [btrfs]
> [ 480.288301] [<ffffffffa00405d0>] ? btrfs_real_readdir+0x720/0x720 [btrfs]
> [ 480.288315] [<ffffffffa003eebf>] btrfs_readpages+0x1f/0x30 [btrfs]
> [ 480.288319] [<ffffffff81120bef>] __do_page_cache_readahead+0x1af/0x250
> [ 480.288323] [<ffffffff81120ff1>] ra_submit+0x21/0x30
> [ 480.288326] [<ffffffff81121115>] ondemand_readahead+0x115/0x230
> [ 480.288330] [<ffffffff81137eb9>] ? __do_fault+0x419/0x530
> [ 480.288333] [<ffffffff81121311>] page_cache_sync_readahead+0x31/0x50
> [ 480.288337] [<ffffffff811167d8>] generic_file_aio_read+0x438/0x780
> [ 480.288342] [<ffffffff81173db2>] do_sync_read+0xd2/0x110
> [ 480.288346] [<ffffffff81294113>] ? security_file_permission+0x93/0xb0
> [ 480.288349] [<ffffffff81174231>] ? rw_verify_area+0x61/0xf0
> [ 480.288352] [<ffffffff81174710>] vfs_read+0xb0/0x180
> [ 480.288355] [<ffffffff8117482a>] sys_read+0x4a/0x90
> [ 480.288359] [<ffffffff81635229>] system_call_fastpath+0x16/0x1b
Jeff Mahoney has been working on a large overhaul of error
handling/BUG_ONs. It is difficult to say when it will be ready, or
if it will even address this specific problem.
I'd go ahead and return the disk. I doubt you'll be the last user to
have bad sectors, so there'll be more opportunities to see how this
issue is handled after the changes to error handling.
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html