Re: [PATCH] Properly init address_space->writeback_index | |
| [Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] | |
On Thu, 14 Aug 2008 14:13:40 -0400
Chris Mason <chris.mason@xxxxxxxxxx> wrote:
> write_cache_pages uses i_mapping->writeback_index to pick up where it
> left off the last time a given inode was found by pdflush or
> balance_dirty_pages (or anyone else who sets wbc->range_cyclic)
>
> alloc_inode should set it to a sane value so that writeback doesn't start
> in the middle of a file. It is somewhat difficult to notice the bug since
> write_cache_pages will loop around to the start of the file and the
> elevator helps hide the resulting seeks.
>
> For whatever reason, Btrfs hits this often. Unpatched, untarring 30 copies of
> the linux kernel in series runs at 47MB/s on a single sata drive. With
> this fix, it jumps to 62MB/s.
>
> Signed-off-by: Chris Mason <chris.mason@xxxxxxxxxx>
>
> diff -r 67160ae21a58 fs/inode.c
> --- a/fs/inode.c Wed Aug 06 19:26:20 2008 -0700
> +++ b/fs/inode.c Thu Aug 14 10:15:49 2008 -0400
> @@ -166,6 +166,7 @@ static struct inode *alloc_inode(struct
> mapping_set_gfp_mask(mapping, GFP_HIGHUSER_PAGECACHE);
> mapping->assoc_mapping = NULL;
> mapping->backing_dev_info = &default_backing_dev_info;
> + mapping->writeback_index = 0;
>
> /*
> * If the block_device provides a backing_dev_info for client
>
inode_cachep's ctor memsets the whole inode* to zero.
Does btrfs have an ->alloc_inode handler? If so, perhaps it is btrfs
which needs to zap that field.
Formally, the way slab works (which is arguably silly from a CPU cache
efficiency POV) is that the object should be returned to slab
(kmem_cache_free()) in a "constructed" manner. Which means that this
zeroing should happen in destroy_inode().
otoh, the fs/inode.c inode creation/destruction code isn't designed
that way - it chose to initialise everything synchronously in
alloc_inode(), which is arguably wrong.
One wonders if that big memset in inode_init_once() actually does
anything useful.
Ho hum, so what to do? I'm suspecting that we should just zap the
kmem_cachep constructor altogether, use a synchronous call to
inode_init_once() to initialise all inode fields and then remove all
those open-coded zeroings.
Something like this?
From: Andrew Morton <akpm@xxxxxxxxxxxxxxxxxxxx>
Signed-off-by: Andrew Morton <akpm@xxxxxxxxxxxxxxxxxxxx>
---
fs/inode.c | 40 +++++++++-------------------------------
1 file changed, 9 insertions(+), 31 deletions(-)
diff -puN fs/inode.c~a fs/inode.c
--- a/fs/inode.c~a
+++ a/fs/inode.c
@@ -115,34 +115,24 @@ static struct inode *alloc_inode(struct
static const struct file_operations empty_fops;
struct inode *inode;
- if (sb->s_op->alloc_inode)
+ if (sb->s_op->alloc_inode) {
inode = sb->s_op->alloc_inode(sb);
- else
- inode = (struct inode *) kmem_cache_alloc(inode_cachep, GFP_KERNEL);
+ } else {
+ inode = kmem_cache_alloc(inode_cachep, GFP_KERNEL);
+ if (inode)
+ inode_init_once(inode);
+ }
if (inode) {
struct address_space * const mapping = &inode->i_data;
inode->i_sb = sb;
inode->i_blkbits = sb->s_blocksize_bits;
- inode->i_flags = 0;
atomic_set(&inode->i_count, 1);
inode->i_op = &empty_iops;
inode->i_fop = &empty_fops;
inode->i_nlink = 1;
atomic_set(&inode->i_writecount, 0);
- inode->i_size = 0;
- inode->i_blocks = 0;
- inode->i_bytes = 0;
- inode->i_generation = 0;
-#ifdef CONFIG_QUOTA
- memset(&inode->i_dquot, 0, sizeof(inode->i_dquot));
-#endif
- inode->i_pipe = NULL;
- inode->i_bdev = NULL;
- inode->i_cdev = NULL;
- inode->i_rdev = 0;
- inode->dirtied_when = 0;
if (security_inode_alloc(inode)) {
if (inode->i_sb->s_op->destroy_inode)
inode->i_sb->s_op->destroy_inode(inode);
@@ -158,15 +148,13 @@ static struct inode *alloc_inode(struct
lockdep_set_class(&inode->i_mutex, &sb->s_type->i_mutex_key);
init_rwsem(&inode->i_alloc_sem);
- lockdep_set_class(&inode->i_alloc_sem, &sb->s_type->i_alloc_sem_key);
+ lockdep_set_class(&inode->i_alloc_sem,
+ &sb->s_type->i_alloc_sem_key);
mapping->a_ops = &empty_aops;
mapping->host = inode;
- mapping->flags = 0;
mapping_set_gfp_mask(mapping, GFP_HIGHUSER_PAGECACHE);
- mapping->assoc_mapping = NULL;
mapping->backing_dev_info = &default_backing_dev_info;
- mapping->writeback_index = 0;
/*
* If the block_device provides a backing_dev_info for client
@@ -181,7 +169,6 @@ static struct inode *alloc_inode(struct
bdi = sb->s_bdev->bd_inode->i_mapping->backing_dev_info;
mapping->backing_dev_info = bdi;
}
- inode->i_private = NULL;
inode->i_mapping = mapping;
}
return inode;
@@ -197,7 +184,6 @@ void destroy_inode(struct inode *inode)
kmem_cache_free(inode_cachep, (inode));
}
-
/*
* These are initializations that only need to be done
* once, because the fields are idempotent across use
@@ -222,16 +208,8 @@ void inode_init_once(struct inode *inode
mutex_init(&inode->inotify_mutex);
#endif
}
-
EXPORT_SYMBOL(inode_init_once);
-static void init_once(void *foo)
-{
- struct inode * inode = (struct inode *) foo;
-
- inode_init_once(inode);
-}
-
/*
* inode_lock must be held
*/
@@ -1400,7 +1378,7 @@ void __init inode_init(void)
0,
(SLAB_RECLAIM_ACCOUNT|SLAB_PANIC|
SLAB_MEM_SPREAD),
- init_once);
+ NULL);
register_shrinker(&icache_shrinker);
/* Hash may have been set up in inode_init_early */
_
Note that this assumes that a filesystem which implements
->alloc_inode() will call inode_init_once() within its ->alloc_inode().
Which means, I think, that we can just remove inode_init_once()
altogether and move its initialisations into alloc_inode() along with
all the existing ones.
What do you think?
I suppose that's all a fun project but doesn't preclude merging your
patchlet. I'll do that for now ;)
--
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
[Site Home] [Kernel Newbies] [UNIX Filesystems] [Share Photos] [Security] [Netfilter] [Bugtraq] [Rubini] [Photo] [Yosemite] [Yosemite News] [MIPS Linux] [ARM Linux] [Linux Security] [Linux Ext4 Filesystem] [Linux Cachefs] [Reiser Filesystem] [Linux RAID] [Samba] [Video 4 Linux] [Device Mapper] [Linux Resources]
![]() |