Re: [PATCH V2] btrfs: implement delayed inode items operation

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hi,

a few code comments below.

On Thu, Feb 17, 2011 at 01:48:40PM +0800, Miao Xie wrote:
> Compare with Ext3/4, the performance of file creation and deletion on btrfs
> is very poor. the reason is that btrfs must do a lot of b+ tree insertions,
> such as inode item, directory name item, directory name index and so on.
> 
> If we can do some delayed b+ tree insertion or deletion, we can improve the
> performance, so we made this patch which implemented delayed directory name
> index insertion/deletion and delayed inode update.
> 
> Implementation:
> - introduce a delayed root object into the filesystem, that use a list to
>   manage all the delayed node which is create for every file/directory.
                           ^^^^^^^^^^^^^^^^^^^^
(should this serve as a changelog entry):
                           nodes which are created
> - Every delayed node has two rb-tree, one is used to manage the directory name
>   index which is going to be inserted into b+ tree, and the other is used to
>   manage the directory name index which is going to be deleted from b+ tree.
> - introduce a worker to deal with the delayed operation. This worker is used
>   to deal with the works of the delayed directory name index items insertion
>   and deletion and the delayed inode update.
>   When the delayed items is beyond the lower limit, we create works for some
>   delayed nodes and insert them into the work queue of the worker, and then
>   go back.
>   When the delayed items is beyond the upper bound, we create works for all
>   the delayed nodes that haven't been dealt with, and insert them into thw work
>   queue of the worker, and then wait for that the untreated items is below some
>   threshold value.
> - When we want to insert a directory name index into b+ tree, we just add the
>   information into the delayed inserting rb-tree.
>   And then we check the number of the delayed items and do delayed items
>   balance. (The balance policy is above.)
> - When we want to delete a directory name index from the b+ tree, we search it
>   in the inserting rb-tree at first. If we look it up, just drop it. If not,
>   add the key of it into the delayed deleting rb-tree.
>   Similar to the delayed inserting rb-tree, we also check the number of the
>   delayed items and do delayed items balance.
>   (The same to inserting manipulation)
> - When we want to update the metadata of some inode, we cached the data of the
>   inode into the delayed node. the worker will flush it into the b+ tree after
>   dealing with the delayed insertion and deletion.
> - We will move the delayed node to the tail of the list after we access the
>   delayed node, By this way, we can cache more delayed items and merge more
>   inode updates.
> - If we want to commit transaction, we will deal with all the delayed node.
> - the delayed node will be freed when we free the btrfs inode.

The delayed node structure will live even after if the data were
transferred to b+ tree? I'm concerned about unnecessary memory allocated
when in fact not needed.

> - Before we log the inode items, we commit all the directory name index items
>   and the delayed inode update.
> 
> I did a quick test by the benchmark tool[1] and found we can improve the
> performance of file creation by ~16%, and file deletion by ~19%.
> 
> Before applying this patch:
> Create files:
> 	Total files: 50000
> 	Total time: 1.113182
> 	Average time: 0.000022
> Delete files:
> 	Total files: 50000
> 	Total time: 1.544013
> 	Average time: 0.000031
> 
> After applying this patch:
> Create files:
> 	Total files: 50000
> 	Total time: 0.935185
> 	Average time: 0.000019
> Delete files:
> 	Total files: 50000
> 	Total time: 1.252672
> 	Average time: 0.000025
> 
> [1] http://marc.info/?l=linux-btrfs&m=128212635122920&q=p3
> 
> Signed-off-by: Miao Xie <miaox@xxxxxxxxxxxxxx>
> ---
> Changelog V1 -> V2
> - break up the global rb-tree, use a list to manage the delayed nodes,
>   which is created for every directory and file, and used to manage the
>   delayed directory name index items and the delayed inode item.
> - introduce a worker to deal with the delayed nodes.
> 
>  fs/btrfs/Makefile        |    2 +-
>  fs/btrfs/btrfs_inode.h   |    5 +
>  fs/btrfs/ctree.c         |   14 +-
>  fs/btrfs/ctree.h         |   22 +-
>  fs/btrfs/delayed-inode.c | 1546 ++++++++++++++++++++++++++++++++++++++++++++++
>  fs/btrfs/delayed-inode.h |  122 ++++
>  fs/btrfs/dir-item.c      |   32 +-
>  fs/btrfs/disk-io.c       |   26 +-
>  fs/btrfs/extent-tree.c   |   18 +-
>  fs/btrfs/inode.c         |  109 ++--
>  fs/btrfs/ioctl.c         |    2 +-
>  fs/btrfs/super.c         |    1 +
>  fs/btrfs/transaction.c   |    7 +-
>  fs/btrfs/tree-log.c      |    7 +
>  14 files changed, 1809 insertions(+), 104 deletions(-)
>  create mode 100644 fs/btrfs/delayed-inode.c
>  create mode 100644 fs/btrfs/delayed-inode.h
> 
> diff --git a/fs/btrfs/Makefile b/fs/btrfs/Makefile
> index 31610ea..a8411c2 100644
> --- a/fs/btrfs/Makefile
> +++ b/fs/btrfs/Makefile
> @@ -7,4 +7,4 @@ btrfs-y += super.o ctree.o extent-tree.o print-tree.o root-tree.o dir-item.o \
>  	   extent_map.o sysfs.o struct-funcs.o xattr.o ordered-data.o \
>  	   extent_io.o volumes.o async-thread.o ioctl.o locking.o orphan.o \
>  	   export.o tree-log.o acl.o free-space-cache.o zlib.o lzo.o \
> -	   compression.o delayed-ref.o relocation.o
> +	   compression.o delayed-ref.o relocation.o delayed-inode.o
> diff --git a/fs/btrfs/btrfs_inode.h b/fs/btrfs/btrfs_inode.h
> index ccc991c..ef3baa8 100644
> --- a/fs/btrfs/btrfs_inode.h
> +++ b/fs/btrfs/btrfs_inode.h
> @@ -22,6 +22,7 @@
>  #include "extent_map.h"
>  #include "extent_io.h"
>  #include "ordered-data.h"
> +#include "delayed-inode.h"
>  
>  /* in memory btrfs inode */
      in-memory

>  struct btrfs_inode {
> @@ -159,9 +160,13 @@ struct btrfs_inode {
>  	 */
>  	unsigned force_compress:4;
>  
> +	struct btrfs_delayed_node *delayed_node;
> +
>  	struct inode vfs_inode;
>  };
>  
> +extern unsigned char btrfs_filetype_table[];
> +
>  static inline struct btrfs_inode *BTRFS_I(struct inode *inode)
>  {
>  	return container_of(inode, struct btrfs_inode, vfs_inode);
> diff --git a/fs/btrfs/ctree.c b/fs/btrfs/ctree.c
> index b5baff0..6e3e4c3 100644
> --- a/fs/btrfs/ctree.c
> +++ b/fs/btrfs/ctree.c
> @@ -38,11 +38,6 @@ static int balance_node_right(struct btrfs_trans_handle *trans,
>  			      struct extent_buffer *src_buf);
>  static int del_ptr(struct btrfs_trans_handle *trans, struct btrfs_root *root,
>  		   struct btrfs_path *path, int level, int slot);
> -static int setup_items_for_insert(struct btrfs_trans_handle *trans,
> -			struct btrfs_root *root, struct btrfs_path *path,
> -			struct btrfs_key *cpu_key, u32 *data_size,
> -			u32 total_data, u32 total_size, int nr);
> -
>  
>  struct btrfs_path *btrfs_alloc_path(void)
>  {
> @@ -3688,11 +3683,10 @@ out:
>   * to save stack depth by doing the bulk of the work in a function
>   * that doesn't call btrfs_search_slot
>   */
> -static noinline_for_stack int
> -setup_items_for_insert(struct btrfs_trans_handle *trans,
> -		      struct btrfs_root *root, struct btrfs_path *path,
> -		      struct btrfs_key *cpu_key, u32 *data_size,
> -		      u32 total_data, u32 total_size, int nr)
> +int setup_items_for_insert(struct btrfs_trans_handle *trans,
> +			   struct btrfs_root *root, struct btrfs_path *path,
> +			   struct btrfs_key *cpu_key, u32 *data_size,
> +			   u32 total_data, u32 total_size, int nr)
>  {
>  	struct btrfs_item *item;
>  	int i;
> diff --git a/fs/btrfs/ctree.h b/fs/btrfs/ctree.h
> index 7219537..7f6fa3c 100644
> --- a/fs/btrfs/ctree.h
> +++ b/fs/btrfs/ctree.h
> @@ -859,6 +859,7 @@ struct btrfs_block_group_cache {
>  struct reloc_control;
>  struct btrfs_device;
>  struct btrfs_fs_devices;
> +struct btrfs_delayed_root;
>  struct btrfs_fs_info {
>  	u8 fsid[BTRFS_FSID_SIZE];
>  	u8 chunk_tree_uuid[BTRFS_UUID_SIZE];
> @@ -885,7 +886,10 @@ struct btrfs_fs_info {
>  	/* logical->physical extent mapping */
>  	struct btrfs_mapping_tree mapping_tree;
>  
> -	/* block reservation for extent, checksum and root tree */
> +	/*
> +	 * block reservation for extent, checksum, root tree and
> +	 * delayed dir index item
> +	 */
>  	struct btrfs_block_rsv global_block_rsv;
>  	/* block reservation for delay allocation */
>  	struct btrfs_block_rsv delalloc_block_rsv;
> @@ -1012,6 +1016,7 @@ struct btrfs_fs_info {
>  	 * for the sys_munmap function call path
>  	 */
>  	struct btrfs_workers fixup_workers;
> +	struct btrfs_workers delayed_workers;
>  	struct task_struct *transaction_kthread;
>  	struct task_struct *cleaner_kthread;
>  	int thread_pool_size;
> @@ -1069,6 +1074,8 @@ struct btrfs_fs_info {
>  
>  	/* filesystem state */
>  	u64 fs_state;
> +
> +	struct btrfs_delayed_root *delayed_root;
>  };
>  
>  /*
> @@ -2085,6 +2092,13 @@ static inline bool btrfs_mixed_space_info(struct btrfs_space_info *space_info)
>  }
>  
>  /* extent-tree.c */
> +static inline u64 btrfs_calc_trans_metadata_size(struct btrfs_root *root,
> +						 int num_items)
> +{
> +	return (root->leafsize + root->nodesize * (BTRFS_MAX_LEVEL - 1)) *
> +		3 * num_items;

Can this calculation be documented?
 
> +}
> +
>  void btrfs_put_block_group(struct btrfs_block_group_cache *cache);
>  int btrfs_run_delayed_refs(struct btrfs_trans_handle *trans,
>  			   struct btrfs_root *root, unsigned long count);
> @@ -2285,6 +2299,10 @@ static inline int btrfs_del_item(struct btrfs_trans_handle *trans,
>  	return btrfs_del_items(trans, root, path, path->slots[0], 1);
>  }
>  
> +int setup_items_for_insert(struct btrfs_trans_handle *trans,
> +			   struct btrfs_root *root, struct btrfs_path *path,
> +			   struct btrfs_key *cpu_key, u32 *data_size,
> +			   u32 total_data, u32 total_size, int nr);
>  int btrfs_insert_item(struct btrfs_trans_handle *trans, struct btrfs_root
>  		      *root, struct btrfs_key *key, void *data, u32 data_size);
>  int btrfs_insert_some_items(struct btrfs_trans_handle *trans,
> @@ -2346,7 +2364,7 @@ int btrfs_set_root_node(struct btrfs_root_item *item,
>  /* dir-item.c */
>  int btrfs_insert_dir_item(struct btrfs_trans_handle *trans,
>  			  struct btrfs_root *root, const char *name,
> -			  int name_len, u64 dir,
> +			  int name_len, struct inode *dir,
>  			  struct btrfs_key *location, u8 type, u64 index);
>  struct btrfs_dir_item *btrfs_lookup_dir_item(struct btrfs_trans_handle *trans,
>  					     struct btrfs_root *root,
> diff --git a/fs/btrfs/delayed-inode.c b/fs/btrfs/delayed-inode.c
> new file mode 100644
> index 0000000..a377d3b
> --- /dev/null
> +++ b/fs/btrfs/delayed-inode.c
> @@ -0,0 +1,1546 @@
> +/*
> + * Copyright (C) 2011 Fujitsu.  All rights reserved.
> + * Written by Miao Xie <miaox@xxxxxxxxxxxxxx>
> + *
> + * This program is free software; you can redistribute it and/or
> + * modify it under the terms of the GNU General Public
> + * License v2 as published by the Free Software Foundation.
> + *
> + * This program is distributed in the hope that it will be useful,
> + * but WITHOUT ANY WARRANTY; without even the implied warranty of
> + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
> + * General Public License for more details.
> + *
> + * You should have received a copy of the GNU General Public
> + * License along with this program; if not, write to the
> + * Free Software Foundation, Inc., 59 Temple Place - Suite 330,
> + * Boston, MA 021110-1307, USA.
> + */
> +
> +#include "delayed-inode.h"
> +#include "disk-io.h"
> +#include "transaction.h"
> +
> +#define BTRFS_DELAYED_LOWER		50
> +#define BTRFS_DELAYED_WRITEBACK		200
> +#define BTRFS_DELAYED_BACKGROUND	100
> +
> +static inline void btrfs_init_delayed_node(
> +				struct btrfs_delayed_node *delayed_node,
> +				u64 root_id, u64 inode_id)
> +{
> +	delayed_node->root_id = root_id;
> +	delayed_node->inode_id = inode_id;
> +	atomic_set(&delayed_node->refs, 0);
> +	delayed_node->count = 0;
> +	delayed_node->in_list = 0;
> +	delayed_node->inode_dirty = 0;
> +	delayed_node->delayed_doing = 0;
> +	delayed_node->ins_root = RB_ROOT;
> +	delayed_node->del_root = RB_ROOT;
> +	mutex_init(&delayed_node->mutex);
> +	delayed_node->index_cnt = 0;
> +	INIT_LIST_HEAD(&delayed_node->list);
> +	spin_lock_init(&delayed_node->lock);
> +	delayed_node->bytes_reserved = 0;
> +	delayed_node->block_rsv = NULL;
> +}
> +
> +static inline int btrfs_is_continuous_delayed_item(
> +					struct btrfs_delayed_item *item1,
> +					struct btrfs_delayed_item *item2)
> +{
> +	if (item1->key.type == BTRFS_DIR_INDEX_KEY &&
> +	    item1->key.objectid == item2->key.objectid &&
> +	    item1->key.type == item2->key.type &&
> +	    item1->key.offset + 1 == item2->key.offset)
> +		return 1;
> +	return 0;
> +}
> +
> +static inline struct btrfs_delayed_root *btrfs_get_delayed_root(
> +							struct btrfs_root *root)
> +{
> +	return root->fs_info->delayed_root;
> +}
> +
> +static struct btrfs_delayed_node *btrfs_get_or_create_delayed_node(
> +							struct inode *inode)
> +{
> +	struct btrfs_delayed_node *node;
> +	struct btrfs_inode *btrfs_inode = BTRFS_I(inode);
> +	struct btrfs_root *root = btrfs_inode->root;
> +
> +	if (btrfs_inode->delayed_node) {
> +		node = btrfs_inode->delayed_node;
> +		atomic_inc(&node->refs);	/* can be accessed */
> +		return node;
> +	}
> +
> +	node = kmalloc(sizeof(*node), GFP_NOFS);

What about putting btrfs_delayed_node in a kmem cache? This can speedup
quick insert/delete sequence.

> +	if (!node)
> +		return ERR_PTR(-ENOMEM);
> +	btrfs_init_delayed_node(node, root->objectid, inode->i_ino);
> +
> +	btrfs_inode->delayed_node = node;
> +	node->delayed_root = btrfs_get_delayed_root(root);
> +	atomic_inc(&node->refs);	/* cached in the btrfs inode */
> +	atomic_inc(&node->refs);	/* can be accessed */
> +
> +	return node;
> +}
> +
> +/* Call it when holding delayed_node->mutex */
> +static void btrfs_queue_delayed_node(struct btrfs_delayed_root *root,
> +				     struct btrfs_delayed_node *node)
> +{
> +	spin_lock(&root->lock);
> +	if (node->in_list)
> +		list_move_tail(&node->list, &root->node_list);
> +	else {
> +		list_add_tail(&node->list, &root->node_list);
> +		atomic_inc(&node->refs);	/* inserted into rb-tree */
> +		root->nodes++;
> +		node->in_list = 1;
> +	}
> +	spin_unlock(&root->lock);
> +}
> +
> +/* Call it when holding delayed_node->mutex */
> +static void btrfs_dequeue_delayed_node(struct btrfs_delayed_root *root,
> +				       struct btrfs_delayed_node *node)
> +{
> +	spin_lock(&root->lock);
> +	if (node->in_list) {
> +		root->nodes--;
> +		atomic_dec(&node->refs);	/* not in the rb-tree */
> +		list_del_init(&node->list);
> +		node->in_list = 0;
> +	}
> +	spin_unlock(&root->lock);
> +}
> +
> +static void btrfs_release_delayed_node(struct btrfs_delayed_node *delayed_node)
> +{
> +	struct btrfs_delayed_root *delayed_root;
> +
> +	if (!delayed_node)
> +		return;
> +
> +	delayed_root = delayed_node->delayed_root;
> +
> +	mutex_lock(&delayed_node->mutex);
> +	if (delayed_node->count)
> +		btrfs_queue_delayed_node(delayed_root, delayed_node);
> +	else
> +		btrfs_dequeue_delayed_node(delayed_root, delayed_node);
> +	mutex_unlock(&delayed_node->mutex);
> +
> +	if (atomic_dec_and_test(&delayed_node->refs))
> +		kfree(delayed_node);
> +}
> +
> +struct btrfs_delayed_node *btrfs_first_delayed_node(
> +			struct btrfs_delayed_root *delayed_root)
> +{
> +	struct list_head *p;
> +	struct btrfs_delayed_node *node = NULL;
> +
> +	spin_lock(&delayed_root->lock);
> +	if (list_empty(&delayed_root->node_list))
> +		goto out;
> +
> +	p = delayed_root->node_list.next;
> +	node = list_entry(p, struct btrfs_delayed_node, list);
> +	atomic_inc(&node->refs);
> +out:
> +	spin_unlock(&delayed_root->lock);
> +
> +	return node;
> +}
> +
> +struct btrfs_delayed_node *btrfs_next_delayed_node(
> +						struct btrfs_delayed_node *node)
> +{
> +	struct btrfs_delayed_root *delayed_root = node->delayed_root;
> +	struct list_head *p;
> +	struct btrfs_delayed_node *next = NULL;
> +
> +	spin_lock(&delayed_root->lock);
> +	if (!node->in_list) {	/* not in the list */
> +		if (list_empty(&delayed_root->node_list))
> +			goto out;
> +		p = delayed_root->node_list.next;
> +	} else if (list_is_last(&node->list, &delayed_root->node_list))
> +		goto out;
> +	else
> +		p = node->list.next;
> +
> +	next = list_entry(p, struct btrfs_delayed_node, list);
> +	atomic_inc(&next->refs);
> +out:
> +	spin_unlock(&delayed_root->lock);
> +
> +	return next;
> +}
> +
> +struct btrfs_delayed_item *btrfs_alloc_delayed_item(int data_len)

'unsigned int data_len' and you can remove (u32) typing below

> +{
> +	struct btrfs_delayed_item *item;
> +	item = kmalloc(sizeof(*item) + data_len, GFP_NOFS);
> +	if (item) {
> +		item->data_len = (u32)data_len;
> +		item->ins_or_del = 0;
> +		item->bytes_reserved = 0;
> +		item->block_rsv = NULL;
> +		item->delayed_node = NULL;
> +	}
> +	return item;
> +}
> +
> +/*
> + * __btrfs_lookup_delayed_item - look up the delayed item by key
> + * @delayed_node: pointer of the delayed node
                             to

> + * @key:	  the key to search
                             look up

> + * @prev:	  used to store the prev item if the right item isn't found
> + * @next:	  used to store the next item if the right item isn't found
> + *
> + * Note: if we don't find the right item, we will return the prev item and
> + * the next item.
> + */
> +static struct btrfs_delayed_item *__btrfs_lookup_delayed_item(
> +				struct rb_root *root,
> +				struct btrfs_key *key,
> +				struct btrfs_delayed_item **prev,
> +				struct btrfs_delayed_item **next)
> +{
> +	struct rb_node *node, *prev_node = NULL;
> +	struct btrfs_delayed_item *delayed_item = NULL;
> +	int ret = 0;
> +
> +	node = root->rb_node;
> +
> +	while (node) {
> +		delayed_item = rb_entry(node, struct btrfs_delayed_item,
> +					rb_node);
> +		prev_node = node;
> +		ret = btrfs_comp_cpu_keys(&delayed_item->key, key);
> +		if (ret < 0)
> +			node = node->rb_right;
> +		else if (ret > 0)
> +			node = node->rb_left;
> +		else
> +			return delayed_item;
> +	}
> +
> +	if (prev) {
> +		if (!prev_node)
> +			*prev = NULL;
> +		else if (ret < 0)
> +			*prev = delayed_item;
> +		else if ((node = rb_prev(prev_node)) != NULL) {
> +			*prev = rb_entry(node, struct btrfs_delayed_item,
> +					 rb_node);
> +		} else
> +			*prev = NULL;
> +	}
> +
> +	if (next) {
> +		if (!prev_node)
> +			*next = NULL;
> +		else if (ret > 0)
> +			*next = delayed_item;
> +		else if ((node = rb_next(prev_node)) != NULL) {
> +			*next = rb_entry(node, struct btrfs_delayed_item,
> +					 rb_node);
> +		} else
> +			*next = NULL;
> +	}
> +	return NULL;
> +}
> +
> +struct btrfs_delayed_item *__btrfs_lookup_delayed_insertion_item(
> +					struct btrfs_delayed_node *delayed_node,
> +					struct btrfs_key *key)
> +{
> +	struct btrfs_delayed_item *item;
> +
> +	item = __btrfs_lookup_delayed_item(&delayed_node->ins_root, key,
> +					   NULL, NULL);
> +	return item;
> +}
> +
> +struct btrfs_delayed_item *__btrfs_lookup_delayed_deletion_item(
> +					struct btrfs_delayed_node *delayed_node,
> +					struct btrfs_key *key)
> +{
> +	struct btrfs_delayed_item *item;
> +
> +	item = __btrfs_lookup_delayed_item(&delayed_node->del_root, key,
> +					   NULL, NULL);
> +	return item;
> +}
> +
> +struct btrfs_delayed_item *__btrfs_search_delayed_insertion_item(
> +					struct btrfs_delayed_node *delayed_node,
> +					struct btrfs_key *key)
> +{
> +	struct btrfs_delayed_item *item, *next;
> +
> +	item = __btrfs_lookup_delayed_item(&delayed_node->ins_root, key,
> +					   NULL, &next);
> +	if (!item)
> +		item = next;
> +
> +	return item;
> +}
> +
> +struct btrfs_delayed_item *__btrfs_search_delayed_deletion_item(
> +					struct btrfs_delayed_node *delayed_node,
> +					struct btrfs_key *key)
> +{
> +	struct btrfs_delayed_item *item, *next;
> +
> +	item = __btrfs_lookup_delayed_item(&delayed_node->del_root, key,
> +					   NULL, &next);
> +	if (!item)
> +		item = next;
> +
> +	return item;
> +}
> +
> +static int __btrfs_add_delayed_item(struct btrfs_delayed_node *delayed_node,
> +				    struct btrfs_delayed_item *ins,
> +				    int action)
> +{
> +	struct rb_node **p, *node;
> +	struct rb_node *parent_node = NULL;
> +	struct rb_root *root;
> +	struct btrfs_delayed_item *item;
> +	int cmp;
> +
> +	if (action == BTRFS_DELAYED_INSERTION_ITEM)
> +		root = &delayed_node->ins_root;
> +	else if (action == BTRFS_DELAYED_DELETION_ITEM)
> +		root = &delayed_node->del_root;
> +	else
> +		BUG();
> +	p = &root->rb_node;
> +	node = &ins->rb_node;
> +
> +	while (*p) {
> +		parent_node = *p;
> +		item = rb_entry(parent_node, struct btrfs_delayed_item,
> +				 rb_node);
> +
> +		cmp = btrfs_comp_cpu_keys(&item->key, &ins->key);
> +		if (cmp < 0)
> +			p = &(*p)->rb_right;
> +		else if (cmp > 0)
> +			p = &(*p)->rb_left;
> +		else
> +			return -EEXIST;
> +	}
> +
> +	rb_link_node(node, parent_node, p);
> +	rb_insert_color(node, root);
> +	ins->delayed_node = delayed_node;
> +	ins->ins_or_del = action;
> +
> +	if (ins->key.type == BTRFS_DIR_INDEX_KEY &&
> +	    action == BTRFS_DELAYED_INSERTION_ITEM &&
> +	    ins->key.offset >= delayed_node->index_cnt)
> +			delayed_node->index_cnt = ins->key.offset + 1;
> +
> +	delayed_node->count++;
> +	atomic_inc(&delayed_node->delayed_root->items);
> +	return 0;
> +}
> +
> +static int __btrfs_add_delayed_insertion_item(struct btrfs_delayed_node *node,
> +					      struct btrfs_delayed_item *item)
> +{
> +	return __btrfs_add_delayed_item(node, item,
> +					BTRFS_DELAYED_INSERTION_ITEM);
> +}
> +
> +static int __btrfs_add_delayed_deletion_item(struct btrfs_delayed_node *node,
> +					     struct btrfs_delayed_item *item)
> +{
> +	return __btrfs_add_delayed_item(node, item,
> +					BTRFS_DELAYED_DELETION_ITEM);
> +}
> +
> +static void __btrfs_remove_delayed_item(struct btrfs_delayed_item *delayed_item)
> +{
> +	struct rb_root *root;
> +	struct btrfs_delayed_root *delayed_root;
> +
> +	delayed_root = delayed_item->delayed_node->delayed_root;
> +
> +	BUG_ON(!delayed_root);
> +	BUG_ON(delayed_item->ins_or_del != BTRFS_DELAYED_DELETION_ITEM &&
> +	       delayed_item->ins_or_del != BTRFS_DELAYED_INSERTION_ITEM);
> +
> +	if (delayed_item->ins_or_del == BTRFS_DELAYED_INSERTION_ITEM)
> +		root = &delayed_item->delayed_node->ins_root;
> +	else
> +		root = &delayed_item->delayed_node->del_root;
> +
> +	rb_erase(&delayed_item->rb_node, root);
> +	delayed_item->delayed_node->count--;
> +	atomic_dec(&delayed_root->items);
> +	if (atomic_read(&delayed_root->items) < BTRFS_DELAYED_LOWER &&
> +	    waitqueue_active(&delayed_root->wait))
> +		wake_up(&delayed_root->wait);
> +}
> +
> +void btrfs_free_delayed_item(struct btrfs_delayed_item *item) {
> +	kfree(item);
> +}

Is this used outside of delayed-inode.c ? The only usage is in the next
function. So I suggest at least 'static inline' or completely removing it.

> +
> +void btrfs_release_delayed_item(struct btrfs_delayed_item *item)
> +{
> +	if (item) {
> +		__btrfs_remove_delayed_item(item);
> +		btrfs_free_delayed_item(item);
> +	}
> +}
> +
> +struct btrfs_delayed_item *__btrfs_first_delayed_insertion_item(
> +					struct btrfs_delayed_node *delayed_node)
> +{
> +	struct rb_node *p;
> +	struct btrfs_delayed_item *item = NULL;
> +
> +	p = rb_first(&delayed_node->ins_root);
> +	if (p)
> +		item = rb_entry(p, struct btrfs_delayed_item, rb_node);
> +
> +	return item;
> +}
> +
> +struct btrfs_delayed_item *__btrfs_first_delayed_deletion_item(
> +					struct btrfs_delayed_node *delayed_node)
> +{
> +	struct rb_node *p;
> +	struct btrfs_delayed_item *item = NULL;
> +
> +	p = rb_first(&delayed_node->del_root);
> +	if (p)
> +		item = rb_entry(p, struct btrfs_delayed_item, rb_node);
> +
> +	return item;
> +}
> +
> +struct btrfs_delayed_item *__btrfs_next_delayed_item(
> +						struct btrfs_delayed_item *item)
> +{
> +	struct rb_node *p;
> +	struct btrfs_delayed_item *next = NULL;
> +
> +	p = rb_next(&item->rb_node);
> +	if (p)
> +		next = rb_entry(p, struct btrfs_delayed_item, rb_node);
> +
> +	return next;
> +}
> +
> +static inline struct btrfs_delayed_node *btrfs_get_delayed_node(
> +							struct inode *inode)
> +{
> +	struct btrfs_inode *btrfs_inode = BTRFS_I(inode);
> +	struct btrfs_delayed_node *delayed_node;
> +
> +	delayed_node = btrfs_inode->delayed_node;
> +	if (delayed_node)
> +		atomic_inc(&delayed_node->refs);
> +
> +	return delayed_node;
> +}
> +
> +static inline struct btrfs_root *btrfs_get_fs_root(struct btrfs_root *root,
> +						   u64 root_id)
> +{
> +	struct btrfs_key root_key;
> +
> +	if (root->objectid == root_id)
> +		return root;
> +
> +	root_key.objectid = root_id;
> +	root_key.type = BTRFS_ROOT_ITEM_KEY;
> +	root_key.offset = (u64)-1;
> +	return btrfs_read_fs_root_no_name(root->fs_info, &root_key);
> +}
> +
> +int btrfs_delayed_item_reserve_metadata(struct btrfs_trans_handle *trans,
> +					struct btrfs_root *root,
> +					struct btrfs_delayed_item *item)
> +{
> +	struct btrfs_block_rsv *src_rsv;
> +	struct btrfs_block_rsv *dst_rsv;
> +	u64 num_bytes;
> +	int ret;
> +
> +	if (!trans->bytes_reserved)
> +		return 0;
> +
> +	src_rsv = trans->block_rsv;
> +	dst_rsv = &root->fs_info->global_block_rsv;
> +
> +	num_bytes = btrfs_calc_trans_metadata_size(root, 1);
> +	ret = btrfs_block_rsv_migrate(src_rsv, dst_rsv, num_bytes);
> +	if (!ret) {
> +		item->bytes_reserved = num_bytes;
> +		item->block_rsv = dst_rsv;
> +	}
> +
> +	return ret;
> +}
> +
> +void btrfs_delayed_item_release_metadata(struct btrfs_root *root,
> +					 struct btrfs_delayed_item *item)

make it static

> +{
> +	if (!item->bytes_reserved)
> +		return;
> +
> +	btrfs_block_rsv_release(root, item->block_rsv,
> +				item->bytes_reserved);
> +}
> +
> +int btrfs_delayed_inode_reserve_metadata(struct btrfs_trans_handle *trans,
> +					 struct btrfs_root *root,
> +					 struct btrfs_delayed_node *node)
> +{
> +	struct btrfs_block_rsv *src_rsv;
> +	struct btrfs_block_rsv *dst_rsv;
> +	u64 num_bytes;
> +	int ret;
> +
> +	if (!trans->bytes_reserved)
> +		return 0;
> +
> +	src_rsv = trans->block_rsv;
> +	dst_rsv = &root->fs_info->global_block_rsv;
> +
> +	num_bytes = btrfs_calc_trans_metadata_size(root, 1);
> +	ret = btrfs_block_rsv_migrate(src_rsv, dst_rsv, num_bytes);
> +	if (!ret) {
> +		node->bytes_reserved = num_bytes;
> +		node->block_rsv = dst_rsv;
> +	}
> +
> +	return ret;
> +}
> +
> +void btrfs_delayed_inode_release_metadata(struct btrfs_root *root,
> +					  struct btrfs_delayed_node *node)
> +{
> +	if (!node->bytes_reserved)
> +		return;
> +
> +	btrfs_block_rsv_release(root, node->block_rsv,
> +				node->bytes_reserved);
> +	node->bytes_reserved = 0;
> +}
> +
> +/*
> + * btrfs_batch_insert_dir_index_items - batch insert some continuous dir index
      ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
does not match the function name below, and does not seem to be useful
for a static function (ie. not an API)

> + *					items into the same leaf
> + *
> + * This function will insert some dir index items into the same leaf according
> + * to the free space of the leaf.
> + */

I think it's not needed to repeat the same information three times

> +static int btrfs_batch_insert_items(struct btrfs_trans_handle *trans,
> +				struct btrfs_root *root,
> +				struct btrfs_path *path,
> +				struct btrfs_delayed_item *item)
> +{
> +	struct btrfs_delayed_item *curr, *next;
> +	int free_space;
> +	int total_data_size = 0, total_size = 0;
> +	struct extent_buffer *leaf;
> +	char *data_ptr;
> +	struct btrfs_key *keys;
> +	u32 *data_size;
> +	struct list_head head;
> +	int slot;
> +	int nitems;
> +	int i;
> +	int ret = 0;
> +
> +	BUG_ON(!path->nodes[0]);
> +
> +	leaf = path->nodes[0];
> +	free_space = btrfs_leaf_free_space(root, leaf);
> +	INIT_LIST_HEAD(&head);
> +
> +	next = item;
> +
> +	/*
> +	 * count the number of the dir index items that we can insert them in
> +	 * batch

"count the number of dir index items that we can insert in batch"

> +	 */
> +	while (total_size + next->data_len + sizeof(struct btrfs_item) <=
> +	       free_space) {
> +		total_data_size += next->data_len;
> +		total_size += next->data_len + sizeof(struct btrfs_item);
> +		list_add_tail(&next->tree_list, &head);
> +		nitems++;
> +
> +		curr = next;
> +		next = __btrfs_next_delayed_item(curr);
> +		if (!next)
> +			break;
> +
> +		if (!btrfs_is_continuous_delayed_item(curr, next))
> +			break;
> +	}
> +
> +	if (!nitems) {
> +		ret = 0;
> +		goto out;
> +	}
> +
> +	keys = kmalloc(sizeof(struct btrfs_key) * nitems, GFP_NOFS);
> +	if (!keys) {
> +		ret = -ENOMEM;
> +		goto out;
> +	}
> +
> +	data_size = kmalloc(sizeof(u32) * nitems, GFP_NOFS);
> +	if (!data_size) {
> +		ret = -ENOMEM;
> +		goto error;
> +	}
> +
> +	/* get keys of all the delayed items */
> +	i = 0;
> +	list_for_each_entry(next, &head, tree_list) {
> +		keys[i] = next->key;
> +		data_size[i] = next->data_len;
> +		i++;
> +	}
> +
> +	/* insert the keys of the items */
> +	ret = setup_items_for_insert(trans, root, path, keys, data_size,
> +				     total_data_size, total_size, nitems);
> +	if (ret)
> +		goto error;
> +
> +	/* insert the dir index items */
> +	slot = path->slots[0];
> +	list_for_each_entry_safe(curr, next, &head, tree_list) {
> +		data_ptr = btrfs_item_ptr(leaf, slot, char);
> +		write_extent_buffer(leaf, &curr->data,
> +				    (unsigned long)data_ptr,
> +				    curr->data_len);
> +		slot++;
> +
> +		btrfs_delayed_item_release_metadata(root, curr);
> +
> +		list_del(&curr->tree_list);
> +		btrfs_release_delayed_item(curr);
> +	}
> +
> +error:
> +	kfree(data_size);
> +	kfree(keys);
> +out:
> +	return ret;
> +}
> +
> +/*
> + * This helper can just do simple insertion that needn't extend item for new
> + * data, such as directory name index insertion, inode insertion.
> + */
> +int btrfs_insert_delayed_item(struct btrfs_trans_handle *trans,

make it 'static'

> +			      struct btrfs_root *root,
> +			      struct btrfs_path *path,
> +			      struct btrfs_delayed_item *delayed_item)
> +{
> +	struct extent_buffer *leaf;
> +	struct btrfs_item *item;
> +	char *ptr;
> +	int ret;
> +
> +	ret = btrfs_insert_empty_item(trans, root, path, &delayed_item->key,
> +				      delayed_item->data_len);
> +	if (ret < 0 && ret != -EEXIST)
> +		return ret;
> +
> +	leaf = path->nodes[0];
> +
> +	item = btrfs_item_nr(leaf, path->slots[0]);
> +	ptr = btrfs_item_ptr(leaf, path->slots[0], char);
> +
> +	write_extent_buffer(leaf, delayed_item->data, (unsigned long)ptr,
> +			    delayed_item->data_len);
> +	btrfs_mark_buffer_dirty(leaf);
> +
> +	btrfs_delayed_item_release_metadata(root, delayed_item);
> +	return 0;
> +}
> +
> +/*
> + * we insert a item first, then if there are some continuous items, we try
                an

> + * to insert those items into the same leaf.
> + */
> +static int btrfs_insert_delayed_items(struct btrfs_trans_handle *trans,
> +				      struct btrfs_path *path,
> +				      struct btrfs_root *root,
> +				      struct btrfs_delayed_node *node)
> +{
> +	struct btrfs_delayed_item *curr, *prev;
> +	int ret = 0;
> +
> +do_again:
> +	mutex_lock(&node->mutex);
> +	curr = __btrfs_first_delayed_insertion_item(node);
> +	if (!curr)
> +		goto insert_end;
> +
> +	ret = btrfs_insert_delayed_item(trans, root, path, curr);
> +	if (ret < 0) {
> +		btrfs_release_path(root, path);
> +		goto insert_end;
> +	}
> +
> +	prev = curr;
> +	curr = __btrfs_next_delayed_item(prev);
> +	if (curr && btrfs_is_continuous_delayed_item(prev, curr)) {
> +		/* insert the coninuous items into the same leaf */
                              ^^^^^^^^^

> +		path->slots[0]++;
> +		btrfs_batch_insert_items(trans, root, path, curr);
> +	}
> +	btrfs_release_delayed_item(prev);
> +	btrfs_mark_buffer_dirty(path->nodes[0]);
> +
> +	btrfs_release_path(root, path);
> +	mutex_unlock(&node->mutex);
> +	goto do_again;
> +
> +insert_end:
> +	mutex_unlock(&node->mutex);
> +	return ret;
> +}
> +
> +static int btrfs_batch_delete_items(struct btrfs_trans_handle *trans,
> +				    struct btrfs_root *root,
> +				    struct btrfs_path *path,
> +				    struct btrfs_delayed_item *item)
> +{
> +	struct btrfs_delayed_item *curr, *next;
> +	struct extent_buffer *leaf;
> +	struct btrfs_key key;
> +	struct list_head head;
> +	int nitems, i, last_item;
> +	int ret = 0;
> +
> +	BUG_ON(!path->nodes[0]);
> +
> +	leaf = path->nodes[0];
> +
> +	i = path->slots[0];
> +	last_item = btrfs_header_nritems(leaf) - 1;
> +	if (i > last_item)
> +		return -ENOENT;		/* Is errno suitable? */

I don'nt know, but I'd put a FIXME there :)

> +
> +	next = item;
> +	INIT_LIST_HEAD(&head);
> +	btrfs_item_key_to_cpu(leaf, &key, i);
> +	nitems = 0;
> +	/*
> +	 * count the number of the dir index items that we can delete them in
> +	 * batch
> +	 */

use same wording as above

> +	while (btrfs_comp_cpu_keys(&next->key, &key) == 0) {
> +		list_add_tail(&next->tree_list, &head);
> +		nitems++;
> +
> +		curr = next;
> +		next = __btrfs_next_delayed_item(curr);
> +		if (!next)
> +			break;
> +
> +		if (!btrfs_is_continuous_delayed_item(curr, next))
> +			break;
> +
> +		i++;
> +		if (i > last_item)
> +			break;
> +		btrfs_item_key_to_cpu(leaf, &key, i);
> +	}
> +
> +	if (!nitems)
> +		return 0;
> +
> +	ret = btrfs_del_items(trans, root, path, path->slots[0], nitems);
> +	if (ret)
> +		goto out;
> +
> +	list_for_each_entry_safe(curr, next, &head, tree_list) {
> +		btrfs_delayed_item_release_metadata(root, curr);
> +		list_del(&curr->tree_list);
> +		btrfs_release_delayed_item(curr);
> +	}
> +
> +out:
> +	return ret;
> +}
> +
> +static int btrfs_delete_delayed_items(struct btrfs_trans_handle *trans,
> +				      struct btrfs_path *path,
> +				      struct btrfs_root *root,
> +				      struct btrfs_delayed_node *node)
> +{
> +	struct btrfs_delayed_item *curr, *prev;
> +	int ret = 0;
> +
> +do_again:
> +	mutex_lock(&node->mutex);
> +	curr = __btrfs_first_delayed_deletion_item(node);
> +	if (!curr)
> +		goto delete_fail;
> +
> +	ret = btrfs_search_slot(trans, root, &curr->key, path, -1, 1);
> +	if (ret < 0)
> +		goto delete_fail;
> +	else if (ret > 0) {
> +		/*
> +		 * can't find the item which the node points to, so this node
> +		 * is invalid, just drop it.
> +		 */
> +		prev = curr;
> +		curr = __btrfs_next_delayed_item(prev);
> +		btrfs_release_delayed_item(prev);
> +		ret = 0;
> +		btrfs_release_path(root, path);
> +		if (curr)
> +			goto do_again;
> +		else
> +			goto delete_fail;
> +	}
> +
> +	btrfs_batch_delete_items(trans, root, path, curr);
> +	btrfs_release_path(root, path);
> +	mutex_unlock(&node->mutex);
> +	goto do_again;
> +
> +delete_fail:
> +	btrfs_release_path(root, path);
> +	mutex_unlock(&node->mutex);
> +	return ret;
> +}
> +
> +static void btrfs_release_delayed_inode(struct btrfs_delayed_node *delayed_node)
> +{
> +	if (delayed_node && delayed_node->inode_dirty) {
> +		BUG_ON(!delayed_node->delayed_root);
> +		delayed_node->inode_dirty = 0;
> +		delayed_node->count--;
> +		atomic_dec(&delayed_node->delayed_root->items);
> +		if (atomic_read(&delayed_node->delayed_root->items) <
> +		    BTRFS_DELAYED_LOWER &&
> +		    waitqueue_active(&delayed_node->delayed_root->wait))
> +			wake_up(&delayed_node->delayed_root->wait);
> +	}
> +}
> +
> +static int btrfs_update_delayed_inode(struct btrfs_trans_handle *trans,
> +				      struct btrfs_root *root,
> +				      struct btrfs_path *path,
> +				      struct btrfs_delayed_node *node)
> +{
> +	struct btrfs_key key;
> +	struct btrfs_inode_item *inode_item;
> +	struct extent_buffer *leaf;
> +	int ret;
> +
> +	mutex_lock(&node->mutex);
> +	if (!node->inode_dirty) {
> +		mutex_unlock(&node->mutex);
> +		return 0;
> +	}
> +
> +	key.objectid = node->inode_id;
> +	btrfs_set_key_type(&key, BTRFS_INODE_ITEM_KEY);
> +	key.offset = 0;
> +	ret = btrfs_lookup_inode(trans, root, path, &key, 1);
> +	if (ret > 0) {
> +		btrfs_release_path(root, path);
> +		mutex_unlock(&node->mutex);
> +		return -ENOENT;
> +	} else if (ret < 0) {
> +		mutex_unlock(&node->mutex);
> +		return ret;
> +	}
> +
> +	btrfs_unlock_up_safe(path, 1);
> +	leaf = path->nodes[0];
> +	inode_item = btrfs_item_ptr(leaf, path->slots[0],
> +				    struct btrfs_inode_item);
> +	write_extent_buffer(leaf, &node->inode_item, (unsigned long)inode_item,
> +			    sizeof(struct btrfs_inode_item));
> +	btrfs_mark_buffer_dirty(leaf);
> +	btrfs_release_path(root, path);
> +
> +	btrfs_delayed_inode_release_metadata(root, node);
> +	btrfs_release_delayed_inode(node);
> +	mutex_unlock(&node->mutex);
> +
> +	return 0;
> +}
> +
> +static int btrfs_prepare_run_delayed_node(struct btrfs_delayed_node *node)
> +{
> +	spin_lock(&node->lock);
> +	if (node->delayed_doing) {
> +		spin_unlock(&node->lock);
> +		return 1;
> +	}
> +	node->delayed_doing = 1;
> +	spin_unlock(&node->lock);
> +
> +	return 0;
> +}
> +
> +static inline void btrfs_end_run_delayed_node(struct btrfs_delayed_node *node)
> +{
> +	spin_lock(&node->lock);
> +	node->delayed_doing = 0;
> +	spin_unlock(&node->lock);
> +}
> +
> +/* Called when committing the transaction. */
> +int btrfs_run_delayed_items(struct btrfs_trans_handle *trans,
> +			    struct btrfs_root *root)
> +{
> +	struct btrfs_delayed_root *delayed_root;
> +	struct btrfs_delayed_node *curr_node, *prev_node;
> +	struct btrfs_path *path;
> +	int ret = 0;
> +
> +	path = btrfs_alloc_path();
> +	if (!path)
> +		return -ENOMEM;
> +	path->leave_spinning = 1;
> +
> +	delayed_root = btrfs_get_delayed_root(root);
> +
> +	curr_node = btrfs_first_delayed_node(delayed_root);
> +	while (curr_node) {
> +		/*
> +		 * We needn't check this delayed node is being commited now or
> +		 * not, because when committing the transaction, just the
> +		 * current task can touch the delayed nodes, the other tasks
> +		 * cannot get the transaction handle and also cannot get the
> +		 * delayed nodes.
> +		 */
> +
> +		/*
> +		 * check root, if root is not the one which the delayed item
> +		 * wants to be inserted to, we get the right root.
> +		 */
> +		if (root->objectid != curr_node->root_id) {
> +			root = btrfs_get_fs_root(root, curr_node->root_id);
> +			BUG_ON(IS_ERR_OR_NULL(root));
> +		}
> +
> +		ret = btrfs_insert_delayed_items(trans, path, root,
> +						 curr_node);
> +		if (!ret)
> +			ret = btrfs_delete_delayed_items(trans, path, root,
> +							 curr_node);
> +		if (!ret)
> +			ret = btrfs_update_delayed_inode(trans, root, path,
> +							 curr_node);
> +		if (ret) {
> +			btrfs_release_delayed_node(curr_node);
> +			break;
> +		}
> +
> +		prev_node = curr_node;
> +		curr_node = btrfs_next_delayed_node(curr_node);
> +		btrfs_release_delayed_node(prev_node);
> +	}
> +
> +	btrfs_free_path(path);
> +	return ret;
> +}
> +
> +int btrfs_commit_inode_delayed_items(struct btrfs_trans_handle *trans,
> +				     struct btrfs_root *root,
> +				     struct inode *inode)
> +{
> +	struct btrfs_delayed_node *delayed_node = btrfs_get_delayed_node(inode);
> +	struct btrfs_path *path;
> +	int ret;
> +
> +	if (!delayed_node)
> +		return 0;
> +
> +	mutex_lock(&delayed_node->mutex);
> +	if (!delayed_node->count) {
> +		mutex_unlock(&delayed_node->mutex);
> +		btrfs_release_delayed_node(delayed_node);
> +		return 0;
> +	}
> +	mutex_unlock(&delayed_node->mutex);
> +
> +	path = btrfs_alloc_path();
> +	if (!path) {
> +		btrfs_release_delayed_node(delayed_node);
> +		return -ENOMEM;
> +	}
> +	path->leave_spinning = 1;
> +
> +	ret = btrfs_insert_delayed_items(trans, path, root, delayed_node);
> +	if (!ret)
> +		ret = btrfs_delete_delayed_items(trans, path, root,
> +						 delayed_node);
> +	if (!ret)
> +		ret = btrfs_update_delayed_inode(trans, root, path,
> +						 delayed_node);
> +	btrfs_free_path(path);
> +	btrfs_release_delayed_node(delayed_node);
> +	return ret;
> +}
> +
> +int btrfs_remove_delayed_node(struct inode *inode)
> +{
> +	struct btrfs_trans_handle *trans;
> +	struct btrfs_root *root = BTRFS_I(inode)->root;
> +	struct btrfs_delayed_node *delayed_node = BTRFS_I(inode)->delayed_node;
> +	int ret;
> +
> +	if (!delayed_node)
> +		return 0;
> +
> +	trans = btrfs_join_transaction(root, 0);
> +	if (IS_ERR(trans))
> +		return PTR_ERR(trans);
> +
> +	ret = btrfs_commit_inode_delayed_items(trans, root, inode);
> +	if (ret)
> +		goto out;
> +
> +	BUG_ON(delayed_node->in_list);
> +	BUG_ON(delayed_node->count);
> +
> +	BTRFS_I(inode)->delayed_node = NULL;
> +	btrfs_release_delayed_node(delayed_node);
> +out:
> +	btrfs_end_transaction(trans, root);
> +
> +	return ret;
> +}
> +
> +struct btrfs_async_delayed_node {
> +	struct btrfs_root *root;
> +	struct btrfs_delayed_node *delayed_node;
> +	struct btrfs_work work;
> +};
> +
> +static void btrfs_async_run_delayed_node_done(struct btrfs_work *work)
> +{
> +	struct btrfs_async_delayed_node *async_node;
> +	struct btrfs_trans_handle *trans;
> +	struct btrfs_path *path;
> +	struct btrfs_delayed_node *delayed_node;
> +	struct btrfs_root *root;
> +	int need_requeue = 1;
> +	int ret;
> +
> +	async_node = container_of(work, struct btrfs_async_delayed_node, work);
> +
> +	path = btrfs_alloc_path();
> +	if (!path)
> +		goto out;
> +	path->leave_spinning = 1;
> +
> +	root = async_node->root;
> +	BUG_ON(!root);
> +
> +	trans = btrfs_join_transaction(root, 0);
> +	if (IS_ERR(trans)) {
> +		btrfs_free_path(path);
remove

> +		goto out;
goto out_free_path:

> +	}
> +
> +	delayed_node = async_node->delayed_node;
> +
> +	root = btrfs_get_fs_root(async_node->root, delayed_node->root_id);
> +	BUG_ON(IS_ERR_OR_NULL(root));
> +
> +	ret = btrfs_insert_delayed_items(trans, path, root, delayed_node);
> +	if (!ret)
> +		ret = btrfs_delete_delayed_items(trans, path, root,
> +						 delayed_node);
> +
> +	if (!ret)
> +		btrfs_update_delayed_inode(trans, root, path, delayed_node);
> +
> +	/* Maybe new delayed items have been inserted */
> +	mutex_lock(&delayed_node->mutex);
> +	if (!delayed_node->count)
> +		need_requeue = 0;
> +	mutex_unlock(&delayed_node->mutex);
> +
> +	btrfs_end_transaction(trans, root);

out_free_path:

> +	btrfs_free_path(path);
> +out:
> +	if (need_requeue)
> +		btrfs_requeue_work(&async_node->work);
> +	else {
> +		BTRFS_end_run_delayed_node(delayed_node);
> +		btrfs_release_delayed_node(delayed_node);
> +		kfree(async_node);
> +	}
> +}
> +
> +static int btrfs_wq_run_delayed_node(struct btrfs_delayed_root *delayed_root,
> +				     struct btrfs_root *root, int all)
> +{
> +	struct btrfs_async_delayed_node *async_node;
> +	struct btrfs_delayed_node *curr, *next;
> +	int count = 0;
> +
> +	curr = btrfs_first_delayed_node(delayed_root);
> +again:
> +	if (!curr)
> +		return 0;
> +
> +	if (btrfs_prepare_run_delayed_node(curr)) {
> +		next = btrfs_next_delayed_node(curr);
> +		btrfs_release_delayed_node(curr);
> +		curr = next;
> +		goto skip;
> +	}
> +
> +	async_node = kmalloc(sizeof(*async_node), GFP_NOFS);
> +	if (!async_node) {
> +		btrfs_end_run_delayed_node(curr);
> +		btrfs_release_delayed_node(curr);
> +		return -ENOMEM;
> +	}
> +
> +	async_node->root = root;
> +	async_node->delayed_node = curr;
> +
> +	async_node->work.func = btrfs_async_run_delayed_node_done;
> +	async_node->work.flags = 0;
> +
> +	btrfs_queue_worker(&root->fs_info->delayed_workers, &async_node->work);
> +	count++;
> +
> +	curr = btrfs_next_delayed_node(curr);
> +skip:
> +	if (all || count < 4)
> +		goto again;
> +	else if (curr)
> +		btrfs_release_delayed_node(curr);
> +	return 0;
> +}
> +
> +static void btrfs_balance_delayed_items(struct btrfs_trans_handle *trans,
> +					struct btrfs_root *root)
> +{
> +	struct btrfs_delayed_root *delayed_root;
> +
> +	delayed_root = btrfs_get_delayed_root(root);
> +	if (atomic_read(&delayed_root->items) < BTRFS_DELAYED_BACKGROUND)
> +		return;
> +
> +	if (atomic_read(&delayed_root->items) >= BTRFS_DELAYED_WRITEBACK) {
> +		int ret;
> +		ret = btrfs_wq_run_delayed_node(delayed_root, root, 1);
> +		if (ret)
> +			return;
> +
> +		wait_event(delayed_root->wait,
> +			   (atomic_read(&delayed_root->items) <
> +			    BTRFS_DELAYED_LOWER));
> +		return;
> +	}
> +
> +	btrfs_wq_run_delayed_node(delayed_root, root, 0);
> +}
> +
> +int btrfs_insert_delayed_dir_index(struct btrfs_trans_handle *trans,
> +				   struct btrfs_root *root, const char *name,
> +				   int name_len, struct inode *dir,
> +				   struct btrfs_disk_key *disk_key, u8 type,
> +				   u64 index)
> +{
> +	struct btrfs_delayed_node *delayed_node;
> +	struct btrfs_delayed_item *delayed_item;
> +	struct btrfs_dir_item *dir_item;
> +	int ret;
> +
> +	delayed_node = btrfs_get_or_create_delayed_node(dir);
> +	if (IS_ERR(delayed_node))
> +		return PTR_ERR(delayed_node);
> +
> +	delayed_item = btrfs_alloc_delayed_item(sizeof(*dir_item) + name_len);
> +	if (!delayed_item) {
> +		ret = -ENOMEM;
> +		goto release_node;
> +	}
> +
> +	ret = btrfs_delayed_item_reserve_metadata(trans, root, delayed_item);
> +	/*
> +	 * we have reserved the enough space when we start a new
                            ^^^ remove

> +	 * transaction, so reserving metadata fails is impossible
                                              failure

> +	 */
> +	BUG_ON(ret);
> +
> +	delayed_item->key.objectid = dir->i_ino;
> +	btrfs_set_key_type(&delayed_item->key, BTRFS_DIR_INDEX_KEY);
> +	delayed_item->key.offset = index;
> +
> +	dir_item = (struct btrfs_dir_item *)delayed_item->data;
> +	dir_item->location = *disk_key;
> +	dir_item->transid = cpu_to_le64(trans->transid);
> +	dir_item->data_len = 0;
> +	dir_item->name_len = cpu_to_le16(name_len);
> +	dir_item->type = type;
> +	memcpy((char *)(dir_item + 1), name, name_len);
> +
> +	mutex_lock(&delayed_node->mutex);
> +	ret = __btrfs_add_delayed_insertion_item(delayed_node, delayed_item);

Ret can become only 0 or -EEXIST, a printk would help here understand
what happened.

> +	BUG_ON(ret);
> +	mutex_unlock(&delayed_node->mutex);
> +
> +release_node:
> +	btrfs_release_delayed_node(delayed_node);
> +
> +	btrfs_balance_delayed_items(trans, root);
> +	return ret;
> +}
> +
> +static int btrfs_delete_delayed_insertion_item(struct btrfs_root *root,
> +					       struct btrfs_delayed_node *node,
> +					       struct btrfs_key *key)
> +{
> +	struct btrfs_delayed_item *item;
> +
> +	mutex_lock(&node->mutex);
> +	item = __btrfs_lookup_delayed_insertion_item(node, key);
> +	if (!item) {
> +		mutex_unlock(&node->mutex);
> +		return 1;
> +	}
> +
> +	btrfs_delayed_item_release_metadata(root, item);
> +	btrfs_release_delayed_item(item);
> +	mutex_unlock(&node->mutex);
> +	return 0;
> +}
> +
> +int btrfs_delete_delayed_dir_index(struct btrfs_trans_handle *trans,
> +				   struct btrfs_root *root, struct inode *dir,
> +				   u64 index)
> +{
> +	struct btrfs_delayed_node *node;
> +	struct btrfs_delayed_item *item;
> +	struct btrfs_key item_key;
> +	int ret;
> +
> +	node = btrfs_get_or_create_delayed_node(dir);
> +	if (IS_ERR(node))
> +		return PTR_ERR(node);
> +
> +	item_key.objectid = dir->i_ino;
> +	btrfs_set_key_type(&item_key, BTRFS_DIR_INDEX_KEY);
> +	item_key.offset = index;
> +
> +	ret = btrfs_delete_delayed_insertion_item(root, node, &item_key);
> +	if (!ret)
> +		goto end;
> +
> +	item = btrfs_alloc_delayed_item(0);
> +	if (!item) {
> +		ret = -ENOMEM;
> +		goto end;
> +	}
> +
> +	item->key = item_key;
> +
> +	ret = btrfs_delayed_item_reserve_metadata(trans, root, item);
> +	/*
> +	 * we have reserved the enough space when we start a new
> +	 * transaction, so reserving metadata fails is impossible
> +	 */
> +	BUG_ON(ret);
> +
> +	mutex_lock(&node->mutex);
> +	ret = __btrfs_add_delayed_deletion_item(node, item);
> +	BUG_ON(ret);
> +	mutex_unlock(&node->mutex);
> +end:
> +	btrfs_release_delayed_node(node);
> +	btrfs_balance_delayed_items(trans, root);
> +	return ret;
> +}
> +
> +int btrfs_inode_delayed_dir_index_count(struct inode *inode)
> +{
> +	struct btrfs_delayed_node *delayed_node = BTRFS_I(inode)->delayed_node;
> +	int ret = 0;
> +
> +	if (!delayed_node)
> +		return -ENOENT;
> +
> +	/*
> +	 * Since we have held i_mutex of this directory, it is impossible that
> +	 * a new directory index is added into the delayed node and index_cnt
> +	 * is updated now. So we needn't lock the delayed node.
> +	 */
> +	if (!delayed_node->index_cnt)
> +		return -EINVAL;
> +
> +	BTRFS_I(inode)->index_cnt = delayed_node->index_cnt;
> +	return ret;
> +}
> +
> +struct btrfs_delayed_node *btrfs_get_locked_delayed_node(struct inode *inode)
> +{
> +	struct btrfs_delayed_node *delayed_node;
> +
> +	delayed_node = btrfs_get_delayed_node(inode);
> +	if (delayed_node)
> +		mutex_lock(&delayed_node->mutex);
> +
> +	return delayed_node;
> +}
> +
> +void btrfs_put_locked_delayed_node(struct btrfs_delayed_node *node)
> +{
> +	if (!node)
> +		return;
> +
> +	mutex_unlock(&node->mutex);
> +	/*
> +	 * This delayed node is still cached in the btrfs inode, so refs
> +	 * must be > 1 now, and we needn't check it is going to be freed
> +	 * or not.
> +	 *
> +	 * Besiede that, this function is used to read dir, we do not
           Besides

> +	 * insert/delete delayed items in this period. so we also needn't
                                                       So

> +	 * requeue or dequeue this delayed node.
> +	 */
> +	atomic_dec(&node->refs);
> +}
> +
> +int btrfs_should_delete_dir_index(struct btrfs_delayed_node *delayed_node,
> +				  struct btrfs_delayed_item **item,
> +				  u64 index)
> +{
> +	if (!delayed_node)
> +		return 0;
> +
> +	if (IS_ERR(*item))
> +		return 0;
> +
> +	if (!(*item))
> +		*item = __btrfs_first_delayed_deletion_item(delayed_node);
> +
> +	while (*item) {
> +		if ((*item)->key.offset < index) {
> +			*item = __btrfs_next_delayed_item(*item);
> +			continue;
> +		} else if ((*item)->key.offset > index)
> +			break;
> +
> +		*item = __btrfs_next_delayed_item(*item);
> +		if (!(*item))
> +			*item = ERR_PTR(-ERANGE);
> +		return 1;
> +	}
> +
> +	if (!(*item))
> +		*item = ERR_PTR(-ERANGE);
> +	return 0;
> +}
> +
> +/*
> + * btrfs_readdir_delayed_dir_index - read dir info stored in the delayed tree
> + *
> + */
> +int btrfs_readdir_delayed_dir_index(struct file *filp, void *dirent,
> +				    filldir_t filldir,
> +				    struct btrfs_delayed_node *delayed_node)
> +{
> +	struct btrfs_dir_item *di;
> +	struct btrfs_delayed_item *item;
> +	struct btrfs_key location;
> +	char *name;
> +	int name_len;
> +	int over = 0;
> +	unsigned char d_type;
> +
> +	if (!delayed_node)
> +		return 0;
> +
> +	/*
> +	 * Changing the data of the delayed item is impossible. So
> +	 * we needn't lock them. And we have held i_mutex of the directory,
> +	 * nobody can delete any directory indexs now.
                                           indexes

> +	 */
> +	item = __btrfs_first_delayed_insertion_item(delayed_node);
> +	while (item) {
> +		if (item->key.offset < filp->f_pos) {
> +			item = __btrfs_next_delayed_item(item);
> +			continue;
> +		}
> +
> +		filp->f_pos = item->key.offset;
> +
> +		di = (struct btrfs_dir_item *)item->data;
> +		name = (char *)(di + 1);
> +		name_len = le16_to_cpu(di->name_len);
> +
> +		d_type = btrfs_filetype_table[di->type];
> +		btrfs_disk_key_to_cpu(&location, &di->location);
> +
> +		over = filldir(dirent, name, name_len, item->key.offset,
> +			       location.objectid, d_type);
> +
> +		if (over)
> +			return 1;
> +		item = __btrfs_next_delayed_item(item);
> +	}
> +	return 0;
> +}
> +
> +BTRFS_SETGET_STACK_FUNCS(stack_inode_generation, struct btrfs_inode_item,
> +			 generation, 64);
> +BTRFS_SETGET_STACK_FUNCS(stack_inode_sequence, struct btrfs_inode_item,
> +			 sequence, 64);
> +BTRFS_SETGET_STACK_FUNCS(stack_inode_transid, struct btrfs_inode_item,
> +			 transid, 64);
> +BTRFS_SETGET_STACK_FUNCS(stack_inode_size, struct btrfs_inode_item, size, 64);
> +BTRFS_SETGET_STACK_FUNCS(stack_inode_nbytes, struct btrfs_inode_item,
> +			 nbytes, 64);
> +BTRFS_SETGET_STACK_FUNCS(stack_inode_block_group, struct btrfs_inode_item,
> +			 block_group, 64);
> +BTRFS_SETGET_STACK_FUNCS(stack_inode_nlink, struct btrfs_inode_item, nlink, 32);
> +BTRFS_SETGET_STACK_FUNCS(stack_inode_uid, struct btrfs_inode_item, uid, 32);
> +BTRFS_SETGET_STACK_FUNCS(stack_inode_gid, struct btrfs_inode_item, gid, 32);
> +BTRFS_SETGET_STACK_FUNCS(stack_inode_mode, struct btrfs_inode_item, mode, 32);
> +BTRFS_SETGET_STACK_FUNCS(stack_inode_rdev, struct btrfs_inode_item, rdev, 64);
> +BTRFS_SETGET_STACK_FUNCS(stack_inode_flags, struct btrfs_inode_item, flags, 64);
> +
> +BTRFS_SETGET_STACK_FUNCS(stack_timespec_sec, struct btrfs_timespec, sec, 64);
> +BTRFS_SETGET_STACK_FUNCS(stack_timespec_nsec, struct btrfs_timespec, nsec, 32);
> +
> +static void fill_stack_inode_item(struct btrfs_trans_handle *trans,
> +				  struct btrfs_inode_item *inode_item,
> +				  struct inode *inode)
> +{
> +	btrfs_set_stack_inode_uid(inode_item, inode->i_uid);
> +	btrfs_set_stack_inode_gid(inode_item, inode->i_gid);
> +	btrfs_set_stack_inode_size(inode_item, BTRFS_I(inode)->disk_i_size);
> +	btrfs_set_stack_inode_mode(inode_item, inode->i_mode);
> +	btrfs_set_stack_inode_nlink(inode_item, inode->i_nlink);
> +	btrfs_set_stack_inode_nbytes(inode_item, inode_get_bytes(inode));
> +	btrfs_set_stack_inode_generation(inode_item,
> +					 BTRFS_I(inode)->generation);
> +	btrfs_set_stack_inode_sequence(inode_item, BTRFS_I(inode)->sequence);
> +	btrfs_set_stack_inode_transid(inode_item, trans->transid);
> +	btrfs_set_stack_inode_rdev(inode_item, inode->i_rdev);
> +	btrfs_set_stack_inode_flags(inode_item, BTRFS_I(inode)->flags);
> +	btrfs_set_stack_inode_block_group(inode_item,
> +					  BTRFS_I(inode)->block_group);
> +
> +	btrfs_set_stack_timespec_sec(btrfs_inode_atime(inode_item),
> +				     inode->i_atime.tv_sec);
> +	btrfs_set_stack_timespec_nsec(btrfs_inode_atime(inode_item),
> +				      inode->i_atime.tv_nsec);
> +
> +	btrfs_set_stack_timespec_sec(btrfs_inode_mtime(inode_item),
> +				     inode->i_mtime.tv_sec);
> +	btrfs_set_stack_timespec_nsec(btrfs_inode_mtime(inode_item),
> +				      inode->i_mtime.tv_nsec);
> +
> +	btrfs_set_stack_timespec_sec(btrfs_inode_ctime(inode_item),
> +				     inode->i_ctime.tv_sec);
> +	btrfs_set_stack_timespec_nsec(btrfs_inode_ctime(inode_item),
> +				      inode->i_ctime.tv_nsec);
> +}
> +
> +int btrfs_delayed_update_inode(struct btrfs_trans_handle *trans,
> +			       struct btrfs_root *root, struct inode *inode)
> +{
> +	struct btrfs_delayed_node *delayed_node;
> +	int ret;
> +
> +	delayed_node = btrfs_get_or_create_delayed_node(inode);
> +	if (IS_ERR(delayed_node))
> +		return PTR_ERR(delayed_node);
> +
> +	mutex_lock(&delayed_node->mutex);
> +	if (delayed_node->inode_dirty) {
> +		fill_stack_inode_item(trans, &delayed_node->inode_item, inode);
> +		goto release_node;
> +	}
> +
(Extra empty line)

> +
> +	ret = btrfs_delayed_inode_reserve_metadata(trans, root, delayed_node);
> +	/*
> +	 * we must reserve the enough space when we start a new
                           ^^^ remove
> +	 * transaction, so reserving metadata fails is impossible
                                              failure

> +	 */
> +	BUG_ON(ret);
> +
> +	fill_stack_inode_item(trans, &delayed_node->inode_item, inode);
> +	delayed_node->inode_dirty = 1;
> +	delayed_node->count++;
> +	atomic_inc(&delayed_node->delayed_root->items);
> +release_node:
> +	mutex_unlock(&delayed_node->mutex);
> +	btrfs_release_delayed_node(delayed_node);
> +
> +	btrfs_balance_delayed_items(trans, root);
> +	return ret;
> +}
> +
> +int btrfs_kill_delayed_inode_items(struct btrfs_trans_handle *trans,
> +				   struct inode *inode)
> +{
> +	struct btrfs_root *root = BTRFS_I(inode)->root;
> +	struct btrfs_delayed_node *delayed_node;
> +	struct btrfs_delayed_item *curr_item, *prev_item;
> +
> +	delayed_node = btrfs_get_delayed_node(inode);
> +	if (!delayed_node)
> +		return 0;
> +
> +	mutex_lock(&delayed_node->mutex);
> +	curr_item = __btrfs_first_delayed_insertion_item(delayed_node);
> +	while (curr_item) {
> +		btrfs_delayed_item_release_metadata(root, curr_item);
> +		prev_item = curr_item;
> +		curr_item = __btrfs_next_delayed_item(prev_item);
> +		btrfs_release_delayed_item(prev_item);
> +	}
> +
> +	curr_item = __btrfs_first_delayed_deletion_item(delayed_node);
> +	while (curr_item) {
> +		btrfs_delayed_item_release_metadata(root, curr_item);
> +		prev_item = curr_item;
> +		curr_item = __btrfs_next_delayed_item(prev_item);
> +		btrfs_release_delayed_item(prev_item);
> +	}
> +
> +	if (delayed_node->inode_dirty) {
> +		btrfs_delayed_inode_release_metadata(root, delayed_node);
> +		btrfs_release_delayed_inode(delayed_node);
> +	}
> +	mutex_unlock(&delayed_node->mutex);
> +
> +	btrfs_release_delayed_node(delayed_node);
> +
> +	return 0;
> +}
> diff --git a/fs/btrfs/delayed-inode.h b/fs/btrfs/delayed-inode.h
> new file mode 100644
> index 0000000..c1f8c5f
> --- /dev/null
> +++ b/fs/btrfs/delayed-inode.h
> @@ -0,0 +1,122 @@
> +/*
> + * Copyright (C) 2011 Fujitsu.  All rights reserved.
> + * Written by Miao Xie <miaox@xxxxxxxxxxxxxx>
> + *
> + * This program is free software; you can redistribute it and/or
> + * modify it under the terms of the GNU General Public
> + * License v2 as published by the Free Software Foundation.
> + *
> + * This program is distributed in the hope that it will be useful,
> + * but WITHOUT ANY WARRANTY; without even the implied warranty of
> + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
> + * General Public License for more details.
> + *
> + * You should have received a copy of the GNU General Public
> + * License along with this program; if not, write to the
> + * Free Software Foundation, Inc., 59 Temple Place - Suite 330,
> + * Boston, MA 021110-1307, USA.
> + */
> +
> +#ifndef __DELAYED_TREE_OPERATION_H
> +#define __DELAYED_TREE_OPERATION_H
> +
> +#include <linux/rbtree.h>
> +#include <linux/spinlock.h>
> +#include <linux/mutex.h>
> +#include <linux/list.h>
> +#include <linux/wait.h>
> +#include <asm/atomic.h>
> +
> +#include "ctree.h"
> +
> +/* types of the delayed item */
> +#define BTRFS_DELAYED_INSERTION_ITEM	1
> +#define BTRFS_DELAYED_DELETION_ITEM	2
> +
> +struct btrfs_delayed_root {
> +	spinlock_t lock;
> +	struct list_head node_list;
> +	atomic_t items;		/* for delayed items */
> +	int nodes;		/* for delayed nodes */
> +	wait_queue_head_t wait;
> +};
> +
> +struct btrfs_delayed_node {
> +	struct list_head list;
> +	u64 root_id;
> +	u64 inode_id;
> +	struct btrfs_delayed_root *delayed_root;
> +	struct rb_root ins_root;
> +	struct rb_root del_root;
> +	struct mutex mutex;
> +	spinlock_t lock;	/* protect delayed_doing */
> +	struct btrfs_inode_item inode_item;
> +	u64 bytes_reserved;
> +	struct btrfs_block_rsv *block_rsv;
> +	atomic_t refs;
> +	u64 index_cnt;

> +	int count;
> +	int in_list;
> +	int inode_dirty;
> +	int delayed_doing;

These could go to a bitfield, saves some bytes.

I haven't looked in detail, but the fields may be reordered, namely the
u64 ones, so there is no unnecessary alignment.

> +};
> +
> +struct btrfs_delayed_item {
> +	struct rb_node rb_node;
> +	struct btrfs_key key;
> +	struct list_head tree_list;	/* used for batch insert/delete items */
> +	u64 bytes_reserved;
> +	struct btrfs_block_rsv *block_rsv;
> +	struct btrfs_delayed_node *delayed_node;
> +	int ins_or_del;
> +	u32 data_len;
> +	char data[0];
> +};
> +
> +static inline void btrfs_init_delayed_root(
> +				struct btrfs_delayed_root *delayed_root)
> +{
> +	atomic_set(&delayed_root->items, 0);
> +	delayed_root->nodes = 0;
> +	spin_lock_init(&delayed_root->lock);
> +	init_waitqueue_head(&delayed_root->wait);
> +	INIT_LIST_HEAD(&delayed_root->node_list);
> +}
> +
> +int btrfs_insert_delayed_dir_index(struct btrfs_trans_handle *trans,
> +				   struct btrfs_root *root, const char *name,
> +				   int name_len, struct inode *dir,
> +				   struct btrfs_disk_key *disk_key, u8 type,
> +				   u64 index);
> +
> +int btrfs_delete_delayed_dir_index(struct btrfs_trans_handle *trans,
> +				   struct btrfs_root *root, struct inode *dir,
> +				   u64 index);
> +
> +int btrfs_inode_delayed_dir_index_count(struct inode *inode);
> +
> +int btrfs_run_delayed_items(struct btrfs_trans_handle *trans,
> +			    struct btrfs_root *root);
> +
> +int btrfs_commit_inode_delayed_items(struct btrfs_trans_handle *trans,
> +				     struct btrfs_root *root,
> +				     struct inode *inode);
> +/* Used for evicting the inode. */
> +int btrfs_remove_delayed_node(struct inode *inode);
> +int btrfs_kill_delayed_inode_items(struct btrfs_trans_handle *trans,
> +				   struct inode *inode);
> +
> +
> +int btrfs_delayed_update_inode(struct btrfs_trans_handle *trans,
> +			       struct btrfs_root *root, struct inode *inode);
> +
> +/* Used for readdir() */
> +struct btrfs_delayed_node *btrfs_get_locked_delayed_node(struct inode *inode);
> +void btrfs_put_locked_delayed_node(struct btrfs_delayed_node *node);
> +int btrfs_should_delete_dir_index(struct btrfs_delayed_node *delayed_node,
> +				  struct btrfs_delayed_item **item,
> +				  u64 index);
> +int btrfs_readdir_delayed_dir_index(struct file *filp, void *dirent,
> +				    filldir_t filldir,
> +				    struct btrfs_delayed_node *delayed_node);
> +#endif
> diff --git a/fs/btrfs/dir-item.c b/fs/btrfs/dir-item.c
> index f0cad5a..696c793 100644
> --- a/fs/btrfs/dir-item.c
> +++ b/fs/btrfs/dir-item.c
> @@ -124,8 +124,9 @@ int btrfs_insert_xattr_item(struct btrfs_trans_handle *trans,
>   * to use for the second index (if one is created).
>   */
>  int btrfs_insert_dir_item(struct btrfs_trans_handle *trans, struct btrfs_root
> -			  *root, const char *name, int name_len, u64 dir,
> -			  struct btrfs_key *location, u8 type, u64 index)
> +			  *root, const char *name, int name_len,
> +			  struct inode *dir, struct btrfs_key *location,
> +			  u8 type, u64 index)

Can 'type' and 'index' be swapped? alignment of u64 after u8 will waste
some stack space. Probably as a separate patch, hmm.

>  {
>  	int ret = 0;
>  	int ret2 = 0;
> @@ -137,13 +138,17 @@ int btrfs_insert_dir_item(struct btrfs_trans_handle *trans, struct btrfs_root
>  	struct btrfs_disk_key disk_key;
>  	u32 data_size;
>  
> -	key.objectid = dir;
> +	key.objectid = dir->i_ino;
>  	btrfs_set_key_type(&key, BTRFS_DIR_ITEM_KEY);
>  	key.offset = btrfs_name_hash(name, name_len);
>  
>  	path = btrfs_alloc_path();
> +	if (!path)
> +		return -ENOMEM;
>  	path->leave_spinning = 1;
>  
> +	btrfs_cpu_key_to_disk(&disk_key, location);
> +
>  	data_size = sizeof(*dir_item) + name_len;
>  	dir_item = insert_with_overflow(trans, root, path, &key, data_size,
>  					name, name_len);
> @@ -155,7 +160,6 @@ int btrfs_insert_dir_item(struct btrfs_trans_handle *trans, struct btrfs_root
>  	}
>  
>  	leaf = path->nodes[0];
> -	btrfs_cpu_key_to_disk(&disk_key, location);
>  	btrfs_set_dir_item_key(leaf, dir_item, &disk_key);
>  	btrfs_set_dir_type(leaf, dir_item, type);
>  	btrfs_set_dir_data_len(leaf, dir_item, 0);
> @@ -174,24 +178,8 @@ second_insert:
>  	}
>  	btrfs_release_path(root, path);
>  
> -	btrfs_set_key_type(&key, BTRFS_DIR_INDEX_KEY);
> -	key.offset = index;
> -	dir_item = insert_with_overflow(trans, root, path, &key, data_size,
> -					name, name_len);
> -	if (IS_ERR(dir_item)) {
> -		ret2 = PTR_ERR(dir_item);
> -		goto out;
> -	}
> -	leaf = path->nodes[0];
> -	btrfs_cpu_key_to_disk(&disk_key, location);
> -	btrfs_set_dir_item_key(leaf, dir_item, &disk_key);
> -	btrfs_set_dir_type(leaf, dir_item, type);
> -	btrfs_set_dir_data_len(leaf, dir_item, 0);
> -	btrfs_set_dir_name_len(leaf, dir_item, name_len);
> -	btrfs_set_dir_transid(leaf, dir_item, trans->transid);
> -	name_ptr = (unsigned long)(dir_item + 1);
> -	write_extent_buffer(leaf, name, name_ptr, name_len);
> -	btrfs_mark_buffer_dirty(leaf);
> +	ret2 = btrfs_insert_delayed_dir_index(trans, root, name, name_len, dir,
> +					      &disk_key, type, index);
>  out:
>  	btrfs_free_path(path);
>  	if (ret)
> diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c
> index 3e1ea3e..ea3c145 100644
> --- a/fs/btrfs/disk-io.c
> +++ b/fs/btrfs/disk-io.c
> @@ -1676,6 +1676,13 @@ struct btrfs_root *open_ctree(struct super_block *sb,
>  
>  	INIT_LIST_HEAD(&fs_info->ordered_extents);
>  	spin_lock_init(&fs_info->ordered_extent_lock);
> +	fs_info->delayed_root = kmalloc(sizeof(struct btrfs_delayed_root),
> +					GFP_NOFS);
> +	if (!fs_info->delayed_root) {
> +		err = -ENOMEM;
> +		goto fail_iput;
> +	}
> +	btrfs_init_delayed_root(fs_info->delayed_root);
>  
>  	sb->s_blocksize = 4096;
>  	sb->s_blocksize_bits = blksize_bits(4096);
> @@ -1743,7 +1750,7 @@ struct btrfs_root *open_ctree(struct super_block *sb,
>  	bh = btrfs_read_dev_super(fs_devices->latest_bdev);
>  	if (!bh) {
>  		err = -EINVAL;
> -		goto fail_iput;
> +		goto fail_alloc;
>  	}
>  
>  	memcpy(&fs_info->super_copy, bh->b_data, sizeof(fs_info->super_copy));
> @@ -1755,7 +1762,7 @@ struct btrfs_root *open_ctree(struct super_block *sb,
>  
>  	disk_super = &fs_info->super_copy;
>  	if (!btrfs_super_root(disk_super))
> -		goto fail_iput;
> +		goto fail_alloc;
>  
>  	/* check FS state, whether FS is broken. */
>  	fs_info->fs_state |= btrfs_super_flags(disk_super);
> @@ -1765,7 +1772,7 @@ struct btrfs_root *open_ctree(struct super_block *sb,
>  	ret = btrfs_parse_options(tree_root, options);
>  	if (ret) {
>  		err = ret;
> -		goto fail_iput;
> +		goto fail_alloc;
>  	}
>  
>  	features = btrfs_super_incompat_flags(disk_super) &
> @@ -1775,7 +1782,7 @@ struct btrfs_root *open_ctree(struct super_block *sb,
>  		       "unsupported optional features (%Lx).\n",
>  		       (unsigned long long)features);
>  		err = -EINVAL;
> -		goto fail_iput;
> +		goto fail_alloc;
>  	}
>  
>  	features = btrfs_super_incompat_flags(disk_super);
> @@ -1791,7 +1798,7 @@ struct btrfs_root *open_ctree(struct super_block *sb,
>  		       "unsupported option features (%Lx).\n",
>  		       (unsigned long long)features);
>  		err = -EINVAL;
> -		goto fail_iput;
> +		goto fail_alloc;
>  	}
>  
>  	btrfs_init_workers(&fs_info->generic_worker,
> @@ -1838,6 +1845,9 @@ struct btrfs_root *open_ctree(struct super_block *sb,
>  			   &fs_info->generic_worker);
>  	btrfs_init_workers(&fs_info->endio_freespace_worker, "freespace-write",
>  			   1, &fs_info->generic_worker);
> +	btrfs_init_workers(&fs_info->delayed_workers, "delayed-meta",
> +			   fs_info->thread_pool_size,
> +			   &fs_info->generic_worker);
>  
>  	/*
>  	 * endios are largely parallel and should have a very
> @@ -1859,6 +1869,7 @@ struct btrfs_root *open_ctree(struct super_block *sb,
>  	btrfs_start_workers(&fs_info->endio_meta_write_workers, 1);
>  	btrfs_start_workers(&fs_info->endio_write_workers, 1);
>  	btrfs_start_workers(&fs_info->endio_freespace_worker, 1);
> +	btrfs_start_workers(&fs_info->delayed_workers, 1);
>  
>  	fs_info->bdi.ra_pages *= btrfs_super_num_devices(disk_super);
>  	fs_info->bdi.ra_pages = max(fs_info->bdi.ra_pages,
> @@ -2104,6 +2115,9 @@ fail_sb_buffer:
>  	btrfs_stop_workers(&fs_info->endio_write_workers);
>  	btrfs_stop_workers(&fs_info->endio_freespace_worker);
>  	btrfs_stop_workers(&fs_info->submit_workers);
> +	btrfs_stop_workers(&fs_info->delayed_workers);
> +fail_alloc:
> +	kfree(fs_info->delayed_root);
>  fail_iput:
>  	invalidate_inode_pages2(fs_info->btree_inode->i_mapping);
>  	iput(fs_info->btree_inode);
> @@ -2551,6 +2565,7 @@ int close_ctree(struct btrfs_root *root)
>  	del_fs_roots(fs_info);
>  
>  	iput(fs_info->btree_inode);
> +	kfree(fs_info->delayed_root);
>  
>  	btrfs_stop_workers(&fs_info->generic_worker);
>  	btrfs_stop_workers(&fs_info->fixup_workers);
> @@ -2562,6 +2577,7 @@ int close_ctree(struct btrfs_root *root)
>  	btrfs_stop_workers(&fs_info->endio_write_workers);
>  	btrfs_stop_workers(&fs_info->endio_freespace_worker);
>  	btrfs_stop_workers(&fs_info->submit_workers);
> +	btrfs_stop_workers(&fs_info->delayed_workers);
>  
>  	btrfs_close_devices(fs_info->fs_devices);
>  	btrfs_mapping_tree_free(&fs_info->mapping_tree);
> diff --git a/fs/btrfs/extent-tree.c b/fs/btrfs/extent-tree.c
> index a7aaa10..0512b04 100644
> --- a/fs/btrfs/extent-tree.c
> +++ b/fs/btrfs/extent-tree.c
> @@ -3897,12 +3897,6 @@ static void release_global_block_rsv(struct btrfs_fs_info *fs_info)
>  	WARN_ON(fs_info->chunk_block_rsv.reserved > 0);
>  }
>  
> -static u64 calc_trans_metadata_size(struct btrfs_root *root, int num_items)
> -{
> -	return (root->leafsize + root->nodesize * (BTRFS_MAX_LEVEL - 1)) *
> -		3 * num_items;
> -}
> -
>  int btrfs_trans_reserve_metadata(struct btrfs_trans_handle *trans,
>  				 struct btrfs_root *root,
>  				 int num_items)
> @@ -3913,7 +3907,7 @@ int btrfs_trans_reserve_metadata(struct btrfs_trans_handle *trans,
>  	if (num_items == 0 || root->fs_info->chunk_root == root)
>  		return 0;
>  
> -	num_bytes = calc_trans_metadata_size(root, num_items);
> +	num_bytes = btrfs_calc_trans_metadata_size(root, num_items);
>  	ret = btrfs_block_rsv_add(trans, root, &root->fs_info->trans_block_rsv,
>  				  num_bytes);
>  	if (!ret) {
> @@ -3952,14 +3946,14 @@ int btrfs_orphan_reserve_metadata(struct btrfs_trans_handle *trans,
>  	 * If all of the metadata space is used, we can commit
>  	 * transaction and use space it freed.
>  	 */
> -	u64 num_bytes = calc_trans_metadata_size(root, 4);
> +	u64 num_bytes = btrfs_calc_trans_metadata_size(root, 4);
>  	return block_rsv_migrate_bytes(src_rsv, dst_rsv, num_bytes);
>  }
>  
>  void btrfs_orphan_release_metadata(struct inode *inode)
>  {
>  	struct btrfs_root *root = BTRFS_I(inode)->root;
(newline after declarations)

> -	u64 num_bytes = calc_trans_metadata_size(root, 4);
> +	u64 num_bytes = btrfs_calc_trans_metadata_size(root, 4);
>  	btrfs_block_rsv_release(root, root->orphan_block_rsv, num_bytes);
>  }
>  
> @@ -3973,7 +3967,7 @@ int btrfs_snap_reserve_metadata(struct btrfs_trans_handle *trans,
>  	 * two for root back/forward refs, two for directory entries
>  	 * and one for root of the snapshot.
>  	 */
> -	u64 num_bytes = calc_trans_metadata_size(root, 5);
> +	u64 num_bytes = btrfs_calc_trans_metadata_size(root, 5);
(newline after declarations)

>  	dst_rsv->space_info = src_rsv->space_info;
>  	return block_rsv_migrate_bytes(src_rsv, dst_rsv, num_bytes);
>  }
> @@ -4000,7 +3994,7 @@ int btrfs_delalloc_reserve_metadata(struct inode *inode, u64 num_bytes)
>  	nr_extents = atomic_read(&BTRFS_I(inode)->outstanding_extents) + 1;
>  	if (nr_extents > BTRFS_I(inode)->reserved_extents) {
>  		nr_extents -= BTRFS_I(inode)->reserved_extents;
> -		to_reserve = calc_trans_metadata_size(root, nr_extents);
> +		to_reserve = btrfs_calc_trans_metadata_size(root, nr_extents);
>  	} else {
>  		nr_extents = 0;
>  		to_reserve = 0;
> @@ -4047,7 +4041,7 @@ void btrfs_delalloc_release_metadata(struct inode *inode, u64 num_bytes)
>  
>  	to_free = calc_csum_metadata_size(inode, num_bytes);
>  	if (nr_extents > 0)
> -		to_free += calc_trans_metadata_size(root, nr_extents);
> +		to_free += btrfs_calc_trans_metadata_size(root, nr_extents);
>  
>  	btrfs_block_rsv_release(root, &root->fs_info->delalloc_block_rsv,
>  				to_free);
> diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c
> index 8d392ed..09152fe 100644
> --- a/fs/btrfs/inode.c
> +++ b/fs/btrfs/inode.c
> @@ -2598,33 +2598,12 @@ static void fill_inode_item(struct btrfs_trans_handle *trans,
>  noinline int btrfs_update_inode(struct btrfs_trans_handle *trans,
>  				struct btrfs_root *root, struct inode *inode)
>  {
> -	struct btrfs_inode_item *inode_item;
> -	struct btrfs_path *path;
> -	struct extent_buffer *leaf;
>  	int ret;
>  
> -	path = btrfs_alloc_path();
> -	BUG_ON(!path);
> -	path->leave_spinning = 1;
> -	ret = btrfs_lookup_inode(trans, root, path,
> -				 &BTRFS_I(inode)->location, 1);
> -	if (ret) {
> -		if (ret > 0)
> -			ret = -ENOENT;
> -		goto failed;
> -	}
> -
> -	btrfs_unlock_up_safe(path, 1);
> -	leaf = path->nodes[0];
> -	inode_item = btrfs_item_ptr(leaf, path->slots[0],
> -				  struct btrfs_inode_item);
> +	ret = btrfs_delayed_update_inode(trans, root, inode);
> +	if (!ret)
> +		btrfs_set_inode_last_trans(trans, inode);
>  
> -	fill_inode_item(trans, leaf, inode_item, inode);
> -	btrfs_mark_buffer_dirty(leaf);
> -	btrfs_set_inode_last_trans(trans, inode);
> -	ret = 0;
> -failed:
> -	btrfs_free_path(path);
>  	return ret;
>  }
>  
> @@ -2680,18 +2659,9 @@ int btrfs_unlink_inode(struct btrfs_trans_handle *trans,
>  		goto err;
>  	}
>  
> -	di = btrfs_lookup_dir_index_item(trans, root, path, dir->i_ino,
> -					 index, name, name_len, -1);
> -	if (IS_ERR(di)) {
> -		ret = PTR_ERR(di);
> -		goto err;
> -	}
> -	if (!di) {
> -		ret = -ENOENT;
> +	ret = btrfs_delete_delayed_dir_index(trans, root, dir, index);
> +	if (ret)
>  		goto err;
> -	}
> -	ret = btrfs_delete_one_dir_name(trans, root, path, di);
> -	btrfs_release_path(root, path);
>  
>  	ret = btrfs_del_inode_ref_in_log(trans, root, name, name_len,
>  					 inode, dir->i_ino);
> @@ -2867,6 +2837,14 @@ static struct btrfs_trans_handle *__unlink_start_trans(struct inode *dir,
>  	index = btrfs_inode_ref_index(path->nodes[0], ref);
>  	btrfs_release_path(root, path);
>  
> +	/*
> +	 * This is a commit root search, if we can lookup inode item and other
> +	 * relative items in the commit root, it means the transaction of
> +	 * dir/file creation has been committed, and the dir index item that we
> +	 * delay to insert has also been inserted into the commit root. So
> +	 * we needn't worry about the delayed insertion of the dir index item
> +	 * here.
> +	 */
>  	di = btrfs_lookup_dir_index_item(trans, root, path, dir->i_ino, index,
>  				dentry->d_name.name, dentry->d_name.len, 0);
>  	if (IS_ERR(di)) {
> @@ -2972,24 +2950,16 @@ int btrfs_unlink_subvol(struct btrfs_trans_handle *trans,
>  		btrfs_release_path(root, path);
>  		index = key.offset;
>  	}
> +	btrfs_release_path(root, path);
>  
> -	di = btrfs_lookup_dir_index_item(trans, root, path, dir->i_ino,
> -					 index, name, name_len, -1);
> -	BUG_ON(!di || IS_ERR(di));
> -
> -	leaf = path->nodes[0];
> -	btrfs_dir_item_key_to_cpu(leaf, di, &key);
> -	WARN_ON(key.type != BTRFS_ROOT_ITEM_KEY || key.objectid != objectid);
> -	ret = btrfs_delete_one_dir_name(trans, root, path, di);
> +	ret = btrfs_delete_delayed_dir_index(trans, root, dir, index);
>  	BUG_ON(ret);
> -	btrfs_release_path(root, path);
>  
>  	btrfs_i_size_write(dir, dir->i_size - name_len * 2);
>  	dir->i_mtime = dir->i_ctime = CURRENT_TIME;
>  	ret = btrfs_update_inode(trans, root, dir);
>  	BUG_ON(ret);
>  
> -	btrfs_free_path(path);
>  	return 0;
>  }
>  
> @@ -3249,6 +3219,15 @@ int btrfs_truncate_inode_items(struct btrfs_trans_handle *trans,
>  	if (root->ref_cows || root == root->fs_info->tree_root)
>  		btrfs_drop_extent_cache(inode, new_size & (~mask), (u64)-1, 0);
>  
> +	/*
> +	 * This function is also used to drop the items in the log tree before
> +	 * we relog the inode, so if root != BTRFS_I(inode)->root, it means
> +	 * it is used to drop the loged items. So we shouldn't kill the delayed
> +	 * items.
> +	 */
> +	if (min_type == 0 && root == BTRFS_I(inode)->root)
> +		btrfs_kill_delayed_inode_items(trans, inode);
> +
>  	path = btrfs_alloc_path();
>  	BUG_ON(!path);
>  	path->reada = -1;
> @@ -4182,7 +4161,7 @@ static struct dentry *btrfs_lookup(struct inode *dir, struct dentry *dentry,
>  	return d_splice_alias(inode, dentry);
>  }
>  
> -static unsigned char btrfs_filetype_table[] = {
> +unsigned char btrfs_filetype_table[] = {
>  	DT_UNKNOWN, DT_REG, DT_DIR, DT_CHR, DT_BLK, DT_FIFO, DT_SOCK, DT_LNK
>  };
>  
> @@ -4196,6 +4175,8 @@ static int btrfs_real_readdir(struct file *filp, void *dirent,
>  	struct btrfs_key key;
>  	struct btrfs_key found_key;
>  	struct btrfs_path *path;
> +	struct btrfs_delayed_node *delayed_node = NULL;
> +	struct btrfs_delayed_item *delayed_item = NULL;
>  	int ret;
>  	u32 nritems;
>  	struct extent_buffer *leaf;
> @@ -4210,6 +4191,7 @@ static int btrfs_real_readdir(struct file *filp, void *dirent,
>  	char tmp_name[32];
>  	char *name_ptr;
>  	int name_len;
> +	int is_curr = 0;	/* filp->f_pos points to the current index? */
>  
>  	/* FIXME, use a real flag for deciding about the key type */
>  	if (root->fs_info->tree_root == root)
> @@ -4234,8 +4216,13 @@ static int btrfs_real_readdir(struct file *filp, void *dirent,
>  		filp->f_pos = 2;
>  	}
>  	path = btrfs_alloc_path();
> +	if (!path)
> +		return -ENOMEM;
>  	path->reada = 2;
>  
> +	if (key_type == BTRFS_DIR_INDEX_KEY)
> +		delayed_node = btrfs_get_locked_delayed_node(inode);
> +
>  	btrfs_set_key_type(&key, key_type);
>  	key.offset = filp->f_pos;
>  	key.objectid = inode->i_ino;
> @@ -4273,8 +4260,13 @@ static int btrfs_real_readdir(struct file *filp, void *dirent,
>  			break;
>  		if (found_key.offset < filp->f_pos)
>  			continue;
> +		if (key_type == BTRFS_DIR_INDEX_KEY &&
> +		    btrfs_should_delete_dir_index(delayed_node, &delayed_item,
> +						  found_key.offset))
> +			continue;
>  
>  		filp->f_pos = found_key.offset;
> +		is_curr = 1;
>  
>  		di = btrfs_item_ptr(leaf, slot, struct btrfs_dir_item);
>  		di_cur = 0;
> @@ -4324,6 +4316,15 @@ skip:
>  		}
>  	}
>  
> +	if (key_type == BTRFS_DIR_INDEX_KEY) {
> +		if (is_curr)
> +			filp->f_pos++;
> +		ret = btrfs_readdir_delayed_dir_index(filp, dirent, filldir,
> +						      delayed_node);
> +		if (ret)
> +			goto nopos;
> +	}
> +
>  	/* Reached end of directory/root. Bump pos past the last item. */
>  	if (key_type == BTRFS_DIR_INDEX_KEY)
>  		/*
> @@ -4336,6 +4337,8 @@ skip:
>  nopos:
>  	ret = 0;
>  err:
> +	if (key_type == BTRFS_DIR_INDEX_KEY)
> +		btrfs_put_locked_delayed_node(delayed_node);
>  	btrfs_free_path(path);
>  	return ret;
>  }
> @@ -4481,9 +4484,12 @@ int btrfs_set_inode_index(struct inode *dir, u64 *index)
>  	int ret = 0;
>  
>  	if (BTRFS_I(dir)->index_cnt == (u64)-1) {
> -		ret = btrfs_set_inode_index_count(dir);
> -		if (ret)
> -			return ret;
> +		ret = btrfs_inode_delayed_dir_index_count(dir);
> +		if (ret) {
> +			ret = btrfs_set_inode_index_count(dir);
> +			if (ret)
> +				return ret;
> +		}
>  	}
>  
>  	*index = BTRFS_I(dir)->index_cnt;
> @@ -4641,7 +4647,7 @@ int btrfs_add_link(struct btrfs_trans_handle *trans,
>  
>  	if (ret == 0) {
>  		ret = btrfs_insert_dir_item(trans, root, name, name_len,
> -					    parent_inode->i_ino, &key,
> +					    parent_inode, &key,
>  					    btrfs_inode_type(inode), index);
>  		BUG_ON(ret);
>  
> @@ -6519,6 +6525,8 @@ struct inode *btrfs_alloc_inode(struct super_block *sb)
>  	ei->dummy_inode = 0;
>  	ei->force_compress = BTRFS_COMPRESS_NONE;
>  
> +	ei->delayed_node = NULL;
> +
>  	inode = &ei->vfs_inode;
>  	extent_map_tree_init(&ei->extent_tree, GFP_NOFS);
>  	extent_io_tree_init(&ei->io_tree, &inode->i_data, GFP_NOFS);
> @@ -6602,6 +6610,7 @@ void btrfs_destroy_inode(struct inode *inode)
>  	inode_tree_del(inode);
>  	btrfs_drop_extent_cache(inode, 0, (u64)-1, 0);
>  free:
> +	btrfs_remove_delayed_node(inode);
>  	kmem_cache_free(btrfs_inode_cachep, BTRFS_I(inode));
>  }
>  
> diff --git a/fs/btrfs/ioctl.c b/fs/btrfs/ioctl.c
> index be2d4f6..c7bb1c3 100644
> --- a/fs/btrfs/ioctl.c
> +++ b/fs/btrfs/ioctl.c
> @@ -333,7 +333,7 @@ static noinline int create_subvol(struct btrfs_root *root,
>  	BUG_ON(ret);
>  
>  	ret = btrfs_insert_dir_item(trans, root,
> -				    name, namelen, dir->i_ino, &key,
> +				    name, namelen, dir, &key,
>  				    BTRFS_FT_DIR, index);
>  	if (ret)
>  		goto fail;
> diff --git a/fs/btrfs/super.c b/fs/btrfs/super.c
> index 0209b5f..6fb8199 100644
> --- a/fs/btrfs/super.c
> +++ b/fs/btrfs/super.c
> @@ -40,6 +40,7 @@
>  #include <linux/magic.h>
>  #include <linux/slab.h>
>  #include "compat.h"
> +#include "delayed-inode.h"
>  #include "ctree.h"
>  #include "disk-io.h"
>  #include "transaction.h"
> diff --git a/fs/btrfs/transaction.c b/fs/btrfs/transaction.c
> index 3d73c8d..f3e250f 100644
> --- a/fs/btrfs/transaction.c
> +++ b/fs/btrfs/transaction.c
> @@ -958,7 +958,7 @@ static noinline int create_pending_snapshot(struct btrfs_trans_handle *trans,
>  	BUG_ON(ret);
>  	ret = btrfs_insert_dir_item(trans, parent_root,
>  				dentry->d_name.name, dentry->d_name.len,
> -				parent_inode->i_ino, &key,
> +				parent_inode, &key,
>  				BTRFS_FT_DIR, index);
>  	BUG_ON(ret);
>  
> @@ -1302,9 +1302,14 @@ int btrfs_commit_transaction(struct btrfs_trans_handle *trans,
>  	} while (cur_trans->num_writers > 1 ||
>  		 (should_grow && cur_trans->num_joined != joined));
>  
> +	btrfs_run_delayed_items(trans, root);
> +
>  	ret = create_pending_snapshots(trans, root->fs_info);
>  	BUG_ON(ret);
>  
> +	btrfs_run_delayed_items(trans, root);
> +	BUG_ON(ret);
> +
>  	ret = btrfs_run_delayed_refs(trans, root, (unsigned long)-1);
>  	BUG_ON(ret);
>  
> diff --git a/fs/btrfs/tree-log.c b/fs/btrfs/tree-log.c
> index a4bbb85..e8a2e95 100644
> --- a/fs/btrfs/tree-log.c
> +++ b/fs/btrfs/tree-log.c
> @@ -2775,6 +2775,13 @@ static int btrfs_log_inode(struct btrfs_trans_handle *trans,
>  		max_key.type = (u8)-1;
>  	max_key.offset = (u64)-1;
>  
> +	ret = btrfs_commit_inode_delayed_items(trans, root, inode);
> +	if (ret) {
> +		btrfs_free_path(path);
> +		btrfs_free_path(dst_path);
> +		return ret;
> +	}
> +
>  	mutex_lock(&BTRFS_I(inode)->log_mutex);
>  
>  	/*

That's all for now. I have probably missed a few 'make it static' and
the code style suggestions are not everywhere.

The patch improvement looks nice!

dave
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[Index of Archives]     [Linux Filesystem Development]     [Linux NFS]     [Linux NILFS]     [Linux USB Devel]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]

  Powered by Linux