[PATCH] raid0: data corruption when using trim

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hi,

There was report at Algolia's blog website regarding some problems using
software raid and trim in SSD.
https://blog.algolia.com/when-solid-state-drives-are-not-that-solid/

It turns out that there is misunderstanding between raid driver and scsi/ata
driver.
The raid driver lets split bios share bio vector of source bio.
Usually, there is no problem, because the raid layer ensures
that the source bio is not freed before the split bios.

But, in case of trim, there are some problems.
The scsi/ata needs some payloads that include start address and size of
device to trim.
So, the scsi/ata driver allocates a page and stores that pointer on
bio->bi_io_vec->bv_page.
(sd_setup_discard_cmnd)

Because split bios share the source bio's bi_io_vec,
the pointer to the allocated page in scsi/ata driver is overwritten.
It leads to memory leakage and data corruption
because the overwritten pointer has wrong address and size to trim on
device.

So, we add some codes that make sure not to share bio vector if the source
bio is discard.
The linear and raid10 also have same problem
because they're using similar scheme to split the source bio.

This problem exists at the very first time when the trim is enabled on
raid0.
Before the new bio_split() (20d0189b), we can patch more easily
by modifying the condition to split bio vector in bio_split function
from (bi->bi_vec != 0) to ((bi->bi_vec != 0) || (bi->bi_rw & REQ_DISCARD)).

Thank you.
Seunguk Shin.



[?0001-PATCH-raid0-data-corruption-when-using-trim.patch?]

>From ca7dbe01fcd2ef2f8cea1a38de5aca5c866c585d Mon Sep 17 00:00:00 2001
From: Seunguk Shin <seunguk.shin@xxxxxxxxxxx>
Date: Sat, 18 Jul 2015 20:13:44 +0900
Subject: [PATCH] [PATCH] raid0: data corruption when using trim

When we are using software raid and tirm, there is data corruption.

The raid driver lets split bios share bio vector of source bio.
The scsi/ata driver allocates a page and stores that pointer on
bio->bi_io_vec->bv_page
(sd_setup_discard_cmnd) because the scsi/ata needs some payloads
that include start address and size of device to trim.
Because split bios share the source bio's bi_io_vec,
the pointer to the allocated page in scsi/ata driver is overwritten.

This patch splits bio vector if bio is discard.
---
block/bio.c         |  6 ------
drivers/md/raid0.c  | 13 +++++++++++++
include/linux/bio.h |  6 ++++++
3 files changed, 19 insertions(+), 6 deletions(-)

diff --git a/block/bio.c b/block/bio.c
index 2a00d34..df7589d 100644
--- a/block/bio.c
+++ b/block/bio.c
@@ -32,12 +32,6 @@
#include <trace/events/block.h>

 /*
- * Test patch to inline a certain number of bi_io_vec's inside the bio
- * itself, to shrink a bio data allocation from two mempool calls to one
- */
-#define BIO_INLINE_VECS                     4
-
-/*
  * if you change this list, also change bvec_alloc or things will
  * break badly! cannot be bigger than what you can fit into an
  * unsigned short
diff --git a/drivers/md/raid0.c b/drivers/md/raid0.c
index efb654e..fd1318b 100644
--- a/drivers/md/raid0.c
+++ b/drivers/md/raid0.c
@@ -529,6 +529,19 @@ static void raid0_make_request(struct mddev *mddev,
struct bio *bio)

                     if (sectors < bio_sectors(bio)) {
                               split = bio_split(bio, sectors, GFP_NOIO,
fs_bio_set);
+                              if (unlikely(split->bi_rw & REQ_DISCARD)) {
+                                         struct bio_vec *bvl = NULL;
+                                         unsigned long idx = BIO_POOL_NONE;
+
+                                         bvl = bvec_alloc(GFP_NOIO,
BIO_INLINE_VECS + 1,
+                                                              &idx,
fs_bio_set->bvec_pool);
+                                         split->bi_flags |= 1 <<
BIO_OWNS_VEC;
+                                         split->bi_flags &= ~(BIO_POOL_NONE
<<
+                                                             
BIO_POOL_OFFSET);
+                                         split->bi_flags |= idx <<
BIO_POOL_OFFSET;
+                                         split->bi_max_vecs =
BIO_INLINE_VECS + 1;
+                                         split->bi_io_vec = bvl;
+                              }
                               bio_chain(split, bio);
                    } else {
                               split = bio;
diff --git a/include/linux/bio.h b/include/linux/bio.h
index 5e963a6..8344bb5 100644
--- a/include/linux/bio.h
+++ b/include/linux/bio.h
@@ -40,6 +40,12 @@
#define BIO_BUG_ON
#endif

+/*
+ * Test patch to inline a certain number of bi_io_vec's inside the bio
+ * itself, to shrink a bio data allocation from two mempool calls to one
+ */
+#define BIO_INLINE_VECS                    4
+
#define BIO_MAX_PAGES           256
#define BIO_MAX_SIZE             (BIO_MAX_PAGES << PAGE_CACHE_SHIFT)
#define BIO_MAX_SECTORS                   (BIO_MAX_SIZE >> 9)
-- 
1.9.1



--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html



[Index of Archives]     [Linux RAID Wiki]     [ATA RAID]     [Linux SCSI Target Infrastructure]     [Linux Block]     [Linux IDE]     [Linux SCSI]     [Linux Hams]     [Device Mapper]     [Device Mapper Cryptographics]     [Kernel]     [Linux Admin]     [Linux Net]     [GFS]     [RPM]     [git]     [Yosemite Forum]


  Powered by Linux