Re: Btrfs Heatmap - v2 - block group internals!

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On 2016-11-18 10:08, Hans van Kranenburg wrote:
On 11/18/2016 03:08 AM, Qu Wenruo wrote:
When generating a picture of a file system with multiple devices,
boundaries between the separate devices are not visible now.

If someone has a brilliant idea about how to do this without throwing
out actual usage data...

The first thought that comes to mind for me is to make each device be a
different color, and otherwise obey the same intensity mapping
correlating to how much data is there.  For example, if you've got a 3
device FS, the parts of the image that correspond to device 1 would go
from 0x000000 to 0xFF0000, the parts for device 2 could be 0x000000 to
0x00FF00, and the parts for device 3 could be 0x000000 to 0x0000FF. This
is of course not perfect (you can't tell what device each segment of
empty space corresponds to), but would probably cover most use cases.
(for example, with such a scheme, you could look at an image and tell
whether the data is relatively well distributed across all the devices
or you might need to re-balance).

What about linear output separated with lines(or just black)?

Linear output does not produce useful images, except for really small
filesystems.
However, it's how the human brain is hardwired to parse data like this (two data points per item, one for value, one for ordering). That's part of the reason that all known writing systems use a linear arrangement arrangement of symbols to store information (the other parts have to do with things like storage efficiency and error detection (and yes, I'm serious, those do play a part in the evolution of language and writing)).

As an example of why this is important, imagine showing someone who understands the concept of data fragmentation (most people have little to no issue understanding this concept) a heatmap of a filesystem with no space fragmentation at all without explaining that it uses a a Hilbert Curve 2d ordering. Pretty much 100% of people who aren't mathematicians or scientists will look at that and the first thought that will come to their mind is almost certainly going to be along the lines of 'holy crap that's fragmented really bad in this specific area'.

This is the reason that pretty much nothing outside of scientific or mathematical data uses a Hilbert curve based 2d ordering of data (and even then, they almost never use it for final presentation of the data).

Data presentation for something like this in a way that laypeople can understand is hard, but it's also important. Take a look at some of the graphical tools for filesystem defragmentation. The presentation requirements there are pretty similar, and so is the data being conveyed. They all use a grid oriented linear presentation of allocation data. The difference is that they scale up the blocks so that they're easily discernible by sight. This allows them to represent the data in a way that's trivial to explain (read this line-by-line), unlike the Hilbert curve (the data follows a complex folded spiral pattern which is fractal in nature).

Now, I personally have no issue with the Hilbert ordering, but if there were an option to use a linear ordering, I would almost certainly use that instead, simply because I could more easily explain the data to people.
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html




[Index of Archives]     [Linux Filesystem Development]     [Linux NFS]     [Linux NILFS]     [Linux USB Devel]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]

  Powered by Linux