On Tue, 2012-05-22 at 22:54 +0300, Siarhei Siamashka wrote:
> This is a very simple few-liner patchset, which allows to optionally
> enable write-through caching for OMAP DSS framebuffer. The problem with
> the current writecombine cacheability attribute is that it only speeds
> up writes. Uncached reads are slow, even though the use of NEON mitigates
> this problem a bit.
> Traditionally, xf86-video-fbdev DDX is using shadow framebuffer in the
> system memory. Which contains a copy of the framebuffer data for the
> purpose of providing fast read access to it when needed. Framebuffer
> read access is required not so often, but it still gets used for
> scrolling and moving windows around in Xorg server. And the users
> perceive their linux desktop as rather sluggish when these operations
> are not fast enough.
> In the case of ARM hardware, framebuffer is typically physically
> located in the main memory. And the processors still support
> write-through cacheability attribute. According to ARM ARM, the writes
> done to write-through cached memory inside the level of cache are
> visible to all observers outside the level of cache without the need
> of explicit cache maintenance (same rule as for non-cached memory).
> So write-through cache is a perfect choice when only CPU is allowed
> to modify the data in the framebuffer and everyone else (screen
> refresh DMA) is only reading it. That is, assuming that write-through
> cached memory provides good performance and there are no quirks.
> As the framebuffer reads become fast, the need for shadow framebuffer
> disappears.

I ran my own fb perf test on omap3 overo board ("perf" test in :


sequential_horiz_singlepixel_read: 25198080 pix, 4955475 us, 5084897 pix/s
sequential_horiz_singlepixel_write: 434634240 pix, 4081146 us, 106498086 pix/s
sequential_vert_singlepixel_read: 20106240 pix, 4970611 us, 4045023 pix/s
sequential_vert_singlepixel_write: 98572800 pix, 4985748 us, 19770915 pix/s
sequential_line_read: 40734720 pix, 4977906 us, 8183103 pix/s
sequential_line_write: 1058580480 pix, 5024628 us, 210678378 pix/s
nonsequential_singlepixel_write: 17625600 pix, 4992828 us, 3530183 pix/s
nonsequential_singlepixel_read: 9661440 pix, 4952973 us, 1950634 pix/s


sequential_horiz_singlepixel_read: 270389760 pix, 4994154 us, 54141253 pix/s
sequential_horiz_singlepixel_write: 473149440 pix, 3932801 us, 120308512 pix/s
sequential_vert_singlepixel_read: 18147840 pix, 4976226 us, 3646908 pix/s
sequential_vert_singlepixel_write: 100661760 pix, 4993164 us, 20159914 pix/s
sequential_line_read: 285143040 pix, 4917267 us, 57988114 pix/s
sequential_line_write: 876710400 pix, 5012146 us, 174917171 pix/s
nonsequential_singlepixel_write: 17625600 pix, 4977967 us, 3540722 pix/s
nonsequential_singlepixel_read: 9661440 pix, 4944885 us, 1953825 pix/s

These also show quite a bit of improvement in some read cases.
Interestingly some of the write cases are also faster.

Reading pixels vertically is slower with vram_cache. I guess this is
because the cache causes some overhead, and we always miss the cache so
the caching is just wasted time.

I would've also presumed the difference in sequential_line_write would
be bigger. write-through is effectively no-cache for writes, right?

If the user of the fb just writes to the fb and vram_cache=y, it means
that the cache is filled with pixel data that is never used, thus
lowering the performance of all other programs?

I have to say I don't know much of the cpu caches, but the read speed
improvements are very big, so I think this is definitely interesting
patch. So if you get the first patch accepted I see no problem with
adding this to omapfb as an optional feature.

However, "vram_cache" is not a very good name for the option.
"vram_writethrough", or something?

Did you test this with VRFB (omap3) or TILER (omap4)? I wonder how those
are affected.


