RE: Write is twice the speed as read? | |
| [Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] | |
Reads are slower than write because the read address request is
presented on the bus and you wait until the memory mapped device (memory
or device) data is returned. Writes are faster because the address and
data are presented at the same time and is performed in 1 step.
Read (address, wait, returned data):
Step 1 - Read request --> Bus --> Device
Step 2 - data <-- bus <--- device data
Total of two transactions.
Write (address/data):
Address/data --> bus --> device.
Total of one transaction.
If you look at a scope or logic analyzer bus transaction, you will see
that the write will be quicker due to address and data being present
simultaneously and handled in say, 1 cycle. In the read situation, the
read address is presented, a wasted cycle while the device retrieves the
data and presents the data on the bus.
This is why...
-John
-----Original Message-----
From: linux-arm-bounces@xxxxxxxxxxxxxxxxxxxxxx
[mailto:linux-arm-bounces@xxxxxxxxxxxxxxxxxxxxxx] On Behalf Of Ioi Lam
Sent: Wednesday, July 27, 2005 1:41 PM
To: linux-arm@xxxxxxxxxxxxxxxxxxxxxx
Subject: Write is twice the speed as read?
Folks,
I am trying to optimize large block memcpy speed on my Versatile A/B
board (210 Mhz ARM926EJ-S with VFP, Linux 2.6.9). So I wrote a few
timing loops to determine the speed of moving memory. For my
requirement, I need to copy a block of memory that's 2x the size of the
dcache.
So I wrote two loops to time the speed of reading or writing (but not
reading *and* writing at the same time). With a 32KB dcache, I try to
read or write 4KB or 64KB of data using load/store multiple
instructions. The strange results are that large blocks of writes are
about 2x the speed of reads. The speed of memcpy is measured using the
standard GNU C library.
rd_ldmia(4) read 4KB of data (ldmia) 134217728 bytes in
220864 us
607.694 MByte/sec
rd_ldmia(64) read 64KB of data (ldmia) 134217728 bytes in
1030162 us
130.287 MByte/sec
wd_stmia(4) write 4KB of data (stmia) 134217728 bytes in
544041 us
246.705 MByte/sec
wd_stmia(64) write 64KB of data (stmia) 134217728 bytes in
539586 us
248.742 MByte/sec
memcpy(4) memcpy 4KB of data 134217728 bytes in
575432 us
233.246 MByte/sec
memcpy(64) memcpy 64KB of data 134217728 bytes in
1510152 us
88.876 MByte/sec
Does anyone know why the reads are just half the speed of the writes?
Maybe I am not doing the right thing to use the full bandwidth between
main_mem -> dcache? I am hoping that if I can get the reads to be as
fast as the writes, memcpy can achieve around 120MB/sec.
------------------------------------------------------------------------
--------------------------
@void rd_ldmia(int *addr, int num_bytes, int num_loops);
.global rd_ldmia
rd_ldmia:
stmdb sp!, {r4, r5, r6, r7, r8}
mov r8, r0
rd_ldmia_0:
mov r3, r1
rd_ldmia_1:
ldmia r0!, {r4, r5, r6, r7}
ldmia r0!, {r4, r5, r6, r7}
ldmia r0!, {r4, r5, r6, r7}
ldmia r0!, {r4, r5, r6, r7}
subs r3, r3, #64
bne rd_ldmia_1
subs r2, r2, #1
movne r0, r8
bne rd_ldmia_0
ldmia sp!, {r4, r5, r6, r7, r8}
mov pc, lr
@void wd_stmia(int *addr, int num_bytes, int num_loops);
.global wd_stmia
wd_stmia:
stmdb sp!, {r4, r5, r6, r7, r8}
mov r8, r0
wd_stmia_0:
mov r3, r1
wd_stmia_1:
stmia r0!, {r4, r5, r6, r7}
stmia r0!, {r4, r5, r6, r7}
stmia r0!, {r4, r5, r6, r7}
stmia r0!, {r4, r5, r6, r7}
subs r3, r3, #64
bne wd_stmia_1
subs r2, r2, #1
movne r0, r8
bne wd_stmia_0
stmia sp!, {r4, r5, r6, r7, r8}
mov pc, lr
-------------------------------------------------------------------
List admin: http://lists.arm.linux.org.uk/mailman/listinfo/linux-arm
FAQ: http://www.arm.linux.org.uk/mailinglists/faq.php
Etiquette: http://www.arm.linux.org.uk/mailinglists/etiquette.php
-------------------------------------------------------------------
List admin: http://lists.arm.linux.org.uk/mailman/listinfo/linux-arm
FAQ: http://www.arm.linux.org.uk/mailinglists/faq.php
Etiquette: http://www.arm.linux.org.uk/mailinglists/etiquette.php
[Site Home] [IETF Annouce] [Security] [Bugtraq] [Linux] [Linux ARM Kernel] [Linux MIPS] [ECOS] [Tools] [DDR & Rambus] [Monitors]