[Arm-dev] [PATCH v1 50/87] arm64: optimized copy_to_user and copy_from_user assembly code, part 2

Thu Aug 13 13:18:47 UTC 2015
Vadim Lomovtsev <Vadim.Lomovtsev at caviumnetworks.com>

From: Craig Magina <craig.magina at canonical.com>

Using the glibc cortex string work work authored by Linaro as base to
create new copy to/from user kernel routine.

Iperf performance increase:
		-l (size)		1 core result
Optimized 	64B			44-51Mb/s
		1500B			4.9Gb/s
		30000B			16.2Gb/s
Original	64B			34-50.7Mb/s
		1500B			4.7Gb/s
		30000B			14.5Gb/s

BugLink: https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1400349

Note there was one change I did to move around tst to be right next to
the branch for better optimization for ThunderX.

Signed-off-by: Craig Magina <craig.magina at canonical.com>
Signed-off-by: Robert Richter <rrichter at cavium.com>
Signed-off-by: Vadim Lomovtsev <Vadim.Lomovtsev at caviumnetworks.com>
---
 arch/arm64/lib/copy_template.S | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/arch/arm64/lib/copy_template.S b/arch/arm64/lib/copy_template.S
index c07eea6..bdce432 100644
--- a/arch/arm64/lib/copy_template.S
+++ b/arch/arm64/lib/copy_template.S
@@ -169,9 +169,9 @@ D_h	.req x14
 	USER(12f, stp B_l, B_h, [dst, #16])
 	USER(12f, stp C_l, C_h, [dst, #32])
 	USER(12f, stp D_l, D_h, [dst, #48])
-	tst	count, #0x3f
 	add	src, src, #64
 	add	dst, dst, #64
+	tst	count, #0x3f
 	b.ne	.Ltail63
 	b	.Lsuccess
 
-- 
2.4.3