[Arm-dev] Fwd: Golang Performance

Thu Jan 5 19:16:38 UTC 2017
Jeremy Linton <jeremy.linton at arm.com>

On 01/04/2017 04:03 AM, Gordan Bobic wrote:
>
> The --cpuprofile option doesn't seem to generate a human-readable file
> and my go-fu is weak, but I've attached it anyway.


Hi,

The attached profile appears to be blank. You can try something like:

go tool pprof rclone profile.aarch64
Entering interactive mode (type "help" for commands)
(pprof) top10

using your attached profile it prints:
profile is empty

So, something isn't quite right, although looking at the raw file it 
appears to be pretty small and mostly nulls.





>
>
>
> On Tue, Jan 3, 2017 at 9:18 PM, Jeremy Linton <jlinton at redhat.com
> <mailto:jlinton at redhat.com>> wrote:
>
>     Hi,
>
>     On 01/01/2017 11:57 AM, Gordan Bobic wrote:
>
>         I'm not much of a Go user at the best of times, but I am
>         noticing that
>         there seems to be a huge ( > 10x clock-for-clock) performance
>         discrepancy between x86-64 and aarch64 binaries.
>
>         Specific example I am looking at is rclone, uploading encryoted
>         backups
>         to Amazon Cloud Drive.
>
>         When I run it on a Westmere class Xeon (3.6GHz), it is comsuming
>         about
>         2% CPU to saturate a 20Mbit uplink. When I run it on an X-Gene
>         (2.0GHz),
>         it is consuming about 50% CPU. Even adjusting for differences in
>         clock
>         speeds, this seems to be a huge difference.
>
>         Is the Go complier known to produce very poor results on
>         aarch64, or is
>         something else in play? I know that x86-64 has a much more
>         powerful SIMD
>         unit, but I am not convinced that this is the explanation, and
>         rclone
>         doesn't use AES AFAIK, so hardware implementation of that
>         doesn't seem
>         to explain it either.
>
>
>     AFAIK, amazon drive uses SSL/TLS so its likely you are using AES.
>     Further, go implements their own TLS/etc libraries rather than
>     depending on openSSL/gnuTLS. So, while ARMv8 cores have hardware AES
>     instructions, go's AES implementation is currently only hardware
>     accelerated on x86 and s390. MD5 (also used AFAIK) is also missing
>     an ARM64 native implementation even though there is an ARM one. So,
>     just those two issues could cause a large performance delta, but at
>     those data rates it seems possible there is something else going on.
>
>     So, a couple questions, did you build rclone yourself or use one of
>     the binaries from rclone.org <http://rclone.org>?
>
>     There is a --cpuprofile option to rclone. It might be helpful if you
>     can post the output from that.
>
>     Thanks,
>
>
>
>         At general purpose pointer chasing such as compiling, the X-Gene
>         seems
>         to produce similar performance clock-for-clock as the Westmere
>         Xeon, so
>         the large discrepancy with rclone seems odd.
>
>         Has anyone got any insights? Both machines are running CentOS 7.
>
>         Gordan
>
>
>         _______________________________________________
>         Arm-dev mailing list
>         Arm-dev at centos.org <mailto:Arm-dev at centos.org>
>         https://lists.centos.org/mailman/listinfo/arm-dev
>         <https://lists.centos.org/mailman/listinfo/arm-dev>
>
>
>     _______________________________________________
>     Arm-dev mailing list
>     Arm-dev at centos.org <mailto:Arm-dev at centos.org>
>     https://lists.centos.org/mailman/listinfo/arm-dev
>     <https://lists.centos.org/mailman/listinfo/arm-dev>
>
>
>
>
>
> _______________________________________________
> Arm-dev mailing list
> Arm-dev at centos.org
> https://lists.centos.org/mailman/listinfo/arm-dev
>