[Arm-dev] Fwd: Golang Performance
Jeremy Linton
jeremy.linton at arm.com
Thu Jan 5 19:16:38 UTC 2017
On 01/04/2017 04:03 AM, Gordan Bobic wrote:
>
> The --cpuprofile option doesn't seem to generate a human-readable file
> and my go-fu is weak, but I've attached it anyway.
Hi,
The attached profile appears to be blank. You can try something like:
go tool pprof rclone profile.aarch64
Entering interactive mode (type "help" for commands)
(pprof) top10
using your attached profile it prints:
profile is empty
So, something isn't quite right, although looking at the raw file it
appears to be pretty small and mostly nulls.
>
>
>
> On Tue, Jan 3, 2017 at 9:18 PM, Jeremy Linton <jlinton at redhat.com
> <mailto:jlinton at redhat.com>> wrote:
>
> Hi,
>
> On 01/01/2017 11:57 AM, Gordan Bobic wrote:
>
> I'm not much of a Go user at the best of times, but I am
> noticing that
> there seems to be a huge ( > 10x clock-for-clock) performance
> discrepancy between x86-64 and aarch64 binaries.
>
> Specific example I am looking at is rclone, uploading encryoted
> backups
> to Amazon Cloud Drive.
>
> When I run it on a Westmere class Xeon (3.6GHz), it is comsuming
> about
> 2% CPU to saturate a 20Mbit uplink. When I run it on an X-Gene
> (2.0GHz),
> it is consuming about 50% CPU. Even adjusting for differences in
> clock
> speeds, this seems to be a huge difference.
>
> Is the Go complier known to produce very poor results on
> aarch64, or is
> something else in play? I know that x86-64 has a much more
> powerful SIMD
> unit, but I am not convinced that this is the explanation, and
> rclone
> doesn't use AES AFAIK, so hardware implementation of that
> doesn't seem
> to explain it either.
>
>
> AFAIK, amazon drive uses SSL/TLS so its likely you are using AES.
> Further, go implements their own TLS/etc libraries rather than
> depending on openSSL/gnuTLS. So, while ARMv8 cores have hardware AES
> instructions, go's AES implementation is currently only hardware
> accelerated on x86 and s390. MD5 (also used AFAIK) is also missing
> an ARM64 native implementation even though there is an ARM one. So,
> just those two issues could cause a large performance delta, but at
> those data rates it seems possible there is something else going on.
>
> So, a couple questions, did you build rclone yourself or use one of
> the binaries from rclone.org <http://rclone.org>?
>
> There is a --cpuprofile option to rclone. It might be helpful if you
> can post the output from that.
>
> Thanks,
>
>
>
> At general purpose pointer chasing such as compiling, the X-Gene
> seems
> to produce similar performance clock-for-clock as the Westmere
> Xeon, so
> the large discrepancy with rclone seems odd.
>
> Has anyone got any insights? Both machines are running CentOS 7.
>
> Gordan
>
>
> _______________________________________________
> Arm-dev mailing list
> Arm-dev at centos.org <mailto:Arm-dev at centos.org>
> https://lists.centos.org/mailman/listinfo/arm-dev
> <https://lists.centos.org/mailman/listinfo/arm-dev>
>
>
> _______________________________________________
> Arm-dev mailing list
> Arm-dev at centos.org <mailto:Arm-dev at centos.org>
> https://lists.centos.org/mailman/listinfo/arm-dev
> <https://lists.centos.org/mailman/listinfo/arm-dev>
>
>
>
>
>
> _______________________________________________
> Arm-dev mailing list
> Arm-dev at centos.org
> https://lists.centos.org/mailman/listinfo/arm-dev
>
More information about the Arm-dev
mailing list