The --cpuprofile option doesn't seem to generate a human-readable file and my go-fu is weak, but I've attached it anyway. On Tue, Jan 3, 2017 at 9:18 PM, Jeremy Linton <jlinton at redhat.com> wrote: > Hi, > > On 01/01/2017 11:57 AM, Gordan Bobic wrote: > >> I'm not much of a Go user at the best of times, but I am noticing that >> there seems to be a huge ( > 10x clock-for-clock) performance >> discrepancy between x86-64 and aarch64 binaries. >> >> Specific example I am looking at is rclone, uploading encryoted backups >> to Amazon Cloud Drive. >> >> When I run it on a Westmere class Xeon (3.6GHz), it is comsuming about >> 2% CPU to saturate a 20Mbit uplink. When I run it on an X-Gene (2.0GHz), >> it is consuming about 50% CPU. Even adjusting for differences in clock >> speeds, this seems to be a huge difference. >> >> Is the Go complier known to produce very poor results on aarch64, or is >> something else in play? I know that x86-64 has a much more powerful SIMD >> unit, but I am not convinced that this is the explanation, and rclone >> doesn't use AES AFAIK, so hardware implementation of that doesn't seem >> to explain it either. >> > > AFAIK, amazon drive uses SSL/TLS so its likely you are using AES. Further, > go implements their own TLS/etc libraries rather than depending on > openSSL/gnuTLS. So, while ARMv8 cores have hardware AES instructions, go's > AES implementation is currently only hardware accelerated on x86 and s390. > MD5 (also used AFAIK) is also missing an ARM64 native implementation even > though there is an ARM one. So, just those two issues could cause a large > performance delta, but at those data rates it seems possible there is > something else going on. > > So, a couple questions, did you build rclone yourself or use one of the > binaries from rclone.org? > > There is a --cpuprofile option to rclone. It might be helpful if you can > post the output from that. > > Thanks, > > > >> At general purpose pointer chasing such as compiling, the X-Gene seems >> to produce similar performance clock-for-clock as the Westmere Xeon, so >> the large discrepancy with rclone seems odd. >> >> Has anyone got any insights? Both machines are running CentOS 7. >> >> Gordan >> >> >> _______________________________________________ >> Arm-dev mailing list >> Arm-dev at centos.org >> https://lists.centos.org/mailman/listinfo/arm-dev >> >> > _______________________________________________ > Arm-dev mailing list > Arm-dev at centos.org > https://lists.centos.org/mailman/listinfo/arm-dev > -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.centos.org/pipermail/arm-dev/attachments/20170104/a467a9d0/attachment-0006.html> -------------- next part -------------- A non-text attachment was scrubbed... Name: profile.aarch64 Type: application/octet-stream Size: 64 bytes Desc: not available URL: <http://lists.centos.org/pipermail/arm-dev/attachments/20170104/a467a9d0/attachment-0006.obj>