On 01/04/2017 04:03 AM, Gordan Bobic wrote: > > The --cpuprofile option doesn't seem to generate a human-readable file > and my go-fu is weak, but I've attached it anyway. Hi, The attached profile appears to be blank. You can try something like: go tool pprof rclone profile.aarch64 Entering interactive mode (type "help" for commands) (pprof) top10 using your attached profile it prints: profile is empty So, something isn't quite right, although looking at the raw file it appears to be pretty small and mostly nulls. > > > > On Tue, Jan 3, 2017 at 9:18 PM, Jeremy Linton <jlinton at redhat.com > <mailto:jlinton at redhat.com>> wrote: > > Hi, > > On 01/01/2017 11:57 AM, Gordan Bobic wrote: > > I'm not much of a Go user at the best of times, but I am > noticing that > there seems to be a huge ( > 10x clock-for-clock) performance > discrepancy between x86-64 and aarch64 binaries. > > Specific example I am looking at is rclone, uploading encryoted > backups > to Amazon Cloud Drive. > > When I run it on a Westmere class Xeon (3.6GHz), it is comsuming > about > 2% CPU to saturate a 20Mbit uplink. When I run it on an X-Gene > (2.0GHz), > it is consuming about 50% CPU. Even adjusting for differences in > clock > speeds, this seems to be a huge difference. > > Is the Go complier known to produce very poor results on > aarch64, or is > something else in play? I know that x86-64 has a much more > powerful SIMD > unit, but I am not convinced that this is the explanation, and > rclone > doesn't use AES AFAIK, so hardware implementation of that > doesn't seem > to explain it either. > > > AFAIK, amazon drive uses SSL/TLS so its likely you are using AES. > Further, go implements their own TLS/etc libraries rather than > depending on openSSL/gnuTLS. So, while ARMv8 cores have hardware AES > instructions, go's AES implementation is currently only hardware > accelerated on x86 and s390. MD5 (also used AFAIK) is also missing > an ARM64 native implementation even though there is an ARM one. So, > just those two issues could cause a large performance delta, but at > those data rates it seems possible there is something else going on. > > So, a couple questions, did you build rclone yourself or use one of > the binaries from rclone.org <http://rclone.org>? > > There is a --cpuprofile option to rclone. It might be helpful if you > can post the output from that. > > Thanks, > > > > At general purpose pointer chasing such as compiling, the X-Gene > seems > to produce similar performance clock-for-clock as the Westmere > Xeon, so > the large discrepancy with rclone seems odd. > > Has anyone got any insights? Both machines are running CentOS 7. > > Gordan > > > _______________________________________________ > Arm-dev mailing list > Arm-dev at centos.org <mailto:Arm-dev at centos.org> > https://lists.centos.org/mailman/listinfo/arm-dev > <https://lists.centos.org/mailman/listinfo/arm-dev> > > > _______________________________________________ > Arm-dev mailing list > Arm-dev at centos.org <mailto:Arm-dev at centos.org> > https://lists.centos.org/mailman/listinfo/arm-dev > <https://lists.centos.org/mailman/listinfo/arm-dev> > > > > > > _______________________________________________ > Arm-dev mailing list > Arm-dev at centos.org > https://lists.centos.org/mailman/listinfo/arm-dev >