[Arm-dev] Golang Performance

Sun Jan 1 17:57:40 UTC 2017
Gordan Bobic <gordan at redsleeve.org>

I'm not much of a Go user at the best of times, but I am noticing that
there seems to be a huge ( > 10x clock-for-clock) performance discrepancy
between x86-64 and aarch64 binaries.

Specific example I am looking at is rclone, uploading encryoted backups to
Amazon Cloud Drive.

When I run it on a Westmere class Xeon (3.6GHz), it is comsuming about 2%
CPU to saturate a 20Mbit uplink. When I run it on an X-Gene (2.0GHz), it is
consuming about 50% CPU. Even adjusting for differences in clock speeds,
this seems to be a huge difference.

Is the Go complier known to produce very poor results on aarch64, or is
something else in play? I know that x86-64 has a much more powerful SIMD
unit, but I am not convinced that this is the explanation, and rclone
doesn't use AES AFAIK, so hardware implementation of that doesn't seem to
explain it either.

At general purpose pointer chasing such as compiling, the X-Gene seems to
produce similar performance clock-for-clock as the Westmere Xeon, so the
large discrepancy with rclone seems odd.

Has anyone got any insights? Both machines are running CentOS 7.

Gordan
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.centos.org/pipermail/arm-dev/attachments/20170101/f8d71d86/attachment-0005.html>