<html><head></head><body><div style="color:#000; background-color:#fff; font-family:verdana, helvetica, sans-serif;font-size:16px"><blockquote id="yui_3_16_0_ym19_1_1483354001483_7509"><br> <div dir="ltr" id="yui_3_16_0_ym19_1_1483354001483_6051"> <font id="yui_3_16_0_ym19_1_1483354001483_6050" face="Arial" size="2"> <hr size="1"> <b><span style="font-weight:bold;">From:</span></b> Gordan Bobic <gordan@redsleeve.org><br> <b><span style="font-weight: bold;">To:</span></b> Conversations around CentOS on ARM hardware <arm-dev@centos.org> <br> <b id="yui_3_16_0_ym19_1_1483354001483_6056"><span style="font-weight: bold;" id="yui_3_16_0_ym19_1_1483354001483_6055">Sent:</span></b> Sunday, 1 January 2017, 11:57<br> <b><span style="font-weight: bold;">Subject:</span></b> [Arm-dev] Golang Performance<br> </font> </div> <br><div id="yiv5234355194"><div dir="ltr" id="yui_3_16_0_ym19_1_1483354001483_6071"><div id="yui_3_16_0_ym19_1_1483354001483_6070"><div id="yui_3_16_0_ym19_1_1483354001483_6069"><div id="yui_3_16_0_ym19_1_1483354001483_6073"><div id="yui_3_16_0_ym19_1_1483354001483_6074"><div id="yui_3_16_0_ym19_1_1483354001483_6075">I'm not much of a Go user at the best of times, but I am noticing that there seems to be a huge ( > 10x clock-for-clock) performance discrepancy between x86-64 and aarch64 binaries.<br><br></div>Specific example I am looking at is rclone, uploading encryoted backups to Amazon Cloud Drive.<br><br></div>When I run it on a Westmere class Xeon (3.6GHz), it is comsuming about 2% CPU to saturate a 20Mbit uplink. When I run it on an X-Gene (2.0GHz), it is consuming about 50% CPU. Even adjusting for differences in clock speeds, this seems to be a huge difference.<br><br></div>Is the Go complier known to produce very poor results on aarch64, or is something else in play? I know that x86-64 has a much more powerful SIMD unit, but I am not convinced that this is the explanation, and rclone doesn't use AES AFAIK, so hardware implementation of that doesn't seem to explain it either.<br><br></div><div id="yui_3_16_0_ym19_1_1483354001483_6141">At general purpose pointer chasing such as compiling, the X-Gene seems to produce similar performance clock-for-clock as the Westmere Xeon, so the large discrepancy with rclone seems odd.<br></div><div id="yui_3_16_0_ym19_1_1483354001483_6140"><br></div>Has anyone got any insights? Both machines are running CentOS 7.<br><br></div>Gordan<br></div></div></blockquote><div class="qtdSeparateBR"><br><br></div><div class="yahoo_quoted" id="yui_3_16_0_ym19_1_1483354001483_6054" style="display: block;"><div style="font-family: verdana, helvetica, sans-serif; font-size: 16px;" id="yui_3_16_0_ym19_1_1483354001483_6053"><div style="font-family: HelveticaNeue, Helvetica Neue, Helvetica, Arial, Lucida Grande, Sans-Serif; font-size: 16px;" id="yui_3_16_0_ym19_1_1483354001483_6052"><div class="y_msg_container" id="yui_3_16_0_ym19_1_1483354001483_6072"><div id="yui_3_16_0_ym19_1_1483354001483_7331"><br></div><div id="yui_3_16_0_ym19_1_1483354001483_7392"><span id="yui_3_16_0_ym19_1_1483354001483_7393">Hi Gordan;</span></div><div id="yui_3_16_0_ym19_1_1483354001483_7394"><span id="yui_3_16_0_ym19_1_1483354001483_7395"><br id="yui_3_16_0_ym19_1_1483354001483_7396"></span></div><div id="yui_3_16_0_ym19_1_1483354001483_7397"><span id="yui_3_16_0_ym19_1_1483354001483_7398">I can't offer a detailed diagnosis, just some observations from a project level:</span></div><div dir="ltr" id="yui_3_16_0_ym19_1_1483354001483_7399"><span id="yui_3_16_0_ym19_1_1483354001483_7400"></span><br id="yui_3_16_0_ym19_1_1483354001483_7401"><span id="yui_3_16_0_ym19_1_1483354001483_7402">Performance
of Go on AArch64 is considered important from the discussions I've had
with colleagues (I work for ARM, posting today with my own views)
and other AArch64 users. With the release of Go 1.7, the compiler back
end switched to a 'single static-assignment' back end. This is a
significant change in the internals of the go compiler, and judged to be
a valuable enhancement.</span></div><div dir="ltr" id="yui_3_16_0_ym19_1_1483354001483_7403"><span id="yui_3_16_0_ym19_1_1483354001483_7404"><br id="yui_3_16_0_ym19_1_1483354001483_7405"></span></div><div dir="ltr" id="yui_3_16_0_ym19_1_1483354001483_7406"><span id="yui_3_16_0_ym19_1_1483354001483_7407">So,
versions of Go compiler pre-1.7 and post-1.7 are significantly
different, and so optimizations for AArch64 (or any other architecture)
are either targetting the pre- or post-1.7 code base. Given this choice, what I observe is the master branch, post-1.7 is the first priority for optimiziations and bug fixing for AArch64. For example, Cherry Zhang </span><br id="yui_3_16_0_ym19_1_1483354001483_7408">
</div><div id="yui_3_16_0_ym19_1_1483354001483_7409"><span style="color:#1F497D" id="yui_3_16_0_ym19_1_1483354001483_7410"><a href="https://go-review.googlesource.com/#/q/owner:%22Cherry+Zhang%22" id="yui_3_16_0_ym19_1_1483354001483_7411"><span style="color:black;mso-style-textfill-fill-color:black;mso-style-textfill-fill-alpha:100.0%" id="yui_3_16_0_ym19_1_1483354001483_7412"><span style="color:black;mso-style-textfill-fill-color:black;mso-style-textfill-fill-alpha:100.0%" id="yui_3_16_0_ym19_1_1483354001483_7413">h<span style="color:black;mso-style-textfill-fill-color:black;mso-style-textfill-fill-alpha:100.0%" id="yui_3_16_0_ym19_1_1483354001483_7414">ttps://go-review.googlesource.com/#/q/owner:%22Cherry+Zhang%22</span></span></span></a></span></div><div id="yui_3_16_0_ym19_1_1483354001483_7415"><br id="yui_3_16_0_ym19_1_1483354001483_7416">
</div><div id="yui_3_16_0_ym19_1_1483354001483_7417">I would expect work from other AArch64 developers (including AES, CRC, etc hardware optimizations) to also target post-1.7 master branch first.<br id="yui_3_16_0_ym19_1_1483354001483_7418"></div><div id="yui_3_16_0_ym19_1_1483354001483_7419"><span id="yui_3_16_0_ym19_1_1483354001483_7420"><br id="yui_3_16_0_ym19_1_1483354001483_7421"></span></div><div dir="ltr" id="yui_3_16_0_ym19_1_1483354001483_7422"><span id="yui_3_16_0_ym19_1_1483354001483_7423">It looks to me like golang version picked from the CentOS extra repository would give you version 1.6.3.
It might be difficult to predict when this version will receive the attention of the
development community for optimization. Have you considered
moving to the golang Master branch?</span></div><div dir="ltr" id="yui_3_16_0_ym19_1_1483354001483_7424"><span id="yui_3_16_0_ym19_1_1483354001483_7425"><br id="yui_3_16_0_ym19_1_1483354001483_7426"></span></div><div dir="ltr" id="yui_3_16_0_ym19_1_1483354001483_7427"><span id="yui_3_16_0_ym19_1_1483354001483_7428">best regards,</span></div><div dir="ltr" id="yui_3_16_0_ym19_1_1483354001483_7429"><span id="yui_3_16_0_ym19_1_1483354001483_7430">Richard</span></div><div dir="ltr" id="yui_3_16_0_ym19_1_1483354001483_7431"><span id="yui_3_16_0_ym19_1_1483354001483_7432"><br id="yui_3_16_0_ym19_1_1483354001483_7433"></span></div>_______________________________________________<br>Arm-dev mailing list<br><a ymailto="mailto:Arm-dev@centos.org" href="mailto:Arm-dev@centos.org" id="yui_3_16_0_ym19_1_1483354001483_7658">Arm-dev@centos.org</a><br><a href="https://lists.centos.org/mailman/listinfo/arm-dev" target="_blank" id="yui_3_16_0_ym19_1_1483354001483_6142">https://lists.centos.org/mailman/listinfo/arm-dev</a><br><br><br></div> </div> </div> </div></div></body></html>