On Mon, 30 Oct 2017 17:07:31 +0000 (UTC) Chris Olson chris_e_olson@yahoo.com wrote:
We have been fortunate to hang onto one of our summer interns for part time work on weekends during the current school year. One of the intern's jobs is to load documents and data which are then processed. The documents are .txt, .docx, and .pdf files. The data files are raw sensor outputs usually captured using ADCs mostly with eight bit precision. All files are loaded or moved from one machine to another with sftp.
The intern noticed right a way that the documents will transfer perfectly from our PPC and SPARC machines to our Intel/CentOS platforms. The raw data files, not so much. There is always an Endian (Thanks Gulliver) issue, which we assume is due to the bytes of data being formatted into 32 bit words somewhere in the Big Endian systems. It is not totally clear why the document files do not have this issue. If there is a known principle behind these observations, we would appreciate very much any information that can shared.
Transferring a file will not change anything. It will be bit-wise identical.
However the data in the file may be in bit-wise little or big endian order. A file format may or may not have metadata indicating this. That is, some files will read differently on different arch'es and some will be immune (due to more sophisticated abstractions).
So it's not surprising that your raw files will have problems.
If you want to prove this to yourself simply md5sum/sha1sum/etc the files on both sides.
/Peter K