[CentOS] CentOS 4.2 x86_64 address errors.

Sat Jan 14 01:04:53 UTC 2006
Michael Ubell <ubell at sleepycat.com>

I am running a multithreaded test program that calls the
Berkeley DB library. This test runs on many platforms.
Lately I have been trying it under CentOS on an Intel x86_64
which is hyperthreaded.
I get a variety of failures.  Mostly they appear to be because
of addressing errors.  For instance:

Program received signal SIGSEGV, Segmentation fault.
[Switching to Thread 1178638688 (LWP 7957)]
0x0000002a956225af in __db_cursor_int (dbp=0x2f2f388, txn=0x0,
     dbtype=DB_BTREE, root=0, is_opd=0, lockerid=0, dbcp=0x46408c20)
     at ../dist/../db/db_am.c:68
68                      if (dbtype == dbc->dbtype) {
(gdb) p dbc
$6 = (DBC *) 0x2f3ae5800000000

Note all the zeros.  Now if we shift the
address down:

(gdb) p *(DBC*)  0x2f3ae58
$7 = {dbp = 0x2f2f388, txn = 0x0, links = {tqe_next = 0x2f47ef8,
     tqe_prev = 0x2f3d4c8}, rskey = 0x2f3ae90, rkey = 0x2f3aeb0,
   rdata = 0x2f3aed0, my_rskey = {data = 0x0, size = 0, ulen = 0,  
dlen = 0,
     doff = 0, flags = 0}, my_rkey = {data = 0x0, size = 0, ulen = 0,  
dlen = 0,
     doff = 0, flags = 0}, my_rdata = {data = 0x0, size = 0, ulen = 0,
     dlen = 0, doff = 0, flags = 0}, lref = 0x2f2a290, locker = 19,  
lock_dbt = {
     data = 0x2f3af20, size = 28, ulen = 0, dlen = 0, doff = 0, flags  
= 0},
   lock = {pgno = 925,
     fileid = "X@\203\000\000�\000\000�\224E\205�\036\000\000\000 
\000\000",
     type = 3}, mylock = {off = 0, ndx = 0, gen = 0, mode = DB_LOCK_NG},
   cl_id = 0, dbtype = DB_BTREE, internal = 0x2f39228,
   c_close = 0x2a956388a3 <__db_c_close_pp>,
   c_count = 0x2a95638a63 <__db_c_count_pp>,
   c_del = 0x2a95638ba7 <__db_c_del_pp>, c_dup = 0x2a95638ea3  
<__db_c_dup_pp>,
   c_get = 0x2a95638fce <__db_c_get_pp>,
   c_pget = 0x2a95639746 <__db_c_pget_pp>,
   c_put = 0x2a956399f6 <__db_c_put_pp>, c_am_bulk = 0x2a9557fdd2  
<__bam_bulk>,
   c_am_close = 0x2a9557dcc4 <__bam_c_close>,
   c_am_del = 0x2a9557ebd3 <__bam_c_del>,
   c_am_destroy = 0x2a9557e50e <__bam_c_destroy>,
   c_am_get = 0x2a9557f10a <__bam_c_get>,
   c_am_put = 0x2a95582040 <__bam_c_put>,
   c_am_writelock = 0x2a9558314f <__bam_c_writelock>, flags = 288}

This is the right address.

Sometimes this part of the code executes correctly and it fails
elsewhere.  Often appearing that an address has been shifted
by 32 bits.

Any ideas?