I am running a multithreaded test program that calls the Berkeley DB library. This test runs on many platforms. Lately I have been trying it under CentOS on an Intel x86_64 which is hyperthreaded. I get a variety of failures. Mostly they appear to be because of addressing errors. For instance:
Program received signal SIGSEGV, Segmentation fault. [Switching to Thread 1178638688 (LWP 7957)] 0x0000002a956225af in __db_cursor_int (dbp=0x2f2f388, txn=0x0, dbtype=DB_BTREE, root=0, is_opd=0, lockerid=0, dbcp=0x46408c20) at ../dist/../db/db_am.c:68 68 if (dbtype == dbc->dbtype) { (gdb) p dbc $6 = (DBC *) 0x2f3ae5800000000
Note all the zeros. Now if we shift the address down:
(gdb) p *(DBC*) 0x2f3ae58 $7 = {dbp = 0x2f2f388, txn = 0x0, links = {tqe_next = 0x2f47ef8, tqe_prev = 0x2f3d4c8}, rskey = 0x2f3ae90, rkey = 0x2f3aeb0, rdata = 0x2f3aed0, my_rskey = {data = 0x0, size = 0, ulen = 0, dlen = 0, doff = 0, flags = 0}, my_rkey = {data = 0x0, size = 0, ulen = 0, dlen = 0, doff = 0, flags = 0}, my_rdata = {data = 0x0, size = 0, ulen = 0, dlen = 0, doff = 0, flags = 0}, lref = 0x2f2a290, locker = 19, lock_dbt = { data = 0x2f3af20, size = 28, ulen = 0, dlen = 0, doff = 0, flags = 0}, lock = {pgno = 925, fileid = "X@\203\000\000�\000\000�\224E\205�\036\000\000\000 \000\000", type = 3}, mylock = {off = 0, ndx = 0, gen = 0, mode = DB_LOCK_NG}, cl_id = 0, dbtype = DB_BTREE, internal = 0x2f39228, c_close = 0x2a956388a3 <__db_c_close_pp>, c_count = 0x2a95638a63 <__db_c_count_pp>, c_del = 0x2a95638ba7 <__db_c_del_pp>, c_dup = 0x2a95638ea3 <__db_c_dup_pp>, c_get = 0x2a95638fce <__db_c_get_pp>, c_pget = 0x2a95639746 <__db_c_pget_pp>, c_put = 0x2a956399f6 <__db_c_put_pp>, c_am_bulk = 0x2a9557fdd2 <__bam_bulk>, c_am_close = 0x2a9557dcc4 <__bam_c_close>, c_am_del = 0x2a9557ebd3 <__bam_c_del>, c_am_destroy = 0x2a9557e50e <__bam_c_destroy>, c_am_get = 0x2a9557f10a <__bam_c_get>, c_am_put = 0x2a95582040 <__bam_c_put>, c_am_writelock = 0x2a9558314f <__bam_c_writelock>, flags = 288}
This is the right address.
Sometimes this part of the code executes correctly and it fails elsewhere. Often appearing that an address has been shifted by 32 bits.
Any ideas?