[CentOS] SATA tape drive on 4.7

Mon Mar 23 03:04:46 UTC 2009
C Linus Hicks <linush at verizon.net>

I have gotten a hard system lockup several times using star on my system
with a Quantum DLT-V4 SATA tape drive. I am including what got recorded
in /var/log/messages from the most recent event showing stack trace.

Output from uname -a:

Linux lh10 2.6.9-78.0.13.ELsmp #1 SMP Wed Jan 14 15:55:36 EST 2009
x86_64 x86_64 x86_64 GNU/Linux

Mar 22 22:02:59 lh10 kernel: Bad page state at free_hot_cold_page (in process 'star', page 00000102382b1ec8)
Mar 22 22:02:59 lh10 kernel: flags:0x0100000c mapping:00000102146a4240 mapcount:2 count:0
Mar 22 22:02:59 lh10 kernel: Backtrace:
Mar 22 22:02:59 lh10 kernel: 
Mar 22 22:02:59 lh10 kernel: Call Trace:<ffffffff8015f0cf>{bad_page+112} <ffffffff8015f7b1>{free_hot_cold_page+130} 
Mar 22 22:02:59 lh10 kernel:        <ffffffffa00d17b8>{:st:sgl_unmap_user_pages+67} <ffffffffa00d17df>{:st:release_buffering+27} 
Mar 22 22:02:59 lh10 kernel:        <ffffffffa00d6196>{:st:st_write+2692} <ffffffff80194b39>{inode_update_time+159} 
Mar 22 22:02:59 lh10 kernel:        <ffffffff8017beaa>{vfs_write+207} <ffffffff8017bf92>{sys_write+69} 
Mar 22 22:02:59 lh10 kernel:        <ffffffff801102f6>{system_call+126} 
Mar 22 22:02:59 lh10 kernel: Trying to fix it up, but a reboot is needed
Mar 22 22:02:59 lh10 kernel: Bad page state at free_hot_cold_page (in process 'star', page 000001023eb70ed8)
Mar 22 22:02:59 lh10 kernel: flags:0x0100000c mapping:00000102146a4240 mapcount:2 count:0
Mar 22 22:02:59 lh10 kernel: Backtrace:
Mar 22 22:02:59 lh10 kernel: 
Mar 22 22:02:59 lh10 kernel: Call Trace:<ffffffff8015f0cf>{bad_page+112} <ffffffff8015f7b1>{free_hot_cold_page+130} 
Mar 22 22:02:59 lh10 kernel:        <ffffffffa00d17b8>{:st:sgl_unmap_user_pages+67} <ffffffffa00d17df>{:st:release_buffering+27} 
Mar 22 22:02:59 lh10 kernel:        <ffffffffa00d6196>{:st:st_write+2692} <ffffffff80194b39>{inode_update_time+159} 
Mar 22 22:02:59 lh10 kernel:        <ffffffff8017beaa>{vfs_write+207} <ffffffff8017bf92>{sys_write+69} 
Mar 22 22:02:59 lh10 kernel:        <ffffffff801102f6>{system_call+126} 
Mar 22 22:02:59 lh10 kernel: Trying to fix it up, but a reboot is needed
Mar 22 22:02:59 lh10 kernel: Bad page state at free_hot_cold_page (in process 'star', page 000001023d3bc220)
Mar 22 22:02:59 lh10 kernel: flags:0x0100000c mapping:00000102146a4240 mapcount:2 count:0
Mar 22 22:02:59 lh10 kernel: Backtrace:
Mar 22 22:02:59 lh10 kernel: 
Mar 22 22:02:59 lh10 kernel: Call Trace:<ffffffff8015f0cf>{bad_page+112} <ffffffff8015f7b1>{free_hot_cold_page+130} 
Mar 22 22:02:59 lh10 kernel:        <ffffffffa00d17b8>{:st:sgl_unmap_user_pages+67} <ffffffffa00d17df>{:st:release_buffering+27} 
Mar 22 22:02:59 lh10 kernel:        <ffffffffa00d6196>{:st:st_write+2692} <ffffffff80194b39>{inode_update_time+159} 
Mar 22 22:02:59 lh10 kernel:        <ffffffff8017beaa>{vfs_write+207} <ffffffff8017bf92>{sys_write+69} 
Mar 22 22:02:59 lh10 kernel:        <ffffffff801102f6>{system_call+126} 
Mar 22 22:02:59 lh10 kernel: Trying to fix it up, but a reboot is needed
Mar 22 22:02:59 lh10 kernel: Bad page state at free_hot_cold_page (in process 'star', page 0000010238c68e80)
Mar 22 22:02:59 lh10 kernel: flags:0x0100000c mapping:00000102146a4240 mapcount:2 count:0
Mar 22 22:03:00 lh10 kernel: Backtrace:
Mar 22 22:03:00 lh10 kernel: 
Mar 22 22:03:00 lh10 kernel: Call Trace:<ffffffff8015f0cf>{bad_page+112} <ffffffff8015f7b1>{free_hot_cold_page+130} 
Mar 22 22:03:00 lh10 kernel:        <ffffffffa00d17b8>{:st:sgl_unmap_user_pages+67} <ffffffffa00d17df>{:st:release_buffering+27} 
Mar 22 22:03:00 lh10 kernel:        <ffffffffa00d6196>{:st:st_write+2692} <ffffffff80194b39>{inode_update_time+159} 
Mar 22 22:03:00 lh10 kernel:        <ffffffff8017beaa>{vfs_write+207} <ffffffff8017bf92>{sys_write+69} 
Mar 22 22:03:00 lh10 kernel:        <ffffffff801102f6>{system_call+126} 
Mar 22 22:03:00 lh10 kernel: Trying to fix it up, but a reboot is needed
Mar 22 22:03:00 lh10 kernel: Bad page state at free_hot_cold_page (in process 'star', page 000001023c14d570)
Mar 22 22:03:00 lh10 kernel: flags:0x0100000c mapping:00000102146a4240 mapcount:2 count:0
Mar 22 22:03:00 lh10 kernel: Backtrace:
Mar 22 22:03:00 lh10 kernel: 
Mar 22 22:03:00 lh10 kernel: Call Trace:<ffffffff8015f0cf>{bad_page+112} <ffffffff8015f7b1>{free_hot_cold_page+130} 
Mar 22 22:03:00 lh10 kernel:        <ffffffffa00d17b8>{:st:sgl_unmap_user_pages+67} <ffffffffa00d17df>{:st:release_buffering+27} 
Mar 22 22:03:00 lh10 kernel:        <ffffffffa00d6196>{:st:st_write+2692} <ffffffff80194b39>{inode_update_time+159} 
Mar 22 22:03:00 lh10 kernel:        <ffffffff8017beaa>{vfs_write+207} <ffffffff8017bf92>{sys_write+69} 
Mar 22 22:03:00 lh10 kernel:        <ffffffff801102f6>{system_call+126} 
Mar 22 22:03:00 lh10 kernel: Trying to fix it up, but a reboot is needed
Mar 22 22:03:00 lh10 kernel: Bad page state at free_hot_cold_page (in process 'X', page 000001023c14d570)
Mar 22 22:03:00 lh10 kernel: flags:0x01020008 mapping:00000101ec811360 mapcount:0 count:0
Mar 22 22:03:00 lh10 kernel: Backtrace:
Mar 22 22:03:00 lh10 kernel: 
Mar 22 22:03:00 lh10 kernel: Call Trace:<ffffffff8015f0cf>{bad_page+112} <ffffffff8015f7b1>{free_hot_cold_page+130} 
Mar 22 22:03:00 lh10 kernel:        <ffffffff8015f86a>{__pagevec_free+39} <ffffffff80165a5d>{release_pages+366} 
Mar 22 22:03:01 lh10 kernel:        <ffffffff8016d921>{__vma_link+66} <ffffffff8016dd53>{vma_adjust+815} 
Mar 22 22:03:01 lh10 kernel:        <ffffffff80165e4b>{__pagevec_mark_accessed+394} <ffffffff80166091>{lru_add_drain+91} 
Mar 22 22:03:01 lh10 kernel:        <ffffffff8016e971>{do_munmap+355} <ffffffff8016f55e>{sys_brk+136} 
Mar 22 22:03:01 lh10 kernel:        <ffffffff801102f6>{system_call+126} 
Mar 22 22:03:01 lh10 kernel: Trying to fix it up, but a reboot is needed

star had been running about 20 to 30 minutes at the time of the crash. I
have seen two or three crashes previously all denoting "Bad page state
at free_hot_cold_page."

This appears to have been reported as a bug in 2.6.12:

https://lists.linux-foundation.org/pipermail/bugme-new/2005-August/012862.html

It was updated March 10, 2008 as rejected unreproducible. I have seen
the problem three times in three days now without trying to reproduce
it. Can I get any help on getting a fix?

Linus