On Fri, Feb 18, 2011 at 10:53 PM, Larry Vaden <vaden at texoma.net> wrote: > On Thu, Feb 17, 2011 at 6:49 PM, Nico Kadel-Garcia <nkadel at gmail.com> wrote: >> >> Also, people like me might say something different from a work email >> account than from a personal account. >> >> I worry about using the email address as the key, because it makes it >> very difficult to *preserve* a user's history across address changes. >> I like my GMail account, but I've had old Comcast or university >> accounts from which I submitted bugs to RHEL years ago. If I'm not >> mistaken, index management by numerical keys can often be >> significantly faster than by text keys. > > echo "Nico Kadel-Garcia G MM/DD/YYYY Born_in_City, Nation > Mother's_Maiden_Name" | md5sum -t > 352d6060b85ef831453bd71cfe22e9a9 - > > works very well for ISP subscriber account numbers. First, md5sum merely makes collisions unlikely, not certain. You'd be wise to check for collisions if you're being careful, even though they're unlikely, and once you've done that, why not simply use incrementing numbers? And it wouldn't have worked before I got married. My "maiden name" was "Garcia-Otero". I have cousins with that name, whose mother's maiden names was "Otero-[whatever]", and I can think of one whom this would have matched. He died some time back, but such schemes often don't scale well and break down under stress. Worse, my name changed when I got married. That breaks the indexing algorithm.