On 9/16/11, m.roth@5-cent.us m.roth@5-cent.us wrote:
I've done a lot of what we used to call embedded SQL, and when I did do a join, it was *not* an explicit join. I've also used right or left once? twice? ever? But then, I carefully design and code my queries.
So it's more like a series of "select a,b,c from x where d=y", then "select a2,b2,c2 from x2 where z=a"? I'm just curious, not that I think it's wrong because I'm actually leaning towards this. There were many occasions where I find that breaking up complex queries and doing filtering within my code was faster than trusting the dbms to optimize the query.
One place I worked, someone else would run a query, and it would bring a server to its knees: I and a co-worker looked at it at one point, and it was a nightmare of joins, multiple references, etc, etc.
I had that kind of experience before, nightmare to figure out exactly what the original coder was trying to do and a performance hell.
But then, third normal form is, in general, idiotic except in the design phase. After you've decided on individual data, then collect them into records (oh, sorry, I'll have to do penance for not using the correct theological term, tubles).
lol, despite my "formal" education, I never got used to calling them "tuples". It just sounds too much like a nonsense word to me, and it confuses the hell out of most people compared to "records" and "rows".
One table for one major set of info, and a key or two across several. Classic is an entire year's monthly payments for one customer on *one* record, not 12 records, as it would be in third normal.
I'm actually leaning towards highly normalized schema but instead of doing joins in queries, I'd do it in my application code. I haven't formally tested and benchmarked things but it would seem that getting the dbms to return 10 matching rows from a 1 million row table of say 100 bytes rows, then calling for 10 records matching those rows out of another 1 million 1KB rows, is going to be a lot faster than letting the dbms attempt to create a 1 million 100bytes x 1 million KB product just to pull those same 10 rows. Unless the dbms' internal optimization logic works every time. Maybe somebody with better understanding of mysql/postgresql innards can shed some light on this.