On Thu, 2008-11-06 at 16:32 -0500, David Ross wrote:
<snip>
There many factors which impact database i/o performance. You are comparing two probably dissimilar i/o paths. Configuration differences (disk, controller, connection, kernel parameters) may be part of the problem.
Physical vs. logical sequence, i.e. disk head movement, is particularly significant for sequential access. Throughput increases of 200% to 500% are not uncommon after packing (dump/restore) of a large table.
The database translates and theoretically optimizes sql statements generating query plans. They should be examined to determine if there are doing what is expected.
That's happening with two servers. One is running Oracle another - MySQL. During speed tests database load is very light (less than 1MB/s of reads and writes). During the tests NetApp was serving only these two servers.
What could be causing such slowdown and overall slow read performance? Writes are fast >100MB/s.
Database performance data is another good source of diagnostic info. It can help pinpoint query plan problems.
This article may help you put all the pieces together. It is one of the best I have found:
http://www.miracleas.com/BAARF/oow2000_same.pdf
The author is Juan Loaitza from Oracle.
Hope some of this helps.
I don't know if what I'm about to add is too dated. But here goes. I'm sure much still applies because the basic issues and concepts in play remain unchanged.
What I'm unsure of is if any of this might be related to the problem seen.
Back in the early days of SQL databases, a lot of folks had no understanding of the logical/physical interactions. I helped many folks improve performance (during my contracting/consulting days) just by examination of these two components in conjunction with access patterns.
The critical components were the most frequent access/update patterns versus the physical organization of the primary table(s) and their _primary_ indexes, or hashes. Do DBMs still use those things? Regardless, even with hash tables, these relationships still are important. Also, selection of proper hashing algorithms was important.
Further, keeping the underlying DBs (the physical file(s) organization) physical sequence optimal for the "critical" access profiles was often overlooked. As updates are done, especially add/delete of entries, the physical organization quickly deteriorates with regard to the "critical" access profile(s). It used to be that "holes" would be left in the primary data space, resulting in constantly growing files. IIRC, most major DBMs have addressed this by tracking and re-using these areas. However, that tends to further "corrode" the physical sequence from that desired. Over time, it may be very noticable.
On a very active heavily loaded DB, this can occur quite quickly. The immediate effects could be ameliorated a bit by having sufficient "overflow" areas - "empty spaces" at the end that received new records when using indexes of hash tables. During reorganization the overflow contents would be merged into the appropriate areas and the "overflow" was again pristine.
Periodic "reorganization", a.k.a "reload" can be used to keep performance within acceptable limits.
If the problem system has not been properly organized for the activity profiles encountered, or if it has had lots of updates without being "reorganized", maybe this causes the problem being seen.
In later implementations, these ill effects are somewhat reduced by making transaction queues that attempt to optimize the sequence of activities. However, if not running on a "raw" partition, where the DBM knows the physical location of items, this may not be as effective as possible.
Further, with the advent of these added "in memory" operations, there is a risk of "thrashing" the VM. With heavy use of buffering, caching, etc. even the fantastic speed of today's equipment may suffer enough slowdown to be noticeable, apart from the physical disk activities. This is exacerbated by problems with the physical organization issues mentioned above.
I can't say if any of this is your problem or not. But barring discovery of any system/physical problems, I suspect this might be useful.
On a last note, I'll give an example of potential gains. Keep in mind this was long ago and far away (in my mind) on much slower equipment. The application was written in C (a fair amount by myself) and so I know the code was fast. Speed has always been my thing. A fellow with relatively little experience was in charge of the database(s).
My inquiry application, fairly complex but I knew to be code-efficient, was taking over two minutes (about 2:30 minutes IIRC) to fill a screen (a text sceeen at that). The other _much_ simpler applications ran much faster. But those applications were "peripheral" to the main task, for which I was responsible.
I entered discussion with the database person, educated him a bit on what I though was happening, and suggested some changes based on those items I discussed above. He made the changes and we ran some tests.
All the "peripheral" applications ran just as faster of faster.
Over the next several days of hitting it every way we could think of, as hard as we could, the _maximum_ wait was 9 seconds. The median came in at 6-7 seconds.
I don't know what capabilities are provided to the administrators these days, but if access to tools that give sufficient information and control are available, maybe there is a (partial?) solution hidden in there?
david
<snip sig stuff>
HTH