[CentOS] Painfully slow NetApp with databas

Thu Nov 6 23:15:58 UTC 2008
William L. Maltby <CentOS4Bill at triad.rr.com>

On Thu, 2008-11-06 at 16:32 -0500, David Ross wrote:
> <snip>

> There many factors which impact database i/o performance. You are comparing two probably dissimilar i/o paths. Configuration differences (disk, controller, connection, kernel parameters) may be part of the problem.
> 
> Physical vs. logical sequence, i.e. disk head movement, is particularly significant for sequential access. Throughput increases of 200% to 500% are not uncommon after packing (dump/restore) of a large table.
> 
> The database translates and theoretically optimizes sql statements generating query plans. They should be examined to determine if there are doing what is expected.
> 
> > That's happening with two servers. One is running Oracle another -
> > MySQL. During speed tests database load is very light (less than 1MB/s
> > of reads and writes). During the tests NetApp was serving only these
> > two servers.
> > 
> >   What could be causing such slowdown and overall slow read
> > performance? Writes are fast >100MB/s.
> > 
> 
> Database performance data is another good source of diagnostic info. It can help pinpoint query plan problems.
> 
> This article may help you put all the pieces together. It is one of the best I have found:
> 
> http://www.miracleas.com/BAARF/oow2000_same.pdf
> 
> The author is Juan Loaitza from Oracle.
> 
> 
> Hope some of this helps.

I don't know if what I'm about to add is too dated. But here goes. I'm
sure much still applies because the basic issues and concepts in play
remain unchanged.

What I'm unsure of is if any of this might be related to the problem
seen.

Back in the early days of SQL databases, a lot of folks had no
understanding of the logical/physical interactions. I helped many folks
improve performance (during my contracting/consulting days) just by
examination of these two components in conjunction with access patterns.

The critical components were the most frequent access/update patterns
versus the physical organization of the primary table(s) and their
_primary_ indexes, or hashes. Do DBMs still use those things?
Regardless, even with hash tables, these relationships still are
important. Also, selection of proper hashing algorithms was important.

Further, keeping the underlying DBs (the physical file(s) organization)
physical sequence optimal for the "critical" access profiles was often
overlooked. As updates are done, especially add/delete of entries, the
physical organization quickly deteriorates with regard to the "critical"
access profile(s). It used to be that "holes" would be left in the
primary data space, resulting in constantly growing files. IIRC, most
major DBMs have addressed this by tracking and re-using these areas.
However, that tends to further "corrode" the physical sequence from that
desired. Over time, it may be very noticable.

On a very active heavily loaded DB, this can occur quite quickly. The
immediate effects could be ameliorated a bit by having sufficient
"overflow" areas - "empty spaces" at the end that received new records
when using indexes of hash tables. During reorganization the overflow
contents would be merged into the appropriate areas and the "overflow"
was again pristine.

Periodic "reorganization", a.k.a "reload" can be used to keep
performance within acceptable limits.

If the problem system has not been properly organized for the activity
profiles encountered, or if it has had lots of updates without being
"reorganized", maybe this causes the problem being seen.

In later implementations, these ill effects are somewhat reduced by
making transaction queues that attempt to optimize the sequence of
activities. However, if not running on a "raw" partition, where the DBM
knows the physical location of items, this may not be as effective as
possible.

Further, with the advent of these added "in memory" operations, there is
a risk of "thrashing" the VM. With heavy use of buffering, caching, etc.
even the fantastic speed of today's equipment may suffer enough slowdown
to be noticeable, apart from the physical disk activities. This is
exacerbated by problems with the physical organization issues mentioned
above.

I can't say if any of this is your problem or not. But barring discovery
of any system/physical problems, I suspect this might be useful.

On a last note, I'll give an example of potential gains. Keep in mind
this was long ago and far away (in my mind) on much slower equipment.
The application was written in C (a fair amount by myself) and so I know
the code was fast. Speed has always been my thing. A fellow with
relatively little experience was in charge of the database(s).

My inquiry application, fairly complex but I knew to be code-efficient,
was taking over two minutes (about 2:30 minutes IIRC) to fill a screen
(a text sceeen at that). The other _much_ simpler applications ran much
faster. But those applications were "peripheral" to the main task, for
which I was responsible.

I entered discussion with the database person, educated him a bit on
what I though was happening, and suggested some changes based on those
items I discussed above. He made the changes and we ran some tests.

All the "peripheral" applications ran just as faster of faster.

Over the next several days of hitting it every way we could think of, as
hard as we could, the _maximum_ wait was 9 seconds. The median came in
at 6-7 seconds.

I don't know what capabilities are provided to the administrators these
days, but if access to tools that give sufficient information and
control are available, maybe there is a (partial?) solution hidden in
there?

> 
> david
> <snip sig stuff>

HTH
-- 
Bill