[CentOS] time foo

Fri Dec 1 19:52:42 UTC 2017
Mark Haney <mark.haney at neonova.net>

On 12/01/2017 02:32 PM, hw wrote:

> 
> Hm.  Foo is a program that imports data into a database from two CVS files,
> using a connection for each file and forking to import both files at once.
> 
> So this would mean that the database (running on a different server) takes
> almost two times as much as foo --- which I would consider kinda 
> excruciatingly
> long because it´s merely inserting rows into two different tables after 
> they were
> prepared by foo and then processes some queries to convert the data.
> 
> The queries after importing may take like 3 or 5 minutes.  About 4.5 
> million rows
> are being imported.
> 
> Would you consider about 20 minutes for importing as long?

There are far too many variables you've not mentioned to determine if 
that's good or bad (or very bad).  Is the connection a local connection 
(ie the import is done on the DB server) or a network connection?

What size are the CSV (CVS is a typo, correct?) files?  4.5M rows tells 
us nothing about how much data each row has.  It could be 4.5M rows of 
one INT field or 4.5M rows of a hundred fields.

I'm a bit confused by the last two sentences.  Based on how I read this:

1. Foo is prepping (creating?) the tables
2. Processes queries to convert the data (to CSV?)
3. Runs more queries on those tables.

Or it could be:

1. Foo preps the tables
2. Foo imports the CSV files
3. Foo does post-processing of the tables.

It's not really clear the actual process, but I'll go on the assumption 
that Foo is creating the tables with the correct fields, data types, 
keys and hopefully indices. Then dumps the CSV files into the tables. 
Then does post-processing.  (I've written similar scripts, so this is 
the most logical process to me.)

If we assume network bandwidth is fine, that still leaves far too many 
server variables to know if 20m is about right or not.  Amount of data 
to import, TYPE of data, database AND server configuration, CPU, RAM, 
etc and DB config for tunable paramters like buffer pool, read/write I/O 
threads, etc.

IIRC, you posted some questions about tuning a DB server a while back, 
would this be data going into that server, perhaps?

I'd like to offer a helpful suggestion when asking for list help.  It's 
better to provide TOO MUCH information, than too little.  There's a big 
difference between 'my printer won't print' and 'my printer won't print 
because it's not feeding paper properly'.


-- 
Mark Haney
Network Engineer at NeoNova
919-460-3330 option 1
mark.haney at neonova.net
www.neonova.net