John R Pierce wrote:
On 12/1/2017 11:32 AM, hw wrote:
So this would mean that the database (running on a different server) takes almost two times as much as foo --- which I would consider kinda excruciatingly long because it´s merely inserting rows into two different tables after they were prepared by foo and then processes some queries to convert the data.
The queries after importing may take like 3 or 5 minutes. About 4.5 million rows are being imported.
so you're missing about 25 minutes, and maybe 5 minutes is spent post processing, so thats 20 minutes spent in the data insertion?
Yes, with the 15 minutes actually spent on foo spent on converting the fields and sending them to the server, which I think is pretty good.
inserting one row at a time? or in batches? remeber a database server is going to do commits after each transaction, which forces the data to be flushed to disk. 4.5 million seperate row transactions, yeah, I could see that taking some time, plus add that many network round trips, etcetc. if the db server just has a single SATA disk, you're doing 9 million committed writes combined to the two tables? 20 minutes for 9 million inserts, thats 7500 per second.
They are inserted one row at a time, during one transaction for each of the CSV files. I´d have to figure out how to insert them in batches, that might yet be faster. I could easily stack up 1000 rows or so and then insert them all at once, if that´s possible.