Hello listmates,
I got a rather strange situation that I can't quite make sense of. OK, I've got a very large data file to sort (hundreds of millions of lines) and I decided to use MySQL for the purpose. I inserted the stuff into a table easily enough. Then I decided to sort it and got stuck as it turned out that MySQL, unless specifically configured to do otherwise, puts temporary files in /tmp which simply was not sufficiently large. Then I changed that directory to a partition that had more space (let's call the new temp driectory /home/big-temp) and now as I am running the query aimed at sorting the data it seems like space, according to the df, is no longer being used up under / (which was there /tmp was) but is now being used up in the right partition ( /home, the large one). Yet /home/big-temp is still empty!
So how is that possible? I know there are all kinds of file locking possible depending on the circumstances, file system type, etc - but how can it be that the files, if they exist, are not even visible to ls, even "ls -a", even run by root?
Thanks in advance for any clarification.
Cheers,
Boris.
On Mon, 2012-02-13 at 16:45 -0500, Boris Epstein wrote:
Hello listmates, I got a rather strange situation that I can't quite make sense of. OK, I've got a very large data file to sort (hundreds of millions of lines) and I decided to use MySQL for the purpose. I inserted the stuff into a table easily enough. Then I decided to sort it and got stuck as it turned out that MySQL, unless specifically configured to do otherwise, puts temporary files in /tmp which simply was not sufficiently large. Then I changed that directory to a partition that had more space (let's call the new temp driectory /home/big-temp) and now as I am running the query aimed at sorting the data it seems like space, according to the df, is no longer being used up under / (which was there /tmp was) but is now being used up in the right partition ( /home, the large one). Yet /home/big-temp is still empty! So how is that possible?
Easy. It is using temporary files the *correct* way.
1. Open file 2. Unlink file 3. Use file 4. Close file
This means (a) even if the process abends the resources allocated to the file are released and (b) an external process can't see [or modify] the temporary file.
When a file is unlinked it remains 'active' until all references to the file are released - but the daemon is still holding a reference [because it is using the file].
There is a file there, but nobody, not even root, can see it.
Actually you can; if you look in /proc/{pid#}/fd ...
On Mon, Feb 13, 2012 at 4:58 PM, Adam Tauno Williams <awilliam@whitemice.org
wrote:
On Mon, 2012-02-13 at 16:45 -0500, Boris Epstein wrote:
Hello listmates, I got a rather strange situation that I can't quite make sense of. OK,
I've
got a very large data file to sort (hundreds of millions of lines) and I decided to use MySQL for the purpose. I inserted the stuff into a table easily enough. Then I decided to sort it and got stuck as it turned out that MySQL, unless specifically configured to do otherwise, puts
temporary
files in /tmp which simply was not sufficiently large. Then I changed
that
directory to a partition that had more space (let's call the new temp driectory /home/big-temp) and now as I am running the query aimed at sorting the data it seems like space, according to the df, is no longer being used up under / (which was there /tmp was) but is now being used up in the right partition ( /home, the large one). Yet /home/big-temp is
still
empty! So how is that possible?
Easy. It is using temporary files the *correct* way.
- Open file
- Unlink file
- Use file
- Close file
This means (a) even if the process abends the resources allocated to the file are released and (b) an external process can't see [or modify] the temporary file.
When a file is unlinked it remains 'active' until all references to the file are released - but the daemon is still holding a reference [because it is using the file].
There is a file there, but nobody, not even root, can see it.
Actually you can; if you look in /proc/{pid#}/fd ...
-- System & Network Administrator [ LPI & NCLA ] http://www.whitemiceconsulting.com OpenGroupware Developer http://www.opengroupware.us Adam Tauno Williams
CentOS mailing list CentOS@centos.org http://lists.centos.org/mailman/listinfo/centos
Adam, thanks!
I haven't thought of it this way. In fact it matches with this description:
http://dev.mysql.com/doc/refman/5.1/en/temporary-files.html
I can kind of see the advantages; the disadvantages, of course, are that a major transaction can not be resumed in case the mysqld process is stopped.
Boris.
On Mon, 2012-02-13 at 17:06 -0500, Boris Epstein wrote:
On Mon, Feb 13, 2012 at 4:58 PM, Adam Tauno Williams <awilliam@whitemice.org
wrote: On Mon, 2012-02-13 at 16:45 -0500, Boris Epstein wrote:
Hello listmates, I got a rather strange situation that I can't quite make sense of. OK,
I've
got a very large data file to sort (hundreds of millions of lines) and I decided to use MySQL for the purpose. I inserted the stuff into a table easily enough. Then I decided to sort it and got stuck as it turned out that MySQL, unless specifically configured to do otherwise, puts
temporary
files in /tmp which simply was not sufficiently large. Then I changed
that
directory to a partition that had more space (let's call the new temp driectory /home/big-temp) and now as I am running the query aimed at sorting the data it seems like space, according to the df, is no longer being used up under / (which was there /tmp was) but is now being used up in the right partition ( /home, the large one). Yet /home/big-temp is
still
empty! So how is that possible?
Easy. It is using temporary files the *correct* way.
- Open file
- Unlink file
- Use file
- Close file
This means (a) even if the process abends the resources allocated to the file are released and (b) an external process can't see [or modify] the temporary file. When a file is unlinked it remains 'active' until all references to the file are released - but the daemon is still holding a reference [because it is using the file]. There is a file there, but nobody, not even root, can see it. Actually you can; if you look in /proc/{pid#}/fd ...
I haven't thought of it this way. In fact it matches with this description: http://dev.mysql.com/doc/refman/5.1/en/temporary-files.html I can kind of see the advantages; the disadvantages, of course, are that a major transaction can not be resumed in case the mysqld process is stopped.
If the process is stopped the transaction cannot be resumed for a myriad reasons; loss of the temporary file is a trivial concern.
If the process is stopped the transaction cannot be resumed for a myriad reasons; loss of the temporary file is a trivial concern.
-- System & Network Administrator [ LPI & NCLA ] http://www.whitemiceconsulting.com OpenGroupware Developer http://www.opengroupware.us Adam Tauno Williams
CentOS mailing list CentOS@centos.org http://lists.centos.org/mailman/listinfo/centos
True - although if you somehow track your progress within the transaction making it resumable could be possible.
Boris.
On Mon, Feb 13, 2012 at 4:06 PM, Boris Epstein borepstein@gmail.com wrote:
I haven't thought of it this way. In fact it matches with this description:
http://dev.mysql.com/doc/refman/5.1/en/temporary-files.html
I can kind of see the advantages; the disadvantages, of course, are that a major transaction can not be resumed in case the mysqld process is stopped.
I don't think it would make much sense to try to finish a query after a restart since at that point the temp table might not reflect the real data anyway. Have you tried adding an index on the fields in your query so it doesn't have to sort everything when you make the query? Mysql isn't too bright about optimizing 3-table joins, but with one or 2 and pre-indexed fields it shouldn't need much time or extra space.
Easy. It is using temporary files the *correct* way.
- Open file
- Unlink file
- Use file
- Close file
This means (a) even if the process abends the resources allocated to the file are released and (b) an external process can't see [or modify] the temporary file.
When a file is unlinked it remains 'active' until all references to the file are released - but the daemon is still holding a reference [because it is using the file].
There is a file there, but nobody, not even root, can see it.
Actually you can; if you look in /proc/{pid#}/fd ...
-- System & Network Administrator [ LPI & NCLA ] http://www.whitemiceconsulting.com OpenGroupware Developer http://www.opengroupware.us Adam Tauno Williams
In fact, I did look in /proc/{pid#}/fd and found the file names, thanks! Coud quite figure out the size of those invisible files - but no matter, hopefully I've got enough room.
Thanks.
Boris.
On 02/13/2012 11:31 PM, Boris Epstein wrote:
In fact, I did look in /proc/{pid#}/fd and found the file names, thanks! Coud quite figure out the size of those invisible files - but no matter, hopefully I've got enough room.
ls -l /proc/{pid#}/fd
On Mon, Feb 13, 2012 at 6:53 PM, Ljubomir Ljubojevic office@plnet.rswrote:
On 02/13/2012 11:31 PM, Boris Epstein wrote:
In fact, I did look in /proc/{pid#}/fd and found the file names, thanks! Coud quite figure out the size of those invisible files - but no matter, hopefully I've got enough room.
ls -l /proc/{pid#}/fd
--
Ljubomir Ljubojevic (Love is in the Air) PL Computers Serbia, Europe
Google is the Mother, Google is the Father, and traceroute is your trusty Spiderman... StarOS, Mikrotik and CentOS/RHEL/Linux consultant _______________________________________________ CentOS mailing list CentOS@centos.org http://lists.centos.org/mailman/listinfo/centos
Ljubomir,
Here we go:
[root@gala ~]# ls -l /proc/18702/fd total 0 lr-x------ 1 root root 64 Feb 13 18:58 0 -> /dev/null lrwx------ 1 root root 64 Feb 13 18:58 1 -> /var/log/mysqld.log lrwx------ 1 root root 64 Feb 13 18:58 10 -> socket:[1245288] lrwx------ 1 root root 64 Feb 13 18:58 11 -> /home/mysql/tmp/ibEvxyKc (deleted) lrwx------ 1 root root 64 Feb 13 18:58 12 -> socket:[1245289] lrwx------ 1 root root 64 Feb 13 18:58 13 -> /home/mysql/mysql_data/mysql/host.MYI lrwx------ 1 root root 64 Feb 13 18:58 14 -> /home/mysql/mysql_data/mysql/host.MYD lrwx------ 1 root root 64 Feb 13 18:58 15 -> /home/mysql/mysql_data/mysql/user.MYI lrwx------ 1 root root 64 Feb 13 18:58 16 -> /home/mysql/mysql_data/mysql/user.MYD lrwx------ 1 root root 64 Feb 13 18:58 17 -> /home/mysql/mysql_data/mysql/db.MYI lrwx------ 1 root root 64 Feb 13 18:58 18 -> /home/mysql/mysql_data/mysql/db.MYD lrwx------ 1 root root 64 Feb 13 18:58 19 -> /home/mysql/mysql_data/mysql/tables_priv.MYI lrwx------ 1 root root 64 Feb 13 18:58 2 -> /var/log/mysqld.log lrwx------ 1 root root 64 Feb 13 18:58 20 -> /home/mysql/mysql_data/mysql/tables_priv.MYD lrwx------ 1 root root 64 Feb 13 18:58 21 -> /home/mysql/mysql_data/mysql/columns_priv.MYI lrwx------ 1 root root 64 Feb 13 18:58 22 -> /home/mysql/mysql_data/mysql/columns_priv.MYD lrwx------ 1 root root 64 Feb 13 18:58 23 -> /home/mysql/mysql_data/mysql/procs_priv.MYI lrwx------ 1 root root 64 Feb 13 18:58 24 -> /home/mysql/mysql_data/mysql/procs_priv.MYD lrwx------ 1 root root 64 Feb 13 18:58 25 -> /home/mysql/mysql_data/mysql/servers.MYI lrwx------ 1 root root 64 Feb 13 18:58 26 -> /home/mysql/mysql_data/mysql/servers.MYD lrwx------ 1 root root 64 Feb 13 18:58 27 -> /home/mysql/mysql_data/mysql/event.MYI lrwx------ 1 root root 64 Feb 13 18:58 28 -> /home/mysql/mysql_data/mysql/event.MYD lrwx------ 1 root root 64 Feb 13 18:58 29 -> socket:[1245325] lrwx------ 1 root root 64 Feb 13 18:58 3 -> /home/mysql/mysql_data/ibdata1 lrwx------ 1 root root 64 Feb 13 18:58 30 -> /home/mysql/mysql_data/text_sort/local_list.MYI lrwx------ 1 root root 64 Feb 13 18:58 31 -> /home/mysql/mysql_data/text_sort/local_list.MYD lrwx------ 1 root root 64 Feb 13 18:58 32 -> /home/mysql/mysql_data/text_sort/local_list_sorted.MYI lrwx------ 1 root root 64 Feb 13 18:58 33 -> /home/mysql/mysql_data/text_sort/local_list_sorted.MYD lrwx------ 1 root root 64 Feb 13 18:58 34 -> /home/mysql/mysql_data/text_sort/tar_list.MYI lrwx------ 1 root root 64 Feb 13 18:58 35 -> /home/mysql/mysql_data/text_sort/tar_list.MYD lrwx------ 1 root root 64 Feb 13 18:58 36 -> /home/mysql/tmp/MYbUgIFT (deleted) lrwx------ 1 root root 64 Feb 13 18:58 37 -> /home/mysql/tmp/MYizt5rL (deleted) lrwx------ 1 root root 64 Feb 13 18:58 38 -> socket:[1249477] lrwx------ 1 root root 64 Feb 13 18:58 39 -> socket:[1340698] lrwx------ 1 root root 64 Feb 13 18:58 4 -> /home/mysql/tmp/ibvjZa8E (deleted) lrwx------ 1 root root 64 Feb 13 18:58 40 -> /home/mysql/mysql_data/text_sort/local_list.MYD lrwx------ 1 root root 64 Feb 13 18:58 41 -> /home/mysql/mysql_data/text_sort/local_list_sorted.MYD lrwx------ 1 root root 64 Feb 13 18:58 5 -> /home/mysql/tmp/ib08cXW2 (deleted) lrwx------ 1 root root 64 Feb 13 18:58 6 -> /home/mysql/tmp/ibtcHJLq (deleted) lrwx------ 1 root root 64 Feb 13 18:58 7 -> /home/mysql/tmp/ibE0DGCO (deleted) lrwx------ 1 root root 64 Feb 13 18:58 8 -> /home/mysql/mysql_data/ib_logfile0 lrwx------ 1 root root 64 Feb 13 18:58 9 -> /home/mysql/mysql_data/ib_logfile1 [root@gala ~]#
Now the files of interest are the ones marked deleted in /home/mysql/tmp . Now how do I tell what their size is?
Thanks.
Boris.
On Mon, 2012-02-13 at 18:59 -0500, Boris Epstein wrote:
Here we go: [root@gala ~]# ls -l /proc/18702/fd total 0
...
lrwx------ 1 root root 64 Feb 13 18:58 7 -> /home/mysql/tmp/ibE0DGCO (deleted) lrwx------ 1 root root 64 Feb 13 18:58 8 -> /home/mysql/mysql_data/ib_logfile0 lrwx------ 1 root root 64 Feb 13 18:58 9 -> /home/mysql/mysql_data/ib_logfile1 [root@gala ~]# Now the files of interest are the ones marked deleted in /home/mysql/tmp . Now how do I tell what their size is?
I wouldn't bother pursuing this angle further; I am not an MySQL admin, but it must provide performance reporting, statistics, and explain information [PostgreSQL does, as do most other database engines]. Check the MySQL docs for how you monitor the utilization of temp spaces. If it can't provide this information choose another solution.
It is best not to dig around under engines. Use the tools and let them tell you, otherwise you will build around implementation details.