I'm getting tired of converting spreadsheets that someone else updates to csv so my perl scripts can push the data into a mysql database. Is there a better way? I haven't had much luck with perl-Spreadsheet-ParseExcel (and find it odd that yum prefers the .32 version from epel over .57 from rpmforge anyway). Is the current CPAN version better? Or the equivalent java tools? Or maybe a scripted OpenOffice conversion would be possible.
Needs to deal with both xls and xlsx formats, the odd characters that are confused with quotes even after csv conversion, numbers with $'s and commas embedded, excel's date formatting nonsense, etc.
On Mon, 2010-10-18 at 17:13 -0500, Les Mikesell wrote:
I'm getting tired of converting spreadsheets that someone else updates to csv so my perl scripts can push the data into a mysql database. Is there a better way? I haven't had much luck with perl-Spreadsheet-ParseExcel (and find it odd that yum prefers the .32 version from epel over .57 from rpmforge anyway). Is the current CPAN version better? Or the equivalent java tools? Or maybe a scripted OpenOffice conversion would be possible.
Needs to deal with both xls and xlsx formats, the odd characters that are confused with quotes even after csv conversion, numbers with $'s and commas embedded, excel's date formatting nonsense, etc.
--- I think you are out of luck on that. .Net has a whole world of Office Goodies what a shame... Extract the CSV Data then do a insert into MySQL. Is that how you do it now?
John
On 10/18/2010 5:31 PM, JohnS wrote:
On Mon, 2010-10-18 at 17:13 -0500, Les Mikesell wrote:
I'm getting tired of converting spreadsheets that someone else updates to csv so my perl scripts can push the data into a mysql database. Is there a better way? I haven't had much luck with perl-Spreadsheet-ParseExcel (and find it odd that yum prefers the .32 version from epel over .57 from rpmforge anyway). Is the current CPAN version better? Or the equivalent java tools? Or maybe a scripted OpenOffice conversion would be possible.
Needs to deal with both xls and xlsx formats, the odd characters that are confused with quotes even after csv conversion, numbers with $'s and commas embedded, excel's date formatting nonsense, etc.
I think you are out of luck on that. .Net has a whole world of Office Goodies what a shame... Extract the CSV Data then do a insert into MySQL. Is that how you do it now?
Yes, someone emails an xls or xlsx, I do a 'save as' csv, but it's not a straight insert after that. I read it into perl and do some checking and conversions, depending on the data involved, then an insert or update. I expected the db to be the authoritative copy but I keep getting batches of wholesale modifications to merge in so I'd like to automate it a little more completely.
On Mon, 2010-10-18 at 17:54 -0500, Les Mikesell wrote:
On 10/18/2010 5:31 PM, JohnS wrote:
On Mon, 2010-10-18 at 17:13 -0500, Les Mikesell wrote:
I'm getting tired of converting spreadsheets that someone else updates to csv so my perl scripts can push the data into a mysql database. Is there a better way? I haven't had much luck with perl-Spreadsheet-ParseExcel (and find it odd that yum prefers the .32 version from epel over .57 from rpmforge anyway). Is the current CPAN version better? Or the equivalent java tools? Or maybe a scripted OpenOffice conversion would be possible.
Needs to deal with both xls and xlsx formats, the odd characters that are confused with quotes even after csv conversion, numbers with $'s and commas embedded, excel's date formatting nonsense, etc.
I think you are out of luck on that. .Net has a whole world of Office Goodies what a shame... Extract the CSV Data then do a insert into MySQL. Is that how you do it now?
Yes, someone emails an xls or xlsx, I do a 'save as' csv, but it's not a straight insert after that. I read it into perl and do some checking and conversions, depending on the data involved, then an insert or update. I expected the db to be the authoritative copy but I keep getting batches of wholesale modifications to merge in so I'd like to automate it a little more completely.
--- This is a really old way here: It can be scripted though..."LOAD DATA INFILE" is the key here check it out.
LOAD DATA INFILE '/my.csv' REPLACE INTO TABLE `test` FIELDS TERMINATED BY ',' ENCLOSED BY '"' ESCAPED BY '\' LINES TERMINATED BY '\r\n'
On 10/18/2010 6:16 PM, JohnS wrote:
On Mon, 2010-10-18 at 17:54 -0500, Les Mikesell wrote:
On 10/18/2010 5:31 PM, JohnS wrote:
On Mon, 2010-10-18 at 17:13 -0500, Les Mikesell wrote:
I'm getting tired of converting spreadsheets that someone else updates to csv so my perl scripts can push the data into a mysql database. Is there a better way? I haven't had much luck with perl-Spreadsheet-ParseExcel (and find it odd that yum prefers the .32 version from epel over .57 from rpmforge anyway). Is the current CPAN version better? Or the equivalent java tools? Or maybe a scripted OpenOffice conversion would be possible.
Needs to deal with both xls and xlsx formats, the odd characters that are confused with quotes even after csv conversion, numbers with $'s and commas embedded, excel's date formatting nonsense, etc.
I think you are out of luck on that. .Net has a whole world of Office Goodies what a shame... Extract the CSV Data then do a insert into MySQL. Is that how you do it now?
Yes, someone emails an xls or xlsx, I do a 'save as' csv, but it's not a straight insert after that. I read it into perl and do some checking and conversions, depending on the data involved, then an insert or update. I expected the db to be the authoritative copy but I keep getting batches of wholesale modifications to merge in so I'd like to automate it a little more completely.
This is a really old way here: It can be scripted though..."LOAD DATA INFILE" is the key here check it out.
LOAD DATA INFILE '/my.csv' REPLACE INTO TABLE `test` FIELDS TERMINATED BY ',' ENCLOSED BY '"' ESCAPED BY '\' LINES TERMINATED BY '\r\n'
But the piece I want to script is the saving as csv in the first place. Plus ways to work around the gunk that excel can put in a csv file and the date format that is nothing like what sql wants.
On Mon, Oct 18, 2010, Les Mikesell wrote:
I'm getting tired of converting spreadsheets that someone else updates to csv so my perl scripts can push the data into a mysql database. Is there a better way? I haven't had much luck with perl-Spreadsheet-ParseExcel (and find it odd that yum prefers the .32 version from epel over .57 from rpmforge anyway). Is the current CPAN version better? Or the equivalent java tools? Or maybe a scripted OpenOffice conversion would be possible.
Needs to deal with both xls and xlsx formats, the odd characters that are confused with quotes even after csv conversion, numbers with $'s and commas embedded, excel's date formatting nonsense, etc.
I don't do much perl these days, having switched to python for most of my stuff.
There is at least one python package for this:
http://pypi.python.org/pypi/xlrd
A google search on ``python excel reader'' came up with quite a few hits.
Of course there are easy python dbi interfaces to mysql, postgresql, and other SQL databases as well.
Bill
On Mon, Oct 18, 2010 at 3:13 PM, Les Mikesell lesmikesell@gmail.com wrote:
I'm getting tired of converting spreadsheets that someone else updates to csv so my perl scripts can push the data into a mysql database. Is there a better way? I haven't had much luck with perl-Spreadsheet-ParseExcel (and find it odd that yum prefers the .32 version from epel over .57 from rpmforge anyway). Is the current CPAN version better? Or the equivalent java tools? Or maybe a scripted OpenOffice conversion would be possible.
Needs to deal with both xls and xlsx formats, the odd characters that are confused with quotes even after csv conversion, numbers with $'s and commas embedded, excel's date formatting nonsense, etc.
Hi, Les.
xlhtml has a switch, -csv, to output in Comma Separated Values
http://chicago.sourceforge.net/xlhtml/
I am not sure if it'll do everything you want, it's a few years old, but may be worth a look.
Best, -at
Les Mikesell wrote, On 10/18/2010 06:13 PM:
I'm getting tired of converting spreadsheets that someone else updates to csv so my perl scripts can push the data into a mysql database. Is there a better way? I haven't had much luck with perl-Spreadsheet-ParseExcel (and find it odd that yum prefers the .32 version from epel over .57 from rpmforge anyway). Is the current CPAN version better? Or the equivalent java tools? Or maybe a scripted OpenOffice conversion would be possible.
Needs to deal with both xls and xlsx formats, the odd characters that are confused with quotes even after csv conversion, numbers with $'s and commas embedded, excel's date formatting nonsense, etc.
Would it cause more headaches than it would solve, for you to hook the excel folks directly to the mysql db and have their changes take place immediately? Assuming a LAN environment here instead of 'the only connection is email'. Could you do the sanity checking you currently do by using some db functions?
"MySQL Forums :: Microsoft Access :: Connecting MS Office, MS Excel, MS Access to MySQL using ODBC" http://forums.mysql.com/read.php?65,148441,148441
* OK, I often come at problems from a different direction. *
On 10/19/2010 9:34 AM, Todd Denniston wrote:
I'm getting tired of converting spreadsheets that someone else updates to csv so my perl scripts can push the data into a mysql database. Is there a better way? I haven't had much luck with perl-Spreadsheet-ParseExcel (and find it odd that yum prefers the .32 version from epel over .57 from rpmforge anyway). Is the current CPAN version better? Or the equivalent java tools? Or maybe a scripted OpenOffice conversion would be possible.
Needs to deal with both xls and xlsx formats, the odd characters that are confused with quotes even after csv conversion, numbers with $'s and commas embedded, excel's date formatting nonsense, etc.
Would it cause more headaches than it would solve, for you to hook the excel folks directly to the mysql db and have their changes take place immediately? Assuming a LAN environment here instead of 'the only connection is email'.
That's pretty much impossible in the near term anyway. The bulk of this involves reconciling inventory data maintained by one set of people for financial purposes in a database I don't control with some others used for operational monitoring and management. For example, we need to be able to report the current value of the set of equipment being used for a particular purpose - where servers are being shuffled around for different purposes all the time. I'm using ocsinventory-ng for operational tracking because the agents keep it updated automatically but it only handles computers and by itself doesn't deal with cost or deprecation. New requirements keep popping up as we go and I don't find out about them until someone sends me a spreadsheet with some new fields and a request to add them to the db and populate them so they'll be available in future reports.
Could you do the sanity checking you currently do by using some db functions?
Maybe, but doing string operations in sql instead of using perl regexps seems a little insane by itself.
"MySQL Forums :: Microsoft Access :: Connecting MS Office, MS Excel, MS Access to MySQL using ODBC" http://forums.mysql.com/read.php?65,148441,148441
- OK, I often come at problems from a different direction. *
Yeah, the inconvenient parts could probably be done in vbscript or something on a windows box, pushing the results into the db through odbc, but I thought this would be a common enough problem that cross-platform tools would be available. I am using some java stuff on the reporting side - maybe I should look there for conversion tools too.
On 10/19/2010 11:17 AM, Les Mikesell wrote:
On 10/19/2010 9:34 AM, Todd Denniston wrote:
I'm getting tired of converting spreadsheets that someone else updates to csv so my perl scripts can push the data into a mysql database. Is there a better way? I haven't had much luck with perl-Spreadsheet-ParseExcel (and find it odd that yum prefers the .32 version from epel over .57 from rpmforge anyway). Is the current CPAN version better? Or the equivalent java tools? Or maybe a scripted OpenOffice conversion would be possible.
Needs to deal with both xls and xlsx formats, the odd characters that are confused with quotes even after csv conversion, numbers with $'s and commas embedded, excel's date formatting nonsense, etc.
Would it cause more headaches than it would solve, for you to hook the excel folks directly to the mysql db and have their changes take place immediately? Assuming a LAN environment here instead of 'the only connection is email'.
That's pretty much impossible in the near term anyway. The bulk of this involves reconciling inventory data maintained by one set of people for financial purposes in a database I don't control with some others used for operational monitoring and management. For example, we need to be able to report the current value of the set of equipment being used for a particular purpose - where servers are being shuffled around for different purposes all the time. I'm using ocsinventory-ng for operational tracking because the agents keep it updated automatically but it only handles computers and by itself doesn't deal with cost or deprecation. New requirements keep popping up as we go and I don't find out about them until someone sends me a spreadsheet with some new fields and a request to add them to the db and populate them so they'll be available in future reports.
Could you do the sanity checking you currently do by using some db functions?
Maybe, but doing string operations in sql instead of using perl regexps seems a little insane by itself.
"MySQL Forums :: Microsoft Access :: Connecting MS Office, MS Excel, MS Access to MySQL using ODBC" http://forums.mysql.com/read.php?65,148441,148441
- OK, I often come at problems from a different direction. *
Yeah, the inconvenient parts could probably be done in vbscript or something on a windows box, pushing the results into the db through odbc, but I thought this would be a common enough problem that cross-platform tools would be available. I am using some java stuff on the reporting side - maybe I should look there for conversion tools too.
I experimented a little with this sometime last year - seemed to work okay:
On 19/10/2010 17:48, Toby Bluhm wrote:
On 10/19/2010 11:17 AM, Les Mikesell wrote:
On 10/19/2010 9:34 AM, Todd Denniston wrote:
I'm getting tired of converting spreadsheets that someone else updates to csv so my perl scripts can push the data into a mysql database. Is there a better way? I haven't had much luck with perl-Spreadsheet-ParseExcel (and find it odd that yum prefers the .32 version from epel over .57 from rpmforge anyway). Is the current CPAN version better? Or the equivalent java tools? Or maybe a scripted OpenOffice conversion would be possible.
Needs to deal with both xls and xlsx formats, the odd characters that are confused with quotes even after csv conversion, numbers with $'s and commas embedded, excel's date formatting nonsense, etc.
This might be coming out of left field.... but I have an Excel add-in which, among other things, packs up Excel spreadsheets and posts the content in blocks to a web server as csv. The other clever bit is that it works on column headings so that it doesn't break if users add/remove/switch columns. It also provides an acknowledgement mechanism which colours and comments a cell for each line so you can notify if the line looks borked to the server.
It does the opposite as well. Using an embedded web browser to find and select info then the server presents an html table which the add-in unpacks into the spreadsheet.
If anyone wants it then please ask off-list. I'll need to strip some of the functionality out as it's confidential business process info.
Yeah, the inconvenient parts could probably be done in vbscript or something on a windows box, pushing the results into the db through odbc, but I thought this would be a common enough problem that cross-platform tools would be available. I am using some java stuff on the reporting side - maybe I should look there for conversion tools too.
Yep, there a few java excel libraries like:
http://jexcelapi.sourceforge.net/ - we are using this one for dumping into excel spreadsheets ...
or
On Wed, 2010-10-20 at 09:39 +0300, Alexander Georgiev wrote:
Yep, there a few java excel libraries like:
Interesting. Are you aware of any similar libraries for C?
2010/10/20 Frank Cox theatre@sasktel.net:
Yep, there a few java excel libraries like:
Interesting. Are you aware of any similar libraries for C?
Nope, but you can always use the COM interface.
On 10/20/10 1:11 AM, mehdi wrote:
how
On Windows, Excel is all one big set of ActiveX objects with 100s of methods, you can programmatically dink around with most every aspect of it.
but, this whole thread has drifted far far away from CentOS and probably belongs on a completely different mail list.
On 10/20/10 3:11 AM, Alexander Georgiev wrote:
2010/10/20 Frank Coxtheatre@sasktel.net:
Yep, there a few java excel libraries like:
Interesting. Are you aware of any similar libraries for C?
Nope, but you can always use the COM interface.
But that only works under windows...
--- Les Mikesell lesmikesell@gmail.com
On 10/18/10 3:13 PM, Les Mikesell wrote:
I'm getting tired of converting spreadsheets that someone else updates to csv so my perl scripts can push the data into a mysql database. Is there a better way? I haven't had much luck with perl-Spreadsheet-ParseExcel (and find it odd that yum prefers the .32 version from epel over .57 from rpmforge anyway). Is the current CPAN version better? Or the equivalent java tools? Or maybe a scripted OpenOffice conversion would be possible.
Needs to deal with both xls and xlsx formats, the odd characters that are confused with quotes even after csv conversion, numbers with $'s and commas embedded, excel's date formatting nonsense, etc.
I suspect it would be easiest to implement this as a set of Excel macros that connect to your mysql with ODBC and write the data directly to your database.
On Tue, Oct 19, 2010 at 12:13 AM, Les Mikesell lesmikesell@gmail.com wrote:
I'm getting tired of converting spreadsheets that someone else updates to csv so my perl scripts can push the data into a mysql database. Is there a better way? I haven't had much luck with perl-Spreadsheet-ParseExcel (and find it odd that yum prefers the .32 version from epel over .57 from rpmforge anyway). Is the current CPAN version better? Or the equivalent java tools? Or maybe a scripted OpenOffice conversion would be possible.
No idea about the csv from excel, but I have had good experiences with Text::CSV_XS.
On 10/19/2010 09:13 AM, Les Mikesell wrote: ...
I haven't had much luck with perl-Spreadsheet-ParseExcel (and find it odd that yum prefers the .32 version from epel over .57 from rpmforge anyway). Is the current CPAN version better?
...
Needs to deal with both xls and xlsx formats, the odd characters that are confused with quotes even after csv conversion, numbers with $'s and commas embedded, excel's date formatting nonsense, etc.
Looking at the changelog, version .57 of perl-Spreadsheet-ParseExcel fixes some of the above issues. Not clear from the above whether you have tried it or not. The developers would probably be very interested in any examples that break the parser.
Kal
On 10/20/2010 3:49 PM, Kahlil Hodgson wrote:
On 10/19/2010 09:13 AM, Les Mikesell wrote: ...
I haven't had much luck with perl-Spreadsheet-ParseExcel (and find it odd that yum prefers the .32 version from epel over .57 from rpmforge anyway). Is the current CPAN version better?
...
Needs to deal with both xls and xlsx formats, the odd characters that are confused with quotes even after csv conversion, numbers with $'s and commas embedded, excel's date formatting nonsense, etc.
Looking at the changelog, version .57 of perl-Spreadsheet-ParseExcel fixes some of the above issues. Not clear from the above whether you have tried it or not. The developers would probably be very interested in any examples that break the parser.
I just went as far as seeing it wouldn't take .xlsx (2007+ default format). It does look like the rpmforge .57 version will accept the .xls format file, although I think it seems slower than loading excel and doing a 'save as' to get the csv. For numbers, cell->unformatted() would give a real number instead of having to yank the commas out of the csv or $cell->value() versions, but dates don't look like what sql wants either way.
On Wed, 2010-10-20 at 16:46 -0500, Les Mikesell wrote:
I just went as far as seeing it wouldn't take .xlsx (2007+ default format). It does look like the rpmforge .57 version will accept the .xls format file, although I think it seems slower than loading excel and doing a 'save as' to get the csv. For numbers, cell->unformatted() would give a real number instead of having to yank the commas out of the csv or $cell->value() versions, but dates don't look like what sql wants either way.
--- Looks very promising for at least reading. Is interesting to me also because I have many excel files. http://pypi.python.org/pypi/xlrd
John
On 20/10/10 2:46 PM, "Les Mikesell" lesmikesell@gmail.com wrote:
On 10/20/2010 3:49 PM, Kahlil Hodgson wrote:
On 10/19/2010 09:13 AM, Les Mikesell wrote: ...
I haven't had much luck with perl-Spreadsheet-ParseExcel (and find it odd that yum prefers the .32 version from epel over .57 from rpmforge anyway). Is the current CPAN version better?
...
Needs to deal with both xls and xlsx formats, the odd characters that are confused with quotes even after csv conversion, numbers with $'s and commas embedded, excel's date formatting nonsense, etc.
Looking at the changelog, version .57 of perl-Spreadsheet-ParseExcel fixes some of the above issues. Not clear from the above whether you have tried it or not. The developers would probably be very interested in any examples that break the parser.
I just went as far as seeing it wouldn't take .xlsx (2007+ default format). It does look like the rpmforge .57 version will accept the .xls format file, although I think it seems slower than loading excel and doing a 'save as' to get the csv. For numbers, cell->unformatted() would give a real number instead of having to yank the commas out of the csv or $cell->value() versions, but dates don't look like what sql wants either way.
Les,
You might want to look at Spreadsheet::XLSX:
http://search.cpan.org/~dmow/Spreadsheet-XLSX-0.13-withoutworldwriteables/li b/Spreadsheet/XLSX.pm
It can read XLSX files from a quick read of the CPAN page.