I'd like to start scanning our boxed up documents. I'd say about 30,000 files total. Mostly to eliminate the boxes of paper we have. I'd like to scan them, store them, Have some sort of index, and be able to retrieve them on multiple machines. I think PDF would be the desired format. I'd like be able to set some permissions as well. (not a deal breaker...) I've searched Sourceforge, and have seen knowledgetree, myDMS, contineo, etc, but really would like to hear from someone that is using something similar. Of course, I use Centos, so..... Thanks, Dennis
Depending on how complex a management system you want you could write a small custom management system using only a few php files and a db backend (I would suggest postgres).
Geoff
Sent from my BlackBerry wireless handheld.
-----Original Message----- From: "Dennis McLeod" dmcleod@foranyauto.com
Date: Wed, 12 Sep 2007 16:08:50 To:centos@centos.org Subject: [CentOS] Document Scanning and Storage
I'd like to start scanning our boxed up documents. I'd say about 30,000 files total. Mostly to eliminate the boxes of paper we have. I'd like to scan them, store them, Have some sort of index, and be able to retrieve them on multiple machines. I think PDF would be the desired format. I'd like be able to set some permissions as well. (not a deal breaker...) I've searched Sourceforge, and have seen knowledgetree, myDMS, contineo, etc, but really would like to hear from someone that is using something similar. Of course, I use Centos, so..... Thanks, Dennis
_______________________________________________ CentOS mailing list CentOS@centos.org http://lists.centos.org/mailman/listinfo/centos
Thanks. I've considered that. It'd be a steep learning curve for me. I have yet to get the two working together (PHP and a db), although I did manage to cobble together a php search page for my LDAP server... (I'm an old MCSE, that only recently converted...) I'll start reading.... Dennis
-----Original Message----- From: centos-bounces@centos.org [mailto:centos-bounces@centos.org] On Behalf Of gjgowey@tmo.blackberry.net Sent: Wednesday, September 12, 2007 4:19 PM To: CentOS mailing list Subject: Re: [CentOS] Document Scanning and Storage
Depending on how complex a management system you want you could write a small custom management system using only a few php files and a db backend (I would suggest postgres).
Geoff
Sent from my BlackBerry wireless handheld.
-----Original Message----- From: "Dennis McLeod" dmcleod@foranyauto.com
Date: Wed, 12 Sep 2007 16:08:50 To:centos@centos.org Subject: [CentOS] Document Scanning and Storage
I'd like to start scanning our boxed up documents. I'd say about 30,000 files total. Mostly to eliminate the boxes of paper we have. I'd like to scan them, store them, Have some sort of index, and be able to retrieve them on multiple machines. I think PDF would be the desired format. I'd like be able to set some permissions as well. (not a deal breaker...) I've searched Sourceforge, and have seen knowledgetree, myDMS, contineo, etc, but really would like to hear from someone that is using something similar. Of course, I use Centos, so..... Thanks, Dennis
_______________________________________________ CentOS mailing list CentOS@centos.org http://lists.centos.org/mailman/listinfo/centos
On Wed, Sep 12, 2007, Dennis McLeod wrote:
I'd like to start scanning our boxed up documents. I'd say about 30,000 files total. Mostly to eliminate the boxes of paper we have. I'd like to scan them, store them, Have some sort of index, and be able to retrieve them on multiple machines. I think PDF would be the desired format. I'd like be able to set some permissions as well. (not a deal breaker...) I've searched Sourceforge, and have seen knowledgetree, myDMS, contineo, etc, but really would like to hear from someone that is using something similar.
This is not a trivial operation.
I was a principal in a company that developed a Linux based system to do this about 8 years ago, with a product good enough that it made national news when Bill Gates' home town of Medina Washington bought a system from us, not a Windows based system.
The scanning can be done pretty nicely using a scanner with and ADF (Automatic Document Feeder), and xsane has the ability to number pages skipping numbers so one can can both sides of two-sided documents in two passes. The biggest issue is probably doing the OCR conversion to get text for indexing. We used proprietary software from Vividata for this which worked pretty well. I haven't looked seriously at gocr or other open source OCR software for Linux so don't know how well it would work.
I've been using the ReadIris OCR software on Macs recently, which has some very nice features such as handling multi-page PDF files well.
If I were to tackle this today, I would probably do it using Plone since it handles things like indexing and organization well.
Bill -- INTERNET: bill@celestial.com Bill Campbell; Celestial Software LLC URL: http://www.celestial.com/ PO Box 820; 6641 E. Mercer Way FAX: (206) 232-9186 Mercer Island, WA 98040-0820; (206) 236-1676
Breathe fire, slay dragons, and take chances. Failure is temporary, regret is eternal.
I guess it boils down to what someone needs out of such a system. If you're looking to store text data from ocr conversion (for searching) then you could use a clob column as well as a blob for the image based version. As for storing these in the db, php makes that easy since all file uploads are automatically stored in temporary files and it's just a matter of reading the contents and storing it in the db. User rights can be regulated using a separate column for the entry. Basically 6 columns would be all that's needed: record id (incrementing integer), a blob, a clob, group access field (I'd suggest an integer), date stored, and user who created the record.
Of course a separate table to convert user logins to what groups they can access.
Now if you want to get more elaborate you could add a version column to the db so that "updates" to the record will create another row with the version number increased so the original isn't modified (ala subversion, cvs, etc). Another column that stores a Boolean value could be used to store if the record is "deleted" (hidden in reality) or not.
Logins can be stored in the db and an apache mod can be used to use that information.
It's really not elaborate in creating the storage part. It's always the ui that is the worst part of any project (both the admin and end user ui), but eclipse with struts and tiles or XML and xsl instead of tiles (personally, I prefer the xml/xsl approach, but to each their own) can make it less painful. I've had rather good experiences with eclipse and the redhat plugin (formerly exadel) for working with struts and tiles.
Geoff
Sent from my BlackBerry wireless handheld.
-----Original Message----- From: Bill Campbell centos@celestial.com
Date: Wed, 12 Sep 2007 16:26:00 To:centos@centos.org Subject: Re: [CentOS] Document Scanning and Storage
On Wed, Sep 12, 2007, Dennis McLeod wrote:
I'd like to start scanning our boxed up documents. I'd say about 30,000 files total. Mostly to eliminate the boxes of paper we have. I'd like to scan them, store them, Have some sort of index, and be able to retrieve them on multiple machines. I think PDF would be the desired format. I'd like be able to set some permissions as well. (not a deal breaker...) I've searched Sourceforge, and have seen knowledgetree, myDMS, contineo, etc, but really would like to hear from someone that is using something similar.
This is not a trivial operation.
I was a principal in a company that developed a Linux based system to do this about 8 years ago, with a product good enough that it made national news when Bill Gates' home town of Medina Washington bought a system from us, not a Windows based system.
The scanning can be done pretty nicely using a scanner with and ADF (Automatic Document Feeder), and xsane has the ability to number pages skipping numbers so one can can both sides of two-sided documents in two passes. The biggest issue is probably doing the OCR conversion to get text for indexing. We used proprietary software from Vividata for this which worked pretty well. I haven't looked seriously at gocr or other open source OCR software for Linux so don't know how well it would work.
I've been using the ReadIris OCR software on Macs recently, which has some very nice features such as handling multi-page PDF files well.
If I were to tackle this today, I would probably do it using Plone since it handles things like indexing and organization well.
Bill -- INTERNET: bill@celestial.com Bill Campbell; Celestial Software LLC URL: http://www.celestial.com/ PO Box 820; 6641 E. Mercer Way FAX: (206) 232-9186 Mercer Island, WA 98040-0820; (206) 236-1676
Breathe fire, slay dragons, and take chances. Failure is temporary, regret is eternal. _______________________________________________ CentOS mailing list CentOS@centos.org http://lists.centos.org/mailman/listinfo/centos
1) Lease a good copier/scanner that handles large scanning capacity and can scan to PDF.
2) Get a good batch OCR program that will take an image-based PDF and create a PDF text overlay (so the PDF can be full-text indexed).
3) Buy a good document management program, preferably one that can do drag-n-drop document sorting/indexing.
You can get all 3 from Ricoh, they have a very good copier/scanner with a proprietary but easy-to-use document management system (based on FreeBSD) that is completely web based and completely integrates with the copier/scanner. A good product for a SMB.
-Ross
From: centos-bounces@centos.org [mailto:centos-bounces@centos.org] On Behalf Of Dennis McLeod Sent: Wednesday, September 12, 2007 7:09 PM To: centos@centos.org Subject: [CentOS] Document Scanning and Storage
I'd like to start scanning our boxed up documents. I'd say about 30,000 files total. Mostly to eliminate the boxes of paper we have. I'd like to scan them, store them, Have some sort of index, and be able to retrieve them on multiple machines. I think PDF would be the desired format. I'd like be able to set some permissions as well. (not a deal breaker> ...> ) I've searched Sourceforge, and have seen knowledgetree, myDMS, contineo, etc, but really would like to hear from someone that is using something similar. Of course, I use Centos, so> ...> .. Thanks, Dennis
<< File: ATT1682181.txt >>
______________________________________________________________________ This e-mail, and any attachments thereto, is intended only for use by the addressee(s) named herein and may contain legally privileged and/or confidential information. If you are not the intended recipient of this e-mail, you are hereby notified that any dissemination, distribution or copying of this e-mail, and any attachments thereto, is strictly prohibited. If you have received this e-mail in error, please immediately notify the sender and permanently delete the original and any copy or printout thereof.