[CentOS] pdftotext latest version for CentOS 7

Mon Dec 16 04:10:39 UTC 2019
Orion Poplawski <orion at nwra.com>

On 12/14/19 7:28 PM, H wrote:
> I have pdftotext 0.26.5, the current version for CentOS 7 and the Mate desktop as far as I can ascertain. The page https://www.xpdfreader.com/pdftotext-man.html seems to suggest that the latest version is 4.02 which seems a gigantic leap ahead.
> 
> Since I have a Chinese text PDF which I am unable to extract any text from using pdftotext, instead I end up with a collection of garbage Latin characters, I am curious how to get a later version? Copying and pasting from Atril 1.16.1 (seems to be part of the Mate desktop I am running) also makes me end up with garbage... Not surprising since it also seems to use pdftotext 0.26.5...
> 
> Any suggestions? Later version of pdftotext? If so, wherefrom? Another PDF-viewer?

pdftotext is distributed as part of the poppler package, which as you 
suggest is at 0.26.5.  However, the latest version of poppler is 0.83.0. 
  And the man page for pdftotext on EL7 suggests it is at version 3.03, 
which is not quite so dramatic a difference.

In any case, welcome to the joys of running an enterprise distribution. 
You'll find newer versions in EL8 or Fedora.  It's an integral core 
component of the system so generally not updated lightly.

-- 
Orion Poplawski
Manager of NWRA Technical Systems          720-772-5637
NWRA, Boulder/CoRA Office             FAX: 303-415-9702
3380 Mitchell Lane                       orion at nwra.com
Boulder, CO 80301                 https://www.nwra.com/