How To: read PDF files from the command line

in #linux7 years ago

Originally posted on my Blog:
manerosss.wordpress.com

Hi!

On a previous post I showed a fantastic Android application called Termux

The app is basically a mini-Linux command-line distro, full of software and things to do.
I use it mostly to write, I connect my usb keyboard and I magically have all the almighty terminal power on my SmartPhone.

With the same text editors you can also read text, actually for that specific matter there are plenty of pagers too.

But these software can’t read all file extensions out of the box, one of these extension are PDF files. So how to read PDF files from the command line?

There are 2 ways to achieve this task, both have the original pdf file converted in another format and both these tools are part of the poppler package:

  • pdftotext converts a PDF file to a simple text file
  • pdftohtml convert PDF to html

pdftotext can be used with many pagers, editors and browsers as does pdftohtml, however the latter works best with browsers.

As both software can get a pdf file from a URL I’ll use a PDF from the internet (George Orwell 1984, under public domain in Australia) so that you can copy and paste all these command to get the same result as I do.

The software used:

  • Converter: pdftotext, pdftohtml (both from the poppler package as said)
  • Pagers: more, less, most, vimpager
  • Browsers: lynx, w3m, elinks
  • Editors: vim, micro, ne, vile, neovim

All these software works out of the box, anything else not listed here may behave in a strange way.


pdftotext

use it this way:

pdftotext file.pdf -

This way the output is redirected to stdout and printed on your terminal. Try it out yourself:

pdftotext http://www.planetebook.com/ebooks/1984.pdf -

Press Ctrl+C to interrupt the command.

As you can see it works! At this point it can be piped in another application.


Pagers:

  • more

pdftotext http://www.planetebook.com/ebooks/1984.pdf - |more

  • less

pdftotext http://www.planetebook.com/ebooks/1984.pdf - |less

You can also color it up by using tput (also works with more)

tput setaf 3; tput setab 4; pdftotext http://www.planetebook.com/ebooks/1984.pdf - |less


  • most

pdftotext http://www.planetebook.com/ebooks/1984.pdf - |most

  • vimpager

pdftotext http://www.planetebook.com/ebooks/1984.pdf - |vimpager


Browsers:

  • Lynx

pdftotext http://www.planetebook.com/ebooks/1984.pdf – |lynx -stdin


  • w3m

pdftotext http://www.planetebook.com/ebooks/1984.pdf – |w3m


  • Elinks

pdftotext http://www.planetebook.com/ebooks/1984.pdf – |elinks



Press \ to toggle source or rendered document, it may render without new lines (press v on w3m).


Editors:

  • micro

pdftotext http://www.planetebook.com/ebooks/1984.pdf - |micro

  • ne

pdftotext http://www.planetebook.com/ebooks/1984.pdf - |ne

  • vile

pdftotext http://www.planetebook.com/ebooks/1984.pdf - |vile

  • vim

pdftotext http://www.planetebook.com/ebooks/1984.pdf - |vim -

  • neovim

pdftotext http://www.planetebook.com/ebooks/1984.pdf - |nvim -


pdftohtml

Use it this way:

pdftohtml -stdout -i file.pdf

Always remember the -i option, if you don’t all the pdf images will be saved on your storage to later be accessible from the html (this means tens or even hundreds of images saved on the current directory).

As said this works and looks great with Browsers because they render html (remember to press either \ or v to switch to rendered mode), therefore I’m listing only the browsers here. Anyway the process is the same as above.


Browsers:

  • lynx

pdftohtml -stdout -i http://www.planetebook.com/ebooks/1984.pdf |lynx -stdin


  • w3m

pdftohtml -stdout -i http://www.planetebook.com/ebooks/1984.pdf |w3m


  • elinks

pdftohtml -stdout -i http://www.planetebook.com/ebooks/1984.pdf |elinks



That’s basically how to use these 2 poppler components.
On a future post I’m going to show you how to do the same with epub files.


TA SALÜDE

Sort:  

If you have any problems with editing PDF documents, then I understand you perfectly. It was difficult for me to find a truly convenient tool that does not require downloading. Now I prefer to use this pdf guru tool. Well, this is the simplest online tool that I use quite often in the process of work.

Coin Marketplace

STEEM 0.17
TRX 0.15
JST 0.029
BTC 56790.06
ETH 2345.62
USDT 1.00
SBD 2.42