Be sure to check out our related post, "Convert PDF-image Files to Text." Click the "convert" label on this post.
Undoubtedly, the Pay to Play programs often are the quickest, easiest and most efficient way to convert pdf files to text files such as MS Word or LibreOffice Writer. But we're not playing in a Windows or Mac environment, and we don't like to pay for stuff. So here are a few other ways to convert pdf files without paying for a program and without resorting to using Wine, dual-booting or any of that other Windows-centric stuff we detest.
Converting an OCR or "Readable" pdf File to Text
That means that the pdf file is not an image file. How can you tell if a pdf file is a pdf-text or a pdf-image file? Simple. If you can use your cursor to highlight words or sentences in the pdf file--especially if you can invoke the Copy-Paste function--it is most likely a pdf-text file. If not, it is probably a pdf-image file.
PDF-Text Files
Once I have determined that I have a pdf-text file, here's how I go about converting it to a text file.
Using Okular. (Preferred and Recommended Method)
Okular is a program for opening and reading pdf files, but it does much more than that. In the case of this post, it is capable of exporting pdf content to plain-text files, graphics excepted. One of my hobbies and avocations is converting public-domain books from pdf and other formats and editing them in LibreOffice Writer, then re-publishing them for fun and profit. Whenever I come across a pdf publication that is not available in Text format, Okular is invaluable in Exporting to Text (provided it is a readable, or OCR'd pdf file). Once exported, I open the file in LibreOffice Writer, edit it and save it in .odt format. As a final step, normally, while still in Writer, I export it as both a new pdf file and an epub file.
Hint: As Okular does not export images, that can involve copying and pasting images from Okular to your Writer file. That's usually not a problem, as most of your pdf files are not going to be more than 10-20 pages, but even if several hundred pages, unless you have hundreds of images, it won't take all that long to copy and paste them from Okular into your new text file.
Warning: Okular WILL NOT correctly convert any text that starts out in multi-columns.
Using Calibre. (Next-best solution) (Trivia: I'ts pronounced "Cali-ber")
Calibre is an all-purpose program for reading and converting just about any kind of publication. Again, though, Calibre, like Okular, is best used for pdf-text (readable, or OCR'd) documents, as it will not convert pdf-image files to text. You simply drag and drop or open the pdf file in Calibre, then use the Convert function. When I use Calibre to convert pdf files, I convert them to .docx files (Calibre does not convert to .odt), which I then open and save as a .odt file (I post elsewhere on why I consider LibreOffice superior to MS Word). Calibre can be configured to import images, such as book covers, when converting to a .docx file.
Hint: When Calibre converts a file, it places the new file in a folder in a directory which Calibre creates.
Admittedly, Workspace is a paid solution, and so is Drive, if you pay for extra space like I do. But they do offer a way to convert pdf-text AND pdf-image files to text. Once you upload your pdf file (text or image) to Drive, you can "Open with Google Docs." Recent upgrades to the pdf conversion feature provide quite acceptable results, including placing images in the right place, a better job of converting columns, and a decent job of keeping frames in their place. Once you convert the pdf into a Google Doc, you can download it as a .docx or .odt file if you need more heavy-duty text editing than is available in Google Docs.
Hint: Workspace has size limits for pdf conversions, and it can a long time to convert a pdf-image file. Some pdf-image file conversions, if they involve large pdf files (say, 50 or 100 page or more) might result in only the first five or ten pages being converted. We don't blame Google for this, as Workspace is not intended to be a document-conversion tool as its primary function.
Comments
Post a Comment