A-PDF Text Extractor copies texts from copy-protected PDF files

Ever come across a PDF file that disallow you from coping chunks of texts from it?

pdf-copy-protection

There is a nice way to bypass the copy-protection. A-PDF Text Extractor is a software that allows you to do that by extracting texts from all are selected pages of any PDF files, whether copy protected or otherwise. You don’t even need to have a PDF reader installed, because the copied text is available as a TXT file. Amusingly, the publisher/developers themselves are not aware of the copy-protection breaking capability their software posses.

Here is a part of the description from the publisher’s website. (Highlighted for emphasis)

To extract text from a PDF file, the PDF file must meet the following conditions:
- The file is formatted to contain text and not just images.
- The file contains no security restrictions which disable text selecting.

To extract text from a PDF file simply load it into the program and click on “Options”. Here you can specify the range of pages from which you want text to copy. You can either copy all pages, even numbered or odd numbered pages or pages within two page numbers. You can also put custom text or variables into the header and footer section of the extracted pages. Once you are satisfied with the settings click on the “Extract” button.

pdf-text-extractor

There are also different ways to output the extracted text:

In PDF Order - Follow the inner order of PDF files
Smart Rearrange - Rearrange text based on the position.
With Position - Output text at specified positions

However, the outputted text loses all text formatting of the original file since the output is in plain text. A-PDF Text Extractor is still a very useful tool to copy restricted PDF files or doing PDF to Text conversion.

You can also checkout PDF Unlocker, another PDF restriction removal software.

Comments

AnonymousAugust 13, 2009 at 8:17 PM
From Wikipedia: "PDF files may also contain embedded DRM restrictions that provide further controls that limit copying, editing or printing. The restrictions on copying, editing, or printing depend on the reader software to obey them, so the security they provide is limited. Printable documents especially might be saved instead as bitmaps and subject to OCR."
ReplyDelete
Replies
Kaushik PatowaryAugust 13, 2009 at 10:09 PM
I didn't know PDF protection was so flawed. Thanks for the info, hmmm... Anonymous (Why can't you people use a name?)
ReplyDelete
Replies
Melvin DengAugust 14, 2009 at 9:14 AM
I'm gonna share nother solution.
When I meet a protected PDF, I'll convert it to Word format using AnyBizSoft, it supports protected PDF conversion and the output quality is pretty good.
http://www.anypdftools.com/pdf-to-word.html#163
ReplyDelete
Replies
DaveJuly 27, 2012 at 1:19 PM
GT Text is also pretty amazing at that.
I love the speed
http://gttext.googlecode.com
ReplyDelete
Replies