Welcome Guest Search | Active Topics | Sign In | Register

Reading text from PDF Options
Jack
Posted: Monday, November 4, 2013 3:18:55 PM
Rank: Newbie
Groups: Member

Joined: 2/7/2012
Posts: 2
Hi, we got a commercial license and were using eo for html to pdf conversion. Now we're trying to figure out how to read the text from a PDF. We got the PdfDocument and pages but all we see is PdfRawContent, but there's no way to extract the text out. This PDF is not an image and itextsharp and other library could be used to extract the text no problem. We don't want to use two different libraries for this. Can you give us some advice on how this can be achieved?

Thank you
eo_support
Posted: Monday, November 4, 2013 3:26:29 PM
Rank: Administration
Groups: Administration

Joined: 5/27/2007
Posts: 24,196
Hi,

Unfortunately this is not supported in the current version. For some PDF it is possible to extract text while for some PDF file it is no possible. This is because PDF file is much more about "drawing the output" rather than information exchange. PDF file format went to great length to ensure the output quality, but it is possible that the file only contains information about how to "draw" each letter while lacks information about what character it is drawing. Further more, for the files that do contain character code information, it may not have enough information to piece different text blocks together. For example, if you have multiple words "This" "is" "a" "PDF" "file" artistically arranged on a page with different fonts at different location, a human being can instantly piece them together as a sentence, but a machine would not know which words goes first and which words go next. For these reasons, we do not have such a feature. So you may still want to rely on iTextSharp for this purpose.

Thanks!


You cannot post new topics in this forum.
You cannot reply to topics in this forum.
You cannot delete your posts in this forum.
You cannot edit your posts in this forum.
You cannot create polls in this forum.
You cannot vote in polls in this forum.