Welcome Guest Search | Active Topics | Sign In | Register

How to read text and images from PDF file with EO.Pdf Options
Peter249
Posted: Thursday, March 27, 2014 10:19:52 AM
Rank: Advanced Member
Groups: Member

Joined: 5/15/2013
Posts: 34
I am evaluating the EO.Pdf for use in my projects both web and desktop.

Much of the functionality need for creating/writing the PDF file is what is needed and am quite sure will meet our needs.

I would like to use if for Reading as well. We have a PDF library written in C++ (native) and are in the process of porting things to DotNet. We use our library mainly for reading PDF files. Since you have all the creating/writing features already developed, we want to know if we can use the EO.Pdf product to meet our reading requirements as well.

I have written a small C# console application to play around with your library to read an existing PDF file that we process as an example.

I open the Pdfdocument and read in my file ok;

I get one of the pages: PdfPage page = doc.Pages[95];

and then the content: PdfContent content = page.Contents[0];

This is where I get lost and examples/documentation fails me.

The page content that I am reading contains a bit of text and an image.

How do I read the text objects and image object. I am quite knowledgeable with the internals of a PDF file and if the image comes as a PDF postscript text (marked-content sequence) which is what is in the raw PDF page that's fine as I can convert that into a bitmap.

Another question is that I notice generated PDF files are of version 1.4. Many files that we read are version 1.6. What is the highest PDF version you support ??

eo_support
Posted: Thursday, March 27, 2014 10:29:47 AM
Rank: Administration
Groups: Administration

Joined: 5/27/2007
Posts: 24,229
Hi,

Unfortunately no. Our library does not support that yet. When we read a file, we read the whole page contents as "raw" contents and keep it intact, we parse it to detect problems, but we do not parse it for text or images. The main purpose of the read feature is so that you will be able to add contents to an existing file using other features in the EO.Pdf package. For example, you can use EO.Pdf to fill in forms, to add header/footer, or to add a cover page, etc.

As to the PDF version, we support the latest PDF specification (ISO 32000), which is 1.7. When we save a PDF file, we always try to use the lowest possible version number. If you do not use any special features, then we will save it as 1.4. However if you use certain features that is only supported after a specific version, then we will use that version. For example, PDF portfolio is in 1.7, so if you use PDF portfolio, then we will save it as 1.7.

Thanks!
Peter249
Posted: Thursday, March 27, 2014 10:36:31 AM
Rank: Advanced Member
Groups: Member

Joined: 5/15/2013
Posts: 34
OK thanks for the quick reply.

Is it possible to get that raw page data ? If we can get the raw page data (which is text) we can (and do) parse that already. We would need to get the resource objects as well, can that be gotten??


eo_support
Posted: Thursday, March 27, 2014 11:21:46 AM
Rank: Administration
Groups: Administration

Joined: 5/27/2007
Posts: 24,229
Hi,

It will probably be easier to do that with your C++ library since you already know quite a lot of about internal PDF structures. You are right you have to get the resource objects as well. You will also need to get the font (for encodings) and image masks (for images with alpha channel). One of the design goal of our API is to hide all those details from users, so we do not have any public interface for user to access/modify those directly. As such a lower level API like yours might be more suitable for this particular purpose.

Thanks!
Peter249
Posted: Thursday, March 27, 2014 1:00:10 PM
Rank: Advanced Member
Groups: Member

Joined: 5/15/2013
Posts: 34
Thanks for the reply. We will most likely get the EO.Pdf for the creation side of things.....Angel
eo_support
Posted: Thursday, March 27, 2014 1:48:25 PM
Rank: Administration
Groups: Administration

Joined: 5/27/2007
Posts: 24,229
Glad to hear that! Please feel free to let us if you have any more questions.

Thanks!


You cannot post new topics in this forum.
You cannot reply to topics in this forum.
You cannot delete your posts in this forum.
You cannot edit your posts in this forum.
You cannot create polls in this forum.
You cannot vote in polls in this forum.