Welcome Guest Search | Active Topics | Sign In | Register

Parsing HTML document Options
Alex
Posted: Wednesday, May 1, 2013 11:23:39 AM
Rank: Newbie
Groups: Member

Joined: 5/1/2013
Posts: 1
Hello,
i have following problem:
I have multiple HTML- files on my local hdd.
Each html file have different content like tables, lists, text... in different styles.

I want to create a pdf which includes the content of all HTML-files in a consistent layout.
For example all tables in the PDF must have the same Borders, same table size and so on.



I use the EO.Pdf for .Net library because i am restricted to use FrameWork 4 Client Profile.

I used the Method to parse the HTML- file in PdfPage:
HtmlToPdf.ConvertHtml(File.ReadAllText(filepath), Pdfpage);

after the parsing i see there is some content like PdfTextLayout and PdfTextContent in Pdfpage.
But if i want to get the Text of PdfTextContent it always references a null.

Is there a way to parse the HTML-File and get a collection of all Elements in it?
Is it possible to change these Elements (e.g. Set Border for all tables in all HTML files ...) and insert them directly to the Pdf
eo_support
Posted: Wednesday, May 1, 2013 12:20:20 PM
Rank: Administration
Groups: Administration

Joined: 5/27/2007
Posts: 24,195
Hi,

You can use this method to check the elements in the result PDF file:

http://www.essentialobjects.com/doc/4/eo.pdf.htmltopdfresult.htmldocument.aspx

However you can only get certain information about a node (such as page location). And you can NOT modify anything. If you want to modify anything, you have to modify it before you pass it to converter.

Thanks!


You cannot post new topics in this forum.
You cannot reply to topics in this forum.
You cannot delete your posts in this forum.
You cannot edit your posts in this forum.
You cannot create polls in this forum.
You cannot vote in polls in this forum.