Welcome Guest Search | Active Topics | Sign In | Register

ConvertHtml cannot be substituted for ConvertUrl Options
BrentS
Posted: Friday, August 5, 2011 2:06:24 PM
Rank: Newbie
Groups: Member

Joined: 8/5/2011
Posts: 1
I tried an experiment where I saved the page source of www.cnn.com to a file and passed it as a string to the EO.Pdf.HtmlToPdf.ConvertHtml method. Since ConvertUrl can produce a pdf from cnn.com and a web browser can render the saved copy of the site I used I expected ConvertHtml to also work, but it always produces a blank pdf. What settings would I need to change to get it to work?
eo_support
Posted: Friday, August 5, 2011 2:29:57 PM
Rank: Administration
Groups: Administration

Joined: 5/27/2007
Posts: 24,237
Hi,

No. You cannot do that. A HTML page is usually much more than the raw HTML it contains.

Most HTML pages include “external references”, which are external files that are rendered/used inside the host page but are physically outside of the page. For example, the following HTML code will render our logo if being placed inside http://www.essentialobjects.com/Default.aspx:

Code: HTML/ASPX
<img src=”/images/logo.gif” />


This works because the absolute image Url would be http://www.essentialobjects.com/ + “images/logo.gif” = http://www.essentialobjects.com/images/logo.gif, which is the correct Url for our logo file.
However if you save the same contents in c:\eo_homepage.html, then it will try to load image from “c:/” + “images/logo.gif” = “c:/images/logo.gif”. It does not work because the BaseUrl is now “c:/” instead of http://www.essentialobjects.com.

Not just images file are this way. JavaScript files, CSS files, relative links all work this way. So most of the time you can’t simply save a web page’s HTML to somewhere else and expect it to work. Both the HTML and the location of the HTML are significant for a webpage.

You can manually set HtmlToPdf.Options.BaseUrl to solve some of these problems. For example, if you set BaseUrl to http://www.essentialobjects.com, then the converter will correctly figure out the full image path is http://www.essentialobjects.com/images/logo.gif. Thus the image will be rendered correctly. Note this means the converter still needs to go to the original server to fetch the file --- not for the main HTML file, but for everything else the main HTML file references.

There are situations even setting HtmlToPdf.Options.BaseUrl won't fix. For example, if the original page contains JavaScript that calls other pages back from its original server, then such code will almost certainly fail if the page is no longer on its original server. A typical example for such scenario is financial site with stock tickers. Stock values are not even in the HTML you saved. It’s dynamically pulled from the server. Once you move it away from the original server, it can no longer find its original server to get the stock information.

Hope this helps. Please let us know if you this makes sense to you.

Thanks


You cannot post new topics in this forum.
You cannot reply to topics in this forum.
You cannot delete your posts in this forum.
You cannot edit your posts in this forum.
You cannot create polls in this forum.
You cannot vote in polls in this forum.