Welcome Guest Search | Active Topics | Sign In | Register

HTML to PDF using ConvertHtml Options
Derek Wickern
Posted: Friday, August 17, 2012 1:09:19 PM
Rank: Newbie
Groups: Member

Joined: 12/29/2011
Posts: 2
Hi,
I am evaluating the product and having some trouble converting HTML with relative resources.

I have the following files on a web server:

http://localhost:8080/test/index.html
http://localhost:8080/test/logo.gif

index.html:
Code: HTML/ASPX
<img src="logo.gif" />


The following all works:
Code: C#
HtmlToPdf.ConvertUrl("http://localhost:8080/test/");
HtmlToPdf.ConvertUrl("http://localhost:8080/test/index.html");

HtmlToPdf.Options.BaseUrl = "http://localhost:8080/test/";
HtmlToPdf.ConvertHtml("&lt;img src=\"logo.gif\"/&gt;", "out.pdf");


But this doesn't:
Code: C#
HtmlToPdf.Options.BaseUrl = "http://localhost:8080/test/index.html";
HtmlToPdf.ConvertHtml("&lt;img src=\"logo.gif\"/&gt;", "out.pdf");


Unfortunately I need the third case to work. It would be easy enough to change the base URL in the static case but I need to handle the general case. It looks like there is some smart URL processing in ConvertUrl which is not present in ConvertHtml.

One thing I noticed is a trailling slash after the URL when using LoadHtml. Maybe related?
Code: C#
var options = new HtmlToPdfOptions
{
    BaseUrl = "http://localhost:8080/test/index.html"
};
var session = HtmlToPdfSession.Create(options);
session.LoadUrl("http://localhost:8080/test/index.html");
var url = session.GetCurrentUrl(); // http://localhost:8080/test/index.html

session.LoadHtml("&lt;img src=\"logo.gif\"/&gt;");
var url2 = session.GetCurrentUrl(); // http://localhost:8080/test/index.html/ (note trailing slash)
eo_support
Posted: Friday, August 17, 2012 1:23:31 PM
Rank: Administration
Groups: Administration

Joined: 5/27/2007
Posts: 24,195
Hi,

The third case SHOULD NOT work. All portion of the Url you passed to BaseUrl are significant. We can not just drop a part of it. Considering the following Urls:

http://localhost/test/page1
http://localhost/test/page1.html
http://localhost/test/page1.html/logo.gif

There is no syntactically difference between the first and the second --- they are both valid and the Url does not tell you whether "page1" or "page1.html" is a "folder" or a final file --- as demonstrated in the third Url, which is a perfectly legal Url but in this Url “page1.html” is a “folder”. As such we cannot assume "page1" is a folder and "page1.html" is the final file. So you must always pass the correct BaseUrl.

Thanks!
Derek Wickern
Posted: Friday, August 17, 2012 1:30:05 PM
Rank: Newbie
Groups: Member

Joined: 12/29/2011
Posts: 2
I understand that the URL does not inherently differentiate between the "folder" and "file". In that case, should I query the server for that information in order to get the correct BaseUrl?
eo_support
Posted: Friday, August 17, 2012 1:40:40 PM
Rank: Administration
Groups: Administration

Joined: 5/27/2007
Posts: 24,195
I don't think you can "query" the server to get the correct BaseUrl. Just like when you visit a page in a browser you must know the full Url, you MUST know the BaseUrl. In short, BaseUrl is something you should know and get it right in the first place. There is no additional logic or anything fancy or smart about this. You have the right value it works, you don't have the right value it doesn't work. That's the end of it. The only thing the Web server is obligated to do is once you provide a valid Url it serves you a valid page. Usually it won't do anything else other than that because it would pose a security risk for the server.

Thanks


You cannot post new topics in this forum.
You cannot reply to topics in this forum.
You cannot delete your posts in this forum.
You cannot edit your posts in this forum.
You cannot create polls in this forum.
You cannot vote in polls in this forum.