Welcome Guest Search | Active Topics | Sign In | Register

HtmlToPDF, baseurl and image paths.. Options
Hougaard
Posted: Thursday, October 13, 2011 6:32:17 AM
Rank: Newbie
Groups: Member

Joined: 10/13/2011
Posts: 4
Hi,

I'm evaluating your product to be able to replace ABCPDF.NET in one of our products.

I'm down to one small problem regarding image paths.

This application runs in IIS, triggered from a ASP.NET page

In my HTML (as a string), I have code like this:
Code: HTML/ASPX
<td width="20" valign="top"><img src="/images/pdf/box.gif"></td>


This is my convertion code (HTML is a string variable holding the entire HTML page.
Code: C#
MemoryStream PDFStream = new MemoryStream();
HtmlToPdf.Options.PageSize = PdfPageSizes.A4;
string BaseDir = HttpRuntime.AppDomainAppPath;
HtmlToPdf.Options.BaseUrl = BaseDir;
HtmlToPdf.Options.AllowLocalAccess = true;
HtmlToPdf.ConvertHtml(HTML, PDFStream);


This produces a blue question mark where the "box.gif" was supposed to go. If I change the html to: (Removing the first slash from the path)
Code: HTML/ASPX
<td width="20" valign="top"><img src="images/pdf/box.gif"></td>


Everything works !

But, my impression is, that BaseUrl sets a new root, and the path for the image is then relative to the new root ?

We need the genereated HTML to look the same regardless of doing the PDF convertion or not.

I also tried to remove the trailling backslash from the AppDomainAppPath, but that did not change anything ?

Am I doing any wrong, or is this simply beyond the scope of your library ?

/Erik
eo_support
Posted: Thursday, October 13, 2011 10:15:42 AM
Rank: Administration
Groups: Administration

Joined: 5/27/2007
Posts: 24,240
Hi,

That is the correct behavior. There are two rules apply here:

Rule A: The leading slash "/" indicates the path starts from the root

For example, for the following page:

http://www.yourdomain.com/dir1/dir2/xyz.html

Path "/abc.gif" would be resolved as:

http://www.yourdomain.com/abc.gif

It starts immediately after the root because that's exactly what "/" means.

On the other hand, path "abc.gif" would be resolved as:

http://www.yourdomain.com/dir1/dir2/abc.html

If a base element (the equivalent of BaseUrl for ConvertUrl) exists in xyz.html's head section with the following value:

Code: HTML/ASPX
<base href="http://www.yourdomain.com/dir1/" />


Then "/abc.gif" would still be resolved as:

http://www.yourdomain.com/abc.gif

While "abc.gif" would be resolved as:

http://www.yourdomain.com/dir1/abc.html

If base element has the following value:

Code: HTML/ASPX
<base href="http://www.anotherdomain.com/dir1/" />


Then "/abc.gif" will be resolved as "http://www.anotherdomain.com/abc.gif". Note that the domain name is changed here, but the path is still relative to the root.

Rule B: When BaseUrl is a local path, host name is implied as "." (localhost)

In Url term, path

"c:\dir1\" = "file://./c:/dir1/"

Combine this with rule A you will see "/abc.gif" will be resolved as "file://./abc.gif". This effectively points to the root directory of the drive of your current directory.

The best way to solve this problem is to avoid using absolute path in your HTML. We can not change the meaning of the leading "/" because the way we interpret it is correct. If you use ASP.NET ResolveUrl to generate the absolute Url, change it to ResolveClientUrl. ResolveClientUrl will create a relative path instead of an absolute path. Alternatively, you can also parse your HTML code to find out all absolute Url and remove the leading "/" before passing it to ConvertHtml.

Hope this helps. Please feel free to let us know if you still have any questions.

Thanks!



Hougaard
Posted: Thursday, October 13, 2011 10:33:53 AM
Rank: Newbie
Groups: Member

Joined: 10/13/2011
Posts: 4
This is what happens in my case... (I do not want HTTP calls to the local IIS, since the same process can be executed in a non-iis environment also)

BaseUrl is set to:

c:\inetpub\site\wwwroot\

and the actual image is located in:

"c:\inetpub\site\wwwroot\Images\pdf\box.gif"

So if "/" == "c:\inetpub\site\wwwroot\", why is /Images not equal "c:\inetpub\site\wwwroot\Images\" ?
eo_support
Posted: Thursday, October 13, 2011 10:36:12 AM
Rank: Administration
Groups: Administration

Joined: 5/27/2007
Posts: 24,240
Hi,

Why do you still think "/" == "c:\inetpub\site\wwwroot\"? My previous reply has already clearly explained to you this is not the case.

Thanks
Hougaard
Posted: Thursday, October 13, 2011 10:52:31 AM
Rank: Newbie
Groups: Member

Joined: 10/13/2011
Posts: 4
eo_support wrote:
Why do you still think "/" == "c:\inetpub\site\wwwroot\"? My previous reply has already clearly explained to you this is not the case.


Properly because I expect "baseurl" to have the same meaning as setting a root directory for a web server, so all path. The documentation is not very clear on this..

And your documentation does not even mention access to local files, so please bear with me while I try to figure out if this will work for us :)
eo_support
Posted: Thursday, October 13, 2011 11:03:43 AM
Rank: Administration
Groups: Administration

Joined: 5/27/2007
Posts: 24,240
Hi,

What we have explained to you in our original reply are NOT our rules. They are the rules defined by various web standards. These are rules like an "img" tag should render an image. Everybody follows them. So it is not something that will be included in our documentation. We only point out what's causing the problem to you when you run into such problems. It’s not up to us to change those rules.

Thanks
Hougaard
Posted: Thursday, October 13, 2011 11:44:20 AM
Rank: Newbie
Groups: Member

Joined: 10/13/2011
Posts: 4
Please don't talk down to me, I'm just trying to figure this out.

But anyway, the HTML standard RFC 1630 does not really define relative URI's, and since a webpage "lives" within a virtual filesystem (where /image is mapped to a folder in/var/www/ or c:\inetpub\wwwroot\ ) my expectation was to be able to simulate that mapping when processing a HTML in memory.
eo_support
Posted: Thursday, October 13, 2011 12:06:52 PM
Rank: Administration
Groups: Administration

Joined: 5/27/2007
Posts: 24,240
Hi,

No. We are not talking down to you. We are just saying standards are standards and we have no choice but to follow them. We simply cannot change our product because you expect it to work in a certain way. There are other customers out there would expect it to work differently so there is no way for us to make it work in 100 different ways for 100 different customers. If we change the product to work your way, then other people will ask us to change it back. In this case, your expectation is at odd with the standards, so our only option is to ask you to change your expectation. We do not have any other options at all. Hope you understand.

If you are still confused about the rule, try to understand it this way: “/” means root of your HOST. In another word, the path portion of the BaseUrl doesn't matter at all once you use "/". That means whatever you are trying to do with BaseUrl while using "/" in your image path would be completely ignored --- it's a dead end on that direction. Removing the leading "/" is the correct solution for your problem.

Hope this clears up.

Thanks
eo_support
Posted: Thursday, October 13, 2011 12:36:54 PM
Rank: Administration
Groups: Administration

Joined: 5/27/2007
Posts: 24,240
Hi,

If you are still doubtful, try to create a html file with the following content:

Code: HTML/ASPX
<html>
<head>
    <base href="file://C:/Inetpub/wwwroot/" />
</head>
<body>
    <img src="/pagerror.gif" />
</body>
</html>


Here it assumes file "pagerror.gif" exists in your "C:/Inetpub/wwwroot/" (it should be there by default). Open the file with IE and you will see IE will not display the image correctly. Right click the red X, select "Properties" and you will see IE resolves the full path as "file://C:/pagerror.gif". Remove the leading "/", you will see the base element kicks in and the image will display correctly. You will get the same result in all major browsers.

Thanks



You cannot post new topics in this forum.
You cannot reply to topics in this forum.
You cannot delete your posts in this forum.
You cannot edit your posts in this forum.
You cannot create polls in this forum.
You cannot vote in polls in this forum.