Welcome Guest Search | Active Topics | Sign In | Register

WebBrowser.GetHtml() encodes URLs Options
Peavy
Posted: Wednesday, October 19, 2016 5:39:16 PM
Rank: Advanced Member
Groups: Member

Joined: 6/26/2015
Posts: 98
Here's how to duplicate this error:

1. First, using Google Chrome, go to LinkedIn.com and log in with your profile (hopefully that's possible for you)
2. Search for and view the profile "Jeff Weiner", CEO of LinkedIn
3. Click the "Contact Info" link in the bottom-right corner of the profile header section, which will reveal the "Company Website" link
4. Inspect the "Company Website" link, and you'll see the HREF is as follows:

Quote:

<a href="/redir/redirect?url=http%3A%2F%2Fwww%2Elinkedin%2Ecom%2F&urlhash=XkSC">


5. Now using EO.WebBrowser, go to the same page
6. Run this code to get the DOM/HTML of the underlying page:

Code: C#
System.IO.File.WriteAllTextSystem.IO.Path.Combine(Environment.SpecialFolder.Desktop, "dom.txt"), myBrowserControl.WebBrowser.GetHtml());


If you look at the same HREF for the "Company Website" link, it will look like this:

Quote:

<a href="/redir/redirect?url=http%3A%2F%2Fwww%2Elinkedin%2Ecom%2F&amp;urlhash=XkSC">


Notice that the "&" in the original query string is now URL-encoded as "&amp;".
eo_support
Posted: Thursday, October 20, 2016 1:40:51 PM
Rank: Administration
Groups: Administration

Joined: 5/27/2007
Posts: 24,229
Hi,

This is normal. The Url in href attribute supposes to be encoded. If you go to Chrome's developer console and type document.body.outerHTML in the console window, you will see the returned value has href Urls encoded as well.

Thanks!
Peavy
Posted: Thursday, October 20, 2016 1:59:18 PM
Rank: Advanced Member
Groups: Member

Joined: 6/26/2015
Posts: 98
I see. Seems like that's a bug in Chrome then. I'd argue it should _not_ add the extra URL encoding when .outerHTML is called, as document.getElementById('someElement').href does _not_ add the URL encoding. Maybe I'm missing something there but it seems inconsistent to have them return different results. Anyway thanks for your help.
eo_support
Posted: Thursday, October 20, 2016 3:01:11 PM
Rank: Administration
Groups: Administration

Joined: 5/27/2007
Posts: 24,229
This is NOT a bug. This is how it should be. You can test the same in all major browsers and you will see they all encode the Urls in the hrefs.

Thanks!


You cannot post new topics in this forum.
You cannot reply to topics in this forum.
You cannot delete your posts in this forum.
You cannot edit your posts in this forum.
You cannot create polls in this forum.
You cannot vote in polls in this forum.