|
Rank: Advanced Member Groups: Member
Joined: 6/26/2015 Posts: 98
|
When calling WebView.GetDOMWindow() to get the HTML source of the page, the current version 15.1.94.2 breaks this method and no longer returns a result. Please advise.
EDIT: I originally posted about WebView.GetText() as it also hung, but I discovered I can use WebView.GetHtml() instead, which is what I would have expected to use originally. I suspect GetText() requires the DOM under the hood so these issue may be related.
|
|
Rank: Administration Groups: Administration
Joined: 5/27/2007 Posts: 24,218
|
Hi,
The problem about GetDOMWindow is a known issue. We have already fixed this problem internally and will post an update build this week with the fix. The root of the problem is a call involves JavaScript can't return large value (for example, a long string). GetText/GetHtml does not involve JavaScript so they are still working properly.
Thanks!
|
|
Rank: Advanced Member Groups: Member
Joined: 6/26/2015 Posts: 98
|
Is it possible to get notified when new builds are available for download?
|
|
Rank: Advanced Member Groups: Member
Joined: 6/26/2015 Posts: 98
|
Any update on when this fix is going to be available for download?
|
|
Rank: Administration Groups: Administration
Joined: 5/27/2007 Posts: 24,218
|
Hi,
Sorry about the delay. The build is delayed due to another issue so we will try to post it early this coming week.
Thanks
|
|
Rank: Administration Groups: Administration
Joined: 5/27/2007 Posts: 24,218
|
Hi,
This is just to let you know that we have posted the new build. Please download it from our download page and let us know how it goes.
Thanks!
|
|
Rank: Advanced Member Groups: Member
Joined: 6/26/2015 Posts: 98
|
I'm reviewing this issue because it appears the following two statements are nearly the same:
Code:
var source = webControl1.WebView.GetHtml(); var dom = webControl1.WebView.GetDOMWindow().document.getElementsByTagName("html")[0].outerHTML;
Is it possible to get the HTML source that was originally returned from the web server by the HTTP request? Thanks.
|
|
Rank: Administration Groups: Administration
Joined: 5/27/2007 Posts: 24,218
|
Hi,
It is not possible to get the HTML source code that was originally returned from the web server. That data is parsed to build the DOM tree and then discarded.
WebView.GetHtml() is more efficient than WebView.GetDOMWindow().document.getElementsByTagName("html")[0].outerHTML because the first method only make one round trip to the browser engine.
Thanks!
|
|
Rank: Advanced Member Groups: Member
Joined: 6/26/2015 Posts: 98
|
eo_support wrote: It is not possible to get the HTML source code that was originally returned from the web server. That data is parsed to build the DOM tree and then discarded.
Is it possible you could make the source available, or expose an event that would allow us to store it before it is discarded?
|
|
Rank: Administration Groups: Administration
Joined: 5/27/2007 Posts: 24,218
|
Hi, It is not possible to do that directly through the browser engine. To save memory, the browser engine doesn't even have the full source ---- it just keeps reading from the network and parse it as it goes. You can however write a custom resource handler to load the resource directly yourself (for example, using WebRequest class) and then feed the contents you get to the custom resource handler. See here for more information on custom resource handler: http://www.essentialobjects.com/doc/webbrowser/advanced/resource_handler.aspxThanks!
|
|
Rank: Advanced Member Groups: Member
Joined: 1/12/2015 Posts: 81
|
|
|
Rank: Administration Groups: Administration
Joined: 5/27/2007 Posts: 24,218
|
BenjaminSimpson1989 wrote: Hi, We tested this with the latest build and it seems to work fine. Can you see if it works for you? Thanks!
|
|
Rank: Advanced Member Groups: Member
Joined: 1/12/2015 Posts: 81
|
I tried the latest version and it's still not working for me. I submitted a code sample that uses the latest version.
|
|
Rank: Administration Groups: Administration
Joined: 5/27/2007 Posts: 24,218
|
Hi,
Please replace the code with EvalScript("document.body.innerHTML") instead. GetDOMWindow() is in fact a very unreliable way (we should probably remove this interface) because it involves multiple round trips to the script engine and objects returned from a previous round trip could have already been destroyed by the time of the second round trip. The following steps demonstrates this problem:
1. wb.GetDOMWindow() is called. This call returns a reference to the current "window" object in the script engine; 2. While this reference is being returned to the C# side, the web page continuous to load/execute JavaScript code, which may cause the actual Window object returned in step 1 to become invalid; 3. The C# code uses the window object returned in step 1 to get its "document" properly. However at this point because the window object is no longer valid, getting window.document would fail and returns null;
Here step 1 and step 3 are two different round trips into the script engine. As long as they are two different round trips, things can happen in between in the script engine and cause problems. The more "chained" round trips there is, the more likely the problem is going to hit somewhere in between. The only reliable way to avoid such problems is to reduce the call to a single round trip with EvalScript. This way the script engine will keep track of all evaluated objects from the beginning to the end and will make sure all intermediate objects are still valid before it returns.
Please let us know if you still see problems after you made the changes.
Thanks!
|
|
Rank: Advanced Member Groups: Member
Joined: 1/12/2015 Posts: 81
|
It worked. Thanks. Could I also replace this code: EO.WebBrowser.DOM.Element productData = wb.GetDOMWindow().document.getElementById("priceBox"); with this code: EO.WebBrowser.DOM.Element productData = (EO.WebBrowser.DOM.Element)wb.EvalScript("document.getElementById('priceBox')");
Would it be able to cast the object (returned by the EvalScript) to the EO.WebBrowser.DOM.Element type?
|
|
Rank: Administration Groups: Administration
Joined: 5/27/2007 Posts: 24,218
|
No. That is not recommended. The idea is to avoid returning any reference to a native JavaScript object because by the time you get around to use it it may longer be valid.
GetDOMWindow is not recommended because it returns a DOM.Window object, which is a reference to a native JavaScript object that lives in the browser engine, so this is bad.
Your new code returns a DOM.Element object, this is still a reference to yet another object that lives in the browser engine. This is still bad even though you get this object through EvalScript.
The key is not whether you use EvalScript or not. The key is whether your use a reference to a JavaScript object or not. When you use EvalScript("document.body.innnerHTML"), it returns a simple string object that does not longer references anything on the JavaScript side, that is why it's good. It's not because it uses EvalScript.
|
|
Rank: Advanced Member Groups: Member
Joined: 1/12/2015 Posts: 81
|
That makes sense now. So you're saying that it's better to use the EvalScript("document") instead of GetDOMWindow().document because the latter makes 2 trips? And that EvalScript("window") is the same as GetDOMWindow()? Now currently I'm doing this:
Code: C#
EO.WebBrowser.DOM.Element productData = wb.GetDOMWindow().document.getElementById("priceBox");
if (productData["offsetParent"] == null)
throw new Exception(@"Product Data Not Found!");
How could I replace that to reduce the number of trips needed?
|
|
Rank: Administration Groups: Administration
Joined: 5/27/2007 Posts: 24,218
|
No. You are still missing the key point. Both EvalScript("document") and GetDOMWindow().document are bad because they both returns a "document" object. The key here is in both version you are going to use a JavaScript object (the document object) in .NET code --- you won't see the problem until you actually use the returned object. To better demonstrate the problem, let's say you wish to use the document object to get the document's title. There are two ways to do it (Note the code below won't actually compile, it's just for conceptual demonstrating purpose): Method 1:
Code: C#
//Get the document object through EvalScript first,
//then use document.title to get the document title
var document = EvalScript("document");
string title = document.title;
Method 2:
Code: C#
//Get the document object through GetDOmWindow().document
//then use document.title to get the document title
var document = GetDOMWindow().document;
string title = document.title;
Both are bad because with both methods, in the second line "document.title" you are using the document object returned from previous round trips. So you violated the rule of "using a JavaScript code in .NET code". Method 2 is worse because it is in fact three lines of code:
Code: C#
//This is the "expanded" version of method 2
var window = GetDOMWindow();
var document = window.document;
string title = document.title;
In this expanded version, both line 2 and line 3 uses objects returned from previous round trips (window object and document object). So it violated the rule twice and has twice the chance to fail. In your case, you can replace the whole thing with a single JavaScript code block similar to a JavaScript function:
Code: C#
EvalScript(@"
{
var result = .....; //put whatever JavaScript code here to return your data
result;
};");
Note that the JavaScript code block you passed to EvalScript is very similar to the body of a JavaScript function. The difference is the last line you would simply write "result;" instead of "return result;". Now your goal is to figure out the right JavaScript code that gives you the correct "result" value ---- note that here "result" must be a primitive value such as a number or a string (or array of those primitive values). It can not be a reference to another object value such as "window" or "document". Otherwise you will be back to exactly where you started. If you need to return multiple values, you can return an array or use JSON. See here for more details: https://www.essentialobjects.com/doc/webbrowser/advanced/json.aspxOnce again, the key rule is: Do not use JavaScript object in your .NET code. Only pass primitive values in between JavaScript and .NET. Hope this makes sense to you. Thanks
|
|
Rank: Advanced Member Groups: Member
Joined: 1/12/2015 Posts: 81
|
I think I understood all that before. I was just asking about the efficiency of the code and the equivalency of the 2 statements. Which, from your extended explanation, seems to confirm my understanding. I apologize if this sounds sarcastic, it's not my intent. I really do appreciate your time in looking into my issues and giving detailed explanations. It definitely helped me and I'm certain it will help others as well. As to the second part of my previous question, can I rewrite the following code? Original:
Code: C#
EO.WebBrowser.DOM.Element productData = wb.GetDOMWindow().document.getElementById("priceBox");
if (productData["offsetParent"] == null)
throw new Exception(@"Product Data Not Found!");
New:
Code: C#
if ((bool)wb.EvalScript(@"var priceBox = document.getElementById('priceBox'); priceBox == null || priceBox.offsetParent == null"))
throw new Exception(@"Product Data Not Found!");
|
|
Rank: Administration Groups: Administration
Joined: 5/27/2007 Posts: 24,218
|
Glad to hear that. Yes. Your new code is good.
|
|