|
Rank: Advanced Member Groups: Member
Joined: 11/8/2017 Posts: 66
|
Hello, I am trying to use EO.WebBrowser as a headless browser to 'fully resolve' a page - 'fully resolve' meaning wait until all of the Javascript in the page has run and capture the resultant HTML
Code: C#
//string html = null;
object objHtml = null;
string data = ....contents-of-html-file-with-js-not-yet-executed....
using (ThreadRunner threadRunner = new ThreadRunner())
{
//Create a WebView through the ThreadRunner
WebView webView = threadRunner.CreateWebView();
EO.WebEngine.BrowserOptions options = new EO.WebEngine.BrowserOptions();
options.EnableWebSecurity = false;
webView.SetOptions(options);
threadRunner.Send(() =>
{
webView.ConsoleMessage += new ConsoleMessageHandler(webView_ConsoleMessage);
//webView.LoadUrlAndWait(htmlfilename);
webView.LoadHtmlAndWait(data); //, workingdir);
//fully resolve the webpage
//html = webView.GetHtml();
objHtml = webView.EvalScript("document.documentElement.outerHTML", true);
});
webView.Destroy();
}
....do-something-with-objHtml.ToString()....
...
private List<string> m_Messages = new List<string>();
void webView_ConsoleMessage(object sender, ConsoleMessageEventArgs e)
{
string message = string.Format("{0} line# {1}:{2}", e.Source, e.LineNumber, e.Message);
m_Messages.Add(message);
}
...
Within the webpage to be resolved, there is quite a bit of Javascript including several calls to requirejs. However when I view the resultant variable (objHtml) that is assigned as a result of the LoadHtmlAndWait and EvalScript, the page only partially resolves - i.e. only some of the Javascript is run - I can check because I have "console.log" statements that are output into the m_Messages List (shown in the above C# code) that shows only some of the Javascript has executed. I know the Javascript code is OK (i.e. not falling over) because if I breakpoint within the output to console event (webView_ConsoleMessage above) and wait a few seconds, everything resolves - i.e. all of the Javascript is run I've tried to use LoadUrlAndWait and/or GetHtml (see commented out code above) and I get the same result How do I wait long enough for all of the Javascript to run Kind regards Phil
|
|
Rank: Administration Groups: Administration
Joined: 5/27/2007 Posts: 24,221
|
Hi,
There is no such thing as "all of the Javascript in the page has run" in a web page. Consequentially, LoadHtmlAndWait will not wait for that.
While the majority of the JavaScript are executed in synchronously manner, there can be plenty of JavaScript code that can run asynchronously in a page. A simple example would be a page that constantly calls setTimeout to setup a timer to display a clock. In such a page the JavaScript code will never end. More complicated scenarios are AJAX request, JavaScript promises, etc. Those are designed to be asynchronously in nature and there is no generic way of waiting for them to finish. So you will have to devise some other mechanism to achieve your goal.
Thanks!
|
|
Rank: Advanced Member Groups: Member
Joined: 11/8/2017 Posts: 66
|
Thanks for the reply.
What I was after was a means to delay the resolution process or at least (by default) for EO-WebBrowser to reach/execute the webpage's document.ready event. i.e. we have exactly the same codebase (i.e. reports) that is passed into EO-PDF and it accommodates this
And further (as you know) EO-PDF has functionality that not only allows setting a delay conversion, but also, the much more useful function that allows a manual trigger (coupled with eopdf.convert).
I think this would be all that would be required here - so to confirm, although the above functionality is available in EO-PDF, it isn't available within EO-WebBrowser ?
|
|
Rank: Administration Groups: Administration
Joined: 5/27/2007 Posts: 24,221
|
Hi, LoadHtmlAndWait does wait for window.load event, which is a standard DOM event. However document.ready is a JQuery event. This is just one of the many third party library out there. It is not practical for a generic purpose browser engine to specifically support a particular third party JavaScript library. JQuery's document.ready event is triggered by window.load, however depends on JQuery versions, it may delay the event. In that case LoadHtmlAndWait method will return before document.ready is triggered. All the EO.Pdf features you mentioned is built on top of the browser engine, not inside the browser engine. For example, a delay conversion is basically an extra wait after LoadHtmlAndWait returns. And manual trigger is a JavaScript extension callback. You can implement both in your code the same way. For example, if you are particular interested in JQuery's document.ready event, you can inject JavaScript code into the page to handle that event, and then call your own "trigger" code when that event is fired. Inside your trigger code, you can call back into the .NET code if needed: https://www.essentialobjects.com/doc/webbrowser/advanced/jsext.aspxHope this helps. Thanks!
|
|