Welcome Guest Search | Active Topics | Sign In | Register

EO.WebBrowser as headless browser - WebView.LoadHtmlAndWait/EvalScript doesn't wait long enough Options
Phil
Posted: Monday, December 3, 2018 7:11:05 AM
Rank: Advanced Member
Groups: Member

Joined: 11/8/2017
Posts: 66
Hello, I am trying to use EO.WebBrowser as a headless browser to 'fully resolve' a page - 'fully resolve' meaning wait until all of the Javascript in the page has run and capture the resultant HTML

Code: C#
//string html = null;
object objHtml = null;
string data = ....contents-of-html-file-with-js-not-yet-executed....
using (ThreadRunner threadRunner = new ThreadRunner())
{
    //Create a WebView through the ThreadRunner
    WebView webView = threadRunner.CreateWebView();

    EO.WebEngine.BrowserOptions options = new EO.WebEngine.BrowserOptions();
    options.EnableWebSecurity = false;
    webView.SetOptions(options);
    threadRunner.Send(() =>
    {
        webView.ConsoleMessage += new ConsoleMessageHandler(webView_ConsoleMessage);

        //webView.LoadUrlAndWait(htmlfilename);
        webView.LoadHtmlAndWait(data); //, workingdir);

        //fully resolve the webpage
        //html = webView.GetHtml();
        objHtml = webView.EvalScript("document.documentElement.outerHTML", true);
    });

    webView.Destroy();
}
....do-something-with-objHtml.ToString()....
...
private List<string> m_Messages = new List<string>();
void webView_ConsoleMessage(object sender, ConsoleMessageEventArgs e)
{
    string message = string.Format("{0} line# {1}:{2}", e.Source, e.LineNumber, e.Message);
    m_Messages.Add(message);
}
...


Within the webpage to be resolved, there is quite a bit of Javascript including several calls to requirejs. However when I view the resultant variable (objHtml) that is assigned as a result of the LoadHtmlAndWait and EvalScript, the page only partially resolves - i.e. only some of the Javascript is run - I can check because I have "console.log" statements that are output into the m_Messages List (shown in the above C# code) that shows only some of the Javascript has executed.

I know the Javascript code is OK (i.e. not falling over) because if I breakpoint within the output to console event (webView_ConsoleMessage above) and wait a few seconds, everything resolves - i.e. all of the Javascript is run

I've tried to use LoadUrlAndWait and/or GetHtml (see commented out code above) and I get the same result

How do I wait long enough for all of the Javascript to run

Kind regards
Phil
eo_support
Posted: Monday, December 3, 2018 3:17:49 PM
Rank: Administration
Groups: Administration

Joined: 5/27/2007
Posts: 24,221
Hi,

There is no such thing as "all of the Javascript in the page has run" in a web page. Consequentially, LoadHtmlAndWait will not wait for that.

While the majority of the JavaScript are executed in synchronously manner, there can be plenty of JavaScript code that can run asynchronously in a page. A simple example would be a page that constantly calls setTimeout to setup a timer to display a clock. In such a page the JavaScript code will never end. More complicated scenarios are AJAX request, JavaScript promises, etc. Those are designed to be asynchronously in nature and there is no generic way of waiting for them to finish. So you will have to devise some other mechanism to achieve your goal.

Thanks!
Phil
Posted: Tuesday, December 4, 2018 12:14:51 AM
Rank: Advanced Member
Groups: Member

Joined: 11/8/2017
Posts: 66
Thanks for the reply.

What I was after was a means to delay the resolution process or at least (by default) for EO-WebBrowser to reach/execute the webpage's document.ready event. i.e. we have exactly the same codebase (i.e. reports) that is passed into EO-PDF and it accommodates this

And further (as you know) EO-PDF has functionality that not only allows setting a delay conversion, but also, the much more useful function that allows a manual trigger (coupled with eopdf.convert).

I think this would be all that would be required here - so to confirm, although the above functionality is available in EO-PDF, it isn't available within EO-WebBrowser ?
eo_support
Posted: Tuesday, December 4, 2018 5:42:15 AM
Rank: Administration
Groups: Administration

Joined: 5/27/2007
Posts: 24,221
Hi,

LoadHtmlAndWait does wait for window.load event, which is a standard DOM event. However document.ready is a JQuery event. This is just one of the many third party library out there. It is not practical for a generic purpose browser engine to specifically support a particular third party JavaScript library. JQuery's document.ready event is triggered by window.load, however depends on JQuery versions, it may delay the event. In that case LoadHtmlAndWait method will return before document.ready is triggered.

All the EO.Pdf features you mentioned is built on top of the browser engine, not inside the browser engine. For example, a delay conversion is basically an extra wait after LoadHtmlAndWait returns. And manual trigger is a JavaScript extension callback. You can implement both in your code the same way. For example, if you are particular interested in JQuery's document.ready event, you can inject JavaScript code into the page to handle that event, and then call your own "trigger" code when that event is fired. Inside your trigger code, you can call back into the .NET code if needed:

https://www.essentialobjects.com/doc/webbrowser/advanced/jsext.aspx

Hope this helps.

Thanks!


You cannot post new topics in this forum.
You cannot reply to topics in this forum.
You cannot delete your posts in this forum.
You cannot edit your posts in this forum.
You cannot create polls in this forum.
You cannot vote in polls in this forum.