Welcome Guest Search | Active Topics | Sign In | Register

Crash while headless scraping Options
Enginess
Posted: Thursday, May 7, 2020 8:04:51 AM
Rank: Newbie
Groups: Member

Joined: 4/29/2020
Posts: 1
Hi Guys, We are doing a headless scraping for our customer to help them sync data. We have an asp.net application and we kick off a task to ryn async. 95% of the time it crashes.
Here is the crash details from the event log.

Below I will include the crash details (EVENT LOG),and a simplified code snippet (C# CLASS CODE)

In the code I have tried to be very verbose and clean up after each instantiation. Even though it was slower, I was hoping to keep it from crashing. I even call GC.Collect() to garbage collect after each scrape.

Any help would be appreciated!

** Updated **
I am running your latest version.

I have tried as a console application now. It crashes after a few scrapes with the following
Problem signature:
Problem Event Name: CLR20r3
Problem Signature 01: DealerScraper.exe
Problem Signature 02: 1.0.0.0
Problem Signature 03: e6885de1
Problem Signature 04: EO.WebBrowser
Problem Signature 05: 20.1.31.0
Problem Signature 06: 5eaaf582
Problem Signature 07: 3fd
Problem Signature 08: d0
Problem Signature 09: System.NullReferenceException
OS Version: 6.3.9600.2.0.0.400.8
Locale ID: 1033
Additional Information 1: b92e
Additional Information 2: b92e7cff81d09fca3b6155da70e151bb
Additional Information 3: b92e
Additional Information 4: b92e7cff81d09fca3b6155da70e151bb

Read our privacy statement online:
http://go.microsoft.com/fwlink/?linkid=280262

If the online privacy statement is not available, please read our privacy statement offline:
C:\Windows\system32\en-US\erofflps.txt


************************* EVENT LOGS **********************************************************

An unhandled exception occurred and the process was terminated. Application ID: /LM/W3SVC/35/ROOT Process ID: 19304 Exception: System.NullReferenceException Message: Object reference not set to an instance of an object. StackTrace: at EO.WebBrowser.RequestEventArgs.a(at7 A_0) at EO.WebBrowser.RequestEventArgs..ctor(at7 A_0, co A_1, ap9 A_2) at EO.WebBrowser.BeforeRequestLoadEventArgs..ctor(at7 A_0, co A_1, ap9 A_2) at EO.WebBrowser.WebView.am(co A_0, ap9 A_1) at EO.Internal.co.a.d(Object A_0) at System.Threading.QueueUserWorkItemCallback.WaitCallback_Context(Object state) at System.Threading.ExecutionContext.RunInternal(ExecutionContext executionContext, ContextCallback callback, Object state, Boolean preserveSyncCtx) at System.Threading.ExecutionContext.Run(ExecutionContext executionContext, ContextCallback callback, Object state, Boolean preserveSyncCtx) at System.Threading.QueueUserWorkItemCallback.System.Threading.IThreadPoolWorkItem.ExecuteWorkItem() at System.Threading.ThreadPoolWorkQueue.Dispatch() at System.Threading._ThreadPoolWaitCallback.PerformWaitCallback()

Application: w3wp.exe Framework Version: v4.0.30319 Description: The process was terminated due to an unhandled exception. Exception Info: System.NullReferenceException at EO.WebBrowser.RequestEventArgs.a(EO.Internal.at7) at EO.WebBrowser.RequestEventArgs..ctor(EO.Internal.at7, EO.Internal.co, EO.Internal.ap9) at EO.WebBrowser.BeforeRequestLoadEventArgs..ctor(EO.Internal.at7, EO.Internal.co, EO.Internal.ap9) at EO.WebBrowser.WebView.am(EO.Internal.co, EO.Internal.ap9) at EO.Internal.co+a.d(System.Object) at System.Threading.QueueUserWorkItemCallback.WaitCallback_Context(System.Object) at System.Threading.ExecutionContext.RunInternal(System.Threading.ExecutionContext, System.Threading.ContextCallback, System.Object, Boolean) at System.Threading.ExecutionContext.Run(System.Threading.ExecutionContext, System.Threading.ContextCallback, System.Object, Boolean) at System.Threading.QueueUserWorkItemCallback.System.Threading.IThreadPoolWorkItem.ExecuteWorkItem() at System.Threading.ThreadPoolWorkQueue.Dispatch() at System.Threading._ThreadPoolWaitCallback.PerformWaitCallback()

Faulting application name: w3wp.exe, version: 8.5.9600.16384, time stamp: 0x52157ba0
Faulting module name: unknown, version: 0.0.0.0, time stamp: 0x00000000
Exception code: 0xc0000005
Fault offset: 0x0b95c8d2
Faulting process id: 0x4b68
Faulting application start time: 0x01d6240546da997c
Faulting application path: C:\Windows\SysWOW64\inetsrv\w3wp.exe
Faulting module path: unknown
Report Id: 1028bf5d-9056-11ea-80e5-000d3af41499
Faulting package full name:
Faulting package-relative application ID:

********************************* CLASS CODE ***************************************
Code: C#
public class Sample
    {
        private void WebView_CertificateError(object sender, CertificateErrorEventArgs e)
        { e.Continue(); }

        private void ScrapeByList(string serviceAccount,SQL currentSql, List<string> urls)
        {

            EO.WebEngine.Engine engine = EO.WebEngine.Engine.Create("Scraper");
            engine.AllowRestart = true;

            foreach (string url in urls)
            {
                try
                {
                    ScrapeThePage(engine, currentSql, url);
                }
                catch (Exception ex)
                {
                    ExceptionUtility.LogException("Scraper Engine -&gt; scrapeBySiteMaps ", ex, currentSql, true);
                }
            }
            return;
        }

        private bool ScrapeThePage(EO.WebEngine.Engine engine, SQL currentSql, string url)
        {
            bool result = false;
            try
            {
                engine.Start();
                result = RunBrowser(engine, currentSql, url);

            }
            catch (Exception ex)
            {
                ExceptionUtility.LogException(ex, currentSql, true);
            }
            finally
            {
                engine.Stop(true);
                GC.Collect();
            }
            return result;
        }


        private bool RunBrowser(EO.WebEngine.Engine engine,SQL currentSql, string url)
        {

            bool result = false;
            var host = new Uri(url).Host;
            host = host.Substring(0, host.LastIndexOf('.'));
            var runner = new ThreadRunner(host, engine);
            try
            {
                result = RunWebView(runner, currentSql, url);
            }
            catch (Exception ex)
            {
                ExceptionUtility.LogException("Scraper Engine -&gt; scrapeBySiteMaps ", ex, currentSql, true);
            }
            finally
            {
                runner.Stop();
                runner.Dispose();
            }

            return result;
        }

        private bool RunWebView(ThreadRunner runner,SQL currentSql, string url)
        {
            var webView = runner.CreateWebView();
            webView.CertificateError += WebView_CertificateError;
            bool result = false;
            try
            {
                webView.LoadUrlAndWait(url);
                //your custom code here....
                result = true;

            }
            catch (Exception ex)
            {
                result = false;
                ExceptionUtility.LogException("Scraper Engine -&gt; scrapeBySiteMaps ", ex, currentSql, true);
            }
            finally
            {
                webView.Close(true);
                webView.Dispose();
            }

            return result;
        }

    }
eo_support
Posted: Thursday, May 7, 2020 1:25:45 PM
Rank: Administration
Groups: Administration

Joined: 5/27/2007
Posts: 24,217
Hi,

It hard for us to pin point exactly what went wrong just from the stack trace. Can you isolate the problem into a small test app and send the test app to us? See here for more details:

https://www.essentialobjects.com/forum/test_project.aspx

As soon as we have that we will run it here under debugger and see what we can find.

Thanks!
Omri Suissa
Posted: Sunday, August 16, 2020 6:29:59 AM
Rank: Member
Groups: Member

Joined: 2/18/2020
Posts: 24
We have the same problem.
It is hard to replicate since it happens "from time to time".

Error:
NullReferenceException: Object reference not set to an instance of an object.
StackTrace: at EO.WebBrowser.RequestEventArgs.a(at7 A_0)
at EO.WebBrowser.BeforeRequestLoadEventArgs..ctor(at7 A_0, co A_1, ap9 A_2)
at EO.WebBrowser.WebView.am(co A_0, ap9 A_1)
at System.Threading.ExecutionContext.RunInternal(ExecutionContext executionContext, ContextCallback callback, Object state, Boolean preserveSyncCtx)
at System.Threading.ExecutionContext.Run(ExecutionContext executionContext, ContextCallback callback, Object state, Boolean preserveSyncCtx)
at System.Threading.QueueUserWorkItemCallback.System.Threading.IThreadPoolWorkItem.ExecuteWorkItem()
at System.Threading.ThreadPoolWorkQueue.Dispatch()

Code:
var threadRunner = new EO_WebBrowser.ThreadRunner();
var webView = threadRunner.CreateWebView();
webView.Resize(1920, 1000);

threadRunner.Send(() =>
{
try
{
var navigation = webView.LoadUrl(_url);
navigation.WaitOne();

if (navigation.HttpStatusCode == 200)
{
Thread.Sleep(3000); //allow js to load

var html = webView.EvalScript("document.documentElement.outerHTML", false).ToString();

if (IsNotEmptyHTML(html))
{
// write to file...
}
}

try
{
webView.Dispose();
}
catch
{
}
}
catch (Exception ex)
{

}
GC.Collect();
GC.WaitForPendingFinalizers();
GC.Collect();
GC.WaitForPendingFinalizers();
Environment.Exit(0);
});
Omri Suissa
Posted: Sunday, August 16, 2020 8:23:40 AM
Rank: Member
Groups: Member

Joined: 2/18/2020
Posts: 24
We are not sure, but we think it is related to the fact the the process is running using the NETWORK SERVICE user name (and we need it to be this way).
Omri Suissa
Posted: Monday, August 17, 2020 6:30:35 AM
Rank: Member
Groups: Member

Joined: 2/18/2020
Posts: 24
We also suspect this is related to redirect
eo_support
Posted: Monday, August 17, 2020 11:53:42 AM
Rank: Administration
Groups: Administration

Joined: 5/27/2007
Posts: 24,217
Hi Omri,

Please update to the latest build. We have fixed a scenario that can cause this issue in build 2020.1.45.

Thanks!
Omri Suissa
Posted: Monday, August 17, 2020 1:55:35 PM
Rank: Member
Groups: Member

Joined: 2/18/2020
Posts: 24
Thank you.

After the upgrade (2020.2.34.0) we now see "Child process exited unexpectedly" on "threadRunner.CreateWebView();" (the second line in the code) from time to time. is it related? do you have a solution for it?

mpnx: Child process exited unexpectedly.
at EO.Base.ThreadRunnerBase.ogth.uwsd(Int32 kns, Boolean& knt)
at EO.Base.ThreadRunnerBase.Send(ActionWithResult action, Int32 timeoutInMS, Boolean& done)
at EO.Base.ThreadRunnerBase.Send(ActionWithResult action, Int32 timeoutInMS)
at EO.Base.ThreadRunnerBase.Send(ActionWithResult action)
at EO.WebBrowser.ThreadRunner.Send(WebViewCallback callback, WebView webView, Object args)
at EO.WebBrowser.ThreadRunner.lvux(Int32 zu, Int32 zv, Boolean zw, BrowserOptions zx)
at EO.WebBrowser.ThreadRunner.CreateWebView(BrowserOptions options)
at EO.WebBrowser.ThreadRunner.CreateWebView()
eo_support
Posted: Monday, August 17, 2020 2:19:05 PM
Rank: Administration
Groups: Administration

Joined: 5/27/2007
Posts: 24,217
Hi,

This should be a different issue. This issue is almost always caused by a crash inside the browser engine. We updated Chromium engine from V77 to V81 in build 20.1.88. So when you updated your previous version to 20.2.34, you would be running a different version of the Chromium engine, and the crash most likely has something to do with this engine change.

There are several options:

1. The most effective way is for you to try to isolate the problem into a test project and then send the test project to us. This way we will be able to debug into the source code here and most of the time we can get to the bottom of the crash. See here for more information on how to send test project to us:

https://www.essentialobjects.com/forum/test_project.aspx

2. If producing a test project is not possible, you can try to collect the crash log and we will look into it to see what we can find. Crash log shows us the location of the crash. It sometimes helps on the very obvious case but very often even if we know where it crashes, we still won't be able to find out how it got there. So it may or may not work for you. See here for more information on how to collect crash log:

https://www.essentialobjects.com/doc/common/crash_report.aspx

3. Since the issue is most likely related to the new Chromium engine, you can try build 20.1.45 (which still uses the older engine) and see if it works better for you. You can get this version from nuget, or if you prefer the full installer, we can provide the download link to you.

Thanks!
Omri Suissa
Posted: Wednesday, August 19, 2020 10:33:16 AM
Rank: Member
Groups: Member

Joined: 2/18/2020
Posts: 24
We can collect the crash report. can you please share the encoding of the report? (utf8, unicode, etc)
eo_support
Posted: Wednesday, August 19, 2020 11:04:15 AM
Rank: Administration
Groups: Administration

Joined: 5/27/2007
Posts: 24,217
The crash report encrypted. So you will not be able to decode it. You can send it to us in zip format and we will look into it.
Omri Suissa
Posted: Wednesday, August 19, 2020 11:05:49 AM
Rank: Member
Groups: Member

Joined: 2/18/2020
Posts: 24
We need to save to a log file, however we need a string and not a byte array. so we need to convert the byte array to string.
Can you suggest the encoding for the conversion?
eo_support
Posted: Wednesday, August 19, 2020 11:09:07 AM
Rank: Administration
Groups: Administration

Joined: 5/27/2007
Posts: 24,217
You can't modify/convert our crash report data in anyway. The only thing you can do is to send it to us as is. Any changes will break the encryption signature and make it useless.
Omri Suissa
Posted: Wednesday, August 19, 2020 11:20:32 AM
Rank: Member
Groups: Member

Joined: 2/18/2020
Posts: 24
We can't save it as is due to some technical limitations. if we convert it to utf8 and back to byte array it should work (if your encryption is using utf8)
Omri Suissa
Posted: Wednesday, August 19, 2020 12:13:16 PM
Rank: Member
Groups: Member

Joined: 2/18/2020
Posts: 24
Can we produce a crash log to test it?
eo_support
Posted: Wednesday, August 19, 2020 1:10:18 PM
Rank: Administration
Groups: Administration

Joined: 5/27/2007
Posts: 24,217
If you can not save the crash log into a file, you can enable the automatic crash log so that it will be automatically sent to our server. We do not accept crash log file that are modified/converted in any way.
svma
Posted: Monday, October 19, 2020 8:11:33 AM
Rank: Member
Groups: Member

Joined: 10/19/2020
Posts: 12
Hello @eo_support,
Can you suggest a work-around for avoiding this issue in previous versions (namely, 20.0.33.0) ?
We cannot update to newer version on the fly.
eo_support
Posted: Monday, October 19, 2020 1:10:35 PM
Rank: Administration
Groups: Administration

Joined: 5/27/2007
Posts: 24,217
svma wrote:
Hello @eo_support,
Can you suggest a work-around for avoiding this issue in previous versions (namely, 20.0.33.0) ?
We cannot update to newer version on the fly.


Hi,

No. We had to change some code on our end in build .45. So the only way to apply this change is to switch to newer version of the DLL.

Thanks!


You cannot post new topics in this forum.
You cannot reply to topics in this forum.
You cannot delete your posts in this forum.
You cannot edit your posts in this forum.
You cannot create polls in this forum.
You cannot vote in polls in this forum.