Hi Guys, We are doing a headless scraping for our customer to help them sync data. We have an asp.net application and we kick off a task to ryn async. 95% of the time it crashes.
Here is the crash details from the event log.
Below I will include the crash details (EVENT LOG),and a simplified code snippet (C# CLASS CODE)
In the code I have tried to be very verbose and clean up after each instantiation. Even though it was slower, I was hoping to keep it from crashing. I even call GC.Collect() to garbage collect after each scrape.
Any help would be appreciated!
** Updated **
I am running your latest version.
I have tried as a console application now. It crashes after a few scrapes with the following
Problem signature:
Problem Event Name: CLR20r3
Problem Signature 01: DealerScraper.exe
Problem Signature 02: 1.0.0.0
Problem Signature 03: e6885de1
Problem Signature 04: EO.WebBrowser
Problem Signature 05: 20.1.31.0
Problem Signature 06: 5eaaf582
Problem Signature 07: 3fd
Problem Signature 08: d0
Problem Signature 09: System.NullReferenceException
OS Version: 6.3.9600.2.0.0.400.8
Locale ID: 1033
Additional Information 1: b92e
Additional Information 2: b92e7cff81d09fca3b6155da70e151bb
Additional Information 3: b92e
Additional Information 4: b92e7cff81d09fca3b6155da70e151bb
Read our privacy statement online:
http://go.microsoft.com/fwlink/?linkid=280262If the online privacy statement is not available, please read our privacy statement offline:
C:\Windows\system32\en-US\erofflps.txt
************************* EVENT LOGS **********************************************************
An unhandled exception occurred and the process was terminated. Application ID: /LM/W3SVC/35/ROOT Process ID: 19304 Exception: System.NullReferenceException Message: Object reference not set to an instance of an object. StackTrace: at EO.WebBrowser.RequestEventArgs.a(at7 A_0) at EO.WebBrowser.RequestEventArgs..ctor(at7 A_0, co A_1, ap9 A_2) at EO.WebBrowser.BeforeRequestLoadEventArgs..ctor(at7 A_0, co A_1, ap9 A_2) at EO.WebBrowser.WebView.am(co A_0, ap9 A_1) at EO.Internal.co.a.d(Object A_0) at System.Threading.QueueUserWorkItemCallback.WaitCallback_Context(Object state) at System.Threading.ExecutionContext.RunInternal(ExecutionContext executionContext, ContextCallback callback, Object state, Boolean preserveSyncCtx) at System.Threading.ExecutionContext.Run(ExecutionContext executionContext, ContextCallback callback, Object state, Boolean preserveSyncCtx) at System.Threading.QueueUserWorkItemCallback.System.Threading.IThreadPoolWorkItem.ExecuteWorkItem() at System.Threading.ThreadPoolWorkQueue.Dispatch() at System.Threading._ThreadPoolWaitCallback.PerformWaitCallback()
Application: w3wp.exe Framework Version: v4.0.30319 Description: The process was terminated due to an unhandled exception. Exception Info: System.NullReferenceException at EO.WebBrowser.RequestEventArgs.a(EO.Internal.at7) at EO.WebBrowser.RequestEventArgs..ctor(EO.Internal.at7, EO.Internal.co, EO.Internal.ap9) at EO.WebBrowser.BeforeRequestLoadEventArgs..ctor(EO.Internal.at7, EO.Internal.co, EO.Internal.ap9) at EO.WebBrowser.WebView.am(EO.Internal.co, EO.Internal.ap9) at EO.Internal.co+a.d(System.Object) at System.Threading.QueueUserWorkItemCallback.WaitCallback_Context(System.Object) at System.Threading.ExecutionContext.RunInternal(System.Threading.ExecutionContext, System.Threading.ContextCallback, System.Object, Boolean) at System.Threading.ExecutionContext.Run(System.Threading.ExecutionContext, System.Threading.ContextCallback, System.Object, Boolean) at System.Threading.QueueUserWorkItemCallback.System.Threading.IThreadPoolWorkItem.ExecuteWorkItem() at System.Threading.ThreadPoolWorkQueue.Dispatch() at System.Threading._ThreadPoolWaitCallback.PerformWaitCallback()
Faulting application name: w3wp.exe, version: 8.5.9600.16384, time stamp: 0x52157ba0
Faulting module name: unknown, version: 0.0.0.0, time stamp: 0x00000000
Exception code: 0xc0000005
Fault offset: 0x0b95c8d2
Faulting process id: 0x4b68
Faulting application start time: 0x01d6240546da997c
Faulting application path: C:\Windows\SysWOW64\inetsrv\w3wp.exe
Faulting module path: unknown
Report Id: 1028bf5d-9056-11ea-80e5-000d3af41499
Faulting package full name:
Faulting package-relative application ID:
********************************* CLASS CODE ***************************************
Code: C#
public class Sample
{
private void WebView_CertificateError(object sender, CertificateErrorEventArgs e)
{ e.Continue(); }
private void ScrapeByList(string serviceAccount,SQL currentSql, List<string> urls)
{
EO.WebEngine.Engine engine = EO.WebEngine.Engine.Create("Scraper");
engine.AllowRestart = true;
foreach (string url in urls)
{
try
{
ScrapeThePage(engine, currentSql, url);
}
catch (Exception ex)
{
ExceptionUtility.LogException("Scraper Engine -> scrapeBySiteMaps ", ex, currentSql, true);
}
}
return;
}
private bool ScrapeThePage(EO.WebEngine.Engine engine, SQL currentSql, string url)
{
bool result = false;
try
{
engine.Start();
result = RunBrowser(engine, currentSql, url);
}
catch (Exception ex)
{
ExceptionUtility.LogException(ex, currentSql, true);
}
finally
{
engine.Stop(true);
GC.Collect();
}
return result;
}
private bool RunBrowser(EO.WebEngine.Engine engine,SQL currentSql, string url)
{
bool result = false;
var host = new Uri(url).Host;
host = host.Substring(0, host.LastIndexOf('.'));
var runner = new ThreadRunner(host, engine);
try
{
result = RunWebView(runner, currentSql, url);
}
catch (Exception ex)
{
ExceptionUtility.LogException("Scraper Engine -> scrapeBySiteMaps ", ex, currentSql, true);
}
finally
{
runner.Stop();
runner.Dispose();
}
return result;
}
private bool RunWebView(ThreadRunner runner,SQL currentSql, string url)
{
var webView = runner.CreateWebView();
webView.CertificateError += WebView_CertificateError;
bool result = false;
try
{
webView.LoadUrlAndWait(url);
//your custom code here....
result = true;
}
catch (Exception ex)
{
result = false;
ExceptionUtility.LogException("Scraper Engine -> scrapeBySiteMaps ", ex, currentSql, true);
}
finally
{
webView.Close(true);
webView.Dispose();
}
return result;
}
}