Welcome Guest Search | Active Topics | Sign In | Register

AccessViolationError when generating PDFs with multiple threads Options
Simon Scheurer
Posted: Tuesday, May 14, 2013 11:03:20 AM
Rank: Advanced Member
Groups: Member

Joined: 5/14/2013
Posts: 45
Ran into the following exception after a while (with 8 worker threads, may be unrelated):
Code: C#
EO.Pdf.HtmlToPdfException Convertion failed. System.AccessViolationException: Attempted to read or write protected memory. This is often an indication that other memory is corrupt.
   at System.Windows.Forms.UnsafeNativeMethods.DispatchMessageW(MSG& msg)
   at System.Windows.Forms.Application.ComponentManager.System.Windows.Forms.UnsafeNativeMethods.IMsoComponentManager.FPushMessageLoop(Int32 dwComponentID, Int32 reason, Int32 pvLoopData)
   at System.Windows.Forms.Application.ThreadContext.RunMessageLoopInner(Int32 reason, ApplicationContext context)
   at System.Windows.Forms.Application.ThreadContext.RunMessageLoop(Int32 reason, ApplicationContext context)
   at System.Windows.Forms.Application.DoEvents()
   at EO.Pdf.Internal.bv.a(WaitHandle A_0, Int32 A_1, h3 A_2)
   at EO.Pdf.Internal.de.a(Boolean A_0, t A_1, Int32 A_2, h3 A_3)
   at EO.Pdf.Internal.de.a(String A_0, Boolean A_1, Boolean A_2, Int32 A_3, t A_4, String[] A_5, Byte[] A_6, h3 A_7)
   at EO.Pdf.Internal.de.a(HtmlToPdfOptions A_0, String A_1, Boolean A_2)
   at EO.Pdf.Internal.de.b(HtmlToPdfOptions A_0, String A_1, Boolean A_2)
   at EO.Pdf.Internal.de.a(bs A_0)
   at EO.Pdf.Internal.lr.c.a(Byte[] A_0)
Void b(System.Exception)


Any ideas?
eo_support
Posted: Tuesday, May 14, 2013 11:14:40 AM
Rank: Administration
Groups: Administration

Joined: 5/27/2007
Posts: 24,195
Hi,

Is there any way you can reproduce the error? We will have to reproduce it here first.

Thanks!
Simon Scheurer
Posted: Monday, July 8, 2013 9:12:50 AM
Rank: Advanced Member
Groups: Member

Joined: 5/14/2013
Posts: 45
Unfortunately not. If it reappears we will post additional information.
Simon Scheurer
Posted: Friday, July 12, 2013 1:37:49 AM
Rank: Advanced Member
Groups: Member

Joined: 5/14/2013
Posts: 45
Ok, the access violation errors are back. They appear systematically since we upgraded to the newest version.
Using 2.4.60.2 we do not have the issues anymore (at least not systematically). As soon as we use 5.0.24.2 then the errors appear on every run we do.

We get the following error message:
Code: C#
2013-07-12 07:15:17.0708 [ERROR] service: Scheduler encountered a fatal error: Convertion failed. System.AccessViolationException: Attempted to read or write protected memory. This is often an indication that other memory is corrupt.
   at EO.Pdf.Internal.df.ad.Invoke(IntPtr A_0, String A_1, String A_2)
   at EO.Pdf.Internal.df.a(IntPtr A_0, Boolean A_1, String A_2)
   at EO.Pdf.Internal.df.b(IntPtr A_0)
   at EO.Pdf.Internal.df.a(g A_0, Int32 A_1, Int32 A_2, Int32 A_3)
   at EO.Pdf.Internal.df.aa.Invoke(String A_0, String A_1, Boolean A_2)
   at EO.Pdf.Internal.df.a(HtmlToPdfOptions A_0, String A_1, Boolean A_2)
   at EO.Pdf.Internal.df.b(HtmlToPdfOptions A_0, String A_1, Boolean A_2)
   at EO.Pdf.Internal.df.a(hk A_0)
   at EO.Pdf.Internal.ls.c.a(hk A_0)
   at EO.Pdf.Internal.ls.c.a(Byte[] A_0)


The conversion method used is quite simple:
Code: C#
private HtmlToPdfResult Render(RenditionData renditionData, SessionData sessionData, bool includeImages, out PdfDocument document) {
                document = new PdfDocument();
                var result = HtmlToPdf.ConvertHtml(renditionData.HtmlContent, document, GetOptions(renditionData, sessionData, includeImages));

                // Set pdf options
                document.EmbedFont = true;

                // Set pdf information
                document.Info.Author = Defaults.Config.Renditions.Creator;
                document.Info.CreationDate = DateTime.Now;
                document.Info.ModifiedDate = renditionData.CaptureTime;
                document.Info.Title = renditionData.Title;
                document.Info.Creator = Defaults.Config.Renditions.Creator;
                document.Info.Subject = renditionData.Subject;

                return result;
            }


RenditionData and SessionData are defined as follows:
Code: C#
public class RenditionData {
        public Uri Uri { get; set; }
        public string HtmlContent { get; set; }
        public DateTime CaptureTime { get; set; }
        public string Title { get; set; }
        public string FooterHtml { get; set; }
        public string Subject { get; set; }
        public int LoadDelay { get; set; }
    }

    public class SessionData {
        public string CookieOrHeaderName { get; set; }
        public string CookieOrHeaderValue { get; set; }
    }


And the GetOptions method is as follows
Code: C#
private HtmlToPdfOptions GetOptions(RenditionData renditionData, SessionData sessionData, bool includeImages) {
                string baseUrl = renditionData.Uri.AbsoluteUri.Substring(0, renditionData.Uri.AbsoluteUri.LastIndexOf('/') + 1);
                return new HtmlToPdfOptions {
                    BaseUrl = baseUrl,
                    AutoFitX = HtmlToPdfAutoFitMode.ScaleToFit,
                    ProxyInfo = new ProxyInfo(ProxyType.HTTP, connector.proxy.Host, connector.proxy.Port),
                    AutoBookmark = true,
                    SSLVerificationMode = SSLVerificationMode.None,
                    PageSize = PdfPageSizes.A4,
                    AdditionalHeaders = new[] {
                        string.Format("{0}: {1}", sessionData.CookieOrHeaderName, sessionData.CookieOrHeaderValue)
                    },
                    GeneratePageImages = includeImages,
                    FooterHtmlFormat = string.Format(Defaults.Config.Renditions.FooterHtmlPatternWrapper, renditionData.FooterHtml),
                    MinLoadWaitTime = renditionData.LoadDelay
                };
            }


We are running HtmlToPDF in parallel in three threads for each tenant. But I don't think it's a multithreading issue. For test purposes I limited to one tenant and one thread each. But also in this non-parallel scenario the issue appears.

What else?
- includeImages is false
- SessionData is just a name value pair. Nothing special there
- HTML we use can be any. Happens with all HTML we tested, so should not depend on that (of course all the HTML we tested contained some CSS, JS, Images, etc.)
- renditionData.LoadDelay is 0
- The Proxy is localhost:6001 we use to serve all the embedded objects (CSS, etc.)

What additional information would be helpful? In the meantime we downgraded back to 2.4.60.2.
Simon Scheurer
Posted: Friday, July 12, 2013 1:38:54 AM
Rank: Advanced Member
Groups: Member

Joined: 5/14/2013
Posts: 45
Sorry, I meant 4.0.60.2 (works), 5.0.24.2 (does not work).
eo_support
Posted: Friday, July 12, 2013 8:56:16 AM
Rank: Administration
Groups: Administration

Joined: 5/27/2007
Posts: 24,195
Hi,

It is possible for you to isolate the problem into a test project and send the test project to us? We have to reproduce it in our environment first.

Thanks!
Simon Scheurer
Posted: Thursday, July 18, 2013 4:26:42 AM
Rank: Advanced Member
Groups: Member

Joined: 5/14/2013
Posts: 45
Hi,

We will try to extract a test project that still shows the erroneous behavior and allows to run standalone without the rest of our components. But this could take some time as we are all quite busy.
Where should we send the package to? If ok, I'll just put a zipfile on a webserver and post the http-link to download in this forum thread.

Best regards,
Simon
eo_support
Posted: Thursday, July 18, 2013 12:02:26 PM
Rank: Administration
Groups: Administration

Joined: 5/27/2007
Posts: 24,195
Hi,

We will PM you as to where to send the project. If you prefer, you can also put it on your site and provide a download link. However please keep in mind that everything you post in the forum is public, so make sure your test app doesn't contain sensitive information such as your db password or license key.

Thanks!
Simon Scheurer
Posted: Thursday, July 25, 2013 5:58:50 AM
Rank: Advanced Member
Groups: Member

Joined: 5/14/2013
Posts: 45
Hi,

Ok, I created a test project that reproduces the behavior. Actually with the test project the behavior even appears on the older Version (4.0.60.2). I'll send you a download link on the e-mail address mentioned in the private message you sent.

Some information about the project:
- It is self-contained. I.e. just load the solution and start the project qumram-service
- There are several command-line flags that can be used. Important for you will be just two of them:
/version: Defines the EO.PDF library to use (see comments below). Possible values are /version:4.0.60.2 (default) and /version:5.0.24.2).
/parallelity: How many PDFs should be created in parallel (default is 1). The errors also arise in non-parallel mode.

Both EO.PDF libraries are contained in the project. A custom class-loader makes sure the correct version is loaded. Of course we do this only in this test-project. The actual software is just referenced normally. Depending on the /version parameter the ClassLoader decides what version of the library to load.

The projects purpose is quite simple: There is a static Snapshot of our qumram webpage. These objects are then rendered to PDF files. There is an endless loop just picking an arbitrary HTML page and rendering it. The resulting PDFs are stored in /work/pdf. To stop the application just press any key on the command prompt and it will shut itself properly down.

The files that are needed to render the webpage (images, css, etc.) are served by an internal small webserver. This webserver is very simple and just reads the corresponding files from the file-system and sends them back.

If you wonder why the overall architecture is quite complicated, then this is because it is a stripped down version of our actual software. Some of the layers involved may not seem necessary. They are there as many components are configured used dependency injection containers and can be replaced by other components (this also holds for the Renditioner that can be replaced by any other Renditioner that implements IRenditioner interface).

Two issues can be reproduced with this test project:
1. The Access Violation Headers
2. Additional Headers are not always sent to the server although they should be

There is a readme file in the root explaining the issues and telling how to reproduce using the test-project.
eo_support
Posted: Thursday, July 25, 2013 9:27:43 PM
Rank: Administration
Groups: Administration

Joined: 5/27/2007
Posts: 24,195
Hi,

This is just to let you know that we have received the test project and have been able to reproduce the problem. We are working on it and will post again if we find anything.

Thanks!
eo_support
Posted: Friday, July 26, 2013 6:37:53 PM
Rank: Administration
Groups: Administration

Joined: 5/27/2007
Posts: 24,195
Hi,

We have posted a new build and emailed you the download location of the new build. Please let us know if the new build fixes the problem for you.

Thanks!
Simon Scheurer
Posted: Monday, July 29, 2013 2:11:35 AM
Rank: Advanced Member
Groups: Member

Joined: 5/14/2013
Posts: 45
Wow, that was quick - thx for the fast reaction!
I'll check it ASAP and get back to you.
Simon
Simon Scheurer
Posted: Monday, July 29, 2013 8:35:41 AM
Rank: Advanced Member
Groups: Member

Joined: 5/14/2013
Posts: 45
Hi,

Ok. I tested the release that you sent me. It works mostly, but still produces both errors in some cases (I still have missing headers sometimes and still get AccessViolations - although much less frequently).

As a remark (you probably have already noticed). To properly run the webserver either VS 2012 or the command prompt need to have Administrator credentials. Otherwise the HTTP-Listener is not able to use http.sys (Access Denied error message).

To reproduce I used the following settings:
/version:5.0.33.2 /parallelity:10 and let it run for a while. The error appeared for the first time after a few hundred renditions (on rendition 470 to be more precise). Following a section out of the log file.

Quote:

Successfully created pdf file 2013-07-29_141159_n468.pdf.
Successfully created pdf file 2013-07-29_141159_n469.pdf.
Successfully created pdf file 2013-07-29_141159_n470.pdf.
Error: System.AccessViolationException: Attempted to read or write protected memory. This is often an indication that other memory is corrupt.
at EO.Pdf.Internal.db.ac.Invoke(IntPtr A_0, Int32& A_1, Int32& A_2, Boolean A_3)
at EO.Pdf.Internal.db.a(IntPtr A_0, Boolean A_1)
at EO.Pdf.Internal.db.a(HtmlToPdfOptions A_0, String A_1, Boolean A_2)
at EO.Pdf.Internal.db.b(HtmlToPdfOptions A_0, String A_1, Boolean A_2)
at EO.Pdf.Internal.db.a(bq A_0)
at EO.Pdf.Internal.lr.c.a(Byte[] A_0)
at EO.Pdf.Internal.a.b(BinaryReader A_0)
at EO.Pdf.Internal.lr.a(a A_0)
at EO.Pdf.HtmlToPdfSession.a(a A_0)
at EO.Pdf.HtmlToPdf.ConvertHtml(String html, PdfDocument doc, HtmlToPdfOptions options)
at Qumram.Business.Renditions.Renditioner.Render(RenditionData renditionData, SessionData sessionData, Boolean includeImages, PdfDocument& document) in e:\qumram\Source-Renditions\qumram-business\library\src\Renditions\Renditioner.cs:line 46
at Qumram.Business.Renditions.Renditioner.Render(RenditionType[] types, RenditionData renditionData, SessionData sessionData) in e:\qumram\Source-Renditions\qumram-business\library\src\Renditions\Renditioner.cs:line 31
at Qumram.Business.Renditions.RenditionerWrapper.CreateRendition(RenditionTestData data, RenditionType[] types) in e:\qumram\Source-Renditions\qumram-business\library\src\Renditions\RenditionerWrapper.cs:line 31
at Qumram.Worker.Jobs.RenditionJob.ProcessOne() in e:\qumram\Source-Renditions\qumram-service\service\src\Jobs\RenditionJob.cs:line 27
Successfully created pdf file 2013-07-29_141159_n471.pdf.
Successfully created pdf file 2013-07-29_141159_n472.pdf.
Successfully created pdf file 2013-07-29_141200_n473.pdf.
Successfully created pdf file 2013-07-29_141200_n474.pdf.
Missing rendition header error in http://www.qumram.ch:13131/images/startupticker.ch-logo.jpg
Missing rendition header error in http://www.qumram.ch:13131/index4e0c.js?f=wp-content/themes/qumram/_assets/js/modernizr.custom.js,wp-content/themes/qumram/_assets/js/main.js,wp-content/plugins/contact-form-7/includes/js/jquery.form.js,wp-content/plugins/contact-form-7/includes/js/scripts.js
Missing rendition header error in http://www.google-analytics.com:13131/ga.js
Missing rendition header error in http://www.qumram.ch:13131/images/logo.png
Missing rendition header error in http://www.qumram.ch:13131/images/nav-main.png
Successfully created pdf file 2013-07-29_141200_n475.pdf.
Successfully created pdf file 2013-07-29_141200_n476.pdf.
Successfully created pdf file 2013-07-29_141200_n477.pdf.
Successfully created pdf file 2013-07-29_141200_n478.pdf.
Missing rendition header error in http://www.qumram.ch:13131/images/icons.png
Successfully created pdf file 2013-07-29_141201_n479.pdf.
Successfully created pdf file 2013-07-29_141201_n480.pdf.


As you can see after that line (and only after that line, never before) the missing rendition header error appears. I.e. it seems that after that this memory exception error occurs EO.pdf somehow is not able to properly set the rendition headers anymore.

When I check the additional debug log from the webserver there is one more interesting aspect:
Quote:

Processed request http://www.qumram.ch:13131/images/logo.png
Processed request http://www.qumram.ch:13131/images/nav-main.png
Processed request http://www.qumram.ch:13131/images/sitepress.js
Processed request http://www.qumram.ch:13131/images/logo.png
Processed request http://www.qumram.ch:13131/images/our-method.png
Object not foud http://www.google-analytics.com:13131/ga.js
Processed request http://www.qumram.ch:13131/images/icons.png
Processed request http://www.qumram.ch:13131/images/icons.png


The debug log stopped growing. I.e. EO did not send any requests anymore after the error ocurred. The PDFs look proper *before* the error occurs but not anymore *after* it ocurred. It continues to run, but does not seem to retrieve (or try to retrieve) any images, css-files, etc. anymore.

I added the log-file (main.log) and a pdf before the error happens (447.pdf) and after it happened (580.pdf) as samples a zip file: https://dl.dropboxusercontent.com/u/1776128/log-and-pdfs.zip

I also improved a few things in the test project and added a new version to the dropbox. I'll send you the link again by e-mail.

Best regards,

Simon
Simon Scheurer
Posted: Monday, July 29, 2013 8:56:24 AM
Rank: Advanced Member
Groups: Member

Joined: 5/14/2013
Posts: 45
As a side-note: As you can see in the PDF-Package the EO.PDF library key is not accepted anymore (and it shows the trial-version footer). Not sure it this is because of the custom build or whether something else is wrong.
Check the 447.pdf for reproduction. As you can see in the demo-project the key is included.
We only have EO.PDF license. Maybe its linked to that, as the package I got was an EO.Total installer package.
Best regards,
Simon
Simon Scheurer
Posted: Monday, July 29, 2013 8:59:42 AM
Rank: Advanced Member
Groups: Member

Joined: 5/14/2013
Posts: 45
As a second side-note (or should I rather open this as new forum post?):
EO.pdf does not seem to render content that is contained within an IFrame.
I added a new html file (frame.html) in the project I sent you. I you remove all html (but not css and js) files in /work/site but the first three items (frame, index and the next one). Then you can see the difference.
If frame.html is opened in a browser it displays nicely. But if a rendition is created there is just empty space, but no content.
Better move this to a new topic?
Best regards,
Simon
eo_support
Posted: Monday, July 29, 2013 7:20:22 PM
Rank: Administration
Groups: Administration

Joined: 5/27/2007
Posts: 24,195
Hi,

We have been running your test app for about an hour and have not been able to reproduce the problem with the new version.

As to the frame.html, you need to set HtmlToPdf.Options.BaseUrl correctly. Otherwise it would not know where to load the frame files.

Thanks!
Simon Scheurer
Posted: Wednesday, August 14, 2013 8:44:33 AM
Rank: Advanced Member
Groups: Member

Joined: 5/14/2013
Posts: 45
Hi,

the IFrame-Issue was actually a stupid issue where the file was just missing. That's working fine. Regarding the RenditionHeaders they are still missing for some objects (.flv, some images that are located in css files). Very strange, no real patterns recognizable. We added custom cookies as fallback to the custom headers and they work reliably.

Regarding the AccessViolationError, I still run into it from time to time. Using the demo app it's hard to reproduce, but after running it for 10 to 20 hours it usually occurs even there. On the production system with the real renditions, it happens frequently (i.e. every hour in average or every 2500 to 3000 renditions).

Difficult to have that reliably reproduceable. Last error message was:
Code: C#
Convertion failed. System.AccessViolationException: Attempted to read or write protected memory. This is often an indication that other memory is corrupt.
   at System.Windows.Forms.UnsafeNativeMethods.DispatchMessageW(MSG& msg)
   at System.Windows.Forms.Application.ComponentManager.System.Windows.Forms.UnsafeNativeMethods.IMsoComponentManager.FPushMessageLoop(Int32 dwComponentID, Int32 reason, Int32 pvLoopData)
   at System.Windows.Forms.Application.ThreadContext.RunMessageLoopInner(Int32 reason, ApplicationContext context)
   at System.Windows.Forms.Application.ThreadContext.RunMessageLoop(Int32 reason, ApplicationContext context)
   at System.Windows.Forms.Application.DoEvents()
   at EO.Pdf.Internal.bt.a(WaitHandle A_0, Int32 A_1, h2 A_2)
   at EO.Pdf.Internal.db.a(Boolean A_0, t A_1, Int32 A_2, h2 A_3)
   at EO.Pdf.Internal.db.a(String A_0, Boolean A_1, Boolean A_2, Int32 A_3, t A_4, String[] A_5, Byte[] A_6, h2 A_7)
   at EO.Pdf.Internal.db.a(HtmlToPdfOptions A_0, String A_1, Boolean A_2)
   at EO.Pdf.Internal.db.b(HtmlToPdfOptions A_0, String A_1, Boolean A_2)
   at EO.Pdf.Internal.db.a(bq A_0)
   at EO.Pdf.Internal.lr.c.a(Byte[] A_0)


As this is hard to make it reproduceable this will also be hard to fix. I understand, that embedding a native code browser is difficult and can lead to issues. We can live with some instabilities if they occur every 1000 to 3000 renditions only.

What is tricky about the whole issue is the following: The renditions that are created *after* the error occurs are just blank pages. After some time it goes back to normal. Unfortunately I do not see any method to reinitialize the EO.PDF components.

- As all methods are static, I cannot dispose any objects and recreate them.
- I'd need to completely remove the dll from memory (by running EO in it's own AppContext and Marshalling objects) but that is truly ugly.
- Or restarting the whole service (needing another service watching and restarting it, removing everything from it thats not rendition-related, etc.).

Now: Can you provide some method to actually re-initialize everything. Dispose internal resources, etc. This way - if an error occurs, we could catch it, dispose everything, call the GC and re-start the renditioning process. I do not see any other way (currently) to reach the required stable and reliable behavior. Any other ideas?

Best regards,

Simon
eo_support
Posted: Wednesday, August 14, 2013 9:24:01 AM
Rank: Administration
Groups: Administration

Joined: 5/27/2007
Posts: 24,195
Hi,

You can unload the AppDomain and then restart a new AppDomain. Since everything is maintained within the AppDomain boundary, restarting the AppDomain should unload everything.

The most reliable way is to restart your process --- ASP.NET works this way by running everything inside a worker process and then if anything is wrong, it just quit the process. The next request will bring up a new process.

Thanks!


You cannot post new topics in this forum.
You cannot reply to topics in this forum.
You cannot delete your posts in this forum.
You cannot edit your posts in this forum.
You cannot create polls in this forum.
You cannot vote in polls in this forum.