Welcome Guest Search | Active Topics | Sign In | Register

Parallel Generation of PDF files Options
BrentonW
Posted: Wednesday, August 15, 2012 9:18:14 PM
Rank: Member
Groups: Member

Joined: 8/14/2012
Posts: 10
I'm trying to use the EO PDF component from a multi-threaded environment, specifically generating several PDF files in parallel. The simplified version of my code looks like this:

Parallel.ForEach(collection, GeneratePdf);


private void GeneratePdf(string html)
{
HtmlToPdf.ConvertHtml(html, outputPath);
}


The documentation for the HtmlToPdf class states "Public static (Shared in Visual Basic) members of this type are safe for multithreaded operations. Instance members are not guaranteed to be thread-safe.".

Given that, I would assume that is it ok to HtmlToPdf() from multiple threads, however, I'm getting an exception:

Source Error:

Line 81:
Line 82: HtmlToPdf.Options.BaseUrl = String.Format("file:///{0}", _httpContext.Server.MapPath("/"));
Line 83: HtmlToPdf.ConvertHtml(html, relativePath);
Line 84: }
Line 85:

Source File: C:\Local\Dev\Projects\MainRepository\trunk\Src\MyApp\MyApp.WebSite\Infrastructure\Helpers\PdfGenerator.cs Line: 83

Stack Trace:

[IndexOutOfRangeException: Index was outside the bounds of the array.]
System.Collections.Generic.Dictionary`2.Insert(TKey key, TValue value, Boolean add) +608
EO.Pdf.Internal.gf.a(Int32 A_0) +355
EO.Pdf.Internal.a.a(i2 A_0) +141
EO.Pdf.Internal.ev.a(gf A_0, i2 A_1) +270
EO.Pdf.Contents.PdfTextContent.a(br A_0) +839
EO.Pdf.Contents.PdfContent.b(br A_0) +149
EO.Pdf.Contents.PdfContent.a(br A_0) +186
EO.Pdf.Contents.PdfTextLayer.a(br A_0) +121
EO.Pdf.Contents.PdfContent.b(br A_0) +149
EO.Pdf.Contents.PdfContent.a(br A_0) +186
EO.Pdf.Contents.PdfContent.b(br A_0) +149
EO.Pdf.Contents.PdfContentContainer.i() +1359
EO.Pdf.PdfDocument.a() +379
EO.Pdf.HtmlToPdf.ConvertHtml(String html, String pdfFileName) +74
MyApp.WebSite.Infrastructure.Helpers.PdfGenerator.GeneratePdf(String html, PdfTicketViewModel viewModel, Boolean forceRegeneration) in C:\Local\Dev\Projects\MainRepository\trunk\Src\MyApp\MyApp.WebSite\Infrastructure\Helpers\PdfGenerator.cs:83
MyApp.WebSite.Controllers.PurchaseNewController.GeneratePdfTicket(KeyValuePair`2 htmlAndViewModel) in C:\Local\Dev\Projects\MainRepository\trunk\Src\MyApp\MyApp.WebSite\Controllers\PurchaseNewController.cs:434
System.Threading.Tasks.<>c__DisplayClass2d`2.<ForEachWorker>b__23(Int32 i) +134
System.Threading.Tasks.<>c__DisplayClassf`1.<ForWorker>b__c() +4399679
System.Threading.Tasks.Task.InnerInvokeWithArg(Task childTask) +24
System.Threading.Tasks.<>c__DisplayClass7.<ExecuteSelfReplicating>b__6(Object ) +406


I'm not sure if this is related to the following recent post: http://www.essentialobjects.com/forum/postst6904_Access-Violation-Exception.aspx. It repros every time.

I'm on version 4.0.29.2.
eo_support
Posted: Wednesday, August 15, 2012 9:30:59 PM
Rank: Administration
Groups: Administration

Joined: 5/27/2007
Posts: 24,196
Hi,

This is a known bug that we already have an internally build that fixed the problem. We will PM you the download location shortly.

Thanks!
BrentonW
Posted: Thursday, August 16, 2012 3:14:02 AM
Rank: Member
Groups: Member

Joined: 8/14/2012
Posts: 10
I tried the internal build that you supplied, and while the multithreading bug is fixed, there are some pretty significant performance issues.

Running 10 HTML to PDF conversions in parallel the first time takes an average time of 23090ms per conversion. Running 10 conversions a second time takes an average of 1186ms per conversion. Looks like some locking and perf issues in initialization.

For reference, the previous build (that I could not get to work in a parallel environment, so these figures are for a sequential run) took 5487ms for the first conversion (a little concerning) and and average of 889ms over the subsequent 9 conversions.

The new build seems to have regressed significantly in performance.

Also, for comparison sake, I'm looking at a competitors component as well, and that for a first run in a parallel environment with 10 conversions is taking an average of 2157ms (over 10 times quicker than the first run with EO).
eo_support
Posted: Thursday, August 16, 2012 8:10:36 AM
Rank: Administration
Groups: Administration

Joined: 5/27/2007
Posts: 24,196
Hi,

The first run is always much slower as it needs to initialize the “engine”. The subsequent run should be not be as long. When you have parallel thread trying to perform conversions, multiple engines will be created. As it shows in both of your test, the subsequent runs are significantly faster than the first run. This is normal for our converter. The way you start the conversion aggravated this a lot because you are trying to start 10 concurrent conversions exactly at the same time. When we "start up" the engine, we will check the status of existing engines. This is very quick if the engine is already started and running ---- however it will lock for a while when the engine is still in the process of being initialized.

The time it takes to "fire up" should not be much of a concern in a real life scenario. In a real life scenario you probably would never have 10 working thread all be created and started at exactly the same time and start conversion. For a server application such as Web application, the threads would be started depending on user request, which comes in randomly. Besides, as long as the “engines” are initialized, the converter will run much smoother and faster. If the slow initializing time still concerns you, you can create an idle thread and call ConvertHtml with blank string to initialize a “spare engine” first. That should be very quick and that “spare engine” will take over whenever all other engines are busy.

Thanks
BrentonW
Posted: Tuesday, August 28, 2012 11:35:00 PM
Rank: Member
Groups: Member

Joined: 8/14/2012
Posts: 10
I've tried this in a variety of ways, and no matter what, keep running into significant performance issues.

The best case scenario when generating just 2 PDF files in parallel is usually around 1000ms per file on average (after initialization). When generating 10 files, best case is around 1700ms per file on average, when generating 20 files, about 12000ms per file on average.

All of these numbers are after the engine initializes, as suggested by the previous thread.

Overall, the performance of this component, or at least the build I'm running, is horrible. Does anyone use this in a production server environment? According to my testing, it simply doesn't scale.

Are these performance problems going to be addressed?
eo_support
Posted: Wednesday, August 29, 2012 7:55:52 AM
Rank: Administration
Groups: Administration

Joined: 5/27/2007
Posts: 24,196
Hi,

We have numerous customers using our product in production server environment and it scales just fine. What you are trying to do is pushing it over the limits to unnecessarily stress it. There are a lot of reasons why it may slow down significantly when you push it over a certain limits. For example, we will swap out a lot of data from memory to disk when we detected that your memory is running low. This will slow down things significantly because then basically everything is run on disk. This is not a disadvantage ---- the opposite is to throw a out of memory exception and crash your application. This is just one of the extra things we do to make sure our product runs smoothly in a server environment and it helps to handle spikes. Your server does not have unlimited power. So I would recommend you to find out the best performance point and try to keep load below that point.

Thanks!
BrentonW
Posted: Saturday, September 1, 2012 3:11:52 AM
Rank: Member
Groups: Member

Joined: 8/14/2012
Posts: 10
I disagree - what I'm doing is experimenting with your component and one of your competitors, EVO.PDF, and figuring out which one performs better and which one scales.

I also disagree with the assertion that your component "scales just fine ". I started this post on 8/14/2012 because EO.PDF was unable to handle a trivial multi-threaded scenario - it was completely broken trying to execute a few conversions in parallel.

Then I showed you some performance data that showed issues trying to scale up and run multiple conversions in parallel which you dismissed because you felt that it was "unnecessarily stressing" the system.

I disagree again, and here is some more specific performance data comparing EO.PDF and EVO.PDF to prove this. I have a spreadsheet with the detailed results that I'm willing to send you as well. The summary is below.

I conducted 2 different tests with each component:
1. Generating 20 PDFs sequentially
2. Generating 20 PDFs using Parallel.ForEach()

For each test I ran it 4 times to account for "warm-up" time of each component. The results below are the averages of the summary data for each run. All times below are in milliseconds.

As a side note, Parallel.ForEach() doesn't "unnecessarily stress" the system. Doing 20 things in parallel doesn’t mean 20 threads are spawned. The Parallel Task Library is intelligent about how many tasks it will allow to run at once based on the available cores in the system, duration each work item is taking etc…


Sequential PDF generation


EO.PDF
Total elapsed time: 19273
First: 2046
Last: 894
Max: 2242
Min: 800
Average: 963
Median: 886

EVO.PDF EVO is quicker by…
Total elapsed time: 16551 116%
First: 2158 95%
Last: 557 160%
Max: 5155 43%
Min: 508 158%
Average: 827 116%
Median: 541 164%


Parallel PDF Generation


EO.PDF
Total elapsed time: 28380
First: 20865
Last: 18506
Max: 25881
Min: 13313
Average: 18127
Median: 17895

EVO.PDF EVO is quicker by…
Total elapsed time: 7524 377%
First: 3224 647%
Last: 5473 338%
Max: 7432 348%
Min: 1460 912%
Average: 4450 407%
Median: 4910 364%


In every test, EVO.PDF out-performs and out-scales EO.PDF, but the more alarming issue is the significant problems that EO.PDF has when trying to scale. I'm sure this at least in part by the singleton design and the locking that appears to be happening behind the scenes. Is there a reason for this singleton design? Newing up a class adds a line of code, but the isolation you can get per converter is worth it. As a side note, how would I run PDF conversions in parallel with different settings? Your settings are located off your global singleton object, and therefore apply to all conversions that run in parallel. This isn't an issue in my scenario, but by building your component this way you're deliberately blocking certain use cases.

I need to make a decision on which component to purchase. EO.PDF claims to have the best component and customer service (http://www.essentialobjects.com/Products/EOPdf/WhyUs.aspx) so I'd like to see if that's the case and if these issues will be addressed.

Cheers,
Brenton
eo_support
Posted: Saturday, September 1, 2012 9:34:28 AM
Rank: Administration
Groups: Administration

Joined: 5/27/2007
Posts: 24,196
Hi,

"Scale" does not mean that it is always the fastest. Scalability is more about stability under heavy load and ability to scale up along with the network (for example, when you add more servers).

There are many important factors when it comes to a product like EO.Pdf. Performance is just one of them. In addition to the performance, the quality of the output PDF file and the stability of the product are even more important. We do a lot of extra work on these two. For example, we try to render every drawing in vector format, while many other vendors simply render it as bitmap. So a CSS3 round corner in our PDF will be vector format and you can zoom in infinitely, that means much better quality when printing out. We also run our core conversion code in a separate worker process instead of inside your process. We have additional code that monitors the health of the worker process and if it sees any problems, it will try to terminate and restart it. All those happen transparently to you and the result is a more stable and reliable product. However there is a trade off with everything ----- on a fixed hardware, doing more on one side means we have less room on the other side. For our product, the quality of the output file and the stability of the product definitely have a higher priority than pure performance. So we definitely do not think it’s worth it to just pursue performance alone as the only indicator.

In additional to those, there are also a lot of additional feature we put out so that our users find very useful. The "singleton" design you mentioned is one of them. While the programming interface appears to be a singleton design, internally it is actually not. To answer your question about "how to run conversion in parallel with different settings" ---- you just use it the same way as you do in a single thread without worrying about anything at all. A different set of settings are automatically maintained and isolated for a different thread. So if you have two conversions running in parallel in two different thread, you just set the settings in your thread and they will not have anything to do with each other. We are not the only one does things this way ---- ASP.NET does things this way too (ever wondering whether you need to lock on session variables?). This is just one of the samples that we try to make it as easy as possible to you. We also have additional features such as returning you the HTML DOM tree so that you will be able to use it to find the location of any DHTML element and add anything else with our ACM interface. All those are nice features but again everything is a trade off. This is similar to full loaded SVU and a bare bone minimum car. If the only thing you are concerned about is gas mileage, then you should definitely go to a small car because a full loaded SVU burns more gas and it is not possible for the SVU to match the gas mileage of a small car. Most of our customers think what we offer is worth the trade off, but ultimately every customer is different and there is no single size that fits all. So in this case, it really depends on your priority. We cannot just cut all the extra features we offer off to match somebody else's performance.

Also we believe some of the performance test you do is far from typical in real life scenario. For example, the test you did when you start all 10 conversions at the same time almost never happen in real life scenario so it definitely would not be a priority for us to "solve it". Our ultimate goal is to provide our customers something very useful in their real life deployment, not to produces something that just has the highest score in every imaginable test. The overall value we can offer to our customer is definitely much more important than one or two test alone.

Based on all those reasons, we do not believe the "performance issue" you reported is something that needs to be addressed ---- don’t get us wrong, it’s not that we do not believe performance is important --- we have in fact done extensive performance test and optimization so we actually do not believe there is too much room left. The real reason behind our position it’s because performance is not the only concern and there are a whole load of things we have to balance very carefully. How well we balance has a very direct impact on the product quality and we fully understand that no every customer agrees with our balance point. But most of them do agree with us, and that’s precisely why we are the most popular HTML to PDF converter out there.

Hope this give you a good idea of the rationale behind our position. Please feel free to let us know if you have more questions.

Thanks
BrentonW
Posted: Saturday, September 1, 2012 2:53:18 PM
Rank: Member
Groups: Member

Joined: 8/14/2012
Posts: 10
I appreciate the explanation, but I'm surprised at what appears to be the dismissive position. If this were my component I would absolutely be investigating.

I never said scale meant it was the fastest. The performance metrics are just another indication of this component relative to EVO.PDF, and EO.PDF is significantly slower, no matter which way you slice it. You claim it's because you generate better PDF, and looking at the results resulting PDF files, I'm inclined to believe you on that. So let's assume that I'm willing to trade slower per-conversion performance for better PDFs

"Scale" in relation to this component means that you can run multiple in parallel and the time required per conversion doesn't degrade significantly. It has absolutely nothing to do with adding more servers - that's how I scale my application, but not this individual component.

I can tell you that these tests are very relevant for my scenario and it is very real world. In my scenario, I kick off anywhere from 2-20 conversions in parallel. Normally it's closer to the lower end, however, it does go to the higher end. I need to know that my system won't grind to a halt and the time per conversion will not degrade significantly (I understand it will degrade a certain amount, and I'm with that). The above tests run 20 conversions in parallel using the parallel task library - but that's doesn't mean 20 threads are spawned simultaneously. The PTL is intelligent about how it kicks off it's work load.

Try it for yourself - test with running 2, 5, 10, 15 and 20 conversions in parallel. As you scale up, you'll notice there is a point where the performance per conversion degrades significantly, and it shouldn't. My guess is it has to do with locking inside your component, but I don’t know that for sure - only you guys can figure that out.

I'm surprised that you don’t want to even investigate the delta in performance between EO and EVO.PDF, but at least take a look at the scale issues.

It will take you a few minutes to set up some basic tests and start incrementing the number of conversions in parallel and you'll see the problem.
eo_support
Posted: Saturday, September 1, 2012 4:17:47 PM
Rank: Administration
Groups: Administration

Joined: 5/27/2007
Posts: 24,196
Hi,

You have to trust us on this ---- we have already done extensive performance testing and optimization before on this product. We feel what you are expecting is a significant performance boost which is not a realistic expectation at this stage. It's like the car currently has a mileage of 30 and you want 15. That's simply not going to happen without changing this car into a totally different type of car. Also when it comes to performance, it is not like a bug that you can trace to a single line of code and fix it right away. Any significant performance boost has to be the result of a system wide optimization because every little bit counts. To be very frank with you, we do not believe that's something we should do for a stable product since that would risk breaking loose a lot of things with very little to gain. As to many things you mentioned, such as significant performance drop after a certain load, we know exactly why it happens and we have already explained it to you ----- you are hitting the upper limits and you should try to stay below that limit. We could risk everything to raise the limit to another 5% while you probably is expecting at least 50%, while after that 5% boost we might have broke something else elsewhere and in the end you will have a newer version 5% faster but 20% less stable. Obviously neither you nor us want that.

I understand you have a valid concern but at the same time I want you to believe we know what we are doing and we are quite sure you are after a dead end here. So I think you should give up on this. If the performance does not meet your need, then I would recommend you go with another product ---- I am not trying to push you away, but while I believe we have the best value, we have our advantages and other vendors have theirs, and in the end it’s what matters to you most that’s important. When we sell a product to you we wanted it to be truly usefully to you and if that does not stand true, the product may not be the best fit for you. We always work very hard try to give you the best we have and in this case you are already getting the best here.

In your case, the only suggestion I can make is to try to randomize your conversion task ---- such as adding a random time delay before starting each thread, and convert different randomly documents in each thread. This way each thread won’t all be doing exactly the same thing at the same time, and that might allow you to handle a little bit more load.

Thanks!
BrentonW
Posted: Saturday, September 1, 2012 5:38:53 PM
Rank: Member
Groups: Member

Joined: 8/14/2012
Posts: 10
Ok, let's put the performance issues aside for a minute and look at *scale*. See my previous comment:

"Try it for yourself - test with running 2, 5, 10, 15 and 20 conversions in parallel. As you scale up, you'll notice there is a point where the performance per conversion degrades significantly, and it shouldn't. My guess is it has to do with locking inside your component, but I don’t know that for sure - only you guys can figure that out.

I'm surprised that you don’t want to even investigate the delta in performance between EO and EVO.PDF, but at least take a look at the scale issues."

Do the scale test for yourself and then post your results here - you'll see there is an issue.

In the time you've spent responding trying to convince me that there is no problem you could have actually tested for yourself and realized that there is.

I've spent a ton of my time on this trying to help *you* create a better product. The least you can do is investigate this.
BrentonW
Posted: Saturday, September 1, 2012 5:43:34 PM
Rank: Member
Groups: Member

Joined: 8/14/2012
Posts: 10
By the way - 15 days ago EO.PDF (a "stable product") didn't event work in a multi-threaded environment. So you can't claim this is an area that is well tested for your component.

Please investigate this problem.
eo_support
Posted: Saturday, September 1, 2012 7:39:13 PM
Rank: Administration
Groups: Administration

Joined: 5/27/2007
Posts: 24,196
Hi,

We do not need to investigate this issue ---- we know exactly what it is and why it is happening. And we have explained to you over and over again that you are hitting the upper limit and you need to stay within limits. You didn’t want an investigation ---- you wanted us to change our product design just for you. It’s like we make a car with a top speed of 100 mph and you insisted that it’s not good enough for you because you wanted it to reach 150 mph. We can tell you very firmly that will not happen.

I understand that there was an issue a few weeks ago and that's precisely what we are trying to avoid this time ---- that was an issue introduced when we were trying to "fix" something else. You have to understand that we walk a fine line between individual customer and a stable product ---- we want every customer to be happy and want a bullet proof stable product at the same time. Unfortunately neither is 100% possible, that’s why we have to be very careful.

Just like you have spent a ton of time trying to get what you want, we believe we have spent enough time explaining our position to you as well. We have said to you over and over what you wanted is not possible and you simply would not believe us. That’s really not how it works here. If we can fix something for you, we will fix it right away just as we did; If we can’t fix something for you, we won’t do it no matter what you do. So there is no point for either of us to spend any more time on this issue. As such this issue is now closed. This will be our last reply on this issue.

Thanks!
DMichael
Posted: Wednesday, February 5, 2014 2:25:43 PM
Rank: Newbie
Groups: Member

Joined: 2/5/2014
Posts: 1
In our evaluation of Eo.Pdf and Eov.Pdf, average single-threaded render time for 100 Pdfs was: 1.12 seconds for Eo.Pdf, and 4.34 seconds for Eov.Pdf.


You cannot post new topics in this forum.
You cannot reply to topics in this forum.
You cannot delete your posts in this forum.
You cannot edit your posts in this forum.
You cannot create polls in this forum.
You cannot vote in polls in this forum.