|
Rank: Member Groups: Member
Joined: 10/6/2014 Posts: 21
|
We are evaluating your product for doing high volume html to pdf conversion in an important eGovernment site. Our tests so far have indicated that the conversions don’t scale to doing more than 6 conversion in parallel (in different threads). Is this a known limitation? Is there anything to configure to obtain better scalability? We have set HtmlToPdf.MaxConcurrentTaskCount = 100.
|
|
Rank: Administration Groups: Administration
Joined: 5/27/2007 Posts: 24,196
|
Hi,
The maximum number of parallel conversion depends on a lot of factors, the most important factors are the complexity of your page and your system configuration.
The HTML to PDF converter is both network intensive and CPU intensive, as such either part can slow it down. Additionally, the amount of memory you have in your system can have a huge impact too because every conversion consumes memory --- when the system runs out of physical memory, it starts to swap out memory contents to disk, so in the end the converter pretty much would just be running on your hard drive instead of in memory ---- that's assuming that you don't hit out of memory exception first. The size and complexity of HTML file obviously will have a big impact on all these aspects (network, CPU and memory).
Because the converter is CPU intensive, generally the performance will degrade very quickly when the number of parallel conversion significantly exceeds the number of CPU core available in your system. This is because when you have more number of "tasks" than the number of CPU available, the OS will just have to frequently switch between these tasks, this will give everyone a chance to run but since everyone will only get less a slice of the CPU, everyone will just take longer to finish. If a task takes too long and reaches the time out value, then it will just be terminated and you will receive an exception.
As a general guideline, you should not have more than twice as many parallel conversions than the number of CPU cores available. For example, if your system has 4 cores, you should not have more than 8 parallel tasks. Note that this parallel tasks number is not the number of parallel users you can have. For example if you run the converter on a web server, you may have 100 concurrent users browsing your site at the same time, but that does not mean the web server is serving pages to 100 users all at the exact same moment --- unless you have all the 100 users keep clicking refresh non-stopping, which obviously is not the case because most users load a page, read it for a while, and then load another page, not to mention that a lot of time these pages are loaded from the cache. That means at any given time the number pages that are being served at the same time is probably in single digits. As such the numbers of concurrent users is very different from the number of concurrent task. So even if your web server needs to handle 100 concurrent users, it may be sufficient for you if the converter can only run 8 parallel conversions. Having 100 parallel conversions is an extremely high value. It is not realistic to achieve this number.
You can check all these factors to see if you can identify a bottleneck. For example, if you see that all your CPU cores running at a high level all the time, then that means your CPU is overloaded and to gain more parallel conversion, you need a system with faster CPU or more CPU cores. If you get out of memory exception all the time, then it means you need more memory. If your web server can't serve pages enough to the converter, then your web server is overloaded. Once you identify the bottleneck, you will be able to address that and gain higher performance.
Hope this helps. Please feel free to let us know if you have any more questions.
Thanks
|
|
Rank: Member Groups: Member
Joined: 10/6/2014 Posts: 21
|
I have 32 cores and plenty of memory. There's no network IO in my evaluation test case - it's a simple local html without external references. That's why I expect scalability to almost 32 - at least a lot more than 6. (And I have many years of experience tuning .net applications.)
|
|
Rank: Administration Groups: Administration
Joined: 5/27/2007 Posts: 24,196
|
Hi,
In that case it should handle more than 6. What error are you getting?
Thanks!
|
|
Rank: Member Groups: Member
Joined: 10/6/2014 Posts: 21
|
No error, but the throughput doesn't increase with more than 6 threads, and the total cpu doesn't exceed ca 18%.
|
|
Rank: Administration Groups: Administration
Joined: 5/27/2007 Posts: 24,196
|
Hi, That doesn't make sense. We don't do anything special on threading except for make sure the APIs are multi-thread safe, and our typical multi-thread testing cases are 20 to 30 threads --- usually the CPU load will max out before that. One thing you can try is to use your own code to read the HTML file into memory, then call HtmlToPdf.ConvertHtml instead of HtmlToPdf.ConvertUrl to rerun the test and see if you get a different result. This might be able to tell you whether the bottleneck is on the file IO portion. If that does not reveal anything, please try to isolate the problem into a small test app and send the test app to us. Once we have that we will try to run it here to see if we can find anything. You can find more information on how to submit test app here: http://www.essentialobjects.com/forum/test_project.aspxThanks!
|
|
Rank: Member Groups: Member
Joined: 10/6/2014 Posts: 21
|
Hi,
The test program is submitted to your support email address.
|
|
Rank: Administration Groups: Administration
Joined: 5/27/2007 Posts: 24,196
|
Hi,
We did not receive your test project. So you might want to check it on your end to see if it went through. Make sure you do not inlcude our DLLs.
Thanks!
|
|
Rank: Member Groups: Member
Joined: 10/6/2014 Posts: 21
|
It was sent 06.10.2014 23:38 (Central European Time) without any error messages. An attached zip file of 20kb without your dlls. Do you have a workspace I could upload to?
|
|
Rank: Administration Groups: Administration
Joined: 5/27/2007 Posts: 24,196
|
Hi,
Can you resend it through a Web based mail system? We did not receive anything on our end at all. It is very common that email systems in big companies/government agencies filter emails for security concerns, particular for emails that contains "script", which is almost the cases for test project since it contains source code files.
Thanks!
|
|
Rank: Member Groups: Member
Joined: 10/6/2014 Posts: 21
|
Hi,
It's resent now from another account.
|
|
Rank: Administration Groups: Administration
Joined: 5/27/2007 Posts: 24,196
|
Hi,
We have received the test project. We are looking into it and we will reply again when we have an update or need any additional information from you.
Thanks!
|
|
Rank: Administration Groups: Administration
Joined: 5/27/2007 Posts: 24,196
|
Hi,
This is just to let you know that we are still working on this issue. We have identified and removed a bottleneck and our test have shown significant improvements. We are still running some more test here and hopefully we will have a new build for you early next week.
Thanks!
|
|
Rank: Administration Groups: Administration
Joined: 5/27/2007 Posts: 24,196
|
Hi,
This is just to let you know that we have posted a new build on our download page (EO.Total 2014.0.23). The EO.Pdf.dll version in this build is 6.0.19.2. Our test indicates that this build scales much better. Please take a look on your end and let us know how it goes.
Thanks!
|
|
Rank: Member Groups: Member
Joined: 10/6/2014 Posts: 21
|
Hi,
Sorry, but I can’t measure any improvements. The test program gives about the same output as the output in the package I sent you. The throughput is best with 9 threads on my 32 core machine (running Win2012R2). Have you tested on a machine with > 16 cores?
|
|
Rank: Administration Groups: Administration
Joined: 5/27/2007 Posts: 24,196
|
Hi,
No. Currently we do not have a system with > 16 cores available. We tested on a system with 6 cores and it steadily climbs to about 15 when using 10 threads and to 20 when using 20 threads. The CPU usage when using 20 threads has reached almost constantly 100% so when we increase the thread count even more the number does not change much any more. This comparing with the old build where the CPU usage was constantly below 50% even using 20 threads thus has a much lower output.
You might want to try two things:
1. Add a small randomly delay in your thread function. The fact that you have parallel threads all processing exactly the same file makes it highly likely that all threads will do exactly the same thing at the same time, this means one thread is at a CPU intensive stage all other threads are at CPU intensive stage too, and when one thread is waiting for something, all other threads would be waiting for something too. Adding a small randomly delay in your thread will break this and offset them out;
2. Try a different system with a different CPU configuration or OS. That might be able to give us some pointers here.
Thanks!
|
|
Rank: Member Groups: Member
Joined: 10/6/2014 Posts: 21
|
Max throughput for 9 threads on a 40 core HP DL580 – 10 conversion/s. Max for 32 threads on a 160 core HP DL960 but the throughput is only 13/s.
If all the threads are pushing on the same resource at the same time, it should stabilize after some time, because they will have different wait times. Anyway, unless you are spawning off several threads in parallel for each conversion we would expect linear scalability until the number of threads gets close to the number of CPUs?
I could send you a JetBrains DotTrace session showing the bottlenecks on our machines. It doesn’t tell me much because your code is obfuscated. A thread dump from WinDbg indicates that a lot of the threads have a call stack like this:
[InlinedCallFrame: 000000bb9a2ce2a0] System.Windows.Forms.UnsafeNativeMethods.WaitMessage() [InlinedCallFrame: 000000bb9a2ce2a0] System.Windows.Forms.UnsafeNativeMethods.WaitMessage() System.Windows.Forms.Application+ComponentManager.System.Windows.Forms.UnsafeNativeMethods.IMsoComponentManager.FPushMessageLoop(IntPtr, Int32, Int32) System.Windows.Forms.Application+ThreadContext.RunMessageLoopInner(Int32, System.Windows.Forms.ApplicationContext) System.Windows.Forms.Application+ThreadContext.RunMessageLoop(Int32, System.Windows.Forms.ApplicationContext) EO.Pdf.Internal.mb.g()
|
|
Rank: Administration Groups: Administration
Joined: 5/27/2007 Posts: 24,196
|
Hi,
Your result is very puzzling. On our "average" test machine which only has 6 cores and it can easily reach 20 to 25 conversions/s, it is almost double of your output. I would assume that our machine is not nearly as powerful as yours. The result will not be linear when you increase the number of threads since the more threads you have, the more overhead it introduces. However usually it should continue to increase to at least close to the CPU numbers.
The WaitMessage time is not unusual though. Internally the converter runs a message loop (for each of your calling thread). Basically it waits for message and then dispatch the message to do all kind of different things. So WaitMessage should be the most called function by the converter. Nevertheless, a DoTrace session might be able to shed some light on what else is happening and we can read the obfuscated code. So please email the trace session to us. You can find our email address in the "submit test project" link at the top of the forum.
If that does not reveal anything, we would like to be able to have remote desktop access to your test machine if possible. We will need to add some debug information, run it on your machine and then see if we can catch something. We might need to repeat this process over and over for many times. So if we do that, please prepare that it might take some time.
Thanks!
|
|
Rank: Member Groups: Member
Joined: 10/6/2014 Posts: 21
|
Hi,
A dotTrace session is sent by email.
|
|