Welcome Guest Search | Active Topics | Sign In | Register

Large HTML to PDF conversions speed improvement Options
rsdev
Posted: Wednesday, August 6, 2014 5:06:22 PM
Rank: Newbie
Groups: Member

Joined: 12/13/2013
Posts: 3
Hi,

Using EO.Pdf 2013.0.112/5.0.87.2 with Corporate License.

Test Code:
Code: C#
var options = new HtmlToPdfOptions();
options.TriggerMode = HtmlToPdfTriggerMode.Auto;
options.PageSize = PdfPageSizes.Letter;
options.OutputArea = new RectangleF(0.25f, 0.25f, PdfPageSizes.Letter.Width - (2f * 0.25f), PdfPageSizes.Letter.Height - (2f * 0.25f));
options.NoScript = false;
options.NoCache = false;
options.NoLink = false;
options.PreserveHighResImages = false;
options.MinLoadWaitTime = 250;
options.MaxLoadWaitTime = 300000;

var html = File.ReadAllText(@"C:\testfile.html");
using (var ms = new MemoryStream())
{
   HtmlToPdf.ConvertHtml(html, ms, options);
   File.WriteAllBytes(@"C:\testfile.pdf", ms.ToArray());
}


1st Run, ConvertHtml time: 8.4 seconds
2nd Run, ConvertHtml time: 7.4 seconds

Test File: Test File - Test File (mirror)

I am investigating "slowness" with generating very large PDFs from HTML. I was curious if you could guide me on any improvements I could make to reduce time. And possibly profile the library to see if there are any real bottlenecks or issues. A welcome enhancement to the library would be a way to output metrics (Elapsed time) of the actions during the conversion process.

This test file is a very basic file, it has no links, images or styles. The html we use in production are more complex with styles and images so their time is greater and we want to go bigger. We were getting around 18 seconds total time that the conversion takes but they also had to fetch js, css, and images and generally more complex. It would be great to reduce the time overall.

Thanks.
eo_support
Posted: Wednesday, August 6, 2014 8:30:21 PM
Rank: Administration
Groups: Administration

Joined: 5/27/2007
Posts: 24,229
Hi,

There are two things in the converter that contribute most in term of the total amount of time taken to convert an HTML file. One thing is to load all the resources, such as images, JavaScript etc. Another thing is post processing such as getting the coordination of all the elements, deciding where to insert page breaks, convert images data, etc. The performance would go much worse when your file gets significant bigger since it takes more memory and all the corresponding internal data structure gets bigger.

One thing you can try to do is to split the file into different chunks and then convert each chunk in a separate thread. Because almost all modern system has multiple CPU cores, this allows the OS to schedule these threads on different CPU cores and sometimes can have significant performance gain. After you have converted all the segments, you can call PdfDocument.Merge to merge them into a single document.

Hope this helps.

Thanks!


You cannot post new topics in this forum.
You cannot reply to topics in this forum.
You cannot delete your posts in this forum.
You cannot edit your posts in this forum.
You cannot create polls in this forum.
You cannot vote in polls in this forum.