|
Rank: Newbie Groups: Member
Joined: 3/31/2014 Posts: 2
|
Hi guys,
We are planning to use EO.PDF (we have a licence already) in a relatively high-load enterprise service.
However there is a concern about its memory footprint.
I have created a simple test console application (code will be given in the next post). Based on results of measurements it seems that there might be a memory leak.
My methodology:
1) use an HTML page with a mix of different content, e.g. angularjs.org 2) convert that HTML page to a local PDF file 3) repeat N iterations and log the memory used 4) use the latest EO.PDF via NuGet: <package id="EO.Pdf" version="5.0.72" targetFramework="net45" />
Configuration and results (re-formatted console dump):
C:\...onversionPerformanceTester\PdfConversionPerformanceTester\bin\Debug>PdfConversionPerformanceTester.exe Please enter the URL (default of 'http://angularjs.org'): Please enter the output path (default of 'C:\Temp\PdfPerformanceTest'): Please enter the number of iterations (default of '10'): 20 Please enter the PDF Converter (evo / eo) (default of 'evo'): eo Press any key to start...
1 (7 MB -> 96 MB, 15747 ms); 2 (96 MB -> 97 MB, 6091 ms); 3 (97 MB -> 98 MB, 11364 ms); 4 (98 MB -> 98 MB, 5802 ms); 5 (98 MB -> 99 MB, 5826 ms); 6 (99 MB -> 98 MB, 5769 ms); 7 (98 MB -> 98 MB, 5803 ms); 8 (98 MB -> 99 MB, 5774 ms); 9 (99 MB -> 99 MB, 5803 ms); 10 (99 MB -> 100 MB, 5946 ms); 11 (100 MB -> 99 MB, 5984 ms); 12 (99 MB -> 100 MB, 5834 ms); 13 (100 MB -> 100 MB, 5466 ms); 14 (100 MB -> 100 MB, 5920 ms); 15 (100 MB -> 100 MB, 5917 ms); 16 (100 MB -> 100 MB, 5789 ms); 17 (100 MB -> 101 MB, 5810 ms); 18 (101 MB -> 101 MB, 5936 ms); 19 (101 MB -> 101 MB, 5831 ms); 20 (101 MB -> 101 MB, 5888 ms); Press any key to exit...
Please let me know your thoughts. Is my measurement methodology flawed? Is there a memory leak? What can be done to fix it?
Thank you.
|
|
Rank: Newbie Groups: Member
Joined: 3/31/2014 Posts: 2
|
This is the code of the test application mentioned above. Thanks.
Code: C#
//Program.cs
public class Program
{
public static void Main(string[] args)
{
var url = GetAgumentFromConsole("URL", "http://angularjs.org");
var outputPath = GetAgumentFromConsole("output path", @"C:\Temp\PdfPerformanceTest");
var iterations = int.Parse(GetAgumentFromConsole("number of iterations", "10"));
var pdfConverterName = GetAgumentFromConsole("PDF Converter (evo / eo)", "evo");
Prompt("Press any key to start...");
Directory.CreateDirectory(outputPath);
var pdfConverter = CreatePdfConverter(pdfConverterName);
Console.WriteLine();
var sw = new Stopwatch();
for (int i = 0; i < iterations; i++)
{
Console.Write("{0} ({1} MB -> ", i + 1, GetCurrentProcessPrivateMemorySizeMb());
var fileName = string.Format("{0}_{1}.pdf", pdfConverterName, Guid.NewGuid());
sw.Restart();
pdfConverter.Convert(url, Path.Combine(outputPath, fileName));
sw.Stop();
GC.Collect(0);
GC.Collect(1);
GC.Collect(2);
Console.Write("{0} MB, {1} ms); ", GetCurrentProcessPrivateMemorySizeMb(), sw.ElapsedMilliseconds);
}
Console.WriteLine();
Prompt("Press any key to exit...");
}
private static long GetCurrentProcessPrivateMemorySizeMb()
{
return Process.GetCurrentProcess().PrivateMemorySize64 / (1024 * 1024);
}
private static void Prompt(string message)
{
Console.WriteLine(message);
Console.ReadKey(intercept: true);
}
private static IPdfConverter CreatePdfConverter(string pdfConverterName)
{
switch (pdfConverterName)
{
case "evo":
return new EvoPdfConverter();
case "eo":
return new EoPdfConverter();
default:
throw new ArgumentOutOfRangeException("pdfConverterName");
}
}
private static string GetAgumentFromConsole(string argumentName, string defaultValue)
{
Console.WriteLine("Please enter the {0} (default of '{1}'):", argumentName, defaultValue);
var input = (Console.ReadLine() ?? string.Empty).Trim();
if (string.IsNullOrEmpty(input))
return defaultValue;
return input;
}
}
//IPdfConverter.cs
public interface IPdfConverter
{
void Convert(string url, string localPath);
}
//EoPdfConverter.cs
public class EoPdfConverter : IPdfConverter
{
public void Convert(string url, string localPath)
{
HtmlToPdf.ConvertUrl(url, localPath);
}
}
//EvoPdfConverter.cs
public class EvoPdfConverter : IPdfConverter
{
public void Convert(string url, string localPath)
{
//Code omitted as probably not relevant, we were testing some other frameworks for comparison.
}
}
|
|
Rank: Administration Groups: Administration
Joined: 5/27/2007 Posts: 24,196
|
Hi,
We are not aware of any memory leaks. We do maintain some internal caches (for example, custom font cache), so it's possible that you see memory rise at the beginning. However if you run your test for an extended period of time (for example, an hour), you should see the memory usage going up and down instead of just going up steadily. Additionally, because .NET uses garbage collector, sometimes the memory are not freed even if they are no longer used. GC.Collect triggers a garbage collection, but the actual work is done in a separate thread, which makes it harder to precisely measure memory usage.
Thanks!
|
|
Rank: Member Groups: Member
Joined: 8/17/2012 Posts: 22
|
Hi
I am experiencing this memory leak too. I have noticed it creep in only in the last few versions.
I explicitly call GC.Collect after I use EO.PDF, and it still leaks by about 10 MB per PDF generation.
I have to manually reset the application that uses EO.PDF every few days because the memory load will get as high as 1 GB and starts to interfere with other processes.
It's hard to track down (and even attribute) memory leaks and so I don't expect an immediate solution to this. It may even be a coincident update to .Net that happened this year too.
I just wanted others to be mindful of this so that we have more leads to find the cause.
Many thanks
Todd
|
|
Rank: Administration Groups: Administration
Joined: 5/27/2007 Posts: 24,196
|
Hi Todd,
Thank you much for the information. We haven't been able to have a concrete reproducing sample on this yet. There are various reasons that might cause memory leak. Some leak was caused by the JavaScript code, while some others can be caused by the converter. Even for the converter there might be different cases. For example, recently we have fixed a leak related to custom font, which only occurs when the page uses custom font. As such it is very important for us to have a reproducing case on this first. So if you happen to be able to isolate it please feel free to send it to us and we will be very happy to investigate further.
Thanks!
|
|
Rank: Member Groups: Member
Joined: 8/17/2012 Posts: 22
|
Hi I have tracked this down to Quote: HtmlToPdf.Options.GeneratePageImages = true;
When set to false, the memory leak goes away. I have tested this generating hundreds of PDFs in tight recursion and with true, the residual memory is about 50 MB. When false, the residual memory climbs past 300 MB easily and does not come down. This is even after garbage collection and thread killing. I hope that this helps. Thanks Todd
|
|
Rank: Member Groups: Member
Joined: 8/17/2012 Posts: 22
|
Hi Another clue is that when I set this in the document generation options (as opposed to the static class), the leak is (oddly) much reduced. Quote: htmlToPdfOptions = new HtmlToPdfOptions(); htmlToPdfOptions.GeneratePageImages = true; HtmlToPdf.ConvertHtml(html, pdfDocument, htmlToPdfOptions)
There may be something about the way that this data is stored in the class that may be at fault, but at least I have found a solution in my own case. I hope that this helps. Regards Todd
|
|
Rank: Administration Groups: Administration
Joined: 5/27/2007 Posts: 24,196
|
Hi,
Thanks for the update. We have investigated this issue. What you observed makes sense and should not be an issue.
The "memory leak" when you use the static HtmlToPdfOptions is the result images that are made accessible to you through HtmlToPdf.Result.PageImages. Because HtmlToPdf.Result is a static property, it is always accessible to your code and will never be garbage collected. The images will be de-referenced and garbage collected when you perform another conversion in the same thread, at which point HtmlToPdf.Result.PageImages will be updated and then the old images will be released.
The workaround you found is a good workaround. It bypassed the static global HtmlToPdfOptions object and uses your own local options, thus in this case the images are referenced by your own local HtmlToPdfOptions object. This would release the images when your own copy of HtmlToPdfOptions is released.
Thanks
|
|
Rank: Member Groups: Member
Joined: 8/17/2012 Posts: 22
|
Hi
I knew you'd solve it!
So in many ways, people reading this should understand that this is more of a design feature that they should be careful of, especially in multi-threaded applications (like mine).
This explanation makes sense in my case. Given that our application would run up to 100 threads, if these images (eg 10MB) were stored statically, then the total storage would grow up to 100 x 10 = 1GB eventually ... and so the static class's memory overhead seemed like a leak, but it was just dutifully storing images as required.
Thanks for your hard work. I'll make changes to my code to accommodate this behaviour.
Todd
|
|
Rank: Administration Groups: Administration
Joined: 5/27/2007 Posts: 24,196
|
You are very welcome. Please feel free to let us know if there is anything else.
Thanks!
|
|