Welcome Guest Search | Active Topics | Sign In | Register

Performance while merging PDF files Options
CLXfeis
Posted: Wednesday, February 14, 2018 10:42:46 AM
Rank: Member
Groups: Member

Joined: 9/11/2017
Posts: 13
We have performance issues while merging more than 100 PDF files. Apparently the fastest way is an incremental merge if you want to keep memory usage low as well. But for that you need to have all the files saved on the disk.
The method used is
Code: C#
Merge(string fileName1, string fileName2);

Is it correct, that the other merge methods without filenames do a deep merge?
Why isn't there a method overload with memory stream objects? It would make things easier and more flexible. I need to save a generated PDF file first to disk in order to do incremental merges with other files.
eo_support
Posted: Thursday, February 15, 2018 1:31:52 AM
Rank: Administration
Groups: Administration

Joined: 5/27/2007
Posts: 24,221
Hi,

This is a valid point. We will look into it and see what we can do. However please keep in mind that one of the reason that the incremental merge saves memory is because it does not load the file contents into the memory at all --- the absolutely majority of the file content are left on the disk. So it may somewhat defeat the purpose to have incremental merge on MemoryStream object.

In the mean time you can try to use a binary approach to merge your files: merge 2 files every time: in every pass you would merge file #1 with #2, file #3 with #4, and so on. Then repeat this process until the total number of file is reduced to one. This should work faster than merging all 100 files at once.

Thanks!
CLXfeis
Posted: Thursday, February 15, 2018 2:23:20 AM
Rank: Member
Groups: Member

Joined: 9/11/2017
Posts: 13
Thank you for your hints. It's obvious that an approach with MemoryStream objects could lead to higher memory consumption. But even if you keep 1 or 2 PDF files and the resulting file in memory we could live with that.
Maybe you can provide a merge method like Merge(string fileName1, MemoryStream file2) as well?

Why should the binary approach be faster? I found out, that the fastest way is to have all files to be merged in an array of type PdfDocument and use the method Merge(params PdfDocument[] docs). But in this case the memory consumption is highest as well.
So far we used the same method but each time with two PdfDocuments only.
Simplified like this:
Code: C#
var pdfFile = new PdfDocument(stream);
pdfFile = PdfDocument.Merge(pdfFile, new PdfDocument(fileName1));
pdfFile = PdfDocument.Merge(pdfFile, new PdfDocument(fileName2));
...

As I wrote this old code can (luckily) be rewritten to use the incremental merge. I just need to save the first file (here as a stream) to disk.
eo_support
Posted: Thursday, February 15, 2018 12:51:10 PM
Rank: Administration
Groups: Administration

Joined: 5/27/2007
Posts: 24,221
Hi,

The reason that the binary approach might be faster is because of the scope of the search --- whenever a document is being merged into another document, a full search of both documents must be done in order to find things that can be merged (especially fonts). However depends on the document a single pass merge can be faster.

We will look into incremental merge of MemoryStream to see whether it does bring noticeable benefit because it will increase memory usage and it does not perform a full merge, which can result in bigger result file. So it may not be justifiable.

Thanks!


You cannot post new topics in this forum.
You cannot reply to topics in this forum.
You cannot delete your posts in this forum.
You cannot edit your posts in this forum.
You cannot create polls in this forum.
You cannot vote in polls in this forum.