Welcome Guest Search | Active Topics | Sign In | Register

Advice for merging thousands of PDFs Options
Mont
Posted: Wednesday, December 18, 2013 6:54:13 PM

Rank: Newbie
Groups: Member

Joined: 5/4/2011
Posts: 6
We are using EOPdf to build a report engine. If a user selects many "objects" a separate PDF will be generated for each object and then they are all merged at the end.

I'm looking for any advice on how to approach this other than a simple loop. We are already using PdfDocument.Merge(filename1, filename2);

We have noticed that as the target/final PDF grows the merges take longer.

Would it help to do merge in pairs? 1 & 2, 3 & 4, 5 & 6, ... and then repeat until there is a single file?

Is there a maximum file beyond which we simply shouldn't allow the PDF to grow?

Thanks,
-Mont
eo_support
Posted: Wednesday, December 18, 2013 8:17:28 PM
Rank: Administration
Groups: Administration

Joined: 5/27/2007
Posts: 24,196
Hi,

PdfDocument.Merge(fileName1, fileName2) is the fastest method. I believe doing it in pair would speed up the total process. There is no maximum file count restrictions because once two files are merged into one, they are the same as one bigger file. So the number of files shouldn't matter. Of course, as the file grows, it gets bigger and slower. In that regard, you might want to directly generate the final PDF file while you are generating PDF from your "objects".

Thanks!
Mont
Posted: Thursday, December 19, 2013 12:05:16 PM

Rank: Newbie
Groups: Member

Joined: 5/4/2011
Posts: 6
Could you explain directly generate the final PDF file while you are generating PDF from your "objects"? I don't understand.

It might help if I provided a little more detail. The reason we generate a PDF for each object is because theses PDFs can include images and therefore they can individually be quite large. The test I was running that prompted me to post was for 5000+ objects. The final PDF was over 1 GB before we had a failure. Unfortunately I don't know what caused the failure yet. The time to run that report was 5.5 hours. It took two hours to generate the individual PDFs. The next 2.5 hours was spent merging them. I know it either completed the merge or got very close. Unfortunately the failure caused the PDFs to be deleted.
eo_support
Posted: Thursday, December 19, 2013 1:08:41 PM
Rank: Administration
Groups: Administration

Joined: 5/27/2007
Posts: 24,196
Hi,

I am not exactly sure what you meant by "object". What I meant is, for example, if your "objects" are a piece of HTML code and you use our HTML to PDF converter to convert them to individual PDF files, then theoretically you can merge all HTML code pieces into a single HTML string and then call the converter the convert them all together. However now I know the size of your file and I am sure that won't work.

I don't believe it's a good idea to create a PDF file of that size. When you create files in GBs, you are probably pushing everything over the limits. In fact I am surprised that you managed to go as far as 1GB. So I would suggest you to devise some alternative ways, such as keep separate files in a folder and then create a master PDF file that would link to those files.

Thanks!
Mont
Posted: Thursday, December 19, 2013 3:30:22 PM

Rank: Newbie
Groups: Member

Joined: 5/4/2011
Posts: 6
The master/child idea is excellent, I hadn't thought of that.

Based on your experience do you have a recommended size limit? We may have to use the master/child selectively based on exceeding some size threshold.
eo_support
Posted: Thursday, December 19, 2013 3:53:40 PM
Rank: Administration
Groups: Administration

Joined: 5/27/2007
Posts: 24,196
Hi,

We can't really give you an recommendation on that. Everyone's system is different, and besides, everyone has different criteria on what would be acceptable from their business point of view ---- In fact I was quite surprised that a 55 minutes merge is acceptable to you. So you probably want to test different scenarios in order to find out a level that is comfortable to you.

Thanks!
Mont
Posted: Thursday, December 19, 2013 4:56:54 PM

Rank: Newbie
Groups: Member

Joined: 5/4/2011
Posts: 6
OK thanks, I appreciate the help and advice.
Mont
Posted: Friday, January 3, 2014 12:09:10 PM

Rank: Newbie
Groups: Member

Joined: 5/4/2011
Posts: 6
Merging in pairs is much faster for a large number of merges. For a test where we merge 1500 files we get a reduction from over 16 min to under 2.5 min. We are not yet multi-threading or doing anything else fancy. Simply looping through a set of files multiples times merging every two files until only one remains.
eo_support
Posted: Friday, January 3, 2014 2:54:29 PM
Rank: Administration
Groups: Administration

Joined: 5/27/2007
Posts: 24,196
Glad to hear that. Thank you very much for the update!


You cannot post new topics in this forum.
You cannot reply to topics in this forum.
You cannot delete your posts in this forum.
You cannot edit your posts in this forum.
You cannot create polls in this forum.
You cannot vote in polls in this forum.