Welcome Guest Search | Active Topics | Sign In | Register

Optimal way to merge PDF files via EO-PDF Options
Phil
Posted: Monday, August 28, 2023 4:44:18 AM
Rank: Advanced Member
Groups: Member

Joined: 11/8/2017
Posts: 66
Hi, we are merging multiple (chunked) PDF's into one file via EO-PDF (v23.1.77) - although the process succeeds in merging the PDF's, the process is taking a long time (when compared to something like iTextSharp). We are using the following code

Code: C#
IOrderedEnumerable<KeyValuePair<long, BlobandBytes>> orderedBandB = await DownloadBlobChunks(items).ConfigureAwait(false); //DownloadBlobChunks simply downloads each file from blob storage

byte[] allPDFBytes;
using (MemoryStream ms = new MemoryStream())
{
    var pdf = new PdfDocument();
    var docs = new List<PdfDocument>();
    PdfDocument reader = null;
    foreach (var bandb in orderedBandB)
    {                    
        if (bandb.Value.Bytes != null && bandb.Value.Bytes.Length > 0)
        {
            reader = new PdfDocument(new MemoryStream(bandb.Value.Bytes));
            docs.Add(reader);
        }
    }
    pdf = PdfDocument.Merge(docs.ToArray());
    pdf.Save(ms);
    allPDFBytes = await CopyToPageBlobBuffer(ms.ToArray()).ConfigureAwait(false); //allPDFBytes contains the merged PDF byte array file
}
async Task<byte[]> CopyToPageBlobBuffer(byte[] array)
{
    int size = array.Length + (512 - (array.Length % 512));     //page blobs must be in multiples of 512 bytes or the upload will fail
    byte[] buffer = new byte[size];

    await Task.Run(() => array.CopyTo(buffer, 0));

    return buffer;
}

The above, compared to iTextSharp was around 5 to 10 times slower - with the volume of merges that we require this accumulates into a big difference.

My question is, is the above the most efficient way to merge PDF files via EO-PDF ?

Kind regards
Phil
eo_support
Posted: Monday, August 28, 2023 10:19:07 AM
Rank: Administration
Groups: Administration

Joined: 5/27/2007
Posts: 24,217
Hi,

The main reason that our merge is slow is because we parse all the embedded font data. This is to ensure that no font data is lost (which causes text to be rendered differently after merge) or duplicated (which results in bigger PDF files). We also compare image data. For example, if both files have the same logo image, then only one copy of the image will be saved in the result file.

There is another way to merge PDF file that MAY result in better performance but it requires you to write into physical files:

https://www.essentialobjects.com/doc/eo.pdf.pdfdocument.merge_overload_5.html

This version performs a less "deep" merge that can result in bigger file but may be faster than merging an array of PdfDocument object. However because it can only merge two files at a time, it would require Merge to call many more times (at least log2). So it may still end up taking longer.

Thanks
Phil
Posted: Monday, August 28, 2023 11:20:06 PM
Rank: Advanced Member
Groups: Member

Joined: 11/8/2017
Posts: 66
OK, many thanks for the reply and explanation

We 'chunk' (break into smaller requests) a lot of our reports then aggregate (into one file) when all chunks have completed. We do this as generating one big report, results in exceptions (timeouts, memory, or other exceptions within EO-PDF). We do this on mass (especially at end of year). I can understand the need to consolidate the fonts for each chunk (our fonts/styles are common to all chunks) however as long as the eventual PDF is merged into one file, the size of the eventual file is not really a priority - speed is by far the priority as the files need to be delivered as soon as possible.

Is there any scope for a flag that allows skipping the consolidation phase of the PdfDocument.Merge that accepts an array/list of chunked PDF's ?
eo_support
Posted: Tuesday, August 29, 2023 9:18:06 AM
Rank: Administration
Groups: Administration

Joined: 5/27/2007
Posts: 24,217
Unfortunately there isn't such a flag in the current version. Is it possible for you to save your "chunk" in HTML instead and then just merge the HTML together and run the HTML to PDF converter in one run? This would be the most efficient way.


You cannot post new topics in this forum.
You cannot reply to topics in this forum.
You cannot delete your posts in this forum.
You cannot edit your posts in this forum.
You cannot create polls in this forum.
You cannot vote in polls in this forum.