|
Rank: Advanced Member Groups: Member
Joined: 11/8/2017 Posts: 66
|
Hi, we are merging multiple (chunked) PDF's into one file via EO-PDF (v23.1.77) - although the process succeeds in merging the PDF's, the process is taking a long time (when compared to something like iTextSharp). We are using the following code
Code: C#
IOrderedEnumerable<KeyValuePair<long, BlobandBytes>> orderedBandB = await DownloadBlobChunks(items).ConfigureAwait(false); //DownloadBlobChunks simply downloads each file from blob storage
byte[] allPDFBytes;
using (MemoryStream ms = new MemoryStream())
{
var pdf = new PdfDocument();
var docs = new List<PdfDocument>();
PdfDocument reader = null;
foreach (var bandb in orderedBandB)
{
if (bandb.Value.Bytes != null && bandb.Value.Bytes.Length > 0)
{
reader = new PdfDocument(new MemoryStream(bandb.Value.Bytes));
docs.Add(reader);
}
}
pdf = PdfDocument.Merge(docs.ToArray());
pdf.Save(ms);
allPDFBytes = await CopyToPageBlobBuffer(ms.ToArray()).ConfigureAwait(false); //allPDFBytes contains the merged PDF byte array file
}
async Task<byte[]> CopyToPageBlobBuffer(byte[] array)
{
int size = array.Length + (512 - (array.Length % 512)); //page blobs must be in multiples of 512 bytes or the upload will fail
byte[] buffer = new byte[size];
await Task.Run(() => array.CopyTo(buffer, 0));
return buffer;
}
The above, compared to iTextSharp was around 5 to 10 times slower - with the volume of merges that we require this accumulates into a big difference. My question is, is the above the most efficient way to merge PDF files via EO-PDF ? Kind regards Phil
|
|
Rank: Administration Groups: Administration
Joined: 5/27/2007 Posts: 24,217
|
Hi, The main reason that our merge is slow is because we parse all the embedded font data. This is to ensure that no font data is lost (which causes text to be rendered differently after merge) or duplicated (which results in bigger PDF files). We also compare image data. For example, if both files have the same logo image, then only one copy of the image will be saved in the result file. There is another way to merge PDF file that MAY result in better performance but it requires you to write into physical files: https://www.essentialobjects.com/doc/eo.pdf.pdfdocument.merge_overload_5.htmlThis version performs a less "deep" merge that can result in bigger file but may be faster than merging an array of PdfDocument object. However because it can only merge two files at a time, it would require Merge to call many more times (at least log2). So it may still end up taking longer. Thanks
|
|
Rank: Advanced Member Groups: Member
Joined: 11/8/2017 Posts: 66
|
OK, many thanks for the reply and explanation
We 'chunk' (break into smaller requests) a lot of our reports then aggregate (into one file) when all chunks have completed. We do this as generating one big report, results in exceptions (timeouts, memory, or other exceptions within EO-PDF). We do this on mass (especially at end of year). I can understand the need to consolidate the fonts for each chunk (our fonts/styles are common to all chunks) however as long as the eventual PDF is merged into one file, the size of the eventual file is not really a priority - speed is by far the priority as the files need to be delivered as soon as possible.
Is there any scope for a flag that allows skipping the consolidation phase of the PdfDocument.Merge that accepts an array/list of chunked PDF's ?
|
|
Rank: Administration Groups: Administration
Joined: 5/27/2007 Posts: 24,217
|
Unfortunately there isn't such a flag in the current version. Is it possible for you to save your "chunk" in HTML instead and then just merge the HTML together and run the HTML to PDF converter in one run? This would be the most efficient way.
|
|