Occasionally you may need to add table of content to an existing PDF file. This post describes a method to use HTML to PDF to add additional table of content page(s) at the beginning of the file. The exact structure/layout of the TOC would depend on your specific business scenario (for example, a book's TOC could only cover chapters, or chapters with section; A report's TOC could organize by topic) and is not covered by this discussion. For demonstration purpose, this post creates a simple TOC with one entry jump to each page. For example, if you have a PDF file with 20 pages, then the TOC would contain 20 links with the first link jump to page 1, the second link jump to page 2, and so on. The steps are outlined below:
1. Generate the TOC html
without filling in the link target. For example:
Code: HTML/ASPX
<p><a href="#">Page 1</a></p>
<p><a href="#">Page 1</a></p>
<p><a href="#">Page 1</a></p>
.....
For demonstration purpose it uses a P element to contain each A element. You may adjust the formatting as needed. All hrefs point to "#" as this is not important in this step;
2. Run the above code through HTML to PDF converter in order to get the total number of pages the TOC would take;
Code: C#
PdfDocument doc = new PdfDocument();
HtmlToPdf.ConvertHtml(tocHTML, doc);
int tocPageCount = doc.Pages.Count;
Getting the total number of pages used by the TOC is the sole purpose of step #1 and step #2;
3. Prepend tocPageCount of blank pages to the existing PDF file. The code would be something like this:
Code: C#
//Create a PDF with tocPageCount of blank pages
PdfDocument tocBlanks = PdfDocument();
for (int i = 0; i < tocPageCount; i++)
tocBlanks.Pages.Add();
//Load the existing PDF file
PdfDocument existingPDF = new PdfDocument(existing_pdf_file_name);
//Merge the two
PdfDocument result = PdfDocument.Merge(tocBlanks, existingPDF);
4. Reformat the TOC HTML with target anchor element. The HTML can be like this:
Code: HTML/ASPX
<!--Section 1: Import CSS --- do not remove -->
<style>
div
{
height: 20px;
page-break-after: always;
}
</style>
<!--Section 2: The same as step #1 but with href filled in -->
<a href="#page1">Page 1</a>
<a href="#page2">Page 2</a>
<a href="#page3">Page 3</a>
.......
<!--Section 3: target anchors -->
<a name="page1"></a><div></div>
<a name="page2"></a><div></div>
<a name="page3"></a><div></div>
......
Note section 3 with the DIVs. Each line of this part will be rendered into each page of the existing PDF file (because of page-break-after:always CSS style in section 1) and serve as the "target anchor" of the links in the TOC section.
5. Run the above HTML through the converter again with the following code:
Code: HTML/ASPX
HtmlToPdf.Options.StartPageIndex = 0;
HtmlToPdf.ConvertHtml(tocHTML, result);
This would render the TOC HTML produced in step #4 over to the existing PDF file with blank TOC pages produced in step 3. Once the TOC HTML is "overprinted" on the existing PDF file, you should have a functional TOC that jumps to each page.
Hope this helps.