Table of Contents
- Getting Started
- EO.Pdf
- Overview
- Installation and Deployment
- Using HTML to PDF
- Using HTML to PDF
- Quick Start
- Setting Page Size and Margins
- Resizing Output
- Output Paging
- Web Page Authentication
- HTTP Post and Headers
- JavaScript in HTML
- Conversion Trigger
- Links in HTML
- Page Header and Footer
- Partial Page Conversion
- Repeating Table Header and Footer
- Merging Multiple HTML pages
- Generating Page Images
- Using with PDF Creator
- Creating Interactive Form Elements
- Working with Secure Pages
- Using HtmlToPdfSession object
- Troubleshooting HTML to PDF
- Debugging HTML to PDF
- Using PDF Creator
- Working with Existing PDF Files
- Using in Web Application
- Advanced Topics
- EO.Web
- EO.WebBrowser
- EO.Wpf
- Common Topics
- Reference
Output Paging |
This topic includes the following sections:
Auto Paging
By default, the HTML to PDF converter automatically pages the contents. This is usually sufficient for most cases.
Manual Paging with CSS
You can explicitly insert a page break or avoid automatic page breaks in your HTML page with CSS. The following code places a page break before the DIV block:
<div style="page-break-before: always"> A page break will be inserted BEFORE this div because "page-break-before" is set to "always". </div>
The following code places a page break after the DIV block:
<div style="page-break-after: always"> A page break will be inserted AFTER this div because "page-break-after" is set to "always". </div>
The following code prevents the converter from breaking the DIV into multiple pages:
<div style="page-break-inside: avoid"> Not matter how much contents you place here, the converter will NOT break this DIV into multiple pages because "page-break-inside" is set to "avoid". </div>
A common requirement is to avoid pictures being split into multiple pages. You can add the following CSS rules into your page to achieve this:
img { page-break-inside: avoid; }
If the contents of an element with "page-break-inside: avoid" set exceed the current page, the exceeding portion is clipped off.
Note:: Custom paging does not support repeating table header and footer.
Custom Paging
EO.Pdf also allows you to implement your own custom paging logic. To perform custom paging, you must first use HtmlToPdfSession.CreatePaginator to create a Paginator object:
using (HtmlToPdfSession session = HtmlToPdfSession.Create(options)) { //Load the page to be converted session.LoadUrl(url); //Create a Paginator object Paginator paginator = session.CreatePaginator(); //Perform custom paging ..... //Use the custom paging result to render the PDF HtmlToPdfResult result = session.RenderAsPDF(paginator); //Save the result result.PdfDocument.Save(pdf_file_name); }
A Paginator object always contains both the current paging settings and results. The current paging settings consist of paging input information on each HTML node, with the HTML body element as the root node. The following code retrieve the root node:
//Retrieve the root HTML node
HtmlElement body = paginator.Document.Body;
The HtmlElement class exposes ChildNodes collection through which you can traverse all child nodes recursively. The following properties are used by the built-in paging algorithm:
-
If the node is a HtmlElement, then
- PageBreakMode contains the page break mode value derived from CSS settings;
- PageBreakRange contains the top and bottom position of the element;
-
If the node is a HtmlTextNode, then
- PageBreakLineRanges contains the top and bottom position of each text line;
The current paging result is available through the Paginator's Pages collection.
Follow these steps to implement your own custom paging logic:
- Examine the current paging settings (PageBreakMode, PageBreakRange and PageBreakLineRanges) and paging result (The Paginator's Pages collection);
- Modify the above paging settings if needed;
- Call PageInfo.PageAgain to run the built-in paging algorithm again. This method re-page the current page and all the pages after the current page. Once this method returns, the Paginator's Pages collection will contain the new result;
- Repeat the above steps as needed;
The following code disables all CSS page-break instructions on all DIVs of the second page and limit the page to 500 pixel maximum:
//Get all DIVs in the document HtmlElement[] divs = paginator.Document.GetElementsByTagName("DIV"); foreach (HtmlElement div in divs) { //Disable CSS page-break instructions if the DIV is on the second page if (div.Location.PageIndex == 1) div.PageBreakMode = PageBreakMode.None; } //Run built-in paging algorithm again with the new page break mode settings paginator.Pages[1].PageAgain(500);
Since the above code calls PageAgain on the second page, it only changes the page break position of the second page and all pages after the second page. It does not affect the first page's page break position.
Troubleshooting Paging Problems
This sections explains the paging process and various common problems that you may encounter during paging. The converter performs paging by:
- Scan the whole HTML to determine "unbreakable ranges". For example, if there are two lines of text with their Y position at 0 to 20 and 20 to 40 respectively, then 0 to 20 is one unbreakable range and 20 to 40 is another unbreakable range. This means paging can not occur between 0 to 20 or 20 to 40. For example, if paging were to occur at 30, then it would break the second line of text into multiple pages;
-
Text lines are automatically recognized as unbreakable ranges. You can use "page-break-inside:avoid"
to add additional unbreakable ranges. For example, consider the following HTML:
HTML
<div style="page-break-inside: avoid;position:absolute;top:0px;height:100px;width:200px;"> some contents </div>
-
Multiple overlapping unbreakable range can form a single larger unbreakable range. Consider the
following HTML:
HTML
<div style="page-break-inside: avoid;position:absolute;top:0px;height:100px;width:200px;"> some contents </div> <div style="page-break-inside: avoid;position:absolute;top:50px;height:100px;width:200px;"> some contents </div>
One common paging problem is certain styles in the HTML caused large unbreakable ranges unintentionally. Consider the following paragraph of text:
<p style="font-size:20px;line-height:15px;"> A long paragraph that contains many lines.... </p>
Normally each line of a paragraph would form a single unbreakable range, thus the converter will fill as many lines as possible in the current page, and when there is no room left on the current page for more lines, it advances to the next page and starts to position lines on the next page. However because the line-height for the above text is smaller than font size, so the area occupied by each text line would overlap with adjacent lines. This will cause their unbreakable ranges to combine into a single range that covers the whole paragraph. When this happens, the paragraph will not be divided into multiple pages.
When the converter encounters a large unbreakable range, it will always try to fit as much as possible. This means if the current page has less space available, the converter will try to start a new page in the hope that the new page will have more available space. This can cause undesired result. For example, it's common that a document reserves some extra white space before the first paragraph on the first page. If there is a large unbreakable range at the beginning of the document, the converter may move the whole block to the second page and leave a completely blank first page since the second page does not have the extra white space thus has more available space.
In all cases, when the converter encounters an unbreakable range that is larger than the page height, the contents will overflow to the footer area of the page until it's being cut off the page boundary. In this case contents that overflows to the footer area may overlaps with the footer, and contents beyond the page boundary will not be visible. To avoid such issues, examine both explicit page-break-inside:avoid CSS attribute and implicitly situations such as overlapping text lines.