|
Rank: Newbie Groups: Member
Joined: 2/23/2022 Posts: 9
|
Good day, I've been tasked with adding accessibility into our PDF generation. After researching existing threads, I've downloaded and installed the latest version that provides the GenerateTags property and functionality. However, nothing seemed to change after enabling this. I then created a tiny HTML page that I suspected with have its elements tagged appropriately: <html> <body> <h1>This is the header</h1> <h2>This is the subheader</h2> <p>This is a paragraph of text that would be considered the body of the message.</p> </body> </html> Using Adobe Acrobat for accessibility checking, it was reported that "Tagged PDF" failed and looking at the document elements, everything was rendered as a simple P element. Even so, the P elements were not tagged. Are there any guidelines to structuring the HTML to make use of the automatic tagging in EO.PDF? Thank you for any help you can provide.
|
|
Rank: Administration Groups: Administration
Joined: 5/27/2007 Posts: 24,217
|
Hi, Please download build 2023.1.77 from our download page and use the following code:
Code: C#
HtmlToPdf.Options.GenerateTags = true;
HtmlToPdf.ConvertUrl(url, pdf_file_name);
Alternatively, you can also replace pdf_file_name with a Stream object. The result PDF file should pass PDF/UA check. Note that the following code will NOT pass PDF/UA check:
Code: C#
HtmlToPdf.Options.GenerateTags = true;
PdfDocument doc = new PdfDocument();
HtmlToPdf.ConvertUrl(url, doc);
doc.Save(pdf_file_name);
The difference between these two methods is for option 2, it actually merges the HTML to PDF output with another PdfDocument (doc). Currently tag information are not carried over during the merge, which causes the result PDF file to lose tag information. Please let us know if this works for you. Thanks!
|
|
Rank: Newbie Groups: Member
Joined: 2/23/2022 Posts: 9
|
Thank you for the reply.
Your instruction does work. I modified the code to render to a stream instead of using a document and Adobe did report the PDF as tagged.
ms = New MemoryStream EO.Pdf.HtmlToPdf.ConvertHtml(html, ms, options) ms.Seek(0, SeekOrigin.Begin) doc = New EO.Pdf.PdfDocument(ms)
While that is a success, knowing that tags are lost anytime a merge is performed is very much a setback. We merge documents in many, many places. We append legal notices to the end of documents. We merge PDF documents into one combined document. We apply headers and footers after a multipage document is rendered. I don't feasibly see a way to build a single html document for a first-instance rendering.
Is there a roadmap for future enhancements regarding accessibility or should I submit a feature request?
|
|
Rank: Administration Groups: Administration
Joined: 5/27/2007 Posts: 24,217
|
Yes. This is the next thing we will be working on for the merge process.
|
|
Rank: Administration Groups: Administration
Joined: 5/27/2007 Posts: 24,217
|
Hi, We have sent a new build to you through private message that can retain tag information during merge, which would enable code like this:
Code: C#
HtmlToPdf.Options.GenerateTags = true;
PdfDocument doc = new PdfDocument();
HtmlToPdf.ConvertUrl(url, doc);
doc.Save(pdf_file_name);
Please take a look and let us know how it goes. Thanks!
|
|
Rank: Newbie Groups: Member
Joined: 2/23/2022 Posts: 9
|
Sorry for my delay in getting back to this topic. I have downloaded the build provided and tested it with our existing code. The result is much, much better. There are some failures that can only be handled with manual review: Logical Reading Order and Color Contrast, and only one that I think you could assist with: Tab Order.
In Acrobat, what you do is edit the page properties of each page in the PDF and in that dialog, it currently shows as "unspecified". You have some choices like: Use row order, Use column order, or Use document structure. That last option - Use document structure - would be sufficient as a default value.
Thank you for the advances in this feature, it's much appreciated.
|
|
Rank: Administration Groups: Administration
Joined: 5/27/2007 Posts: 24,217
|
Hi,
These might be Acrobat specific properties. Can you send us a test file with this property "properly set"? We will look into it and see what we can do. We may not be able to do anything if the properties are not standard, or if they are not sufficiently supported by the underlying Chromium engine (since many of the data are geneated by/gathered from the browser engine).
Thanks
|
|
Rank: Newbie Groups: Member
Joined: 2/23/2022 Posts: 9
|
The latest build provided via email link has set the page properties to Document Order in Acrobat and there is nothing left in the Accessibility checker that cannot be passed without manual checks. Thank you for your quick attention to this. I'm going to test out the changes on some more complicated documents and hopefully can submit them to our accessibility team for review.
|
|
Rank: Administration Groups: Administration
Joined: 5/27/2007 Posts: 24,217
|
Great thank you very much for the update!
|
|