Welcome Guest Search | Active Topics | Sign In | Register

PDF Accessibility and Tagged PDFs Options
David A
Posted: Monday, April 17, 2023 12:08:35 PM
Rank: Newbie
Groups: Member

Joined: 2/23/2022
Posts: 9
Good day,

I've been tasked with adding accessibility into our PDF generation. After researching existing threads, I've downloaded and installed the latest version that provides the GenerateTags property and functionality. However, nothing seemed to change after enabling this.

I then created a tiny HTML page that I suspected with have its elements tagged appropriately:

<html>
<body>
<h1>This is the header</h1>
<h2>This is the subheader</h2>
<p>This is a paragraph of text that would be considered the body of the message.</p>
</body>
</html>

Using Adobe Acrobat for accessibility checking, it was reported that "Tagged PDF" failed and looking at the document elements, everything was rendered as a simple P element. Even so, the P elements were not tagged.



Are there any guidelines to structuring the HTML to make use of the automatic tagging in EO.PDF? Thank you for any help you can provide.
eo_support
Posted: Tuesday, April 18, 2023 10:51:35 AM
Rank: Administration
Groups: Administration

Joined: 5/27/2007
Posts: 24,217
Hi,

Please download build 2023.1.77 from our download page and use the following code:

Code: C#
HtmlToPdf.Options.GenerateTags = true;
HtmlToPdf.ConvertUrl(url, pdf_file_name);

Alternatively, you can also replace pdf_file_name with a Stream object.

The result PDF file should pass PDF/UA check. Note that the following code will NOT pass PDF/UA check:
Code: C#
HtmlToPdf.Options.GenerateTags = true;
PdfDocument doc = new PdfDocument();
HtmlToPdf.ConvertUrl(url, doc);
doc.Save(pdf_file_name);

The difference between these two methods is for option 2, it actually merges the HTML to PDF output with another PdfDocument (doc). Currently tag information are not carried over during the merge, which causes the result PDF file to lose tag information.

Please let us know if this works for you.

Thanks!
David A
Posted: Tuesday, April 18, 2023 1:28:37 PM
Rank: Newbie
Groups: Member

Joined: 2/23/2022
Posts: 9
Thank you for the reply.

Your instruction does work. I modified the code to render to a stream instead of using a document and Adobe did report the PDF as tagged.

ms = New MemoryStream
EO.Pdf.HtmlToPdf.ConvertHtml(html, ms, options)
ms.Seek(0, SeekOrigin.Begin)
doc = New EO.Pdf.PdfDocument(ms)

While that is a success, knowing that tags are lost anytime a merge is performed is very much a setback. We merge documents in many, many places. We append legal notices to the end of documents. We merge PDF documents into one combined document. We apply headers and footers after a multipage document is rendered. I don't feasibly see a way to build a single html document for a first-instance rendering.

Is there a roadmap for future enhancements regarding accessibility or should I submit a feature request?
eo_support
Posted: Tuesday, April 18, 2023 1:37:45 PM
Rank: Administration
Groups: Administration

Joined: 5/27/2007
Posts: 24,217
Yes. This is the next thing we will be working on for the merge process.
eo_support
Posted: Monday, April 24, 2023 2:08:37 PM
Rank: Administration
Groups: Administration

Joined: 5/27/2007
Posts: 24,217
Hi,

We have sent a new build to you through private message that can retain tag information during merge, which would enable code like this:

Code: C#
HtmlToPdf.Options.GenerateTags = true;
PdfDocument doc = new PdfDocument();
HtmlToPdf.ConvertUrl(url, doc);
doc.Save(pdf_file_name);


Please take a look and let us know how it goes.

Thanks!
David A
Posted: Wednesday, May 10, 2023 2:19:50 PM
Rank: Newbie
Groups: Member

Joined: 2/23/2022
Posts: 9
Sorry for my delay in getting back to this topic. I have downloaded the build provided and tested it with our existing code. The result is much, much better. There are some failures that can only be handled with manual review: Logical Reading Order and Color Contrast, and only one that I think you could assist with: Tab Order.

In Acrobat, what you do is edit the page properties of each page in the PDF and in that dialog, it currently shows as "unspecified". You have some choices like: Use row order, Use column order, or Use document structure. That last option - Use document structure - would be sufficient as a default value.

Thank you for the advances in this feature, it's much appreciated.
eo_support
Posted: Wednesday, May 10, 2023 2:58:02 PM
Rank: Administration
Groups: Administration

Joined: 5/27/2007
Posts: 24,217
Hi,

These might be Acrobat specific properties. Can you send us a test file with this property "properly set"? We will look into it and see what we can do. We may not be able to do anything if the properties are not standard, or if they are not sufficiently supported by the underlying Chromium engine (since many of the data are geneated by/gathered from the browser engine).

Thanks
David A
Posted: Monday, May 22, 2023 2:25:54 PM
Rank: Newbie
Groups: Member

Joined: 2/23/2022
Posts: 9
The latest build provided via email link has set the page properties to Document Order in Acrobat and there is nothing left in the Accessibility checker that cannot be passed without manual checks. Thank you for your quick attention to this. I'm going to test out the changes on some more complicated documents and hopefully can submit them to our accessibility team for review.
eo_support
Posted: Monday, May 22, 2023 4:39:20 PM
Rank: Administration
Groups: Administration

Joined: 5/27/2007
Posts: 24,217
Great thank you very much for the update!


You cannot post new topics in this forum.
You cannot reply to topics in this forum.
You cannot delete your posts in this forum.
You cannot edit your posts in this forum.
You cannot create polls in this forum.
You cannot vote in polls in this forum.