Welcome Guest Search | Active Topics | Sign In | Register

Accessibility tags not carried over in PDF - GenerateTags=true Options
Phil
Posted: Monday, February 24, 2025 4:36:15 AM
Rank: Advanced Member
Groups: Member

Joined: 11/8/2017
Posts: 68
Hello, I'm using the latest EO-PDF version (25.0.86) and using NVDA as my screen reader

I'm trying to generate a PDF with tags from the original HTML as follows (note the GenerateTags option is set to true

Code: C#
HtmlToPdf.Options.GenerateTags = true;
byte[] bytes = null;
HtmlToPdfResult htmlToPdfResult = null;
using (System.IO.MemoryStream stream = new System.IO.MemoryStream())
{
    htmlToPdfResult = HtmlToPdf.ConvertHtml(inputHtml, stream);  //**inputHtml is assigned to the HTML below
    bytes = stream.ToArray();
}
var reader = new PdfDocument(new MemoryStream(bytes));
var pdfPathFileName = $"{pdfPathname}\\{pdfFilename}.pdf"; //**pdfPathname, pdfFilename assigned as appropriate
reader.Save($"{pdfPathFileName}");


The input html is as follows

Code: HTML/ASPX
<!doctype html>
<html lang="en">
<head></head>
<body>
	<br />Start of heading markup via div, role and aria-level
	<div role="heading" aria-level="1">
		This is marked heading level 1 via role and aria-level 1
	</div>
	Testing 1-2-3...
	<div role="heading" aria-level="2">
		This is marked heading level 2 via role and aria-level 2
	</div>
	Testing 1-2-3...
	<div role="heading" aria-level="3">
		This is marked heading level 3 via role and aria-level 3
	</div>
	<br />End of heading markup via div, role and aria-level

	<br />========

	<br />Start of heading markup via h1, h2, h3
	<h1>
		This is marked heading level 1 via h1
	</h1>
	Testing 1-2-3...
	<h2>
		This is marked heading level 2 via h2
	</h2>
	Testing 1-2-3...
	<h3>
		This is marked heading level 3 via h3
	</h3>
	<br />End of heading markup via h1, h2, h3
</body>
</html>


However, when the resultant PDF is generated and read out by the screen reader, it doesn't really line-up - only some of the headings are read out and they don't necessarily align with the levels specified in the markup.

Note that if I use the screen reader against the original HTML (above) via a web browser, what is read out is as expected

Is there something further I need to do - can you help ?
eo_support
Posted: Monday, February 24, 2025 10:18:21 PM
Rank: Administration
Groups: Administration

Joined: 5/27/2007
Posts: 24,298
Hi,

We rely on the Chromium browser engine to render the tags properly so we do not have a lot of controls. As a test, you can open the same HTML file with Google Chrome, then choose Print -> PDF file to save the HTML as a PDF file and see if the screen reader can read it correctly. The Print to PDF file feature generates the same tags as setting HtmlToPdf.Options.GenerateTags to true.

We are not familiar with NVDA. However if we use Adobe Reader's Read out loud feature (Menu -> View -> Read out loud -> Activate read out loud), whether to have tags on makes a difference. Without tags Adobe Reader may not know exactly where a block of text starts or end. However with tag information it can read out every line accurately. If you can share more details the exact steps you use NVDA to read PDF files and exactly what you meant by "they don't align with the level specified in the markup", we can try it here and see what we can find.

Thanks!
Phil
Posted: Tuesday, February 25, 2025 7:22:59 AM
Rank: Advanced Member
Groups: Member

Joined: 11/8/2017
Posts: 68
Thanks for the reply - through the browser (Chrome) the screen reader (NVDA) didn't interpret the PDF correctly - however I found that opening the PDF in Adobe Acrobat Reader the screen reader interpreted the PDF correctly (including reading the text, picking up landmarks such as headings, tables, etc, with their shortcuts)

My question is, do you know whether this is the way it is normally done - i.e. say the PDF is generated (and hence cannot be manually tagged as the content is dynamic and immediately viewed in real time), is this the way PDF's are normally viewed when accessibility functionality is required (i.e. generated then opened not through a Chrome browser but though say Adobe PDF Reader) ?

In answer to your question "If you can share more details the exact steps you use NVDA to read PDF files and exactly what you meant by "they don't align with the level specified in the markup", we can try it here and see what we can find." - (a) as mentioned we have been opening the EO-PDF (GenerateTags=true) PDF's in the Chrome browser and NVDA interpret these correctly but by opening in Adobe Acrobat Reader, NVDA did a much better job (b) the level specified meaning (going back to the original HTML posted) was referring to the ARIA tagging heading/levels as in say

Code: HTML/ASPX
<div role="heading" aria-level="2">
	This is marked heading level 2 via role and aria-level 2
</div>

...the screen reader wasn't getting the aria-level correct when viewing the PDF in Chrome
eo_support
Posted: Wednesday, February 26, 2025 10:49:22 AM
Rank: Administration
Groups: Administration

Joined: 5/27/2007
Posts: 24,298
Hi,

PDF is open standard so it supposes to be able to viewed correctly by any standard conformed reader, with browser's built-in PDF reader and Adobe Reader being the most commonly used ones. However in reality not all readers are created equal, just like not all browsers display the same HTML page the same way. Adobe Reader is still recognized the "standard" reader for PDF files, especially considers that Adobe created the PDF format.

We do correctly generated the corresponding tags based on arial-level. This would map to the S attribute of the StructElem dictionary inside the PDF. For example, for the above DIV with role set to "heading" and aria-level set to 2, the S value would be "H2". We are not sure how the PDF Reader (For example, chromium's built-in reader or PDF Reader) would expose this information to the screen reader (in your case NVDA) though. We would guess that Adobe Reader does a better job than Chromium's built-in reader on this step.

Thanks!



You cannot post new topics in this forum.
You cannot reply to topics in this forum.
You cannot delete your posts in this forum.
You cannot edit your posts in this forum.
You cannot create polls in this forum.
You cannot vote in polls in this forum.