|
Rank: Member Groups: Member
Joined: 2/21/2020 Posts: 10
|
Hi I use EO.PDF in my company, to convert web pages to PDF files. The size of the files is quite big, for example it uses 88KB for a file with only text (no images) that contains only one page. I tried to set the PdfDocument.EmbedFont option to false, but I don't see any difference in the file size. Maybe I'm using it wrong, here's the code I tested :
Code: C#
var stream = new MemoryStream();
var pdfDocument = new PdfDocument();
pdfDocument.EmbedFont = false;
var result = HtmlToPdf.ConvertHtml(html, pdfDocument, baseOptions);
pdfDocument.EmbedFont = false;
pdfDocument.Save(stream);
return stream.ToArray();
Another example, using HtmlToPdf.ConvertHtml(...) with only this HTML text: "<b>test</b>", generates a 32KB PDF file ! How to optimize this? What is the best practice? Best regards
|
|
Rank: Administration Groups: Administration
Joined: 5/27/2007 Posts: 24,217
|
Hi,
These are indeed font data. Whenever a new font is introduced, you will see a noticeable increase. However as you use more text, you won't see much increase at all. For example, you won't see much a difference when you render 'test' and a long paragraph of the same font.
EmbedFont only has very limited impacts on a very small set of fonts. The very reason that PDF is so popular is because it has font data embedded, so it can be rendered correctly on a target system even if the font used does not exist on the target system. Adobe however defined a very small set of fonts, so called "standard 14" that can be omitted from the PDF file. Only if a font used falls into those 14 fonts it can be omitted from the file.
In practice, a font rarely falls into these standard 14 fonts due to:
1. The "standard 14" was defined set decades ago and modern systems introduces many modern fonts it does not cover; 2. While it's called "standard 14", it really contains only several fonts because in fact each style of a font is counted as a different font. For example, "Arial", "Arial Italic", "Arial Bold", "Arial Bold Italic" are counted as 4 fonts; 3. Even for fonts that normally should falls into these 14, the font file can report a different name. For example, for the "Arial" font on Windows 10, the font file reports the "Post Script Font Name" as "ArialMT". This causes a mismatch and cause it to be excluded from standard 14 as well;
All these situations cause the font not to be omitted even when Embedded is set to false.
Hope this helps.
Thanks!
|
|
Rank: Member Groups: Member
Joined: 2/21/2020 Posts: 10
|
Okay, I understand the problem. So there aren't other fonts in "standard 14" that could be used to reduce the size of my PDF files? Times New Roman or Helvetica for example? If I write something like this :
Code: C#
var pdfDocument = new PdfDocument();
pdfDocument.EmbedFont = false;
var result = HtmlToPdf.ConvertHtml("<span style='font-family:Times New Roman,Helvetica'>test</span>", pdfDocument, baseOptions);
pdfDocument.Save(stream);
Best regards
|
|
Rank: Administration Groups: Administration
Joined: 5/27/2007 Posts: 24,217
|
Hi,
Those are about it --- but unfortunately in the current version of Windows 10 they all reported different font names. For example, "Times New Roman" is reported as "TimesNewRomanPSMT" in our test system. This would exclude it to be omitted.
Thanks
|
|
Rank: Member Groups: Member
Joined: 2/21/2020 Posts: 10
|
It's the same problem, if we use EO.PDF on a Windows Server machine (2016/2019) ?
Best regards
|
|
Rank: Administration Groups: Administration
Joined: 5/27/2007 Posts: 24,217
|
Hi,
It may or may not. Windows used to report "Arial" as "Arial" and "Times New Roman" as "Times New Roman". Somewhere along the way it has been changed --- due to the large number of Windows version/distribution along with numerous Windows updates patches on each of them, it is not possible for us to track down exactly when it has been changed on each platform. We could hard code in our code to treat "Arial" and "ArialMT" the same, but we do not know when MS is going to change it again, then we would be at this situation all over again. As such we decided not to do anything with it.
In my opinion you are probably chasing a ghost. These "standard 14" font set were introduced decades ago when PDF standard was first introduced. The actual concept was probably even much earlier (probably due to PostScript printers had these fonts built-in to reduce memory usage). As time goes by it carries less and less significance. An increase of 30K of the PDF file size may look significant to you but to many other users this is quite negligible, especially when the PDF file contains modern fonts and images/charts.
Thanks
|
|