Open topic with navigation
Text in PDF files sometimes contains embedded fonts. If you experience difficulties filtering embedded fonts, there are options in the API, the
formats.ini file, and the filter sample program that enable you to skip this type of text.
If you skip embedded fonts, none of the content that contains embedded fonts is included in the output.
When you use
formats.ini to skip embedded fonts, you can also specify an embedded font threshold, which is an arbitrary percentage probability that the glyph in the embedded text maps to a character value in the output character set (ASCII, UTF-8, and so on).
For example, if you specify a threshold of
75, embedded text glyphs that have a 75% or greater probability of correctly matching the character in the output character set are included in the output; glyphs that have a probability of less than 75% of matching the output character set are omitted from the output.
To skip embedded fonts by using the formats.ini file
Set the following parameters:
threshold is a value between
100. A threshold of
100 skips all embedded font text; a threshold of
0 retains all embedded font text. Set
TRUE to enable the
The default value of
100. if you set
TRUE and do not specify the
embedded_font_threshold parameter, Filter skips all embedded text.
To skip embedded fonts by using the
invoke skip_embedded_fonts(true) on a
Configuration object. See The Configuration Class and skip_embedded_fonts for more information.