AntennaHouse XSL Formatter - so close, and yet...

[originally posted on 4/7/09]

I almost but not quite found the solution to yesterday’s problem.

Starting with XSL Formatter version 3.x, the ability exists to inline EPS files in a PDF generated directly from XSL Formatter, eliminating the need to go through Distiller, and thus solving the problem of not having figure tagging in the PostScript output. Well, almost. There’s always a catch, isn’t there?

The 3.x+ EPS inclusion functionality relies on an external processing tool, either Ghostscript or Distiller, to turn the EPS into a PDF, which XSL Formatter then inlines into the PDF it’s outputting. Unfortuantely, XSL Formatter can’t do this with tagging turned on: PDF output error. (3106)
TaggedPDF fails because of importing pdf

So, despite being tantalizingly close to a solution, this one isn’t quite going to work.

The next step is going to be to see if the EPS can’t be manually inlined. A little hokey, I know, but less hokey than a pre- or post-processing step that would try to associate figures in the FO with objects in the PS or PDF output and tag them there.

Some notes on Antenna House’s XSL Formatter, PostScript, and related things

XSL Formatter from Antenna House is, as far as I can tell, the best XSL-FO renderer on the market today for dealing with multilingual content. RTL, Asian character sets, line breaking in non-Latin languages – XSL Formatter does them better than any of the other renderers on the market.

That’s not to say it’s perfect, however. My current challenge is to figure out a way to output Section 508-compliant PDF from XSL-FO using XSL Formatter, and it’s definitely a challenge.

Way back in the murky depths of time, when this tool that I’m working on was first designed, XSL Formatter was somewhere around version 2.4 and didn’t support inlining EPS images at full resolution – it would use the preview raster image, and that was the end of it. To get around this, you could export to PostScript instead of going directly to PDF, and then use Acrobat Distiller to turn the PS into PDF, which would incorporate the EPS at an arbitrary resolution of your specification.

Fast forward to today, past several XSL Formatter updates, and the client now has a pressing need to produce Section 508-compliant PDFs for online publication. As per Adobe, this is done via tagged PDF, which when done right produces a nicely-ordered set of objects in the PDF that have alt text tags associated with them. This, dear readers, is where the trouble starts.

XSL-FO doesn’t understand the concept of alt-text tags, but that’s OK – XSLFormatter has the axf:alttext attribute to compensate. You can then enable tags in your rendition config, either via the pdf-settings element in the XfoSettings.xml config file or the xfo.setPdfTag(true) if you’re going in via the API. And, when you output directly to PDF, all is well – you get lovely tagged figures that make your documents (more) usable for the visually impared.

But…we’re not rendering directly to PDF. Because of the EPS issue, we’re rendering to PostScript, and then using Distiller to go to PDF. This is a problem, because despite the options, XSL Formatter does not correctly output tagging information to PostScript. Using pdfmarks, adding tagged figures to PostScript shouldn’t be a problem. You can see an example of how to do it the hard way here. It doesn’t seem to matter what options you set or how your XSL-FO is structured – XSL Formatter just isn’t producing the expected information in the PS.

That’s the challenge; hopefully I’ll be able to post a solution here soon, one that isn’t “wait for Antenna House to fix broken PostScript renderer”.