XSLT gotchas: generate-id() collisions

Mixing ID generation schemes is always something you have to be careful about, but XSLT has a particular gotcha that you might run into if you’re got both externally-accessible cross-references that you need to propagate, and internally-generated cross-references that you use to make things like indexes and TOCs work.

More likely than not, you’re using the generate-id() function to get unique IDs for those index and TOC links. You’re also probably using author-maintained (or at least authoring-environment-maintained) ID attributes for the internal cross-references and external link destinations. However, an ID is an ID, and even if you’re using separate strategies to deal with them, if they’re in the same XML document you should check to see if your code is checking for collisions between those author-managed IDs and the IDs that the generate-id() function is producing.

The XSLT specification does actually call this out: “There is no guarantee that a generated unique identifier will be distinct from any unique IDs specified in the source document.” In my experience it’s rare that you have to deal with this, but it does happen occasionally. In my particular case, it came up when working with a source document that had been generated by the same XSLT processor being used to do the transform I was working on, and the XSLT processor in question (Saxon 6.5.5) exclusively uses the structure of the document being transformed to generate the IDs – it doesn’t appear to salt them in any way. The result was collisions between source document IDs, and IDs for different elements being generated with the generate-id() method.

The solution for me was to make sure and use a pseudorandom salt for any IDs generated with generate-id(); this was pretty easy, since I was already in the process of replacing all of the generate-id() calls in the system with a custom function anyway (which was necessitated by requirements that aren’t relevant here).

XSLT gotchas: generate-id() collisions

Coyote Logic Blog

Recent Posts