Archives

Anticipation

  • No dates present

Excessive whitespace in XmlWriter

I’ve had this post kicking around in my drafts folder for a while now, and hey, it’s only been a year or two since my last post. My how time flies.

This is just a short bit of utility code I felt like sharing in case anyone else runs into the same problem I did.

The issue was with XmlWriter in the .NET Framework 2.0-3.5; I think the problem is still present in 4.0 and later but I don’t recall if I’ve tested this explicitly or not. One of the useful features of XmlWriter is that you can set the Indent property to true in order to make the resulting XML more human-readable instead of all smooshed together on one line.

Most of the time, this doesn’t cause any issues, because like HTML, whitespace is generally considered either insignificant or collapsible, so the extra spaces can be ignored. There is one exception to this rule, however: an element can have the xml:space="preserve" attribute applied to it, which means that all whitespace within that element is significant and must be preserved entirely intact (and thus must not be touched in any way by reformatting for humans).

Unfortunately, this isn’t always the case. When the XmlWriter is told to indent the code for humans, for some reason it ignores the xml:space="preserve" property unless the element already contains some whitespace that the parser has previously identified as significant. This means that it will often change the whitespace in unintended ways, which can cause problems for later use.

Fortunately, there’s a fairly simple workaround: you can explicitly give it some “significant” whitespace, without actually changing the content of the element when saved to a file. The magic is like so:

/// <summary>Works around a bug in <c>XmlWriter</c>; when using Indent=true it will still insert
/// whitespace into elements marked with xml:space="preserve", unless they already contain
/// whitespace.  This method inserts some zero-length whitespace, which avoids the bug without 
/// changing the final document content.</summary>
/// <param name="document">The document to update for whitespace preservation.</param>
public static void PreserveWhitespace(XmlDocument document)
{
    PreserveWhitespace(document.DocumentElement);
}
 
/// <summary>Works around a bug in <c>XmlWriter</c>; when using Indent=true it will still insert
/// whitespace into elements marked with xml:space="preserve", unless they already contain
/// whitespace.  This method inserts some zero-length whitespace, which avoids the bug without
/// changing the final document content.</summary>
/// <param name="element">The element at which to start searching for elements that should 
/// be preserved (typically the DocumentElement).</param>
public static void PreserveWhitespace(XmlElement root)
{
    if (root == null) { return; }
    var nsmgr = new XmlNamespaceManager(root.OwnerDocument.NameTable);
    foreach (var element in root.SelectNodes("//*[@xml:space='preserve']", nsmgr).OfType<XmlElement>())
    {
        if (element.HasChildNodes && !(element.FirstChild is XmlSignificantWhitespace))
        {
            var whitespace = element.OwnerDocument.CreateSignificantWhitespace("");
            element.InsertBefore(whitespace, element.FirstChild);
        }
    }
}

Simply call this method on your XmlDocument (or some sub-element, if you know that you only need space preservation in a small section) after you have made any updates to the document that you wish, but before you actually write it out. That’s all there is to it.

Leave a Reply

You can use these HTML tags

<a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <s> <strike> <strong>