The Article

Thoughts on XML Best Practices

XML has been around for a long time. It was built for and is mostly used for the transportation of data. XML’s main goal is storing data to eventually use it somewhere. I was recently working on an XML document to store a fairly small amount of information to use in small project I have. I ran into an issue, which led to me to start thinking of how we store the data inside an XML document. I started thinking of the best way to organize the data logically.

My Way

In my XML document, I was using a few attributes for some of my tags. To me, attributes describe and provide data about a tag; metadata. Just like an id is used for a div tag to give that specific tag some metadata, attributes in XML do the same thing. Here is an example of something similar to what I have:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
<?xml version="1.0" encoding="utf-8" ?>
<library>
 <book title="Designing for the Web" author="Mark Boulton">
  <chapter>Chapter 1</chapter>
  <chapter>Chapter 2</chapter>
  <chapter>Chapter 3</chapter>
  <chapter>Chapter 4</chapter>
  <chapter>Chapter 5</chapter>
 </book>
 <book title="Sexy Web Design" author="Elliot Jay Stocks">
  <chapter>Chapter 1</chapter>
  <chapter>Chapter 2</chapter>
  <chapter>Chapter 3</chapter>
  <chapter>Chapter 4</chapter>
  <chapter>Chapter 5</chapter>
 </book>
</library>

To me, this makes sense. I feel that the attributes title and author describe the tag book and the data for book are the list of chapters. It wouldn’t make sense to add a chapter attribute because it isn’t metadata about book itself; it is the data.

The Most Common Way

The most common way, and the way suggested by the W3C, is to make a tag for each piece of data. The argument is supported by three disadvantages of using attributes: attributes cannot contain multiple values, attributes cannot contain tree structures, and attributes are not easily expandable (from W3C). All of these reasons are valid. Here is an example using just tags:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
<?xml version="1.0" encoding="utf-8" ?>
<library>
 <book>
  <title>Designing for the Web</title>
  <author>Mark Boulton</author>
  <chapters>
   <chapter>Chapter 1</chapter>
   <chapter>Chapter 2</chapter>
   <chapter>Chapter 3</chapter>
   <chapter>Chapter 4</chapter>
   <chapter>Chapter 5</chapter>
  </chapters>
 </book>
 <book>
  <title>Sexy Web Design</title>
  <author>Elliot Jay Stocks</author>
  <chapters>
   <chapter>Chapter 1</chapter>
   <chapter>Chapter 2</chapter>
   <chapter>Chapter 3</chapter>
   <chapter>Chapter 4</chapter>
   <chapter>Chapter 5</chapter>
  </chapters>
 </book>
</library>

So which one?

Although I’m leaning more toward using attributes where it makes sense, I do not think it matters. Since XML is primarily used for the transportation of data, the organization makes sense both ways. It would be very similar to parse the XML in either solution.

There were a couple of things I noticed between the two ways. I used jQuery to parse the document and ran into issues with primes (′) inside of an attribute value. It rendered it as a single quote when I searched for the value and caused an open string in the jQuery that couldn’t be closed. Also, once I changed my XML to follow the W3C suggestions, I couldn’t find a way to check the value of a tag and continue on with my selector like I could with attributes. For example:

Using attributes:

1
$(xml).find("book[title='Sexy Web Design'] chapter").each( function() { //do work });

Using W3C’s suggestion:

1
2
3
4
5
6
$(xml).find("book title").each( function() {
 if ($(this).text() == "Sexy Web Design") {
  var theParent = $(this).parent();
  $("chapters chapter",theParent).each( function() { //do work });
 }
});

I’d like to know other developer’s thoughts on this. Also if you have any links on XML standards to share, feel free to do so. I will post them in the article for reference.

There are 1 comment on “Thoughts on XML Best PracticesLeave one yourself
  1. #1 R

    posted on Sep 18th 1:10 am

    Attributes for metadata sounds good…but who determines whether data is meta or not, and how? To me, title and author(s)/editor(s) of a book are very clearly data, not metadata. In fact, I can’t think of much data about a book that isn’t data (ISBN, publication year, number of pages, etc., are all reasonably viewed as data attributes). About the only thing that falls into the category of metadata in my mind is links or relations to other things (where to find it online or for sale, references to other editions, similar books, etc.) and since most of those are by nature potentially multiple things (which attributes can’t handle), it makes more sense to just make all of the data first-rate data.

    I use attributes in XML as a matter of pragmatic convenience, but they really just don’t work as some ideological division of data vs metadata, at least not for me.

Leave a comment

My name is , my email address is ,

and my website address is .

My response:

(Some HTML is allowed: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong> <pre lang="" line="" escaped="" highlight=""> )