When an XML file is correctly formatted with proper syntax, it is said to be well formed.

Consider the following well formed XML document.


<?xml version="1.0" encoding="ISO-8859-1"?>
<books>
<book>
	<title>A Christmas Carol</title>
	<author>"Charles Dickens"</author>
	<genre>Fiction</genre>
	<year era="AD">1843</year>
	<pages>86</pages>
	<link><![CDATA[http://linktobook?link=1&data=2 ]]></link>
	</book>	
<book>
	<title>A Treatise on Government</title>
	<author>Aristotle</author>
	<genre>Non-Fiction</genre>
	<year era="BC">322</year>
	<pages>195</pages>
	<link><![CDATA[http://linktobook?link=2&data=3 ]]></link>
</book>	
<book>
	<title>A Thief in the Night</title>
	<author>E. W. Hornung</author>
	<genre>Short Stories</genre>
	<year era="AD">1905</year>
	<pages>182</pages>
	<link><![CDATA[http://linktobook?link=3&data=4 ]]></link>
</book>	
</books>

This document is considered well formed because it follows XML syntax rules. It has a root element, all elements are properly nested, and each element tag is properly closed. In addition, all attributes are in quotes, and the CDATA tag is correctly formatted.

When an XML file contains a DTD (document type definition) and adheres to that DTD, it is said to be valid. A valid XML file must also be well formed.

Consider the following valid XML document.


<?xml version="1.0" encoding="ISO-8859-1"?>
<!DOCTYPE note
[
<!ELEMENT book (title,author,genre,year,pages)>
<!ELEMENT title (#PCDATA)>
<!ELEMENT author (#PCDATA)>
<!ELEMENT genre (#PCDATA)>
<!ELEMENT year (#PCDATA)>
<!ELEMENT pages (#PCDATA)>
<!ATTLIST year era CDATA "AD">
<!ENTITY unknown "Unknown">
]>
<books>
<book>
	<title>A Christmas Carol</title>
	<author>Charles Dickens</author>
	<genre>Fiction</genre>
	<year era="AD">1843</year>
	<pages>86</pages>
	<link><![CDATA[http://linktobook?link=1&data=2 ]]></link>
</book>	
<book>
	<title>A Treatise on Government</title>
	<author>Aristotle</author>
	<genre>Non-Fiction</genre>
	<year era="BC">322</year>
	<pages>195</pages>
	<link><![CDATA[http://linktobook?link=2&data=3 ]]></link>
</book>	
<book>
	<title>A Thief in the Night</title>
	<author>E. W. Hornung</author>
	<genre>Short Stories</genre>
	<year era="AD">1905</year>
	<pages>182</pages>
	<link><![CDATA[http://linktobook?link=3&data=4 ]]></link>
</book>	
<book>
	<title>Some Book</title>
	<author>&unknown;</author>
	<genre>Non-Fiction</genre>
	<year era="BC">500</year>
	<pages>1000</pages>
	<link><![CDATA[http://linktobook?link=4&data=5 ]]></link>
</book>	
</books>

In this example, the DTD is included in the XML. It could just as easily be included as an external file.

<!DOCTYPE books SYSTEM "Books.dtd">

The rules of this DTD are as follows:
Each book element must contain the children title,author,genre,year,pages. They must also appear in that order.
All elements except 'link' may only contain parsable data. CDATA may not be used.
The element 'year' must have an attribute for 'era', and will default to "AD"

We also define a custom entity reference called "Unknown" which we can use freely in our XML with:

&unknown;