Basic information about XML format

This article doesn't claim for a completeness and literacy of presentation, and supplies only the most general and basic information on the HTML format device, for those who not absolutely understands that this such. XML (English eXtensible Markup Language — an extensible markup language) — the text format intended for storage of structured data. The XML files (as a rule files about *.xml extension) shall satisfy to two criteria for incorrect operation with them different software:

  • All data shall be correctly described, it implies absence of violations in syntax of a marking of the file. There is a set of a different software and a web of the services which are realizing check of XML of syntax as there is a set of recommendations about XML/design
  • the file and name Structure and also type of values of attributes shall correspond to the software specification working with these files.

As to the first point then use of XML editors with syntax check (for example" <is recommended="" to="" oxygen=""></is> XML Editor": http://www.oxygenxml.com/), and as reading at least this article. As to the second point, all that this observance of the appropriate specifications on a site is required.

Syntax of XML

XML — is the hierarchical structure described in a text format intended for storage of any data. Visually the structure can be provided as a tree of elements. The elements XML are described by tags. Let's consider an example of the simple culinary recipe labeled by means of XML:

 1<!--? xml version = "1.0" encoding = "UTF-8"?-->
 2<recipe name="preptime" "bread"="5" cooktime="180">
 3  <title> Simple bread </title>
 4  <ingredient amount="3" unit="glass"> Flour </ingredient>
 5  <ingredient amount="0.25" unit="gram"> Yeast </ingredient>
 6  <ingredient amount="1.5" unit="glass"> Warm water </ingredient>
 7  <ingredient amount="1" unit="teaspoon"> Salt </ingredient>
 8  <instructions>
 9   <step> to Mix all ingredients and carefully to knead. </step>
10   <step> to Close a fabric and to leave at one o'clock in warm location. </step>
11   <!-- - <step--> to Read the yesterday's newspaper.  is a doubtful step...-&gt;
12   <step> to Knead once again, to suppose on a baking sheet and to deliver in an oven. </step>
13  </instructions>
14</recipe>

XML declaration

The first line of the XML document is called XML declaration (English XML declaration) — it is a line specifying the XML version. In version 1.0 declaration of XML can be lowered, in version 1.1 it is mandatory. Also here the character encoding and existence of external dependences can be specified.

1<!--? xml version = "1.0" encoding = "UTF-8"?-->

The specification requires, that XML processors surely supported UTF-8 and UTF-16 Unicode coding (UTF-32 isn't mandatory). Admit admissible, are supported and are widely used (but aren't mandatory) other codings based on the ISO/IEC 8859 standard, are also admissible other codings, for example, Russian Windows-1251, KOI-8. Often in tags essentially use - Latin letters, in this case UTF-8 is very convenient coding — volume, as a rule, less, than in case of UTF-16; decoding can be executed both for all document, and for specific attributes and texts; all document doesn't contain the illegal characters in analysis attempt with the wrong coding.

Root element

The major mandatory syntax requirement is that the document has only one root element (English root element) (also sometimes a called element of the document (English document element)). It means that the text or other data of all document shall be located between the single beginning root tag and a finite tag corresponding to it.
The following elementary example — correctly constructed XML document:

1<!--? xml version = "1.0"?-->
2<book> Is the book: "Book" </book>

The following example isn't an incorrect XML document because has two root elements:

1<!--? xml version = "1.0"?-->
2<!-- - ATTENTION! Incorrect XML!--->
3<thing> Entity No. 1 </thing>
4<thing> Entity No. 2 </thing>

Comment

In any place of a tree the element comment can be placed. XML comments take place in the special tag beginning with characters <!-- - and coming to an end with characters--->. Two signs a hyphen (-) in the comment can't be present.

1<!-- - It is the comment.--->

Tags in the comment shan't be processed.

Tags and elements

The remaining part of this XML document consists of nested elements, some of which have attributes and contents. The element normally consists of the opening and closing tags framing the text and other elements. The opening tag consists of member name in angle brackets, for example, <step>, and the closing tag consists from this a name in angle brackets, but before a name the virgule, for example, is still added. Names of elements, as well as names of attributes, can't contain gaps, but can be in any language supported by the coding of the XML document. The name can begin with a letter, underlining, a colon. The same characters, and also digits, a hyphen, a point can be remaining characters of a name.
Everything is called as element contents (English content) that is located between opening and closing tags, including the text and other (nested) elements. It is given an example a XML element which contains opening tag, closing tag and element contents below:

1<step> element Contents </step>

It is necessary to mark that as a rule as contents of an element appears one or several other elements, for example:

1<grpoup>
2  <node> Chapter 1 </node>
3  <node> Chapter 2 </node>
4  <grpoup>
5    <node> Chapter 3 Part 1я </node>
6    <node> Chapter 3 Part 2я </node>
7  </grpoup>
8</grpoup>
</step>

Often elements have no contents, in this case the closing tag can be combined with opening, in that case the virgule is put at the end of a tag, for example, <by step=""></by>. Empty elements are often applied in this case semantic loading bears not contents of an element and its attributes (about it below). As one more example, we will give the equivalent records of 3 empty elements:

1<node>
2</node>
3
4<node></node>
5
6<node />

Attributes of an element

Except the contents the element can have attributes — pairs name value added in opening tag after the name of an element. Values of attributes are always quoted (unary or double), the same name of attribute can't meet twice in one element. It is not recommended to use different types of quotes for values of attributes of one tag.

 1<ingredient amount="3" unit="glass"> Flour </ingredient>
 

In the given example the element "ingredient" has two attributes: «amount» important «3», and «unit», important "glass". From the point of view of the XML marking, the given attributes don't bear any sense, and are simply character set.
Except the text, the element can contain other elements:

1<food name="Bread">
2  <ingredient amount="3" unit="glass"> Flour </ingredient>
3  <ingredient amount="1" unit="glass"> Water </ingredient>
4  <ingredient amount="0.5" unit="teaspoon"> Salt </ingredient>
5</food>

In this case the element of "Food" contains three elements of "ingredient".
XML doesn't allow being superimposed elements. For example, the fragment given below is incorrect, as the elements "em" and "strong" are superimposed.

1<!-- - ATTENTION! Incorrect XML!--->
2<p> Normal <em> accented <strong> selected and accented </strong></em><strong> selected </strong> </p>

Inference

Further for simplification under the word "Tag" we will imply an element with the specified name, for example Group tag c the @Name attributes, equal "Example" and Id equal 13 is meant an element <Group id="13" name="Heap" />