HTML / CSS / JavaScript Tutorial

Markup Languages

[this page | pdf | back links]

HTML is the main ‘markup’ language used for web pages and web applications. By a (digital) markup language we mean a way of creating and interpreting a digital document in which the document contains tags (and their attributes) that the software rendering the document interprets in a specific way (but with the tags themselves and their attributes not typically directly appearing in the output transmitted to the user). In what follows we will describe how this concept works with documents concentrating on textual output, although the same concepts are also applicable to documents containing other types of material (such as pictures or sounds).


There are many different mark-up languages used in different contexts. For example, LaTeX (and TeX, the underlying mark-language on which LaTeX is based) is a tool for preparing mathematically orientated documents. It uses the backslash character (“\”) and braces (“{” and “}”) to tell the software rendering the document that relevant text needs to be interpreted in a specific manner. Text of the form “E &= \frac{mc^2}{\sqrt{1-\frac{v^2}{c^2}}}” is rendered by a TeX viewer roughly along the lines of the following:



Here the “\frac{numerator}{denominator}” tells the software to render the text formed by the numerator and the denominator as a fraction, and \sqrt{argument} tells the software to render the text formed by the argument as a square root. Markup can be nested.


Certain features are shared by virtually all digital mark-up languages, including HTML. These are:


(a)    The mark-up language needs to be specified and interpreted in a consistent fashion. This is harder to arrange than it looks for languages that develop through time, since the equivalent of different dialects can then be created.


At the time of writing, the latest formally adopted version of HTML is HTML 4.01 although the World Wide Web Consortium (W3C) issued HTML 5 as a formal recommendation in October 2014 and has also developed a parallel XML based language, XHTML 5.1. XML stands for “eXtensible Mark-up Language”. Most leading browsers will interpret an HTML document using HTML 5 conventions, but some older browsers may not. Modern browsers can be instructed to use older versions of the language if necessary by including a suitable document-level tag. HTML 4 itself comes in three different versions, i.e. Strict, Transitional and Frameset. These loosely-speaking correspond to how closely the document adheres to the specific requirements of HTML 4.


(b)   The language generally needs to be able to nest tags within other tags. This requires the language to have the concept of opening a tag and then closing it, with the text in-between the opening and closing elements being interpreted in a specific manner. With TeX, the nesting process makes use of open and close braces (“{” and “}” respectively). With HTML, tags (more commonly called ‘elements’) generally take a form akin to <xxx> … </xxx>, where the <xxx> opens the tag, the </xxx> closes the tag and the xxx represents the type of tag involved. More sophisticated tags take the form:


<xxx yyy> … </xxx>


where the yyy defines the tag’s attributes, i.e. provides added information on (i.e. attributes for) the element / tag.


For example, any text in a webpage between an opening <script> tag and the corresponding closing </script> is generally interpreted as JavaScript code. Any text between an opening <a> and a closing </a> is the text used when rendering a hyperlink. The address of the document to which the hyperlink points is included as an element attribute, e.g. the full tag might involve:


<a href=“http://www.nematrian.com/Introduction.aspx”> Introduction to Nematrian website </a>).


Some mark-up languages such as XML require all opened tags to be explicitly closed, e.g. with any <x> ultimately closed by a </x> (or in XML it is possible to open and close a tag at the same time, using a format such as <x />). Others, like HTML, do not require this convention, if the tag never contains anything. For example, in HTML the tag <br> means insert a carriage break, i.e. start a new line, and does not need to be followed by a </br>.


Contents | Prev | Next

Desktop view | Switch to Mobile