banner
Welcome to HTML.co.uk, the number one resource for all news, information, and happenings regarding HTML.

Updates: HTML.co.uk has just been relaunched. Subscribe to our RSS Feed to stay on top of HTML news and techniques.
May
5th

Rules for Text Input and Editing HTML

Author: Editor | Files under HTML Basics, HTML Tutorials
Tags for this article: , , , ,

If you use a text editor or source text oriented HTML editor to produce HTML files, then you should know and pay attention to the following rules:

  • Pay attention to the masking regulations for HTML unique characters, special characters and the entire character set.
  • Order line breaks and blank lines in such a way as to provide an optimal overview for the source text. Also make sure that line breaks and new paragraphs aren’t displayed the same way by the web browser, as they are entered into the source text. For line breaks and paragraph breaks to appear in the internet browser, you have to use the corresponding paragraph or line break HTML elements. If for some special reason you wish for everything entered in the source code to be displayed in exactly the same way by the web browser, you may use the HTML element for pre-formatted text.
  • Remember that there are no tabs in HTML. A tab entered in the source code will be changed into an empty space by the web browser. Line break characters, tab characters and space characters in HTML make up the so-called white space characters. The browser typically interprets a tab or line break in the HTML source text as an empty space. Multiple consecutive white space characters are ignored and fused together as only one single empty space character. Instead of actually pressing the space bar, you may instead enter the   character combination into the source code to produce an empty space; and do so as many times in a row as desired.


May
5th

The Fundamental Structure of a HTML File

Author: Editor | Files under HTML Basics, HTML Tutorials
Tags for this article: , , ,

A typical HTML file consists of the following parts:

  • The document type declaration (information on which form of HTML is being used)
  • Header (Containing the title, for example)
  • Body (displayed content, text with headings and references)

The formula:

<!DOCTYPE HTML PUBLIC “-//W3C//DTD HTML 4.01 Transitional//EN”
<html>
<head>
<title>Site description< /title>
</head>
<body>
</body>
</html>

Explanation:

The first line looks especially confusing for beginners. This somewhat complicated piece of information is a document type declaration. We will describe it in more detail later on.

The HTML file’s entire remaining content is contained within the <html> and </html> tags. The html element is also described as the core element of a HTML file. The introducing tag for the header, <head>, follows the introductory <html> tag. The header information is included in between this tag and the closing </head> tag. The most important header information concerns the title and is marked with <title> and </title> tags. The body text, defined through the <body> and </body> tags, follows below. The file’s actual content, as in what is displayed by the web browser, is notated in the body.

Take Note

If you want to use multiple frames, then the framework in files, where a frameset has been defined, looks different. We will discuss these complexities and advise you only to involve yourself with framesets after becoming more familiar with the basics of HTML.

The fundamental framework of a XHTML file

If you wish to correctly write XHTML, then the fundamental framework looks quite similar. Only the beginning is slightly different.

The formula:

<?xml version=“1.0” ?>

<!DOCTYPE html PUBLIC “-//W3C//DTD XHTML 1.0 Transitional//EN”

http://www.w3.org/1999/xhtml>

<html xmlns=http://www.w3.org/1999/xhtml>
<head>
<title>Site description</title>
</head>
<body>
</body>
</html>

Explanation:

The relationship to XML should be defined even before the document type in XHTML files. The first line with the question marks inside the arrows serves to make this distinction. Notate this line just like in the example. It concerns a so-called XML declaration.

A valid document type for XHTML files must be given with the document type declaration.

In the introductory <html> tag, the XML naming space must be given with an attribute named xmlns. Use the information as shown in the example above.

The file is declared as a XHTML using these methods. The following source text is technically only normal HTML, although you still have to pay attention to the differences between standard HTML and XHTML. However, you only need to occupy yourself with these differences after first becoming familiar with standard HTML.

Document Type Declaration

HTML is only one from many in the family of markup languages, although it is the most prominent. HTML itself has a long history and has been developed into many different versions. The document type declaration determines which version and which markup language you are using. A type of reading software, such as a web browser, can then orient itself based on this information.

The rules for HTML have been formulated with the help of SGML, while the rules for XHTML were formulated with help of XML. According to the rules of a SGML or XML based markup language, a HTML or XHTML file is only first valid if it provides a specific document type and then if its source code completely adheres to the document type’s regulations. Every document type declaration comes with document type definitions. These define which elements a HTML document type may include, which elements may be interlocked with other elements, what attributes belong to each element, whether entering this attribute is necessary or not, and so on.

As an HTML novice, you might fail to see the point behind all the attention given to declaring the document type. But it is exactly these document types that precisely define the rules of various languages, and have proved to be a major advance in programming. The concept of files independent from software, that also adhere to rules, is only made viable through document types. Without official rules to fall back on, languages like HTML would quickly fractionalise into various dialects and splits. The same is true with natural languages: without certain grammatical rules, the same written language would eventually diverge in various directions, and be indiscernible from one group to the next. Because software is much less intelligent than humans, and requires much more exact information in order to understand what it is receiving, adhering to rules and standards is all the more important.

An example of document type declaration:

<!DOCTYPE HTML PUBLIC “-//W3C//DTD HTML 4.01 Transitional//EN”

http://www.w3.org/TR/html4/loose.dtd>

Explanation:

Notate the document type at the beginning of the HTML file before the tag in capitalised letters, as shown above. The exclamation mark follows the first arrow. The information DOCTYPE HTML PUBLIC follows afterwards. This means that the file is referenced to the publicly accessible HTML-DTD type. The information included in quotations marks can be interpreted as such:

W3C is the publisher of DTD. Information such as DTD HTML 4.01 Transitional means that you are using the file in the “HTML” document type, version 4.01 and its transitional variant. The EN indicates the language, in this case English. The piece of information concerns which language will be used to define the elements and attributes, not for the file’s contents.

The document type declaration then includes the web address of the document type definitions. This information is not mandatory. Simply entering:
<!DOCTYPE HTML PUBLIC “-//W3C//DTD HTML 4.01 Transitional//EN”>
is also acceptable. Reading software can use the rules noted in the document type definitions to check the HTML file. However, most of today’s browsers already come equipped with the main document types, so that this isn’t necessary. Because browsers also have to deal with worst language disfigurations imaginable, they also possess the ability to display even mistake ridden HTML pages halfway decently. But with XML document files, it is very common that a parser stops loading the website if rules are broken, and instead only loads an error message. This is already the case with XHTML sites that are completed with the application/xhtml+xml MIME type.

The strict variant for HTML:

<!DOCTYPE HTML PUBLIC “-//W3C//DTD HTML 4.01//EN”

http://www.w3.org/TR/html4/strict.dtd>

Use this entry if you do not wish to use certain elements and attributes that were used in earlier HTML standards, and have since become replaceable. The interlocking regulations for HTML elements in the strict variant are naturally stricter, and structured more cleanly. For example, in this variant one cannot simply notate text in between the <body> and </body> tags. All content must be included in so-called block elements, such as headings, paragraphs, graphs, etc.

The transitional variant for HTML:

<!DOCTYPE HTML PUBLIC “-//W3C//DTD HTML 4.01 Transitional//EN”
“http://www.w3.org/TR/html4/loose.dtd”>

You can use this entry if you need to use some of the elements or attributes not allowed in the strict variant. The rules for element interlocking are somewhat milder in the transitional variant. One is allowed to include naked text outside of any element in between the <body> and </body> tags. Moreover, this variant is necessary in order to edit links with target attributes, and correct direct framesets, for example.

The variant frameset for HTML:

<!DOCTYPE HTML PUBLIC “-//W3C//DTD HTML 4.01 Frameset//EN”

http://www.w3.org/TR/html4/frameset.dtd”>

This entry is envisioned for special HTML files, in which framesets are defined.

Older document type declarations:

It can become reasonable to refer to older HTML versions in some certain instances. However, this should only be done if the technical circumstances demand it. The following older entries are available:

<!DOCTYPE HTML PUBLIC “-//IETF//DTD HTML 2.0//EN”>

Use this document type if you wish to refer to HTML 2.0

<!DOCTYPE HTML PUBLIC “-//IETF//DTD HTML 3.2//EN”>

Use this document type to refer to HTML 3.2.

Some tips for using different document types

If you have become somewhat bewildered through the whole chaos of HTML and XHTML, HTML language variants, XML and document type declarations – you don’t have to be. The clutter has resulted from the large amount of development within the language.

For your first foray into the HTML language, you should only use the first presented fundamental framework as it is notated above. Then learn how to work with additional elements and attributes, as well as style sheets. After becoming a little more familiar with the language, it will make much more sense which document type to choose.


May
5th

Getting to know the Basics of HTML: Tags, Elements and Attributes

Author: Editor | Files under HTML Basics, HTML Tutorials
Tags for this article: , ,

Anyone with extensive HTML experience can easily skip this tutorial. However, novices, and even those who wish to brush up on their basic HTML knowledge, could stand to gain from this section. We will go over and define the various pieces that make up HTML code, including elements, tags, attributes and their uses.

Elements and Tags in HTML

HTML files only consist of text. There are certain characters, all from the regular character supply, for distinguishing parts of the text.

The content of HTML files is placed inside HTML elements.

Here is an Example:

<h1>HTML – the language of the web</h1>

This is how it actually looks:

HTML – the language of the web

Explanation:

The example shows a first order heading. The introducing <h1> tag signals that a first order heading is going to follow. The closing </h1> tag signals the end of the heading. A closing tag begins with a left arrow and a forward slash, like “</”.

Pay Attention to the Following:

With standard HTML, it doesn’t matter if tags are notated in lower- or uppercase. Therefore, <h1> and <H1> mean the same thing in standard HTML. However, in the new HTML variant, XHTML, element names must be written in lowercase. So it is recommendable to always write the element names in lower case, regardless if writing HTML or XHTML.

Another Example:

One line and a manual line break<br>
The next line

Here is what it looks like:

One line and a manual line break

The next line

Explanation:

The <br> at the end of the first line signifies that a manual line break will be entered at that point. (br=break).

Take Note:

If you want to write elements in XHTML correctly, then elements with standalone tags must be notated differently: instead of <br>, they must be notated as <br />. The element name must be written with a closing forward slash. Alternatively, you could also notate the element as <br><br/>, with an introducing and ending tag, but no content. We will go over this more in depth later on in some of our other chapters.

Interlocking Elements

Elements can be interlocked with one another. A hierarchical structure then results from this technique. Complex HTML files contain many interlockings. This is why many experienced HTML programmers talk of structured markup.

An Example:

<h1><i>HTML</i> – the language of the internet</h></p>

Explanation:

The i Element stands for italics. The text between <i> and </i> will be displayed in italics, depending on the font and size of the first order heading.

Attributes in Tags

Introducing and standalone tags can include additional information.

An Example:

<h1 align=“center”>HTML – the language of the web</h></p>

Explanation:

The align=“center” attribute makes it so that text is centred.

There are the following types of attributes present in HTML elements:

  • Attributes with value allocation, where only certain values are allowed. Like with <h1 align=“center”>, for example. Here only the values, center, right, left and justify are allowed.
  • Attributes with free value allocation, but where a certain type of value or convention is expected. Such as with <style type=“text/css”> (this defines the field for style sheets and MIME types are always constructed as type/type below). Or <table border=“1”> (This defines the pixel strength for a table border and a numerical input value is expected.)
  • Attributes with free value allocation without any further conventions, like <p title= “Introduction to HTML”> – an entire text can be entered here if desired.
  • Standalone attributes, such as <hr no shade> (a separator with no shade). Although standalone attributes only exist in standard HTML. If you wish to write in XHTML correctly, then you must notate the attribute as <hr noshade=“noshade”>. We will of course discuss more of the differences between XHTML and HTML later on.

Even though standard HTML allows certain attributes to be written without quotation marks, you should not use this option. It lowers the possibility of mistakes when you fundamentally and always place all values allocated to attributes in quotation marks. Either ‘ or “ can be used, it is only important not to use different types with the same value.

The same holds true for element names holds true for attribute: with HMTL it makes no difference whether you write in upper- or lowercase, but in XHTML all attribute names need to be written in lowercase. The difference in a lower or uppercase value can sometimes be important, depending on the type of value.

Next to attributes, that are only present in certain HTML elements, there are also so-called universal attributes that are allowed in nearly every HTML element.

An Example:

<p id=“Introduction”>Text</p></p>

Explanation:

The example defines a paragraph of text with <p> and </p> tags. A universal attribute is notated in the introducing tag, namely, the id attribute. This allows you to give distinct names for single HTML elements throughout the document. We will go over universal elements in more detail in a separate section.

HTML Parser

A HTML parser is understood as software that recognises HTML and transforms it into structured text. Every web browser possesses a HTML parser in order to even understand HTML to begin with. Unfortunately, such parsers are often confronted with HTML coding mistakes in most websites. Usually the mistakes are small and not too tragic, but some websites’ HTML source code so horribly disfigures the rules of HTML that insufficient is not strong enough of a word. Strong parsers, which check exactly how much a website adheres to the rules, should not display such insufficient websites and instead provide an error message. However, such a parser would hardly have any chance on the open market, because it would barely display a single prevalent website. As a result, today’s parsers are all fairly accepting and most simply digest everything. Internet Explorer has taken this loose parser type the furthest. This allows them to proclaim they “command” HTML the best, while many experts feel their practice is only furthering sloppy and mistake ridden HTML coding.

It has become more and more important to adhere to the rules of HTML in the face of increasing complexity resulting from numerous languages, such as HTML together with embedded CSS, Javascript, PHP, etc. We will describe these important rules in more detail in a separate section.

If you want to see for yourself if your HTML website conforms to all the rules, then a so-called validator is a very useful tool. A validator reads an entire site, determines which document type is given for the site, and then strongly checks how well the site adheres to the corresponding rules.

The oldest offered validator comes from WC3: validator.w3.org. There you have the possibility to either upload a file from your computer, or give the validator a web address to check.