Tolerating broken HTML writers
These are illegal according to SGML, but they're so prevalent that
they're supported by the sample implementation.
Please stop generating HTML in this style!
The BODY element must start with some element. See: an example document where this rule
is broken.
Paragraph breaks are not allowed in headers, lists etc. They may be
ignored or treated intelligently.
Muti-paragraph
heading
Unknown Tags
Tags that aren't known to the parser are treated as data by, for
example, the MidasWWW-1.0 implementation. They should be ignored.
There should be no tags around the word foo: foo.
Body Elements
Note that conforming SGML parsers will treat "&", "<", "</",
and "<!" as normal text characters when they are not followed by a
letter. HTML producers are discouraged from taking advantage of this
feature.
This anchor's name starts with a digit, which is not a
name start character.
unquoted attribute literals: NeXT and html-mode.el
This anchor's href contains a '#', which is not a name
character. It should lead to the NeXT implementation reference below
anyway. This
anchor's href
contains ':' and '/', which are not a name characters. It should lead
to the SLAC MidasWWW doc anyway.
Literal Text Elements
Historical Note
The original semantics of the XMP and LISTING elements is not
representable in SGML. From Tags used in
HTML:
- The text may contain any ISO Latin printable characters, including
the tag opener, so long as it does not contain the closing tag in
full.
But in section 7.6 of the SGML standard:
- The content of an element declared to be character data or
replaceable character data is terminated only by an etago
delimiter-in-context (which need not open a valid end-tag) ... .
The XMP and LISTING elements are deprecated in favor of the TYPEWRITER
element.
Non-standard CDATA parsing: LineMode, MidasWWW, etc.
This example section ends here:
Just in case the foo close tag above wasn't recognized:
Known Implementations
The following systems are known to read and/or write HTML. They all
have bugs.
- Linemode Browser 1.3c
-
- MidasWWW 1.0
-
The MidasWWW parses HTML into its internal data structures, and
then offers the option to extract the data and write it to a file.
It doesn't get it right all the time.
- NeXT editor
- From timbl@info.cern.ch
- html-mode.el
- from marca@@@
- Viola
-
From Pei Wei @ O'Reilly (@@email address). Any known problems? I hear
it's going to use SGMLs.
- www_and_frame
-
@@Go get The
latest version -- it should be current with this spec.
- perl client
-
Just heard about it. haven't tried it. I don't think it supports
entities.