HTML directions
The HTML language has been in use
in the field since 1990, and several
suggestions have been made for improvements.
See working notes . A new DTD will
be the result.
Bad HTML
Much of the HTML actually around
has been generated by the NeXTStep
editor, which has in fact generated
bad HTML. This should not confuse
the specification. Some bugs in
that output include non-matching
open and close tags, and a NEXTID
tag which is not SGML. Also, attribute
values are not quoted even when they
contain characters which require
them to be quoted in SGML.
A perl script was written by Dan
Connolly to clean up bad HTML.
Also, see Dan's HTML spec (draft)
which contains a sort of test suite.
New features
Please mail me mentioning this list
if you think of features I have missed
out.
Header
A wrapper element for all the document-wide
information such as title, document-wide
links, etc. Advantage: You know when
you have got to the end of it, and
can open a window with the required
attributes. This is easier than checking
for a printable character.
Disadvantage: If mandatory, the size
of the minimum document is increased.
A "Body" tag might be useful in the
same light, for the rest.
Link
A document-wide link, as distinct
from a localized anchor. Mainly useful
in conjunction with interesting link
types such as related-index, related-glossary,
parent, author, print-with, copy-with,
etc.
An empty element.
Atributes are as for the anchor element.
Dates
A tag giving the dates a document
was created, modified and expired
is going to be essential for caching
systems.
The expiry date-time will allow long
cache times for documents such as
RFCs, and short or zero caching times
for varing data.
<DATE CREATED="920630123067" EXPIRES="920706000000">
(Is there an SGML standard for datetimes?
Which standard to use standard? HyTime?)
Highlighting
The HPx elements are not implemented.
Some bold/italic/fixed width highlighting
is useful, with equivalent representations
on single font devices. Three possibilities
are
- Numbered HPn tags
- These are rather
meaningless. In practice, everyone
has to remember which is bold and
which is italic.
- Logical tags.
- Dan: "I'd prefer <em>,
<tt>, <cite>, ala TeX. Or we could
go with the O'Reilly/Hal DocBook
tags: <Emphasis>, <OopsChar>, <wordasword>,<CiteBook>,<Subscript>,
<Superscript>". A problem is there
are never enough of them, so people
reuse them on the understanding that
they will be bold, etc.
- Physical tags:
- <Bold>, <italic> etc
as in MIME. There would have to be
an understanding that equivalent
representations could be substituted
where bold and italic are not available.
Base address
- savedas
- Could be a name for the tag
to give the address with which the
document was saved, so that relative
links could be resolved even when
a document is found out of context
(like mailed).
Fixed width text with anchors etc
The XMP and LISTING elements have
proved essential for putting on line
text already formatted assuming a
fixed-width character set. Many people
have asked for a version which, instead
of being oblivious to any embedded
elements, added elements, ang and
anchors withing the text. Line end
would have to be mareked as such
(with P) so that marked-up a line
could be represented on many lines:
the markup could make it too long
to send as it was, and very inconvenient.
Note that an editor could always
save in this element something which
was originally loaded as a raw text
section: indeed, the raw text is
really only a (very useful!) way
of importing text which could also
go though a filter to make it valid
marked up SGML.
Fixed width indented
Very often one wants to quote a command
in fixed width font, but indented
as a quotation, say 40 characters
wide rather than 80. Perhaps the
width required should be a parameter
to the fixed width with anchors element.
(Smacks of low-level format!)
Ordered list
Perhaps the OL tag ought to go back
in, to distinguish the ordered list
from the unordered one. Dan Conolly
implements it.
Link types
There is a list of link types . We
should formalize these, and then
people actually could implement them.
This corresponds to giving values
to the TYPE attribute . This attribute
cohis attribute coEL for RELATIONSHIP
to avoid confusion between the type
of link and the type of object to
which it points.
Entities
A full set of entities for specical
charecters should be defined, picked
out of a suitable standard table.
This should allow for accented characeters
and bullets as a minimum. Representation
using regular USASCII stand-ins (such
as oe for o umlaut) should be allowed
where the full character sets are
not available. Editors must preserve
entities even when the display has
defaulted to a stand-in character
combination.
Comments
The ability to hide information in
an SGML document is useful. The COMMENT
entity was introduced for this purpose
in the line mode browser as an experiment.
It should go in as standard in future.
If it can contain anything then it
can be used for commenting things
out.
Tim BL