1 # $XConsortium: sgmls.txt /main/2 1996/11/11 11:24:56 drk $
9 sgmls - a validating SGML parser
11 An SGML System Conforming to
12 International Standard ISO 8879 --
13 Standard Generalized Markup Language
16 sgmls [ -deglprsuv ] [ -cfile ] [ -iname ] [ filename...
20 Sgmls parses and validates the SGML document entity in
21 filename... and prints on the standard output a simple
22 ASCII representation of its Element Structure Information
23 Set. (This is the information set which a structure-
24 controlled conforming SGML application should act upon.)
25 Note that the document entity may be spread amongst sev-
26 eral files; for example, the SGML declaration, document
27 type declaration and document instance set could each be
28 in a separate file. If no filenames are specified, then
29 sgmls will read the document entity from the standard
30 input. A filename of - can also be used to refer to the
33 The following options are available:
35 -cfile Write a report of capacity usage to file. The
36 report is in the format of a RACT result. RACT is
37 the Reference Application for Capacity Testing
38 defined in the Proposed American National Standard
39 Conformance Testing for Standard Generalized Markup
40 Language (SGL) Systems (X3.190-199X), Draft July
43 -d Warn about duplicate entity declarations.
45 -e Describe open entities in error messages. Error
46 messages always include the position of the most
47 recently opened external entity.
49 -g Show the GIs of open elements in error messages.
53 <!ENTITY % name "INCLUDE">
55 occurs at the start of the document type declara-
56 tion subset in the SGML document entity. Since
57 repeated definitions of an entity are ignored, this
58 definition will take precedence over any other def-
59 initions of this entity in the document type decla-
60 ration. Multiple -i options are allowed. If the
61 SGML declaration replaces the reserved name INCLUDE
74 then the new reserved name will be the replacement
75 text of the entity. Typically the document type
76 declaration will contain
78 <!ENTITY % name "IGNORE">
80 and will use %name; in the status keyword specifi-
81 cation of a marked section declaration. In this
82 case the effect of the option will be to cause the
83 marked section not to be ignored.
85 -l Output L commands giving the current line number
88 -p Parse only the prolog. Sgmls will exit after pars-
89 ing the document type declaration. Implies -s.
91 -r Warn about defaulted references.
93 -s Suppress output. Error messages will still be
96 -u Warn about undefined elements: elements used in the
97 DTD but not defined. Also warn about undefined
100 -v Print the version number.
103 An external entity resides in one or more files. The
104 entity manager component of sgmls maps a sequence of files
105 into an entity in three sequential stages:
107 1. each carriage return character is turned into a
110 2. each newline character is turned into a record end
111 character, and at the same time a record start
112 character is inserted at the beginning of each
115 3. the files are concatenated.
117 A system identifier is interpreted as a list of filenames
118 separated by colons. A filename of - can be used to refer
119 to the standard input. If no system identifier is sup-
120 plied, then the entity manager will attempt to generate a
121 filename using the public identifier (if there is one) and
122 other information available to it. Notation identifiers
123 are not subject to this treatment. This process is con-
124 trolled by the environment variable SGML_PATH; this con-
125 tains a colon-separated list of filename templates. A
126 filename template is a filename that may contain substitu-
127 tion fields; a substitution field is a % character
140 followed by a single letter that indicates the value of
141 the substitution. If SGML_PATH uses the %S field (the
142 value of which is the system identifier), then the entity
143 manager will also use SGML_PATH to generate a filename
144 when a system identifier that does not contain any colons
145 is supplied. The value of a substitution can either be a
146 string or it can be null. The entity manager transforms
147 the list of filename templates into a list of filenames by
148 substituting for each substitution field and discarding
149 any template that contained a substitution field whose
150 value was null. It then uses the first resulting filename
151 that exists and is readable. Substitution values are
152 transformed before being used for substitution: firstly,
153 any names that were subject to upper case substitution are
154 folded to lower case; secondly, space characters are
155 mapped to underscores and slashes are mapped to percents.
156 The value of the %S field is not transformed. The values
157 of substitution fields are as follows:
161 %D The entity's data content notation. This substitu-
162 tion will succeed only for external data entities.
164 %N The entity, notation or document type name.
166 %P The public identifier if there was a public identi-
167 fier, otherwise null.
169 %S The system identifier if there was a system identi-
172 %X (This is provided mainly for compatibility with
173 ARCSGML.) A three-letter string chosen as follows:
175 | | With public identifier
176 | +-------------+-----------
177 | No public | Device | Device
178 | identifier | independent | dependent
179 ---------------------------+------------+-------------+-----------
180 Data or subdocument entity | nsd | pns | vns
181 General SGML text entity | gml | pge | vge
182 Parameter entity | spe | ppe | vpe
183 Document type definition | dtd | pdt | vdt
184 Link process definition | lpd | plp | vlp
186 The device dependent version is selected if the
187 public text class allows a public text display ver-
188 sion but no public text display version was speci-
191 %Y The type of thing for which the filename is being
206 SGML subdocument entity sgml
208 General text entity text
209 Parameter entity parm
210 Document type definition dtd
211 Link process definition lpd
213 The value of the following substitution fields will be
214 null unless a valid formal public identifier was supplied.
216 %A Null if the text identifier in the formal public
217 identifier contains an unavailable text indicator,
218 otherwise the empty string.
220 %C The public text class, mapped to lower case.
222 %E The public text designating sequence (escape
223 sequence) if the public text class is CHARSET, oth-
226 %I The empty string if the owner identifier in the
227 formal public identifier is an ISO owner identi-
228 fier, otherwise null.
230 %L The public text language, mapped to lower case,
231 unless the public text class is CHARSET, in which
234 %O The owner identifier (with the +// or -// prefix
237 %R The empty string if the owner identifier in the
238 formal public identifier is a registered owner
239 identifier, otherwise null.
241 %T The public text description.
243 %U The empty string if the owner identifier in the
244 formal public identifier is an unregistered owner
245 identifier, otherwise null.
247 %V The public text display version. This substitution
248 will be null if the public text class does not
249 allow a display version or if no version was speci-
250 fied. If an empty version was specified, a value
251 of default will be used.
273 The system declaration for sgmls is as follows:
275 SYSTEM "ISO 8879:1986"
277 BASESET "ISO 646-1983//CHARSET
278 International Reference Version (IRV)//ESC 2/5 4/0"
280 CAPACITY PUBLIC "ISO 8879:1986//CAPACITY Reference//EN"
282 MINIMIZE DATATAG NO OMITTAG YES RANK NO SHORTTAG YES
283 LINK SIMPLE NO IMPLICIT NO EXPLICIT NO
284 OTHER CONCUR NO SUBDOC YES 1 FORMAL YES
286 SYNTAX PUBLIC "ISO 8879:1986//SYNTAX Reference//EN"
287 SYNTAX PUBLIC "ISO 8879:1986//SYNTAX Core//EN"
289 GENERAL YES MODEL YES EXCLUDE YES CAPACITY YES
290 NONSGML YES SGML YES FORMAL YES
294 The memory usage of sgmls is not a function of the capac-
295 ity points used by a document; however, sgmls can handle
296 capacities significantly greater than the reference capac-
299 In some environments, higher values may be supported for
300 the SUBDOC parameter.
302 Documents that do not use optional features are also sup-
303 ported. For example, if FORMAL NO is specified in the
304 SGML declaration, public identifiers will not be required
305 to be valid formal public identifiers.
307 Certain parts of the concrete syntax may be changed:
309 The shunned character numbers can be changed.
311 Eight bit characters can be assigned to LCNMSTRT,
312 UCNMSTRT, LCNMCHAR and UCNMCHAR. Declaring this
313 requires that the syntax reference character set be
315 BASESET "ISO Registration Number 100//CHARSET
316 ECMA-94 Right Part of Latin Alphabet Nr. 1//ESC 2/13 4/1"
319 Uppercase substitution can be performed or not per-
320 formed both for entity names and for other names.
322 Either short reference delimiters assigned by the
323 reference delimiter set or no short reference
324 delimiters are supported.
338 The reserved names can be changed.
340 The quantity set can be increased within certain
341 limits subject to there being sufficient memory
342 available. The upper limit on NAMELEN is 239. The
343 upper limits on ATTCNT, ATTSPLEN, BSEQLEN, ENTLVL,
344 LITLEN, PILEN, TAGLEN, and TAGLVL are more than
345 thirty times greater than the reference limits.
346 The upper limit on GRPCNT, GRPGTCNT, and GRPLVL is
347 253. NORMSEP cannot be changed. DTAGLEN are
348 DTEMPLEN irrelevant since sgmls does not support
352 The SGML declaration may be omitted, the following decla-
353 ration will be implied:
354 <!SGML "ISO 8879:1986"
356 BASESET "ISO 646-1983//CHARSET
357 International Reference Version (IRV)//ESC 2/5 4/0"
365 CAPACITY PUBLIC "ISO 8879:1986//CAPACITY Reference//EN"
367 SYNTAX PUBLIC "ISO 8879:1986//SYNTAX Reference//EN"
369 MINIMIZE DATATAG NO OMITTAG YES RANK NO SHORTTAG YES
370 LINK SIMPLE NO IMPLICIT NO EXPLICIT NO
371 OTHER CONCUR NO SUBDOC YES 99999999 FORMAL YES
373 with the exception that characters 128 through 254 will be
374 assigned to DATACHAR. When exporting documents that use
375 characters in this range, an accurate description of the
376 upper half of the document character set should be added
377 to this declaration. For ISO Latin-1, an appropriate
378 description would be:
379 BASESET "ISO Registration Number 100//CHARSET
380 ECMA-94 Right Part of Latin Alphabet Nr. 1//ESC 2/13 4/1"
381 DESCSET 128 32 UNUSED
386 The output is a series of lines. Lines can be arbitrarily
387 long. Each line consists of an initial command character
388 and one or more arguments. Arguments are separated by a
389 single space, but when a command takes a fixed number of
390 arguments the last argument can contain spaces. There is
391 no space between the command character and the first
404 argument. Arguments can contain the following escape
409 \n A record end character.
411 \| Internal SDATA entities are bracketed by these.
413 \nnn The character whose code is nnn octal.
415 A record start character will be represented by \012.
416 Most applications will need to ignore \012 and translate
419 The possible command characters and arguments are as fol-
422 (gi The start of an element whose generic identifier is
423 gi. Any attributes for this element will have been
424 specified with A commands.
426 )gi The end an element whose generic identifier is gi.
430 &name A reference to an external data entity name; name
431 will have been defined using an E command.
433 ?pi A processing instruction with data pi.
436 The next element to start has an attribute name
437 with value val which takes one of the following
441 The value of the attribute is implied.
444 The attribute is character data. This is
445 used for attributes whose declared value is
449 The attribute is a notation name; nname will
450 have been defined using a N command. This
451 is used for attributes whose declared value
455 The attribute is a list of general entity
456 names. Each entity name will have been
457 defined using an I, E or S command. This is
470 used for attributes whose declared value is
474 The attribute is a list of tokens. This is
475 used for attributes whose declared value is
479 This is the same as the A command, except that it
480 specifies a data attribute for an external entity
481 named ename. Any D commands will come after the E
482 command that defines the entity to which they
483 apply, but before any & or A commands that refer-
486 Nnname nname. Define a notation This command will be pre-
487 ceded by a p command if the notation was declared
488 with a public identifier, and by a s command if the
489 notation was declared with a system identifier. A
490 notation will only be defined if it is to be refer-
491 enced in an E command or in an A command for an
492 attribute with a declared value of NOTATION.
495 Define an external data entity named ename with
496 type typ (CDATA, NDATA or SDATA) and notation not.
497 This command will be preceded by one or more f com-
498 mands giving the filenames generated by the entity
499 manager from the system and public identifiers, by
500 a p command if a public identifier was declared for
501 the entity, and by a s command if a system identi-
502 fier was declared for the entity. not will have
503 been defined using a N command. Data attributes
504 may be specified for the entity using D commands.
505 An external data entity will only be defined if it
506 is to be referenced in a & command or in an A com-
507 mand for an attribute whose declared value is
511 Define an internal data entity named ename with
512 type typ (CDATA or SDATA) and entity text text. An
513 internal data entity will only be defined if it is
514 referenced in an A command for an attribute whose
515 declared value is ENTITY or ENTITIES.
517 Sename Define a subdocument entity named ename. This com-
518 mand will be preceded by one or more f commands
519 giving the filenames generated by the entity man-
520 ager from the system and public identifiers, by a p
521 command if a public identifier was declared for the
522 entity, and by a s command if a system identifier
523 was declared for the entity. A subdocument entity
536 will only be defined if it is referenced in a {
537 command or in an A command for an attribute whose
538 declared value is ENTITY or ENTITIES.
540 ssysid This command applies to the next E, S or N command
541 and specifies the associated system identifier.
543 ppubid This command applies to the next E, S or N command
544 and specifies the associated public identifier.
547 This command applies to the next E or S command and
548 specifies an associated filename. There will be
549 more than one f command for a single E or S command
550 if the system identifier used a colon.
552 {ename The start of the SGML subdocument entity ename;
553 ename will have been defined using a S command.
555 }ename The end of the SGML subdocument entity ename.
559 Set the current line number and filename. The
560 filename argument will be omitted if only the line
561 number has changed. This will be output only if
562 the -l option has been given.
564 #text An APPINFO parameter of text was specified in the
565 SGML declaration. This is not strictly part of the
566 ESIS, but a structure-controlled application is
567 permitted to act on it. No # command will be out-
568 put if APPINFO NONE was specified. A # command
569 will occur at most once, and may be preceded only
570 by a single L command.
572 C This command indicates that the document was a con-
573 forming SGML document. If this command is output,
574 it will be the last command. An SGML document is
575 not conforming if it references a subdocument
576 entity that is not conforming.
579 Some non-SGML characters in literals are counted as two
580 characters for the purposes of quantity and capacity cal-
584 The SGML Handbook, Charles F. Goldfarb
585 ISO 8879 (Standard Generalized Markup Language), Interna-
586 tional Organization for Standardization
589 ARCSGML was written by Charles F. Goldfarb.
602 Sgmls was derived from ARCSGML by James Clark
603 (jjc@jclark.com), to whom bugs should be reported.