From ec0c920a78813d1c047924e024017189dedeec93 Mon Sep 17 00:00:00 2001 From: Denis Vlasenko Date: Sun, 26 Nov 2006 17:07:38 +0000 Subject: [PATCH] added small doc about tar 'pax header' format --- docs/tar_pax.txt | 239 +++++++++++++++++++++++++++++++++++++++++++++++ 1 file changed, 239 insertions(+) create mode 100644 docs/tar_pax.txt diff --git a/docs/tar_pax.txt b/docs/tar_pax.txt new file mode 100644 index 000000000..8a3f1e755 --- /dev/null +++ b/docs/tar_pax.txt @@ -0,0 +1,239 @@ +'pax headers' is POSIX 2003 (iirc) addition designed to fix +tar format limitations - older tar format has fixed fields +for everything (filename, uid, filesize etc) which can overflow. + +pax Header Block + +The pax header block shall be identical to the ustar header block +described in ustar Interchange Format, except that two additional +typeflag values are defined: + +x + Represents extended header records for the following file in +the archive (which shall have its own ustar header block). + +g + Represents global extended header records for the following +files in the archive. Each value shall affect all subsequent files +that do not override that value in their own extended header +record and until another global extended header record is reached +that provides another value for the same field. The typeflag g +global headers should not be used with interchange media that +could suffer partial data loss in transporting the archive. + +For both of these types, the size field shall be the size of the +extended header records in octets. The other fields in the header +block are not meaningful to this version of the pax utility. +However, if this archive is read by a pax utility conforming to +the ISO POSIX-2:1993 standard, the header block fields are used to +create a regular file that contains the extended header records as +data. Therefore, header block field values should be selected to +provide reasonable file access to this regular file. + +A further difference from the ustar header block is that data +blocks for files of typeflag 1 (the digit one) (hard link) may be +included, which means that the size field may be greater than +zero. + +pax Extended Header + +An extended header shall consist of one or more records, each +constructed as follows: + +"%d %s=%s\n", , , + +The field shall be the decimal length of the extended +header record in octets, including length string itself and the +trailing . + +[skip] + +atime + The file access time for the following file(s), equivalent to +the value of the st_atime member of the stat structure for a file, +as described by the stat() function. The access time shall be +restored if the process has the appropriate privilege required to +do so. The format of the shall be as described in pax +Extended Header File Times. + +charset + The name of the character set used to encode the data in the +following file(s). + + The encoding is included in an extended header for information +only; when pax is used as described in IEEE Std 1003.1-2001, it +shall not translate the file data into any other encoding. The +BINARY entry indicates unencoded binary data. + + When used in write or copy mode, it is implementation-defined +whether pax includes a charset extended header record for a file. + +comment + A series of characters used as a comment. All characters in +the field shall be ignored by pax. + +gid + The group ID of the group that owns the file, expressed as a +decimal number using digits from the ISO/IEC 646:1991 standard. +This record shall override the gid field in the following header +block(s). When used in write or copy mode, pax shall include a gid +extended header record for each file whose group ID is greater +than 2097151 (octal 7777777). + +gname + The group of the file(s), formatted as a group name in the +group database. This record shall override the gid and gname +fields in the following header block(s), and any gid extended +header record. When used in read, copy, or list mode, pax shall +translate the name from the UTF-8 encoding in the header record to +the character set appropriate for the group database on the +receiving system. If any of the UTF-8 characters cannot be +translated, and if the -o invalid= UTF-8 option is not specified, +the results are implementation-defined. When used in write or copy +mode, pax shall include a gname extended header record for each +file whose group name cannot be represented entirely with the +letters and digits of the portable character set. + +linkpath + The pathname of a link being created to another file, of any +type, previously archived. This record shall override the linkname +field in the following ustar header block(s). The following ustar +header block shall determine the type of link created. If typeflag +of the following header block is 1, it shall be a hard link. If +typeflag is 2, it shall be a symbolic link and the linkpath value +shall be the contents of the symbolic link. The pax utility shall +translate the name of the link (contents of the symbolic link) +from the UTF-8 encoding to the character set appropriate for the +local file system. When used in write or copy mode, pax shall +include a linkpath extended header record for each link whose +pathname cannot be represented entirely with the members of the +portable character set other than NUL. + +mtime + The file modification time of the following file(s), +equivalent to the value of the st_mtime member of the stat +structure for a file, as described in the stat() function. This +record shall override the mtime field in the following header +block(s). The modification time shall be restored if the process +has the appropriate privilege required to do so. The format of the + shall be as described in pax Extended Header File Times. + +path + The pathname of the following file(s). This record shall +override the name and prefix fields in the following header +block(s). The pax utility shall translate the pathname of the file +from the UTF-8 encoding to the character set appropriate for the +local file system. + + When used in write or copy mode, pax shall include a path +extended header record for each file whose pathname cannot be +represented entirely with the members of the portable character +set other than NUL. + +realtime.any + The keywords prefixed by "realtime." are reserved for future +standardization. + +security.any + The keywords prefixed by "security." are reserved for future +standardization. + +size + The size of the file in octets, expressed as a decimal number +using digits from the ISO/IEC 646:1991 standard. This record shall +override the size field in the following header block(s). When +used in write or copy mode, pax shall include a size extended +header record for each file with a size value greater than +8589934591 (octal 77777777777). + +uid + The user ID of the file owner, expressed as a decimal number +using digits from the ISO/IEC 646:1991 standard. This record shall +override the uid field in the following header block(s). When used +in write or copy mode, pax shall include a uid extended header +record for each file whose owner ID is greater than 2097151 (octal +7777777). + +uname + The owner of the following file(s), formatted as a user name +in the user database. This record shall override the uid and uname +fields in the following header block(s), and any uid extended +header record. When used in read, copy, or list mode, pax shall +translate the name from the UTF-8 encoding in the header record to +the character set appropriate for the user database on the +receiving system. If any of the UTF-8 characters cannot be +translated, and if the -o invalid= UTF-8 option is not specified, +the results are implementation-defined. When used in write or copy +mode, pax shall include a uname extended header record for each +file whose user name cannot be represented entirely with the +letters and digits of the portable character set. + +If the field is zero length, it shall delete any header +block field, previously entered extended header value, or global +extended header value of the same name. + +If a keyword in an extended header record (or in a -o +option-argument) overrides or deletes a corresponding field in the +ustar header block, pax shall ignore the contents of that header +block field. + +Unlike the ustar header block fields, NULs shall not delimit +s; all characters within the field shall be +considered data for the field. None of the length limitations of +the ustar header block fields in ustar Header Block shall apply to +the extended header records. + +pax Extended Header File Times + +Time records shall be formatted as a decimal representation of the +time in seconds since the Epoch. If a period ( '.' ) decimal point +character is present, the digits to the right of the point shall +represent the units of a subsecond timing granularity. In read or +copy mode, the pax utility shall truncate the time of a file to +the greatest value that is not greater than the input header +file time. In write or copy mode, the pax utility shall output a +time exactly if it can be represented exactly as a decimal number, +and otherwise shall generate only enough digits so that the same +time shall be recovered if the file is extracted on a system whose +underlying implementation supports the same time granularity. + +Example from Linux kernel archive tarball: + +00000000 70 61 78 5f 67 6c 6f 62 61 6c 5f 68 65 61 64 65 |pax_global_heade| +00000010 72 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 |r...............| +00000020 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 |................| +00000030 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 |................| +00000040 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 |................| +00000050 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 |................| +00000060 00 00 00 00 30 30 30 30 36 36 36 00 30 30 30 30 |....0000666.0000| +00000070 30 30 30 00 30 30 30 30 30 30 30 00 30 30 30 30 |000.0000000.0000| +00000080 30 30 30 30 30 36 34 00 30 30 30 30 30 30 30 30 |0000064.00000000| +00000090 30 30 30 00 30 30 31 34 30 35 33 00 67 00 00 00 |000.0014053.g...| +000000a0 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 |................| +000000b0 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 |................| +000000c0 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 |................| +000000d0 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 |................| +000000e0 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 |................| +000000f0 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 |................| +00000100 00 75 73 74 61 72 00 30 30 67 69 74 00 00 00 00 |.ustar.00git....| +00000110 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 |................| +00000120 00 00 00 00 00 00 00 00 00 67 69 74 00 00 00 00 |.........git....| +00000130 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 |................| +00000140 00 00 00 00 00 00 00 00 00 30 30 30 30 30 30 30 |.........0000000| +00000150 00 30 30 30 30 30 30 30 00 00 00 00 00 00 00 00 |.0000000........| +00000160 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 |................| +00000170 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 |................| +00000180 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 |................| +00000190 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 |................| +000001a0 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 |................| +000001b0 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 |................| +000001c0 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 |................| +000001d0 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 |................| +000001e0 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 |................| +000001f0 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 |................| +00000200 35 32 20 63 6f 6d 6d 65 6e 74 3d 62 31 30 35 30 |52 comment=b1050| +00000210 32 62 32 32 61 31 32 30 39 64 36 62 34 37 36 33 |2b22a1209d6b4763| +00000220 39 64 38 38 62 38 31 32 62 32 31 66 62 35 39 34 |9d88b812b21fb594| +00000230 39 65 34 0a 00 00 00 00 00 00 00 00 00 00 00 00 |9e4.............| +00000240 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 |................| +... -- 2.25.1