@@ -10,6 +10,18 @@ Markup Shorthands: css off
1010Translate IDs : dictdef-textdecoderoptions textdecoderoptions,dictdef-textdecodeoptions textdecodeoptions,index section-index
1111</pre>
1212
13+ <pre class=biblio>
14+ {
15+ "ISO8859-1": {
16+ "href": "https://www.iso.org/standard/28245.html",
17+ "title": "Information technology — 8-bit single-byte coded graphic character sets — Part 1: Latin alphabet No. 1",
18+ "publisher": "International Organization for Standardization (ISO)",
19+ "status": "Published",
20+ "date": "April 1998"
21+ }
22+ }
23+ </pre>
24+
1325<link rel=stylesheet href=visualization-colors.css>
1426
1527
@@ -592,7 +604,10 @@ prescribes, as that is necessary to be compatible with deployed content.
592604 <tr><td> "<code> windows-1251</code> "
593605 <tr><td> "<code> x-cp1251</code> "
594606 <tr>
595- <td rowspan=17> <a>windows-1252</a>
607+ <td rowspan=17>
608+ <a>windows-1252</a>
609+ <p class=note> See <a href="#note-latin1-ascii">below</a> for the relationship to historical
610+ "Latin1" and "ASCII" concepts.
596611 <td> "<code> ansi_x3.4-1968</code> "
597612 <tr><td> "<code> ascii</code> "
598613 <tr><td> "<code> cp1252</code> "
@@ -756,6 +771,29 @@ part of the ISO 8859 series. In particular, the necessity of the inclusion of <a
756771and <a>ISO-8859-16</a> is doubtful for the purpose of supporting existing content, but there are no
757772plans to remove these.</p>
758773
774+ <div class=note id=note-latin1-ascii>
775+ <p> The <a>windows-1252</a> <a for=/>encoding</a> has various <a for=encoding>labels</a> , such as
776+ "<code> latin1</code> ", "<code> iso-8859-1</code> ", and "<code> ascii</code> ", which have historically
777+ been confusing for developers. On the web, and in any software that seeks to be web-compatible by
778+ implementing this standard, these are synonyms: "<code> latin1</code> " and "<code> ascii</code> " are
779+ just labels for <a>windows-1252</a> , and any software following this standard will, for example,
780+ decode 0x80 as U+20AC (€) when asked for the "Latin1" or "ASCII" decoding of that byte.
781+
782+ <p> Software that does not follow this standard does not always give the same answers. The root of
783+ this is that the original document that specified Latin1 (ISO/IEC 8859-1) did not provide any
784+ mappings for bytes in the inclusive ranges 0x00 to 0x1F or 0x7F to 0x9F. Similarly, the original
785+ documents that specified ASCII (ISO/IEC 646, among others) did not provide any mappings for bytes
786+ in the inclusive range 0x80 to 0xFF. This means different software has chosen different code point
787+ mappings for those bytes when asked to use Latin1 or ASCII encodings. Web browsers and
788+ browser-compatible software have chosen to map those bytes according to <a>windows-1252</a> , which
789+ is a superset of both, and this choice was codified in this standard. Other software throws errors,
790+ or uses <a>isomorphic decoding</a> , or other mappings. [[ISO8859-1]] [[ISO646]]
791+
792+ <p> As such, implementers and developers need to be careful whenever they are using libraries which
793+ expose APIs in terms of "Latin1" or "ASCII". It's very possible such libraries will not give
794+ answers in line with this standard, if they have chosen other behaviors for the bytes which were
795+ left undefined in the original specifications.
796+ </div>
759797
760798<h3 id=output-encodings>Output encodings</h3>
761799
0 commit comments