Skip to content

Commit 9d99cf0

Browse files
committed
Revert "Define encoding of PHP scripts"
This reverts commit f35bd70.
1 parent f35bd70 commit 9d99cf0

File tree

5 files changed

+42
-45
lines changed

5 files changed

+42
-45
lines changed

spec/05-types.md

Lines changed: 8 additions & 8 deletions
Original file line numberDiff line numberDiff line change
@@ -128,10 +128,10 @@ str-whitespace::
128128
129129
str-whitespace-char::
130130
new-line
131-
"Space character (0x20)"
132-
"Horizontal-tab character (0x09)"
133-
"Vertical-tab character (0x0B)"
134-
"Form-feed character (0x0C)"
131+
"Space character (U+0020)"
132+
"Horizontal-tab character (U+0009)"
133+
"Vertical-tab character (U+000B)"
134+
"Form-feed character (U+000C)"
135135
136136
str-number::
137137
digit-sequence
@@ -147,10 +147,10 @@ str-number::
147147

148148
<i id="grammar-str-whitespace-char">str-whitespace-char::</i>
149149
<i><a href="09-lexical-structure.md#grammar-new-line">new-line</a></i>
150-
Space character (0x20)
151-
Horizontal-tab character (0x09)
152-
Vertical-tab character (0x0B)
153-
Form-feed character (0x0C)
150+
Space character (U+0020)
151+
Horizontal-tab character (U+0009)
152+
Vertical-tab character (U+000B)
153+
Form-feed character (U+000C)
154154

155155
<i id="grammar-str-number">str-number::</i>
156156
<i><a href="09-lexical-structure.md#grammar-digit-sequence">digit-sequence</a></i>

spec/09-lexical-structure.md

Lines changed: 24 additions & 24 deletions
Original file line numberDiff line numberDiff line change
@@ -145,9 +145,9 @@ input-character::
145145
"Any source character except" new-line
146146
147147
new-line::
148-
"Carriage-return character (0x0D)"
149-
"Line-feed character (0x0A)"
150-
"Carriage-return character (0x0D) followed by line-feed character (0x0A)"
148+
"Carriage-return character (U+000D)"
149+
"Line-feed character (U+000A)"
150+
"Carriage-return character (U+000D) followed by line-feed character (U+000A)"
151151
152152
delimited-comment::
153153
'/*' "No characters or any source character sequence except */" '*/'
@@ -170,9 +170,9 @@ delimited-comment::
170170
Any source character except <i><a href="#grammar-new-line">new-line</a></i>
171171

172172
<i id="grammar-new-line">new-line::</i>
173-
Carriage-return character (0x0D)
174-
Line-feed character (0x0A)
175-
Carriage-return character (0x0D) followed by line-feed character (0x0A)
173+
Carriage-return character (U+000D)
174+
Line-feed character (U+000A)
175+
Carriage-return character (U+000D) followed by line-feed character (U+000A)
176176

177177
<i id="grammar-delimited-comment">delimited-comment::</i>
178178
/* No characters or any source character sequence except */ */
@@ -212,8 +212,8 @@ white-space::
212212
213213
white-space-character::
214214
new-line
215-
"Space character (0x20)"
216-
"Horizontal-tab character (0x09)"
215+
"Space character (U+0020)"
216+
"Horizontal-tab character (U+0009)"
217217
-->
218218

219219
<pre>
@@ -223,8 +223,8 @@ white-space-character::
223223

224224
<i id="grammar-white-space-character">white-space-character::</i>
225225
<i><a href="#grammar-new-line">new-line</a></i>
226-
Space character (0x20)
227-
Horizontal-tab character (0x09)
226+
Space character (U+0020)
227+
Horizontal-tab character (U+0009)
228228
</pre>
229229

230230
**Semantics**
@@ -290,7 +290,7 @@ name::
290290
291291
name-nondigit::
292292
nondigit
293-
"one of the characters 0x80–0xff"
293+
"one of the characters U+0080–U+00ff"
294294
295295
nondigit:: one of
296296
'_'
@@ -324,7 +324,7 @@ nondigit:: one of
324324

325325
<i id="grammar-name-nondigit">name-nondigit::</i>
326326
<i><a href="#grammar-nondigit">nondigit</a></i>
327-
one of the characters 0x80–0xff
327+
one of the characters U+0080–U+00ff
328328

329329
<i id="grammar-nondigit">nondigit:: one of</i>
330330
_
@@ -344,7 +344,7 @@ Names are used to identify the following: [constants](06-constants.md#general),
344344
and names in [heredoc](#heredoc-string-literals) and [nowdoc comments](#nowdoc-string-literals).
345345

346346
A *name* begins with an underscore (_), *name-nondigit*, or extended
347-
name character in the range 0x80–-0xff. Subsequent characters can
347+
name character in the range U+0080–-U+00ff. Subsequent characters can
348348
also include *digits*. A *variable name* is a name with a leading
349349
dollar ($).
350350

@@ -704,7 +704,7 @@ b-prefix:: one of
704704
**Semantics**
705705

706706
A single-quoted string literal is a string literal delimited by
707-
single-quotes (`'`, 0x27). The literal can contain any source character except
707+
single-quotes (`'`, U+0027). The literal can contain any source character except
708708
single-quote (`'`) and backslash (`\\`), which can only be represented by
709709
their corresponding escape sequence.
710710

@@ -807,7 +807,7 @@ codepoint-digits::
807807
**Semantics**
808808

809809
A double-quoted string literal is a string literal delimited by
810-
double-quotes (`"`, 0x22). The literal can contain any source character except
810+
double-quotes (`"`, U+0022). The literal can contain any source character except
811811
double-quote (`"`) and backslash (`\\`), which can only be represented by
812812
their corresponding escape sequence. Certain other (and sometimes
813813
non-printable) characters can also be expressed as escape sequences.
@@ -821,15 +821,15 @@ in the table below:
821821

822822
Escape sequence | Character name | Unicode character
823823
--------------- | --------------| ------
824-
\$ | Dollar sign | 0x24
825-
\" | Double quote | 0x22
826-
\\ | Backslash | 0x5C
827-
\e | Escape | 0x1B
828-
\f | Form feed | 0x0C
829-
\n | New line | 0x0A
830-
\r | Carriage Return | 0x0D
831-
\t | Horizontal Tab | 0x09
832-
\v | Vertical Tab | 0x0B
824+
\$ | Dollar sign | U+0024
825+
\" | Double quote | U+0022
826+
\\ | Backslash | U+005C
827+
\e | Escape | U+001B
828+
\f | Form feed | U+000C
829+
\n | New line | U+000A
830+
\r | Carriage Return | U+000D
831+
\t | Horizontal Tab | U+0009
832+
\v | Vertical Tab | U+000B
833833
\ooo | 1–3-digit octal digit value ooo
834834
\xhh or \Xhh | 1–2-digit hexadecimal digit value hh
835835
\u{xxxxxx} | UTF-8 encoding of Unicode codepoint U+xxxxxx | U+xxxxxx

spec/10-expressions.md

Lines changed: 3 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -1428,7 +1428,7 @@ shell-command-expression:
14281428
` <i><a href="09-lexical-structure.md#grammar-dq-char-sequence">dq-char-sequence</a></i><sub>opt</sub> `
14291429
</pre>
14301430

1431-
where \` is the GRAVE ACCENT character 0x60, commonly referred to as a
1431+
where \` is the GRAVE ACCENT character U+0060, commonly referred to as a
14321432
*backtick*.
14331433

14341434
**Semantics**
@@ -2804,9 +2804,9 @@ character from the right-hand operand is stored at the designated
28042804
location; all other characters in the right-hand operand string are
28052805
ignored. If the designated location is beyond the end of the
28062806
destination string, that string is extended to the new length with
2807-
spaces (0x20) added as padding beyond the old end and before the newly
2807+
spaces (U+0020) added as padding beyond the old end and before the newly
28082808
added character. If the right-hand operand is an empty string, the null
2809-
character \\0 (0x00) is stored.
2809+
character \\0 (U+0000) is stored.
28102810

28112811
**Examples**
28122812

spec/19-grammar.md

Lines changed: 6 additions & 9 deletions
Original file line numberDiff line numberDiff line change
@@ -3,9 +3,6 @@
33
## General
44

55
The grammar notation is described in [Grammars section](09-lexical-structure.md#grammars).
6-
PHP scripts are encoded as ASCII, but bytes 0x80 to 0xFF are allowed in some places, as defined in the grammar.
7-
PHP scripts are parsed as a series of 8-bit bytes, rather than code points from Unicode or any other character repertoire.
8-
Within this specification, bytes are represented by their ASCII interpretations where these are printable characters.
96

107
## Lexical Grammar
118

@@ -35,9 +32,9 @@ Within this specification, bytes are represented by their ASCII interpretations
3532
Any source character except <i><a href="#grammar-new-line">new-line</a></i>
3633

3734
<i id="grammar-new-line">new-line::</i>
38-
Carriage-return character (0x0D)
39-
Line-feed character (0x0A)
40-
Carriage-return character (0x0D) followed by line-feed character (0x0A)
35+
Carriage-return character (U+000D)
36+
Line-feed character (U+000A)
37+
Carriage-return character (U+000D) followed by line-feed character (U+000A)
4138

4239
<i id="grammar-delimited-comment">delimited-comment::</i>
4340
/* No characters or any source character sequence except */ */
@@ -48,8 +45,8 @@ Within this specification, bytes are represented by their ASCII interpretations
4845

4946
<i id="grammar-white-space-character">white-space-character::</i>
5047
<i><a href="#grammar-new-line">new-line</a></i>
51-
Space character (0x20)
52-
Horizontal-tab character (0x09)
48+
Space character (U+0020)
49+
Horizontal-tab character (U+0009)
5350

5451
<i id="grammar-token">token::</i>
5552
<i><a href="#grammar-variable-name">variable-name</a></i>
@@ -83,7 +80,7 @@ Within this specification, bytes are represented by their ASCII interpretations
8380

8481
<i id="grammar-name-nondigit">name-nondigit::</i>
8582
<i><a href="#grammar-nondigit">nondigit</a></i>
86-
one of the characters 0x80–0xff
83+
one of the characters U+0080–U+00ff
8784

8885
<i id="grammar-nondigit">nondigit:: one of</i>
8986
_

tests/lexical_structure/unicode_string_escape_sequence/unicode_escape.phpt

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -3,7 +3,7 @@ PHP Spec test generated from ./lexical_structure/unicode_string_escape_sequence/
33
--FILE--
44
<?php
55

6-
var_dump("\u{61}"); // ASCII "a" - characters below 0x7F just encode as ASCII, as it's UTF-8
6+
var_dump("\u{61}"); // ASCII "a" - characters below U+007F just encode as ASCII, as it's UTF-8
77
var_dump("\u{FF}"); // y with diaeresis
88
var_dump("\u{ff}"); // case-insensitive
99
var_dump("\u{2603}"); // Unicode snowman

0 commit comments

Comments
 (0)