Skip to content

Commit fe55bee

Browse files
ramonacatnikic
authored andcommitted
Define encoding of PHP scripts
1 parent fd81c6b commit fe55bee

File tree

4 files changed

+45
-46
lines changed

4 files changed

+45
-46
lines changed

spec/05-types.md

Lines changed: 8 additions & 8 deletions
Original file line numberDiff line numberDiff line change
@@ -128,10 +128,10 @@ str-whitespace::
128128
129129
str-whitespace-char::
130130
new-line
131-
"Space character (U+0020)"
132-
"Horizontal-tab character (U+0009)"
133-
"Vertical-tab character (U+000B)"
134-
"Form-feed character (U+000C)"
131+
"Space character (0x20)"
132+
"Horizontal-tab character (0x09)"
133+
"Vertical-tab character (0x0B)"
134+
"Form-feed character (0x0C)"
135135
136136
str-number::
137137
digit-sequence
@@ -147,10 +147,10 @@ str-number::
147147

148148
<i id="grammar-str-whitespace-char">str-whitespace-char::</i>
149149
<i><a href="09-lexical-structure.md#grammar-new-line">new-line</a></i>
150-
Space character (U+0020)
151-
Horizontal-tab character (U+0009)
152-
Vertical-tab character (U+000B)
153-
Form-feed character (U+000C)
150+
Space character (0x20)
151+
Horizontal-tab character (0x09)
152+
Vertical-tab character (0x0B)
153+
Form-feed character (0x0C)
154154

155155
<i id="grammar-str-number">str-number::</i>
156156
<i><a href="09-lexical-structure.md#grammar-digit-sequence">digit-sequence</a></i>

spec/09-lexical-structure.md

Lines changed: 28 additions & 29 deletions
Original file line numberDiff line numberDiff line change
@@ -5,16 +5,15 @@
55
A [script](04-basic-concepts.md#program-structure) is an ordered sequence of characters. Typically, a
66
script has a one-to-one correspondence with a file in a file system, but
77
this correspondence is not required.
8+
PHP scripts are parsed as a series of 8-bit bytes, rather than code points from Unicode or any other character repertoire.
9+
Within this specification, bytes are represented by their ASCII interpretations where these are printable characters.
810

911
Conceptually speaking, a script is translated using the following steps:
1012

11-
1. Transformation, which converts a script from a particular character
12-
repertoire and encoding scheme into a sequence of 8-bit characters.
13-
14-
2. Lexical analysis, which translates a stream of input characters into
13+
1. Lexical analysis, which translates a stream of input characters into
1514
a stream of tokens.
1615

17-
3. Syntactic analysis, which translates the stream of tokens into
16+
2. Syntactic analysis, which translates the stream of tokens into
1817
executable code.
1918

2019
Conforming implementations must accept scripts encoded with the UTF-8
@@ -145,9 +144,9 @@ input-character::
145144
"Any source character except" new-line
146145
147146
new-line::
148-
"Carriage-return character (U+000D)"
149-
"Line-feed character (U+000A)"
150-
"Carriage-return character (U+000D) followed by line-feed character (U+000A)"
147+
"Carriage-return character (0x0D)"
148+
"Line-feed character (0x0A)"
149+
"Carriage-return character (0x0D) followed by line-feed character (0x0A)"
151150
152151
delimited-comment::
153152
'/*' "No characters or any source character sequence except */" '*/'
@@ -170,9 +169,9 @@ delimited-comment::
170169
Any source character except <i><a href="#grammar-new-line">new-line</a></i>
171170

172171
<i id="grammar-new-line">new-line::</i>
173-
Carriage-return character (U+000D)
174-
Line-feed character (U+000A)
175-
Carriage-return character (U+000D) followed by line-feed character (U+000A)
172+
Carriage-return character (0x0D)
173+
Line-feed character (0x0A)
174+
Carriage-return character (0x0D) followed by line-feed character (0x0A)
176175

177176
<i id="grammar-delimited-comment">delimited-comment::</i>
178177
/* No characters or any source character sequence except */ */
@@ -212,8 +211,8 @@ white-space::
212211
213212
white-space-character::
214213
new-line
215-
"Space character (U+0020)"
216-
"Horizontal-tab character (U+0009)"
214+
"Space character (0x20)"
215+
"Horizontal-tab character (0x09)"
217216
-->
218217

219218
<pre>
@@ -223,8 +222,8 @@ white-space-character::
223222

224223
<i id="grammar-white-space-character">white-space-character::</i>
225224
<i><a href="#grammar-new-line">new-line</a></i>
226-
Space character (U+0020)
227-
Horizontal-tab character (U+0009)
225+
Space character (0x20)
226+
Horizontal-tab character (0x09)
228227
</pre>
229228

230229
**Semantics**
@@ -290,7 +289,7 @@ name::
290289
291290
name-nondigit::
292291
nondigit
293-
"one of the characters U+0080–U+00ff"
292+
"one of the characters 0x80–0xff"
294293
295294
nondigit:: one of
296295
'_'
@@ -324,7 +323,7 @@ nondigit:: one of
324323

325324
<i id="grammar-name-nondigit">name-nondigit::</i>
326325
<i><a href="#grammar-nondigit">nondigit</a></i>
327-
one of the characters U+0080–U+00ff
326+
one of the characters 0x80–0xff
328327

329328
<i id="grammar-nondigit">nondigit:: one of</i>
330329
_
@@ -344,7 +343,7 @@ Names are used to identify the following: [constants](06-constants.md#general),
344343
and names in [heredoc](#heredoc-string-literals) and [nowdoc comments](#nowdoc-string-literals).
345344

346345
A *name* begins with an underscore (_), *name-nondigit*, or extended
347-
name character in the range U+0080–-U+00ff. Subsequent characters can
346+
name character in the range 0x80–-0xff. Subsequent characters can
348347
also include *digits*. A *variable name* is a name with a leading
349348
dollar ($).
350349

@@ -704,7 +703,7 @@ b-prefix:: one of
704703
**Semantics**
705704

706705
A single-quoted string literal is a string literal delimited by
707-
single-quotes (`'`, U+0027). The literal can contain any source character except
706+
single-quotes (`'`, 0x27). The literal can contain any source character except
708707
single-quote (`'`) and backslash (`\\`), which can only be represented by
709708
their corresponding escape sequence.
710709

@@ -807,7 +806,7 @@ codepoint-digits::
807806
**Semantics**
808807

809808
A double-quoted string literal is a string literal delimited by
810-
double-quotes (`"`, U+0022). The literal can contain any source character except
809+
double-quotes (`"`, 0x22). The literal can contain any source character except
811810
double-quote (`"`) and backslash (`\\`), which can only be represented by
812811
their corresponding escape sequence. Certain other (and sometimes
813812
non-printable) characters can also be expressed as escape sequences.
@@ -821,15 +820,15 @@ in the table below:
821820

822821
Escape sequence | Character name | Unicode character
823822
--------------- | --------------| ------
824-
\$ | Dollar sign | U+0024
825-
\" | Double quote | U+0022
826-
\\ | Backslash | U+005C
827-
\e | Escape | U+001B
828-
\f | Form feed | U+000C
829-
\n | New line | U+000A
830-
\r | Carriage Return | U+000D
831-
\t | Horizontal Tab | U+0009
832-
\v | Vertical Tab | U+000B
823+
\$ | Dollar sign | 0x24
824+
\" | Double quote | 0x22
825+
\\ | Backslash | 0x5C
826+
\e | Escape | 0x1B
827+
\f | Form feed | 0x0C
828+
\n | New line | 0x0A
829+
\r | Carriage Return | 0x0D
830+
\t | Horizontal Tab | 0x09
831+
\v | Vertical Tab | 0x0B
833832
\ooo | 1–3-digit octal digit value ooo
834833
\xhh or \Xhh | 1–2-digit hexadecimal digit value hh
835834
\u{xxxxxx} | UTF-8 encoding of Unicode codepoint U+xxxxxx | U+xxxxxx

spec/10-expressions.md

Lines changed: 3 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -1428,7 +1428,7 @@ shell-command-expression:
14281428
` <i><a href="09-lexical-structure.md#grammar-dq-char-sequence">dq-char-sequence</a></i><sub>opt</sub> `
14291429
</pre>
14301430

1431-
where \` is the GRAVE ACCENT character U+0060, commonly referred to as a
1431+
where \` is the GRAVE ACCENT character 0x60, commonly referred to as a
14321432
*backtick*.
14331433

14341434
**Semantics**
@@ -2804,9 +2804,9 @@ character from the right-hand operand is stored at the designated
28042804
location; all other characters in the right-hand operand string are
28052805
ignored. If the designated location is beyond the end of the
28062806
destination string, that string is extended to the new length with
2807-
spaces (U+0020) added as padding beyond the old end and before the newly
2807+
spaces (0x20) added as padding beyond the old end and before the newly
28082808
added character. If the right-hand operand is an empty string, the null
2809-
character \\0 (U+0000) is stored.
2809+
character \\0 (0x00) is stored.
28102810

28112811
**Examples**
28122812

spec/19-grammar.md

Lines changed: 6 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -32,9 +32,9 @@ The grammar notation is described in [Grammars section](09-lexical-structure.md#
3232
Any source character except <i><a href="#grammar-new-line">new-line</a></i>
3333

3434
<i id="grammar-new-line">new-line::</i>
35-
Carriage-return character (U+000D)
36-
Line-feed character (U+000A)
37-
Carriage-return character (U+000D) followed by line-feed character (U+000A)
35+
Carriage-return character (0x0D)
36+
Line-feed character (0x0A)
37+
Carriage-return character (0x0D) followed by line-feed character (0x0A)
3838

3939
<i id="grammar-delimited-comment">delimited-comment::</i>
4040
/* No characters or any source character sequence except */ */
@@ -45,8 +45,8 @@ The grammar notation is described in [Grammars section](09-lexical-structure.md#
4545

4646
<i id="grammar-white-space-character">white-space-character::</i>
4747
<i><a href="#grammar-new-line">new-line</a></i>
48-
Space character (U+0020)
49-
Horizontal-tab character (U+0009)
48+
Space character (0x20)
49+
Horizontal-tab character (0x09)
5050

5151
<i id="grammar-token">token::</i>
5252
<i><a href="#grammar-variable-name">variable-name</a></i>
@@ -80,7 +80,7 @@ The grammar notation is described in [Grammars section](09-lexical-structure.md#
8080

8181
<i id="grammar-name-nondigit">name-nondigit::</i>
8282
<i><a href="#grammar-nondigit">nondigit</a></i>
83-
one of the characters U+0080–U+00ff
83+
one of the characters 0x80–0xff
8484

8585
<i id="grammar-nondigit">nondigit:: one of</i>
8686
_

0 commit comments

Comments
 (0)