55A [ script] ( 04-basic-concepts.md#program-structure ) is an ordered sequence of characters. Typically, a
66script has a one-to-one correspondence with a file in a file system, but
77this correspondence is not required.
8+ PHP scripts are parsed as a series of 8-bit bytes, rather than code points from Unicode or any other character repertoire.
9+ Within this specification, bytes are represented by their ASCII interpretations where these are printable characters.
810
911Conceptually speaking, a script is translated using the following steps:
1012
11- 1 . Transformation, which converts a script from a particular character
12- repertoire and encoding scheme into a sequence of 8-bit characters.
13-
14- 2 . Lexical analysis, which translates a stream of input characters into
13+ 1 . Lexical analysis, which translates a stream of input characters into
1514 a stream of tokens.
1615
17- 3 . Syntactic analysis, which translates the stream of tokens into
16+ 2 . Syntactic analysis, which translates the stream of tokens into
1817 executable code.
1918
2019Conforming implementations must accept scripts encoded with the UTF-8
@@ -145,9 +144,9 @@ input-character::
145144 "Any source character except" new-line
146145
147146new-line::
148- "Carriage-return character (U+000D )"
149- "Line-feed character (U+000A )"
150- "Carriage-return character (U+000D ) followed by line-feed character (U+000A )"
147+ "Carriage-return character (0x0D )"
148+ "Line-feed character (0x0A )"
149+ "Carriage-return character (0x0D ) followed by line-feed character (0x0A )"
151150
152151delimited-comment::
153152 '/*' "No characters or any source character sequence except */" '*/'
@@ -170,9 +169,9 @@ delimited-comment::
170169 Any source character except <i ><a href =" #grammar-new-line " >new-line</a ></i >
171170
172171<i id =" grammar-new-line " >new-line::</i >
173- Carriage-return character (U+000D )
174- Line-feed character (U+000A )
175- Carriage-return character (U+000D ) followed by line-feed character (U+000A )
172+ Carriage-return character (0x0D )
173+ Line-feed character (0x0A )
174+ Carriage-return character (0x0D ) followed by line-feed character (0x0A )
176175
177176<i id =" grammar-delimited-comment " >delimited-comment::</i >
178177 /* No characters or any source character sequence except */ */
@@ -212,8 +211,8 @@ white-space::
212211
213212white-space-character::
214213 new-line
215- "Space character (U+0020 )"
216- "Horizontal-tab character (U+0009 )"
214+ "Space character (0x20 )"
215+ "Horizontal-tab character (0x09 )"
217216-->
218217
219218<pre >
@@ -223,8 +222,8 @@ white-space-character::
223222
224223<i id =" grammar-white-space-character " >white-space-character::</i >
225224 <i ><a href =" #grammar-new-line " >new-line</a ></i >
226- Space character (U+0020 )
227- Horizontal-tab character (U+0009 )
225+ Space character (0x20 )
226+ Horizontal-tab character (0x09 )
228227</pre >
229228
230229** Semantics**
@@ -290,7 +289,7 @@ name::
290289
291290name-nondigit::
292291 nondigit
293- "one of the characters U+0080–U+00ff "
292+ "one of the characters 0x80–0xff "
294293
295294nondigit:: one of
296295 '_'
@@ -324,7 +323,7 @@ nondigit:: one of
324323
325324<i id =" grammar-name-nondigit " >name-nondigit::</i >
326325 <i ><a href =" #grammar-nondigit " >nondigit</a ></i >
327- one of the characters U+0080–U+00ff
326+ one of the characters 0x80–0xff
328327
329328<i id =" grammar-nondigit " >nondigit:: one of</i >
330329 _
@@ -344,7 +343,7 @@ Names are used to identify the following: [constants](06-constants.md#general),
344343and names in [ heredoc] ( #heredoc-string-literals ) and [ nowdoc comments] ( #nowdoc-string-literals ) .
345344
346345A * name* begins with an underscore (_ ), * name-nondigit* , or extended
347- name character in the range U+0080–-U+00ff . Subsequent characters can
346+ name character in the range 0x80–-0xff . Subsequent characters can
348347also include * digits* . A * variable name* is a name with a leading
349348dollar ($).
350349
@@ -704,7 +703,7 @@ b-prefix:: one of
704703** Semantics**
705704
706705A single-quoted string literal is a string literal delimited by
707- single-quotes (` ' ` , U+0027 ). The literal can contain any source character except
706+ single-quotes (` ' ` , 0x27 ). The literal can contain any source character except
708707single-quote (` ' ` ) and backslash (` \\ ` ), which can only be represented by
709708their corresponding escape sequence.
710709
@@ -807,7 +806,7 @@ codepoint-digits::
807806** Semantics**
808807
809808A double-quoted string literal is a string literal delimited by
810- double-quotes (` " ` , U+0022 ). The literal can contain any source character except
809+ double-quotes (` " ` , 0x22 ). The literal can contain any source character except
811810double-quote (` " ` ) and backslash (` \\ ` ), which can only be represented by
812811their corresponding escape sequence. Certain other (and sometimes
813812non-printable) characters can also be expressed as escape sequences.
@@ -821,15 +820,15 @@ in the table below:
821820
822821Escape sequence | Character name | Unicode character
823822--------------- | --------------| ------
824- \$ | Dollar sign | U+0024
825- \" | Double quote | U+0022
826- \\ | Backslash | U+005C
827- \e | Escape | U+001B
828- \f | Form feed | U+000C
829- \n | New line | U+000A
830- \r | Carriage Return | U+000D
831- \t | Horizontal Tab | U+0009
832- \v | Vertical Tab | U+000B
823+ \$ | Dollar sign | 0x24
824+ \" | Double quote | 0x22
825+ \\ | Backslash | 0x5C
826+ \e | Escape | 0x1B
827+ \f | Form feed | 0x0C
828+ \n | New line | 0x0A
829+ \r | Carriage Return | 0x0D
830+ \t | Horizontal Tab | 0x09
831+ \v | Vertical Tab | 0x0B
833832\ooo | 1–3-digit octal digit value ooo
834833\xhh or \Xhh | 1–2-digit hexadecimal digit value hh
835834\u{xxxxxx} | UTF-8 encoding of Unicode codepoint U+xxxxxx | U+xxxxxx
0 commit comments