diff --git a/00-about-this-tutorial.md b/00-about-this-tutorial.md index 4a351d1..a4a5d61 100644 --- a/00-about-this-tutorial.md +++ b/00-about-this-tutorial.md @@ -2,52 +2,55 @@ ## Why (Modern) C++? -C++ is a living programming language. What began as Bjarne Stroustrup's research project at Bell Labs in the late 1970s has grown into an ISO-Standardized language and library, with regular new releases co-ordinated by an ISO Committee. During that time countless over-hyped programming languages have appeared (and disappeared), whilst C++ has continually "borrowed" new ideas from other contemporary or newer languages. +C++ is a living programming language. What began as Bjarne Stroustrup's research project at Bell Labs in the late 1970s has grown into an ISO-Standardized language and library, with regular new releases co-ordinated by an ISO Committee. During that time countless over-hyped programming languages have appeared (and disappeared), whilst C++ has improved by continually "borrowing" new ideas from other contemporary or newer languages. -The main reasons for C++'s success story are speed and stability; well-written programs compile to efficient code and are reliable after deployment. C++ is an AOT (ahead-of-time) compiled language, and usually compiles straight to the machine language of the target processor; this means that full (re-)compilation must take place for each deployment to a new platform architecture. The source-code must be available, under either a closed- or open-source license, in order to compile to a specific platform; also it is not possible to recreate C++ source code by reverse-engineering an executable (or library) binary object. C++ is scalable and runs on everything from 32K RAM Arduino up to supercomputers, supporting small, compact programs with little link-time overhead as well as projects containing millions of lines of source code. +The main reasons for C++'s success story are speed and stability; well-written programs compile to efficient code and are reliable after deployment. C++ is an AOT (ahead-of-time) compiled language, and usually compiles straight to the machine language of the target processor (although [WebAssembly is another possiblity](https://developer.mozilla.org/en-US/docs/WebAssembly/C_to_Wasm)); this means that full (re-)compilation must take place for each deployment to a new platform architecture. In order for this to happen, the source-code must be available (under either a closed- or open-source license); it is not usually possible to recreate C++ source code by reverse-engineering an executable (or library) binary object. C++ is scalable and runs on everything from 32K RAM Arduino up to supercomputers, supporting small, compact programs with low link-time overhead as well as projects containing millions of lines of source code and many Megabytes of object code. -Modern C++ is a massive improvement over earlier versions. It is in many ways unrecognizable to the C++ of the late 1980s and early 1990s when the reference compiler, called "Cfront", was written entirely by Stroustrup and compiled C++ to C, not assembly language. Since C++ ceased to be a largely one-man effort in the mid-1990s there have been numerous ISO Standards published; these are commonly known as C++98, C++03, C++11, C++14, C++17 and C++20, with the next release expected to be called C++23. Of these, C++11 and C++20 have seen the biggest changes to the language, and have caused the most work for compiler writers. +Modern C++ (C++11 and newer) is a massive improvement over earlier versions. It is in many ways unrecognizable to the C++ of the late 1980s and early 1990s when the reference compiler, called "Cfront", was written entirely by Stroustrup and compiled C++ to C. Since C++ ceased to be a largely one-man effort in the mid-1990s there have been numerous ISO Standards published; these are commonly known as C++98, C++03, C++11, C++14, C++17, C++20 and C++23, with the next release expected to be called C++26. Of these Standards, C++11 and C++20 have seen the biggest changes to the language, and have caused the most work for compiler writers. -Of course, it is far easier to add features to a language than remove them (this phenomenon is known in the industry as "feature creep"), and the later versions of C++ are almost entirely a feature superset compared to previous versions (with some redundant features deprecated or removed). "Remember the Vasa!" is something Stroustrup is quoted as saying comparatively recently (referring to the wooden Swedish battleship that capsized and sank on its maiden voyage) and it is clear that he would be wary of continuing to promote a top-heavy, Christmas-wishlist-featured language. The good news is that almost all C++ programs written in the last twenty-or-so years will compile with few or no modifications using the latest compilers. Also, Modern C++ adds features that are useful even in comparatively simple programs, making them easier to comprehend; cleaning up overly verbose and non-intuitive syntax has been a major focus for newer versions of C++. +Of course, it is far easier to add features to a language than remove them (this phenomenon is known in the industry as *feature creep*), and later versions of C++ are almost entirely a feature superset compared to previous versions (with some obsolete or superseded features deprecated or removed). "Remember the Vasa!" is something Stroustrup is quoted as saying comparatively recently (referring to the wooden Swedish battleship that capsized and sank on its maiden voyage) and it is clear that he would be wary of advocating a top-heavy, "Christmas Wishlist"-featured language. The good news is that almost all C++ programs written in the last twenty-or-so years will compile with few modifications (or none) using the latest compilers. Also, Modern C++ adds features that are useful even in comparatively simple programs, making them easier to comprehend; cleaning up overly verbose and non-intuitive syntax has been a major focus for newer versions of C++, and much legacy code can be easily refactored into a more modern style. ## Why this Tutorial? -I have been learning and using C++ since around 2002, and Modern C++ is the language I wish had been available to me back then; I believe it is an easier language to learn to use productively than older versions. Of course, there have been many books written about C++ over the years, but unavoidably most of these will be (badly) out-of-date. Even [Stroustrup's](https://stroustrup.com) "The C++ Programming Language" (Fourth Edition)[^1] only covers up to and including C++11. +I have been learning and using C++ since around 2002, and Modern C++ is the language I wish had been available to me back then. I believe it is an easier language to learn to use productively than older versions. Of course, there have been many books written about C++ over the years, but unavoidably most of these will be (badly) out-of-date. Even [Stroustrup's](https://stroustrup.com) "The C++ Programming Language" (Fourth Edition)[^1] only covers up to and including C++11. -This online self-study course has been put together as a personal project designed to be accessible and comprehensible to anyone without any previous experience of programming in compiled languages; hopefully, no material is presented without suitable context and explanation. The format is a conversational-tone introduction to a topic followed by an example program (in many cases intended to be compiled and run, with all source code [available on GitHub](https://github.com/cpp-tutor/learnmoderncpp-tutorial/)[^2]). Then a list of points-of-interest follows the code, often followed by a number of possible modifications that you (the reader) are invited to make in order to cement your understanding. Each Chapter introduces several related topics. - -Most people "learn by doing" (kinesthetic learning) and for decades it has been recognized that the only way to become proficient in a programming language is to write working programs in it. There are dozens (hundreds?) of C++ video tutorials available elsewhere on the Web, and if these are more suited to your learning style, then great! However, I believe that written information is more likely to be taken in than spoken, both at first reading and in review. The course materials (this Tutorial) have been made freely available and are intended to be improved over time. Spelling, technical and other errors will be present, and I welcome feedback and suggestions for improvement; please leave a comment or drop me an email, thanks! +This online self-study course has been put together as a personal project designed to be accessible and comprehensible to anyone with no previous experience of programming in compiled languages; it is intended that all material is presented with suitable context and explanation. The format is a conversational-tone introduction to a topic followed by an example program, many of which can be compiled and run; all source code is [available on GitHub](https://github.com/cpp-tutor/learnmoderncpp-tutorial/)[^2] and the whole tutorial is available as a set of [Jupyter notebooks](https://github.com/cpp-tutor/learnmoderncpp-tutorial/blob/main/jupyter-notebooks/)[^3]. Then a list of points-of-interest follows the code, often followed by a number of possible modifications that you (the reader) are invited to make in order to cement your understanding. Each Chapter introduces several related topics, and the suggested [coding assignments](http://learnmoderncpp.com/coding-assignments/) have been created to increase your experience with the material presented. + +Most people "learn by doing" (kinesthetic learning), and for decades it has been recognized that the only way to become proficient in a programming language is to write small- to medium-sized (working) programs in it. There are dozens (hundreds?) of C++ video tutorials available elsewhere on the Web, and if these are more suited to your learning style, then great! In my experience, written information is more likely to be taken in than spoken, both at first reading and in review. The course materials (this Tutorial) have been made freely available and are intended to be improved over time. Spelling, technical and other errors may still be present, and I welcome feedback and suggestions for improvement; please leave a comment or drop me an email, thanks! ## Which compiler should I choose when learning C++? -The best-known C++ compilers are Microsoft Visual Studio C++[^3] (MSVC, cl), the FSF GNU Compiler Collection[^4] (GCC, g++) and Apple's Clang/LLVM[^5] (clang++). All of these aggressively follow the progress of the ISO Standard and are under (ongoing) heavy development. If you are programming under Windows, you may want to seriously consider Microsoft's Visual Studio, which comes with an IDE and C++ compiler (the latter as an optional component, this needs to be selected in the installer) and has cost-free variants for students and individuals; it is also possible to use Clang (via the installer) as a drop-in replacement within Visual Studio (clang-cl). If you're a Mac user, you'll probably want to install Xcode, which includes Clang/LLVM. Linux users probably have both GCC and Clang packaged for their distribution, and Clang's libc++ may be available too. Also, Stephan T. Lavavej (primary architect of MSVC) makes MinGW (g++ for Windows) available from his [personal website](https://nuwen.net/mingw.html)[^6] (note: this does not require Visual Studio to be installed, but can coexist on the same machine). +The best-known C++ compilers are Microsoft Visual Studio C++[^4] (MSVC, cl), the FSF GNU Compiler Collection[^5] (GCC, g++) and Apple's Clang/LLVM[^6] (clang++). All of these aggressively follow the progress of the ISO Standard and are under (ongoing) heavy development. If you are programming under Windows, you may want to seriously consider Microsoft's Visual Studio, which comes with an IDE and C++ compiler (the latter as an optional component, this needs to be selected in the installer) and has cost-free variants for students and individuals. It is also possible to use Clang (via the installer) as a drop-in replacement within Visual Studio (clang-cl). If you're a Mac user, you'll probably want to install Xcode, which includes Clang/LLVM. Linux users probably have both GCC and Clang packaged for their distribution, and Clang's libc++ should be available too. Also, Stephan T. Lavavej (primary architect of MSVC) makes MinGW (g++ for Windows) available from his [personal website](https://nuwen.net/mingw.html)[^7]. (Note: this does not require Visual Studio to be installed, and both can coexist on the same machine.) -Any of these compiler/OS combinations are suitable for new C++ coders, with Clang giving possibly the most user-friendly error messages, which could be an important factor. In fact to start learning C++, you don't even need to install a compiler on your system if you can't, or don't want to; head to the [Compiler Explorer](https://godbolt.org)[^7] which provides just about every available C++ compiler (and versions including trunk development), and can even execute your program code (although interactive programs are not supported). +Any of these compiler/OS combinations are suitable for new C++ coders, with Clang giving possibly the most user-friendly error messages, which can be an important consideration. In fact, to start learning C++, you don't even need to install a compiler on your system; if you can't, or don't want to, instead head over to the [Compiler Explorer](https://godbolt.org)[^8]. This site provides just about every available C++ compiler (including trunk and experimental development branches), and can even execute your program code (while interactive input sessions are not supported, the running program's input, or stdin, is able to be set in advance). Learning any (programming) language is hard, and of course you will make many errors when starting out. Don't forget, the error messages your compiler gives you are designed to help you write correct code. In a coding/debugging session it can often feel as if you are "fighting the compiler", which can at times be frustrating. Try to think of it this way: if coding is musicianship, the compiler is your instrument. Treat it carefully, learn its nuances, and listen to it with the experience that comes from practice. ## Compiling your first program -Plain text C++ *source code* files must be translated into a binary form understood by the hardware in order to be *executed*, or run. This translation process is called *compilation*, and has to be repeated every time the source code is modified. It is assumed you have a C++ compiler installed and working; for the purposes of testing the code examples in this Tutorial I used Microsoft Visual C++ 16.7.6 (later versions should work too). Under Windows it is possible to use the command line and the installer should have created Start Menu entries called "x64 Native Tools Command Prompt for VS 2022", or similar, so simply select this and a Command Window (shell) should open showing a message such as `[vcvarsall.bat] Environment initialized for: 'x64'`. Under Linux and MacOS simply open a new shell and navigate with `cd` to wherever the source file is located. +Plain text C++ *source code* files must first be translated into a binary form understood by the hardware in order to be *executed*, or run. This translation process is called *compilation*, and has to be repeated every time the source code is modified. It is assumed you have a C++ compiler installed and working; for the purposes of testing the code examples in this Tutorial I have used Microsoft Visual C++ 17.8.4 (later versions should work too). The interactive Jupyter notebooks require a C++ kernel (code execution manager) to be available in order for Ctrl-Enter in a "code cell" to run the code and outputs the result immediately underneath. Kernel `cpp23` from Python's "pip" package `jupyter-cpp-kernel` was used to test all of the code samples which were complete programs (starting with `#include <...>`). + +Under Windows it is possible to use the command line, and the installer should have created Start Menu entries called "x64 Native Tools Command Prompt for VS 2022", or similar, so simply select this and a Command Window (shell) should open showing a message such as `[vcvarsall.bat] Environment initialized for: 'x64'`. Under Linux and MacOS simply open a new shell and navigate with `cd` to wherever the source file is located; system compilers will be run from `/usr/bin`, while other variants may need the `PATH` environment variable to be set. The simple programs we will introduce and create will consist of just one source file each, as in the first example program `01-hellow.cpp` available from GitHub[^2] and also listed in full in the first Chapter of this Tutorial. To compile this program use one of the following: * **MSVC**: `cl /EHsc /std:c++latest 01-hellow.cpp` -* **GCC**: `g++ -o 01-hellow -std=c++20 01-hellow.cpp` -* **Clang**: `clang++ -o 01-hellow -std=c++20 -stdlib=libc++ 01-hellow.cpp` +* **GCC**: `g++ -o 01-hellow -std=c++23 01-hellow.cpp` +* **Clang**: `clang++ -o 01-hellow -std=c++23 -stdlib=libc++ 01-hellow.cpp` -Note: Different options are used with the different compilers and when using `import` instead of `#include`. +Note: Different options are used with the different compilers, and when using `import` (modules) instead of `#include` (headers). With Clang, `-stdlib=libc++` should be both available, and optional. The graphic below outlines the steps for compiling the modules version of this program from scratch with MSVC, including [creating the Standard C++ Modules file](https://learn.microsoft.com/en-us/cpp/cpp/tutorial-import-stl-named-module?view=msvc-170). Only MSVC (under Windows) and Clang (MacOS or Linux) fully support `import std;` at the present time. -Successful compilation produces an *executable binary* called `01-hellow.exe` (under Windows) or `01-hellow` (MacOS or Linux). This can be run by typing `01-hellow.exe` (or just `01-hellow`) into a Windows console (see graphic below), or by typing `./01-hellow` into a MacOS or Linux Terminal (both assuming that the executable is located in the current directory). I don't recommend running the program by double-clicking it in an Explorer or File Manager window as any output may be lost as the program exits, so your program may not actually appear to do anything! +![Output from compiling and running the program under Windows 10](https://learnmoderncpp.files.wordpress.com/2024/01/compile-module.png) -![Output from running the above program under Windows 10](https://learnmoderncpp.files.wordpress.com/2023/02/compile-console.png) +Successful compilation produces an *executable binary* called `01-hellow.exe` (under Windows) or `01-hellow` (MacOS or Linux). This can be run by typing `01-hellow.exe` (or just `01-hellow`) into a Windows console (see graphic above), or by typing `./01-hellow` into a MacOS or Linux Terminal (both assuming that the executable is located in the current directory). Running the program by double-clicking it in an Explorer or File Manager window is not recommended as any output may be lost as the window closes at program exit, so your program may not actually appear to do anything! [^1]: https://stroustrup.com [^2]: https://github.com/cpp-tutor/learnmoderncpp-tutorial/ -[^3]: https://visualstudio.microsoft.com -[^4]: https://gcc.gnu.org -[^5]: https://clang.llvm.org -[^6]: https://nuwen.net/mingw.html -[^7]: https://godbolt.org - -*All text and program code ©2019-2022 Richard Spencer, all rights reserved.* +[^3]: https://github.com/cpp-tutor/learnmoderncpp-tutorial/blob/main/jupyter-notebooks/ +[^4]: https://visualstudio.microsoft.com +[^5]: https://gcc.gnu.org +[^6]: https://clang.llvm.org +[^7]: https://nuwen.net/mingw.html +[^8]: https://godbolt.org + +*All text and program code ©2019-2025 Richard Spencer, all rights reserved.* \ No newline at end of file diff --git a/01-string-and-character-literals.md b/01-string-and-character-literals.md index 04282df..e7cbf21 100644 --- a/01-string-and-character-literals.md +++ b/01-string-and-character-literals.md @@ -1,58 +1,57 @@ -# String and Character Literals +# String and Character Literals ## Introducing a Modern C++ program -Convention dictates that a first program should output the programmer's cry of "Hello, World!" to the screen, and do no more (or less). This is useful in order to test that the compilation environment is fully functional in terms of executable paths, header files, link libraries etc. The C++ version of this program is shown below: +Convention dictates that a first program should output the programmer's timeless cry of "Hello, World!" to the screen, and do no more (or less). This is often useful in order to test that the compilation environment is fully functional in terms of executable paths, header files, link libraries etc. The most up-to-date C++ version of this program is shown below: ```cpp // 01-hellow.cpp : prints a line of text to the console -#include +#include using namespace std; int main() { - cout << "Hello, World!" << '\n'; + println("Hello, World!"); } ``` +If you prefer not to cut-and-paste, this source file is included in the zip archive linked from this site.[^1] If you are reading this as a Jupyter notebook, clicking within the code cell and then either pressing the "play" button on the menu bar, or typing Ctrl-Enter, should compile and run the program, showing any resulting output immediately below. (Notebooks previewed on GitHub are not functional in this way, however all of the content is displayed.) -If you prefer not to cut-and-paste, this source file is included in the zip archive linked from this site.[^1] Having seen what this program does (admittedly not that much), let's explore how it is put togther: +Having seen what this program does (admittedly not that much), let's explore how it is put together: -* The first line is a comment; syntax of comments are discussed in more detail later in this Chapter. I've chosen to repeat the filename of the source code file in the comment, and also to summarize the purpose of the program. This summary is intended to be useful to anybody who later reads the code, including possibly the original author! +* The first line is a comment; syntax of comments are discussed in more detail later in this Chapter. I've chosen to repeat the filename of the source code file in the comment, and also to summarize the purpose of the program. This summary is intended to be useful to anybody who later reads the code, possibly including the original author! -* Then comes some *boilerplate* code, that is common code that we'll see again in future programs we write. The *include directive* is a command interpreted by the *pre-processor* which pastes the entire contents of the relevant *header file* (and any other files it `#include`s) at that point. These directives are being phased out of Modern C++ in favor of the `import` keyword (which has the potential to speed up compilation times significantly), but it is likely the transition will take years to complete. Both Clang/LLVM and MSVC implement `import` although currently extra command-line switches are needed. +* Then comes some *boilerplate* code, which is common code that we'll see again in future programs we write. The *include directive* is a command interpreted by the *pre-processor* which pastes the entire contents of the relevant *header file* (and any other files it `#include`s) at that point into the *compilation unit*. These directives are being phased out of Modern C++ in favor of the `import` keyword (which has the potential to speed up compilation times significantly), but it is likely the transition will take years to complete. Both Clang/LLVM and MSVC implement `import` although extra command-line switches are needed currently. -* The next line `using namespace std;` is another directive which makes available all of the elements of the Standard Namespace (abbreviated as `std`) available to the global scope (that is, the scope in which the directive appears). Many experienced programmers would consider this *namespace pollution* bad form, preferring instead to use the *fully qualified names* of the individual components, however I have chosen to use it in all of the the example programs we will see in this Tutorial. The name "Standard" comes from the definition of the C++ Library's classes, functions and other facilities as defined by the ISO Standardization Committee. Programs can use any part of the Standard Library and be expected to compile on any compiler/platform combination without modification. +* The next line `using namespace std;` is another directive which makes available all of the elements of the Standard Namespace (abbreviated as `std`) available tothe scope in which the directive appears (in this case global scope). Many experienced programmers would consider this *namespace pollution* bad form, preferring instead to use the *fully qualified names* of the individual components. I have chosen to use it in all of the the example programs we will see in this Tutorial as it facilitates better readability of (and familiarity with) the component names. The name "Standard" comes from the definition of the C++ Library's classes, functions and other facilities as defined by the ISO Standardization Committee. Programs can use any part of the Standard Library and be expected to compile on any compiler/platform combination without modification. -* Next we have a function definition, which is for the `main()` function; here the *parentheses*, or brackets, indicate an unused (or empty) *parameter list*. Every executable C++ program has to have exactly one `main()` in order for it to be able to be linked into an executable binary, and this is where execution begins when the program is run. (This is almost true since global *objects*, if any, will have their constructor functions called before `main()` is entered.) +* Next we have a function definition, which is for the `main()` function; here the *parentheses*, or brackets, indicate an unused (or empty) *parameter list*. Every executable C++ program has to have exactly one `main()` in order for it to be able to be linked into an executable binary, and this is where execution begins when the program is run. (This is almost true: you should be aware that global *objects*, if any, will have their constructors called before `main()` is entered.) -* The `int` is the *return type* of `main()`, although specifying `return` within the function body is optional; failing to specify it causes a value of zero (indicating successful completion) to be returned to the calling environment or process as the *system return code*. +* The `int` specifies the correct *return type* of `main()`, although unique to `main()` specifying `return` within the function body is in fact optional; failing to specify it causes a value of zero (indicating successful completion) to be returned to the calling environment or process as the *system return code*. -* Curly braces `{` and `}` are used to delimit a *block* of code, in this case the *body* of the `main()` function. The convention of putting the opening brace `{` at the end of the line instead of on a line by itself follows the "One True Brace Style" (or *1TBS* for short) popularized for the C programming language. I use it both here and in future example programs because it saves on vertical space, and works better with code-folding modes found in many text editors. Some people feel very strongly about whitespace and formatting conventions in their code; your organization will almost certainly have its own coding standards (which you will have to follow even if you don't agree with them!) I highly recommend the [Clang-Tidy](https://clang.llvm.org/extra/clang-tidy/)[^2] utility, which exists as a plugin to many IDEs and can be used to automatically reformat source code to a pre-defined set of rules. +* Curly braces `{` and `}` are used to delimit a *block* of code, in this case the *body* of the `main()` function. The convention of putting the opening brace `{` at the end of the line instead of on a line by itself follows the "One True Brace Style" (or *1TBS* for short) popularized for the C programming language. I use it both here and in future example programs because it saves on vertical space, and works better with code-folding modes found in many text editors. Some people feel very strongly about whitespace and formatting conventions in their code; your organization will almost certainly have its own coding standards (which you will have to follow even if you don't agree with them!) I highly recommend the [Clang-Format](https://clang.llvm.org/docs/ClangFormat.html)[^2] utility, which exists as a plugin to many IDEs and can be used to automatically reformat source code to a pre-defined set of rules. -* The next line, within the body of `main()`, uses the `cout` object from the Standard Library (the name is an abbreviation of "Character Output", and is analogous to C's `stdout`). This is followed by the *stream insertion operator* `<<` and a *string literal* followed by a *character literal* (with an additional stream insertion operator between them). A string literal is delimited by a matching pair of double quotes (`"`) on the same line, and can contain any number of printable and escaped non-printable characters. A character literal is delimited by a matching pair of single quotes (`'`) and contains exactly one printable or escaped non-printable character. To be used **within** any literal, both of these types of quotes need to be *escaped* by preceding them with a backslash (`\`). Certain other codes have to be *escape sequences* as well, with `\n` representing new-line; for a complete list see the table later in this Chapter. +* The only part of this program which appears to perform an action is within the body of `main()`. It is a call to the C++ library function `println()` (new to C++23, previously *stream objects* would have been used) which outputs its *format string* plus parameters (if any) followed by a new-line sequence. (The `print()` function works identically but omits the trailing new-line.) Output is sent directly to `stdout` (the C-Library's Standard Output) which implies that C++ streams are not used at all. -So, you wanted to learn about C++'s *Object-Oriented Programming* (OOP) capabilities? Well, we've already used `cout`, which is a C++ *object*. Maybe you've heard about *Operator Overloading* (OO) too? We've used that too; the `<<` operator is *overloaded* to perform stream output. As we have seen even in this simple program, entities can be *chained* so `cout` can handle more than one entity to be output at once, which is implemented just as in composed arithmetic operations. Maybe you've heard about C++ supporting *generics* through the `template` keyword? You'll be pleased to learn that the Library functions selected to correctly handle output of, firstly, a pointer to a string literal, and secondly, a character literal value, are selected at compile-time (as opposed to run-time) for maximum performance and reliability. Hopefully, it soon becomes apparent that complex capabilities can lead to easily comprehensible client code compared with leaner ("simpler") programming languages, such as C. +* A *string literal*, used to write the format string within code, is delimited by a matching pair of double quotes (`"`) on the same line, and can contain any number of printable and escaped, non-printable characters. It would usually be stored verbatim in a read-only data segment of the final executable. A *character literal* however is delimited by a matching pair of single quotes (`'`) and contains exactly one printable or escaped, non-printable character. To be used **within** any literal, both of these types of quotes need to be *escaped* by preceding them with a backslash (`\`). Certain other codes have to be *escape sequences* as well, with `\n` representing new-line; for a complete list see the table later in this Chapter. -**Experiment:** - -* Adapt the above program (perhaps calling the modified version `01-hellow2.cpp`) to print the new-line character from within the string literal, removing the character literal and the now unneeded second stream insertion operator from the code. Is the output identical? +Maybe you've heard about C++ supporting *generics* through the `template` keyword? Even this simple program only works due to the use of *template instantiation* (`std::print()` is in fact a *generic function*), which is in simple terms creating code to be compiled from the provided parameter(s). Hopefully, it soon becomes apparent that support for such capabilities can lead to easily comprehensible client code compared with leaner ("simpler") programming languages, such as C. -* Now go back and change `\n` to `endl`. Does the program produce the same output? Is there still a need for a semi-colon at the end of the line? +**Experiment:** -* Now modify the program to print `Hello` using only *character* literals. Can you get this to compile first time, without making a typo? (Hint: syntax highlighting will help you a lot.) +* Adapt the above program (perhaps calling the modified version `01-hellow2.cpp`) to print the new-line character from within the string literal, using the `print()` function instead. Is the output identical? * Move the using-directive in the original program to within `main()`, and make sure the program still compiles. Does its position within `main()` matter? -* Now use a using-statement `using std::cout;` *instead* of `using namespace std;`. Are there any other changes you need to make to the code? - -* Change all occurrences of `cout` to `cerr` and rerun the program. Can you find any difference in how the program performs? (Hint: `cerr` is the error logging stream and is unbuffered, unlike `cout`; try using shell redirection commands eg. `01-hellow > output.txt` under Windows, `./01-hellow > output.txt` under Linux/MacOS.) +* Now use a using-statement `using std::println;` *instead* of `using namespace std;`. Are there any other changes you need to make to the code? -* Finally, go back to the version using `cout` and try omitting any `using` statement at all, and prefix `cout` with `std::`. Check this code compiles, and then consider whether you prefer this use of *fully qualified* Standard Library entities. Personally I feel that for new C++ programmers, fully qualified names in code look too similar to each other, making it harder to learn to recognize the individual names as different. +* Finally, go back to the version using `println()` and try omitting any `using` statement at all, and prefix the function call with `std::`. Check this code compiles, and then consider whether you prefer this use of *fully qualified* Standard Library entities. Personally, I feel that for new C++ programmers, fully qualified names in code look too similar to each other, making it harder to learn to recognize the individual names. However, you should be aware that having `using namespace std;` in your code does make you look like a beginner to more experienced C++ coders. ## Special characters -Some characters cannot be easily entered into string or character literals within code, this may be because they are ASCII *control* characters (also known as non-printable, in the range 0-31) or *top-bit-set* characters (in the range 128-255) not available on your keyboard, or because they have special meaning. Some of the more common control and other special characters have single-letter short forms; we've already encountered `\n` for new-line; the others are listed in the table below. Note: a backslash followed by an *octal* (base 8) number up to three digits (between `\0` or `\377`) can be used for any character in the range 0-255 (decimal), as can a backslash followed by `x` and one or two *hexadecimal* (base 16) digits (such as `\xa` or `\xA3`) up to `\xFF`. +Some characters cannot be easily entered into string or character literals within code, this may be because they are ASCII *control* characters (also known as non-printable, being in the range 0-31) or *top-bit-set* characters (in the range 128-255) not available on your keyboard, or because they have special meaning (such as Delete). Some of the more common control and other "special" characters have single-letter short forms; we've already encountered `\n` for new-line; the others are listed in the table below. + +Note: a backslash followed by an *octal* (base 8) number up to three digits (between `\0` or `\377`) can be used for any character in the range 0-255 (decimal), as can a backslash followed by `x` and one or two *hexadecimal* (base 16) digits (such as `\xa` or `\xA3`) from `\x00` up to `\xff`. | Escape sequence | Description | |:---------------:|:------------------:| @@ -72,61 +71,61 @@ Some characters cannot be easily entered into string or character literals withi | \\uhhhh | Unicode sequence (0-ffff) * | | \\Uhhhhhhhh | Unicode sequence (0-10ffff) * | -* Note: not all Unicode sequences are allowed in 8-bit string or character literals, however these escape sequences are more useful with *Unicode string literals*, explained later in this Chapter. +* Note: not all Unicode sequences are allowed in 8-bit string or character literals, however these escape sequences become more useful with *Unicode string literals*, explained later in this Chapter. -Any escape sequence can be used within single quotes to represent exactly one character literal (including zero `'\0'`). Zero has a special meaning, as it is the string literal termination character. C++ inherits its string literals from C, and C-strings (as they are sometimes known, and as referred to in this Tutorial) were a bit of an afterthought to the C language of the 1970s. String literals can be thought of as a **read-only** array of characters with an automatically added zero terminator; the space needed to store the string literal `"Hello"` is six bytes, and not five as might be assumed. +Any escape sequence can be used within single quotes to represent exactly one character literal (including zero `'\0'`). Zero has a special meaning, as it is the string literal termination character. C++ inherits its string literals from C, and C-strings (as they are sometimes known, and as referred to in this Tutorial) were added as a bit of an afterthought to the C language of the early 1970s. String literals can be thought of as a **read-only** array of characters with an automatically added zero terminator; the space needed to store the string literal `"Hello"` is therefore six bytes, and not five as might be assumed. -When outputting a string literal via `cout`, the zero byte, or *null terminator*, is not outputted, but must be present to stop further raw memory being seen to be part of the string. Other terms you may encounter for string literal are *NTMBS* (null-terminated multi-byte string), `zstring` (a common *typedef* to implement the type of zero-terminated string), and `czstring` (`const`ant zero-terminated string, which is immutable as it is stored read-only). +When outputting a string literal via `print()` or `println()`, the zero byte, or *null terminator*, is not outputted, but must be present to stop further raw memory being seen to be part of the string. Other terms you may encounter for string literal are *NTMBS* (null-terminated multi-byte string), `zstring` (a common *typedef* to implement the type of zero-terminated string), and `czstring` (`const`ant zero-terminated string). -It is an oversimplification to say that any valid character can fit into a character literal; a character literal is simply a 8-bit type (possibly signed or unsigned) called `char` (pronounced "car", at least in the US) and this can thus hold one of 256 possible values. Historically, the first 128 characters of *ASCII* (American Standard Code for Information Interchange) were the same on any platform (this was also known as 7-bit ASCII, or more recently UTF-7), while the second 128 ("top-bit-set") characters could change according to the specification of the chosen *code page* (also known as *extended ASCII*). With the advent and near-universal adoption of UTF-8 (Unicode encoded in an eight-bit octet stream), all top-bit-set characters begin a two-, three- or four-character sequence, which also all have their top bits set. +It is an oversimplification to say that any valid character can fit into a character literal; a character literal is simply a 8-bit type (possibly signed or unsigned) called `char` (pronounced "car", at least in the US) which can hold one of 256 possible values. Historically, the first 128 characters of *ASCII* (American Standard Code for Information Interchange) were the same on any platform (this was also known as 7-bit ASCII, and more recently as UTF-7), while the second 128 ("top-bit-set") characters could change according to the specification of the chosen *code page* (also known as *extended ASCII*). With the advent and near-universal adoption of UTF-8 (Unicode encoded into an eight-bit octet stream), all *top-bit-set* characters begin a two-, three- or four-character sequence, all having their top bits set. -The good news is that despite the complexity of implementation of UTF-8, if your editor is set to edit text in UTF-8 (optionally with an identifying *magic number* BOM at the start) and your shell uses a UTF-8 *locale*, then your programs should output code to the console exactly as you type it into your editor. To repeat: string literals containing raw UTF-8 sequences entered into string literals within the code should display correctly in the console when the program is run. (On Windows it may be necessary to enter `chcp 65001` at the shell prompt once for every shell session before running your program; this will change the code page to UTF-8, instead of the most likely default Windows-1252, which is an eight-bit encoding. Alternatively, you may wish to use UTF-16 in your editor and *wide character* literals and streams, see later in this Chapter.) +The good news is that despite the complexity of implementation of UTF-8, if your editor is set to edit text in UTF-8 (optionally with an identifying *magic number* BOM at the start) and your shell uses a UTF-8 *locale*, then your programs should output code to the console exactly as you type it into your editor. To repeat: string literals containing raw UTF-8 sequences entered into string literals within the code should display correctly in the console when the program is run. (On Windows it may be necessary to enter `chcp 65001` at the shell prompt once for every shell session before running your program. This changes the active code page to UTF-8 instead of the most likely default Windows-1252, which is a simple eight-bit encoding. Alternatively, you may wish to use UTF-16 in your editor and *wide character* literals and streams, see later in this Chapter.) **Experiment:** * Modify `01-hellow.cpp` to output each word on a new line indented by one tab-stop, using only one string literal. -* Modify the part reading `Hello,` to `Hello\0`, and run the program. Are you surprised by this change? +* Modify the sub-string reading `Hello,` to `Hello\0`, and run the program. Are you surprised by this change? -* Now go back to the original `01-hellow.cpp` and try outputting the character literal `\0` at the end instead of `\n`. What do you discover? +* Now go back to the `print()`-using version and try outputting the character literal `\0` at the end instead of `\n`. What do you discover? -* Now try to create a program that can output: `$(USD) £(GBP) €(EUR)` Hint: The Dollar symbol should be on your keyboard, and the Pound and Euro symbols may well be too, but if not use a character picker such as Character Map and a UTF-8 encoding in your editor (and on the console, remember `chcp 65001` for Windows). +* Now try to create a program that can output: `$(USD) £(GBP) €(EUR)` Hint: The Dollar symbol should be on your keyboard, and the Pound and Euro symbols may well be too, but if not use a character picker such as Character Map and a UTF-8 encoding in your editor (and in the console when running your program, remember `chcp 65001` for Windows). -* Use Character Map (or similar) to enter a pi symbol into your text editor, and make the program output: `π has the value 3.14159...` +* Use Character Map (or similar) to enter a *pi* symbol into your text editor, and make this program output: `π has the value 3.14159...` ## Raw string literals -String literals are interpreted at compile time and any escape sequences are translated then. The resultant *raw string* is then stored in read-only memory, and the running program uses a pointer to the first character. This pointer is in fact a variable (as opposed being a constant) however the string data itself is constant and attempting to change it (for example through subscript assignment) is a compile time error. With these facts in mind, try to predict the output from changing the `cout` operation in `01-hellow.cpp` to: `cout << "Hello, World!\n" + 7;` Surprised? (Some compilers warn about shifting a pointer with "pointer arithmetic" in this way.) +String literals are interpreted at compile time and any escape sequences are translated at this point. The resultant *raw string* is then stored in read-only memory, and the running program uses a pointer to the first character. This pointer is in fact a variable (as opposed being a constant) however the string data itself is constant and attempting to change it (for example through subscript assignment) is a compile time error. With these facts in mind, try to predict the output from changing the string literal parameter in `01-hellow.cpp` to: `"Hello, World!\n" + 7;` Surprised? (Some compilers warn about shifting a pointer with "pointer arithmetic" in this way.) Now consider the usefulness of being able to insert whitespace (particularly tabs and newlines) and unescaped backslashes (particularly in regular expressions, or "regexes") into string literals *without* the need for escape characters. Such an entity is called a *raw string literal*, and takes the format: `R"(a \raw\ string literal)"` -The start of a raw string literal is a capital letter "R" followed by a double quote and opening regular parenthesis, none of which form part of the stored or output string. A raw string literal is ended with a closing regular parenthesis and double quote. In the (unlikely) event that a raw string literal is required to *contain* a closing parenthesis followed by a double quote, this can be achieved by putting a unique sequence (often a word, or one or more asterisks) betweeen the double quote and parenthesis *at both ends*, for example: `R"*(can contain )" here)*"` +The start of a raw string literal is a capital letter "R" followed by a double quote and opening regular parenthesis, none of which form part of the stored or output string. A raw string literal is ended with a closing regular parenthesis and double quote. In the (unlikely) event that a raw string literal is required to *contain* a closing parenthesis followed by a double quote, this can be achieved by putting a unique sequence (often a word, or one or more asterisks) between the double quote and parenthesis *at both ends*, for example: `R"*(can contain )" here)*"` *Pointer arithmetic* combined with raw string literals can serve a useful purpose, as shown in this next example program `01-title.cpp`: ```cpp // 01-title.cpp : output the title page of a well-known book -#include +#include using namespace std; int main() { - cout << 1+R"( + print(1+R"( Alice's Adventures In Wonderland by LEWIS CARROLL -)"; +)"); } ``` -Compile and run this program following the same process as before. Notice that the `1+R"(` *idiom* omits a blank line before the output, thus the first line output is the correct number of spaces followed by `Alice's`. Using a raw string literal means we don't have to litter the output string with escape characters for new lines, and can begin the output **unindented** as the `1+R"(` skips the first character, which is (intentionally) a new line in the source file. The raw string literal is in this case (again intentionally) terminated at the start of a blank line, separate from the indentation of `cout` within `main()`; this is preferable to including "invisible" trailing whitespace in the output string. +Compile and run this program following the same process as before. Notice that the `1+R"(` *idiom* omits a blank line before the output, thus the first line output is the correct number of spaces followed by `Alice's`. Using a raw string literal means we don't have to litter the output string with escape characters for new lines, and can begin the output **unindented** as the `1+R"(` skips the first character, which is (intentionally) a new line in the source file. The raw string literal is in this case (again intentionally) terminated at the start of a blank line, separate from the indentation of `print(` within `main()`; this is preferable to including "invisible" trailing whitespace in the output string, as would be the case if the `)"` were itself indented. **Experiment:** -* Change the program above to output the first stanza from the rhyme at the beginning of the same book (shown below), indenting **even-numbered** lines by eight spaces. +* Change the program above to output the first stanza from the rhyme at the beginning of the same book (shown below), indenting all **even-numbered** lines by eight spaces. Is there more than one way of achieving this? ``` All in the golden afternoon @@ -137,27 +136,29 @@ While little hands make vain pretence Our wanderings to guide. ``` -* Now use a (non-raw) string literal for each line and a single call to `cout` with suitable escape characters. What happens if you remove all of the stream insertion operators *except* for the first? (Explanation: adjacent string literal concatenation is automatically performed by the pre-processor.) +* Now use a (non-raw) string literal for each line and a single call to `print()` with suitable escape characters. Note: it is possible to *concatenate* the string literals without any operator: concatenation of adjacent string literals is automatically performed by the pre-processor. * Modify `01-title.cpp` to output the title of your favorite book or film centered on the console window (assume an 80 character fixed width, and change the size of the console window if different). ## Wide characters -Although very popular, and supported in most modern programming languages, UTF-8 encoded string literals are not the only way to manipulate and display characters outside the range of seven or eight-bit ASCII. We've discussed `char` as being the *underlying type* of string and character literals in C++, and there is also the `wchar_t` (possibly pronounced "dub-car-tee") type associated with *wide character* stream objects and strings (the names of which also start with "w") and these predate Unicode support in the C++ Standard Library. Wide character support is platform-specific, in particular the size of `wchar_t` in bits is not standardized; on many systems it is 32 bits but on Microsoft Windows it is 16 bits (and encodes Unicode UTF-16). If you think you need to use wide character support, and want to find out if it is suitable for your needs, consult your platform's documentation. +Although very popular, and supported in most modern programming languages, UTF-8 encoded string literals are not the only way to manipulate and display characters outside the range of seven or eight-bit ASCII. We've discussed `char` as being the *underlying type* of string and character literals in C++, and there is also the `wchar_t` (possibly pronounced "dub-car-tee") type associated with *wide character* stream objects and strings (the names of which also start with "w"), and these predate Unicode support in the C++ Standard Library. + +Wide character support is platform-specific, and in particular the size of `wchar_t` in bits is not standardized; on many systems it is 32 bits but on Microsoft Windows it is 16 bits (and encodes Unicode UTF-16). If you think you need to use wide character support, and want to find out if it is suitable for your needs, consult your platform's documentation. It is important to note that while your editor/IDE may have support for wide-character/UTF-16/32 support, `print()` only works with eight-bit data (either ASCII 8-bit or UTF-8). For stream output of wide-character data, `wcout` can be used, but conversion between encodings using the Standard Library `codecvt` is deprecated. This may lead to differing I/O schemes being necessary if to software targets Windows. -As well as the eight-bit type `char` there is now also `char8_t` which is useful for explicitly specifying that a string is UTF-8, and usefully can encode all UTF-8 code points when using `\u` and `\U`, unlike plain `char`. It also removes the uncertainty of whether `char` is signed or unsigned, which can cause programs to work differently on different platforms in some cases. Also available are `char16_t` and `char32_t` designed to be the correct size for holding a single UTF-16 or UTF-32 Unicode code point, respectively. Whilst these types are built into the language, converting strings between these types is a complex task and requires use of either the Standard Library, or third-party libraries (such as ICU[^3]), further discussion of which is beyond the scope of this Tutorial. +As well as the eight-bit type `char` there is now also `char8_t` which is useful for explicitly specifying that a string is UTF-8, and can encode all UTF-8 code points when using `\u` and `\U`. (Note: even plain `char` can include UTF-8 code points under most modern compilers, including those above U+00FF.) Specifying `char8_t` removes the uncertainty of whether `char` is signed or unsigned, which can cause programs to work differently on different platforms in some cases. Also available are `char16_t` and `char32_t` designed to be the correct size for holding a single UTF-16 or UTF-32 Unicode code point, respectively. Whilst these types are built into the language, converting strings between these types is a complex task and requires use of either the Standard Library, or third-party libraries (such as ICU[^3]), further discussion of which is beyond the scope of this Tutorial. The following table lists C++ types, sizes, target encodings, literals and objects used with normal and wide character sets: -| Type | Bits | Encoding | String Literal | Character Literal | Raw String Literal | String Type | Stream Output | -|:--------:|:-----:|:--------:|:--------------:|:-----------------:|:------------------:|:-----------:|:-------------:| -| char | 8 | ASCII | "abcd" | 'a' | R"(abcd)" | string | cout | -| char8_t | 8 | UTF-8 | u8"abcd" | u8'a' | u8R"(abcd)" | u8string | cout * | -| char16_t | 16 | UTF-16 | u"abcd" | u'a' | uR"(abcd)" | u16string | n/a | -| char32_t | 32 | UTF-32 | U"abcd" | U'a' | UR"(abcd)" | u32string | n/a | -| wchar_t | 16/32 | n/a + | L"abcd" | L'a' | LR"(abcd)" | wstring | wcout | +| Type | Bits | Encoding | String Literal | Character Literal | Raw String Literal | String Type | Stream Output | print() | +|:--------:|:-----:|:--------:|:--------------:|:-----------------:|:------------------:|:-----------:|:-------------:|:-------:| +| char | 8 | ASCII | "abcd" | 'a' | R"(abcd)" | string | cout/cerr | yes | +| char8_t | 8 | UTF-8 | u8"abcd" | u8'a' | u8R"(abcd)" | u8string | cout/cerr * | yes | +| char16_t | 16 | UTF-16 | u"abcd" | u'a' | uR"(abcd)" | u16string | n/a | no | +| char32_t | 32 | UTF-32 | U"abcd" | U'a' | UR"(abcd)" | u32string | n/a | no | +| wchar_t | 16/32 | n/a + | L"abcd" | L'ab' | LR"(abcd)" | wstring | wcout/wcerr | no | -* An explicit cast to type `char` in `operator<<` may be required. +* An explicit cast to type `char` in `operator<<` may be required when using `cout`/`cerr`, for example: `cout << reinterpret_cast(u8"Hello \u20AC!\n");`. + The `wchar_t` encoding and streams under Windows are 16-bit and support UTF-16. @@ -173,26 +174,24 @@ C++ has two types of comments: single line comments which begin anywhere on a li is also ignored. */ -int main( /* this is parsed as empty parentheses */ ) {} +int main( /* this appears as empty parentheses to the compiler */ ) {} ``` -Modern C++ code favors the `//` style, with multiple lines of comments possible by started each one with `//`. Temporarily *commenting out* a block of code, thus preventing it from being compiled, can be achieved by putting `/*` at the beginning and `*/` at the end. Nesting multi-line comments is not possible as the comment always ends at the first `*/` reached; single line comments within a multi-line block are fine, however. +Modern C++ code favors the `//` style, with multiple lines of comments possible by starting each one with `//`. Temporarily *commenting-out* a whole block of code, thus preventing it from being compiled, can be achieved by putting `/*` before the beginning and `*/` after the end of the block. Nesting multi-line comments is not possible as the comment always ends at the first `*/` reached; single line comments within a multi-line block are possible, however. -Comments are like strings in that they do not contain program code, instead they are written in natural language (usually English) using the same character encoding of the source program file. The content of comments is not formalized, unless you wish to employ a tool such as [Doxygen](https://www.doxygen.nl/manual/docblocks.html)[^4], which generates HTML documentation from source code by reading custom mark-up within comments. Comments within code that comprise paragraphs of text are often formatted to a fixed width, for example 77 characters (the standard for plain text email). +Comments are like strings in that they do not contain program code, instead they are written in natural language (usually English) using the same character encoding of the source program file. (The often contain references to variables/functions etc. and it is important that these are kept in-sync with the code.) The content of comments is not formalized, unless you wish to employ a tool such as [Doxygen](https://www.doxygen.nl/manual/docblocks.html)[^4], which generates HTML documentation from source code by reading custom mark-up within comments. Comments within code that comprise paragraphs of text are often formatted to a fixed width, for example 77 characters (the standard for plain text email). -Learning when and how to comment code comes with experience; typically you shouldn't duplicate information that can be inferred from the program code itself. Comments such as "This is correct" aren't particularly helpful either, instead you should try to be relevant and concise, aiming at the ability level of a fellow programmer (or even yourself in the future) who reads your code. When reading other people's code remember the time-honored adage: if code and comments disagree, then both are wrong. +Learning when and how to comment code comes with experience; typically you shouldn't duplicate information that can be easily inferred from the program code itself. Comments such as "This is correct" aren't particularly helpful either, instead you should try to be relevant and concise, aiming at the ability level of a fellow programmer (or even yourself in the future) who reads your code. When reading other people's code remember the time-honored saying: if code and comments disagree, then both are wrong. **Experiment:** -* Going back to `01-hellow.cpp` add a single-line comment sequence to the line beginning `cout`. Does this program compile and run? - -* Uncomment this line and use a pair of multi-line delimeters to comment out the whole of `main()`. Does this program compile and run? +* Going back to `01-hellow.cpp` add a single-line comment sequence to the line beginning `println()`. Does this program compile and run? -* Now change the position these delimiters so that no new-line is printed by the line beginning `cout`. (They will necessarily be on the same line.) How does this affect the readability of the code, especially with syntax highlighting enabled in your editor? +* Uncomment this line and use a pair of multi-line delimiters to comment-out the whole of the body of `main()`. Does this program compile and run? [^1]: https://learnmoderncpp.com/2019/08/03/welcome/ -[^2]: https://clang.llvm.org/extra/clang-tidy/ +[^2]: https://clang.llvm.org/docs/ClangFormat.html [^3]: http://site.icu-project.org/home [^4]: https://www.doxygen.nl/manual/docblocks.html -*All text and program code ©2019-2022 Richard Spencer, all rights reserved.* +*All text and program code ©2019-2025 Richard Spencer, all rights reserved.* diff --git a/02-variables-scopes-and-namespaces.md b/02-variables-scopes-and-namespaces.md index 8af260d..7f88a89 100644 --- a/02-variables-scopes-and-namespaces.md +++ b/02-variables-scopes-and-namespaces.md @@ -1,14 +1,23 @@ -# Variables, Scopes and Namespaces +# Variables, Scopes and Namespaces ## Declarations, definitions and assignment -A variable is a named entity which can hold a value; thus it has *state*. As the name "variable" suggests, this value can, and often does, change over the entity's lifetime. A *declaration* can be thought of as introducing a variable to your program, as if it is saying: "I exist with this name and have such-and-such type, use me." On the other hand, a *definition* is **everything** a declaration is, plus asking: "Please reserve some memory for me here." Additionally, an *assignment* can be combined with a definition, thus stating "I have this initial value from now until (optional) later reassignment." Defining a variable without giving it an initial value is usually best avoided, as the variable will likely contain random garbage (dereferencing an uninitialized variable causes undefined behavior in C++; your compiler can and often will warn of this). Declarations that are not also definitions are rare for variables of the built-in types, so we will omit further discussion of them here. +A variable is a named entity which can hold a value; thus it has *state*. As the name "variable" suggests, this value can, and often does, change during the entity's lifetime. A *declaration* can be thought of as introducing a variable to your program, as if it is saying: "I exist with this name and have such-and-such type, use me." On the other hand, a *definition* is **everything** a declaration is, plus asking: "Please reserve some memory for me here." Additionally, an *assignment* can be combined with a definition, thus stating "I have this initial value from now until (optional) later reassignment (unless I am a constant)." Defining a variable without giving it an initial value is usually best avoided, as the variable will likely contain random garbage (dereferencing an uninitialized variable causes undefined behavior in C++; your compiler can and often will warn of this). Declarations that are not also definitions are rare for variables of the built-in types, so we will omit further discussion of them here. -C++ is a statically typed language, meaning that the type of each variable is known at compile time (importantly, this is also true of variables defined with the keyword `auto`, see later). Due to the fact the types are known and fixed, the amount of memory needed for each varaible is known at compile time too; this specific amount of memory is called the variable's *storage class*. (Storage class applies to **all** user-defined types, too.) This fact gives rise to the *One Definition Rule* (ODR) which states that a variable can be declared or assigned to multiple times, but must be defined **exactly** once. If you remember one thing about variables in C++, remember the ODR. By default, C++ reserves space for new local (function or sub-scope) variables on the *stack*, which means that two variables of the same name can exist in different scopes (one scope enclosing the other); however the *address* of the variable which is always unique. The other place variables can be stored is on the *heap*, which is sometimes preferable for large objects or arrays. Again these variables always have a unique address, but continue to use memory until it is explicitly deallocated. +C++ is a statically typed language, meaning that the type of each variable is known at compile time (importantly, this is also true of variables defined with the keyword `auto`, see later). Due to the fact the types are known and fixed, the amount of memory needed for each variable is known at compile time too; this specific amount of memory is called the variable's *storage class*. (Storage class applies to **all** user-defined types, too.) This fact gives rise to the *One Definition Rule* (ODR) which states that a variable can be declared or assigned to multiple times, but must be defined **exactly** once. This is the key concept concerning memory usage of variables in C++, so remember the ODR. By default, C++ reserves space for new local (function or sub-scope) variables on the *stack*, which means that two variables of the same name can exist in different scopes (one scope enclosing the other); however the *address* of the variable which is always unique. The other place variables can be stored is on the *heap*, which is often preferable for large objects or arrays. Again these variables always have a unique address, but continue to use memory until it is explicitly deallocated, with the responsibility being the programmer's, not the C++ runtime. -The shortest possible name or *identifier* for a variable is a single letter, and these are often the name of choice for variables whose purpose is obvious (such as a loop counter); this convention also provides a symmetry with variable names in Mathematics. Variable names must start with a lower- or uppercase letter or an underscore, followed by an arbitrary number of lower- or uppercase letters, underscores or decimal digits in any order. Reserved names that should not be used as identifiers are any of the C++ keywords (of which there are just under a hundred at the time of writing), names beginning with an underscore followed by a capital letter, names containing adjacent double underscores, and at global scope any name beginning with an underscore. Use of top-bit-set characters (including UTF-8 sequences) **is** permitted in variable names with more recent compilers, including as the initial character. +The shortest possible name or *identifier* for a variable is a single letter, and these are often the name of choice for variables whose purpose is obvious (such as a loop counter); this convention also provides a symmetry with variable names in Mathematics. Variable names must start with a lower- or uppercase letter or an underscore, followed by an arbitrary number of lower- or uppercase letters, underscores or decimal digits in any order. -C++ does not mandate different uses of capital letters and so on for different types of entity, but your organization may well follow conventions such as constants in upper case, user-defined types in sentence case and member functions in camel case. The rules for identifiers are the same for `class`, `struct`, `enum` and `union` names, function names, namespace names and macro names. Different variable naming styles, the use of which may fall into coding standards at your employer, are listed in the following table: +Reserved names that should not be used as identifiers are: + +* Any of the C++ keywords (of which there are just under a hundred at the time of writing). +* Names beginning with an underscore followed by a capital letter (these are reserved for the Standard Library). +* Names containing adjacent double underscores (reserved for purposes such as name mangling). +* At **global scope** any name beginning with an underscore. + +Use of top-bit-set characters (including UTF-8 sequences) **is** permitted in variable names with more recent compilers, including as the initial character; such sequences are also recognized by the preprocessor. + +Unlike some programming languages, C++ does not mandate different uses of capital letters and so on for different types of entity, but your organization may well follow conventions such as constants in upper case, user-defined types in sentence case and member functions in camel case. The rules for identifiers are the same for `class`, `struct`, `enum` and `union` names, function names, namespace names and macro names. Different variable naming styles, the use of which may fall under coding standards requirements at your employer, are listed in the following table: | Naming Style | Example | |:----------------:|:---------------:| @@ -19,22 +28,26 @@ C++ does not mandate different uses of capital letters and so on for different t | Upper Snake Case | A_VARIABLE_NAME | | Camel Case | aVariableName | -New variables are introduced (defined) by providing a type, an identifier and, optionally (but highly recommended) either an initial value or a pair of empty braces `{}`. The initial value follows an equals sign when using C-style syntax, as shown in this program; this is optional when using *uniform initialization* with braces (covered later). The following program defines three variables but only assigns to two of them initially: +New variables are introduced (defined) by providing a type, an identifier and, optionally (but highly recommended) either an initial value after an equals sign (`=`), and/or within or a pair of braces `{` and `}` (which can be empty to assign the default value for the type). Use of equals is historical syntax, while use of braces (where the equals sign becomes optional) is called *uniform initialization* and is discussed on more detail later in this Chapter. + +Braces are also used with strings passed to `print()` and `println()` indicating a point in the string where a variable's current value should be substituted. The number of brace pairs must equal the number of additional parameters passed to the functions. (To output a literal `{` or `}` use one of the escape sequences `{{` or `}}` respectively.) + +The following program defines three variables but only assigns to two of them initially, despite the fact that it prints them all out twice: ```cpp // 02-assign.cpp : assign to local variables -#include +#include using namespace std; int main() { int i = 1, j = 2; unsigned k; - cout << "(1) i = " << i << ", j = " << j << ", k = " << k << '\n'; + println("(1) i = {}, j = {}, k = {}", i, j, k); i = j; j = 3; k = -1; - cout << "(2) i = " << i << ", j = " << j << ", k = " << k << '\n'; + println("(2) i = {}, j = {}, k = {}", i, j, k); } ``` @@ -47,7 +60,7 @@ Running this program produced the output: There are probably no surprises for the values of `i` and `j` as output the first and second time. Note that the statement `i = j` merely assigns the **current** value of `j` to `i` and does not imply that they point to the same object; the values of `i` and `j` can subsequently change **independently**. -The first time `k` is output its value is essentially random; nothing can be guaranteed about its value other than it is within the valid range for the `unsigned` type. Assigning a negative number to an `unsigned` type is (perhaps surprisingly) legal C++, and if you are unsure of why the second output of `k` is what it is, you may want to do some research into "two's-complement" binary representation of integers (it's actually the number (2^32)-1 represented as a positive integer). +The first time `k` is output its value is essentially random, an example of *undefined behavior* (UB); nothing can be guaranteed about its value other than it is within the valid range for the `unsigned` type. Assigning a negative number to an `unsigned` type is (perhaps surprisingly) legal C++, and if you are unsure of why the second output of `k` is what it is, you may want to do some research into "two's-complement" binary representation of integers (it's actually the number 232-1 represented as a positive integer). **Experiment** @@ -59,21 +72,21 @@ The first time `k` is output its value is essentially random; nothing can be gua ## Casts and uniform initialization -The following program assigns an integer to a variable `a` of type `int`, and a real number to a variable `b` of type `double`. In case you're wondering, the name for the type of `b` comes from *double precision* as defined in the IEEE Standard for Floating-Point Arithmetic (IEEE 754), which defines how an (accurate) approximation of a real number is stored in 64 bits of memory. Single precision `float` uses 32 bits, and extended precision `long double` uses 96 bits. Then the initial values of `a` and `b` are then reassigned to each other, meaning the second output line is different: +The following program assigns an integer to a variable `a` of type `int`, and a real number to a variable `b` of type `double`. In case you're wondering, the name for the type of `b` comes from *double precision* as defined in the IEEE Standard for Floating-Point Arithmetic (IEEE 754), which defines how an (accurate) approximation of a real number is stored in 64 bits of memory. Single precision `float` uses 32 bits, and extended precision `long double` typically uses up to 96 bits (the storage class may be different from the number of precision bits used). The initial values of `a` and `b` are then reassigned to each other, meaning the second output line is different: ```cpp // 02-swap.cpp : attempt to swap the values of an int and a double -#include +#include using namespace std; int main() { int a = 1; double b = 2.5; - cout << "(1) a = " << a << ", b = " << b << '\n'; + println("(1) a = {}, b = {}", a, b); a = 2.5; b = 1; - cout << "(2) a = " << a << ", b = " << b << '\n'; + println("(2) a = {}, b = {}", a, b); } ``` @@ -84,7 +97,7 @@ Running this program produces the output: (2) a = 2, b = 1 ``` -The variable assignment statement `a = 2.5` is called a *narrowing cast* because of the reduction in precision and likelihood of information being lost. In this case the value is **automatically** rounded down from `2.5` to `2`, as the decimal part cannot be represented in an `int`. Even though the term being assigned is floating-point (actually it's a double-precision literal) the type of `a` **remains** as `int` (and this is why the fractional part is lost). In contrast, the statement `b = 1` is a *widening cast* with the assumption that there is no chance of information being lost; `b` remains of type `double` holding an integer value (which could be represented explictly as `1.0`). Both of these casts are *implicit casts* becuase the compiler makes them happen automatically; the instruction to carry out the type casting is implicit. (We could have used `static_cast(2.5)` and `static_cast(1)` to make the casts explicit, we'll see this later.) +The variable assignment statement `a = 2.5` is called a *narrowing cast* because of the reduction in precision and likelihood of information being lost. In this case the value is **automatically** *truncated* from `2.5` to `2`, as the decimal part cannot be represented in an `int`. Even though the term being assigned is floating-point (actually it's a double-precision literal, see later in this Chapter) the type of `a` **remains** as `int` (and this is why the fractional part is lost). In contrast, the statement `b = 1` is a *widening cast* with the assumption that there is no chance of information being lost; `b` remains of type `double` holding an integer value (which could be represented explictly as a literal `1.0`). Both of these casts are *implicit casts* becuase the compiler makes them happen automatically; the instruction to carry out the type casting is implicit. (We could have used a more verbose `static_cast(2.5)` and `static_cast(1)` to make the casts explicit, we'll see this later.) **Experiment** @@ -94,23 +107,23 @@ The variable assignment statement `a = 2.5` is called a *narrowing cast* because * Again, modify the original program to use `static_cast`. (Hint: don't worry if you don't fully understand the syntax yet.) -Implicit casts can happen at variable initialization and assignment too, however this is not always the behavior we want. To force the compiler to disallow (possibly unintentional) narrowing casts we can use *uniform initialization* which involves enclosing the assigned value in curly braces: +Implicit casts can happen with variable initialization-and-assignment too, however this is not always the behavior we want. To force the compiler to disallow (possibly unintentional) narrowing casts we can use *uniform initialization* which involves enclosing the assigned value in curly braces: ```cpp // 02-uniform.cpp : avoid compiler error with uniform initialization and explicit narrowing cast -#include +#include using namespace std; int main() { // int c = { 2.5 }; // Error: this does NOT compile int c = { static_cast(2.5) }; // while this does double d = { 1 }; // and so does this - cout << "c = " << c << ", d = " << d << '\n'; + println("c = {}, d = {}", c, d); } ``` -It is important not to confuse a single value in curly braces with an array initializer containing one element when reading code like this; in practice here there is no ambiguity because if we had wanted to initialize a single element array we would have written `int c[] = {2.5,}` with a trailing comma inside the braces. Interestingly, the equals sign in uniform initialization is in fact **optional**, so we could have written `int c{2.5}` and `double d{1}`. Uniform initialization appears elsewhere in C++ so it is a good idea to become familiar with the syntax early on, and know the nuances of its behavior compared to using a time-honored C-style equals sign instead. +It is important not to confuse a single value in curly braces with an initializer list containing one element when reading code like this; in practice here there is no ambiguity because if we had wanted to initialize an array of `int` a single element list we would have written `int c[] = {2.5,};` using a trailing comma inside the braces. Interestingly, the equals sign in uniform initialization is in fact **optional**, so we could have written `int c{2.5}` and `double d{1}`. Uniform initialization appears elsewhere in C++ so it is a good idea to become familiar with the syntax early on, and know the nuances of its behavior compared to using a time-honored C-style equals sign instead. In Modern C++, uniform initialization is probably considered better style, where you have the choice of the two. **Experiment** @@ -132,31 +145,34 @@ C++ has quite a lot of built-in types, most of them inherited from the C languag | unsigned short | 16 | 0 | 65535 | n/a (as for unsigned) | | int | 32 | -2147483648 | 2147483647 | -1000, 0x7fff | | unsigned | 32 | 0 | 4294967295 | 1000U, 0xffffU | -| long | 32 | -2147483648 | 2147483647 | 1L, 0x7fffffffL | -| unsigned long | 32 | 0 | 4294967295 | 10000000UL, 0xbbbfUL | -| long long | 64 | -9223372036854775808 | 9223372036854775807 | -10000LL, 0x80000000000LL | -| unsigned long long | 64 | 0 | 18446744073709551615 | 10000ULL, 0x7fffffffffULL | -| size_t | 64 (32) | 0 | 18446744073709551615 | 0ULL (assuming 64 bit), 0z * | +| long | 64 (32) + | -2147483648 | 2147483647 | 1L, 0x7fffffffL | +| unsigned long | 64 (32) + | 0 | 4294967295 | 10000000UL, 0xbbbfUL | +| long long | 64 + | -9223372036854775808 | 9223372036854775807 | -10000LL, 0x80000000000LL | +| unsigned long long | 64 + | 0 | 18446744073709551615 | 10000ULL, 0x7fffffffffULL | +| ssize_t | 64 (32) | -9223372036854775808 | 9223372036854775807 | 0Z * | +| size_t | 64 (32) | 0 | 18446744073709551615 | 0UZ * | | float | 32 | 1.17549e-38 | 3.40282e+38 | 0.f, 3.2e-10f | | double | 64 | 2.22507e-308 | 1.79769e+308 | 2.3, 1.2345e200 | -| long double | 96 | 3.3621e-4932 | 1.18973e+4932 | 100000000.5L, 0.0000345L | +| long double | 128 | 3.3621e-4932 | 1.18973e+4932 | 100000000.5L, 0.0000345L | -* The use of suffix `z` for `size_t` literals is currently only a proposal, so not all compilers support it. On 32-bit machines `long` and `size_t` are usually 32 bits, while `long long` is guaranteed to be (at least) 64 bits. +* The "size types" `std::size_t` (unsigned) and `std::ssize_t` (signed) are from the Standard Library, and so require a header which defines them, such as ``. (Negative values for `std::ssize_t` are typically used to represent error values.) + ++ On 32-bit machines `long`, `unsigned long`, `ssize_t` and `size_t` are usually 32 bits, and are usually 64 bits on 64-bit machines, while `long long` and `unsigned long long` are guaranteed to be (at least) 64 bits on all platforms. The variable definition `double n{2.3};` should by now appear familiar and correct; it assigns a floating-point number (actually as shown in the table, a numeric literal) to a double precision variable. In other words it's an exact match between the declared type and the literal type. (If it were a narrowing cast, such as `double n{2.3L}` we would expect compilation to fail.) -The `auto` type specifier has a meaning in Modern C++: deduce the type of the variable being assigned **to** from the value, variable or expression being assigned **from**. This means, however, that the variable definition must also always be an assignment as uninitialized `auto` variables are not allowed. The reason for this is simple: C++ variables must have their type known at compile time, and this is no different for `auto` variables. I'll repeat this as it is so important; *C++ is a statically typed language, and every possible use of `auto` does not change this*. +The `auto` type specifier has a specific meaning in Modern C++: deduce the type of the variable being assigned **to** from the value, variable or expression being assigned **from**. This means, however, that the variable definition must also *always be an assignment* as uninitialized `auto` variables are not allowed. The reason for this is simple: C++ variables must have their type known at compile time, and this is no different for `auto` variables. I'll repeat this as it is so important; *C++ is a statically typed language, and every available use of* `auto` *does not change this*. Some example usage of `auto` is shown here: ```cpp int i = 1; // both i and 1 are of type int auto j = i; // j is also of type int -auto k{ 1.0 }; // k has type double using uniform initialization syntax +auto k{ 1.0 }; // k has type double (using uniform initialization syntax) auto q; // Error: will not compile ``` -Program can be (re-)written without any use of `auto`, however you will often encounter it in modern code so you need to be able to recognize and understand its meaning. It is especially useful where the type in question is overly verbose, such as when using types related to generic classes. Notice from the example shown here the use of uniform initialization syntax with `auto`-assignment for the variable `k`; this usage can be expected to get more common. +Programs can be (re-)written without any use of `auto`, however preferring `auto` in Modern C++ is motivated primarily by correctness, performance, maintainability, and robustness, rather than just typing convenience. It is especially useful where the type in question is overly verbose, such as when using types related to generic classes, and also helps avoid accidental narrowing conversions or commitment to implementation-specific types. Notice from the example shown here the use of uniform initialization syntax with `auto`-assignment for the variable `k`; this usage can be expected to become more common. ## Bool and byte @@ -167,9 +183,9 @@ bool success{ true }; bool are_equal = (a == b); ``` -The `byte` type, often referred to as `std::byte` as it is a type made available from within the Standard Library namespace (in order to avoid name clashes with existing code) designed to replace `unsigned char` where the variable (or array) contains (8-bit) binary data. +The `byte` type, often referred to as `std::byte` as it is a type made available from within the Standard Library namespace (in order to avoid name clashes with existing code), designed to replace `unsigned char` where the variable (or array) contains (8-bit) binary data. -This type is actually implemented as an `enum class` (see Chapter 6) and only the bitwise operators are supported, so no addition or subtraction of `byte` values is allowed. A variable of type `byte` can be initialized with any value from `0` to `255` and converted back to an integer value with the function `to_integer()` (functions are covered in Chapter 4). +This type is actually implemented as an `enum class` (see Chapter 6) and only the bitwise operators are supported, so addition or subtraction of `byte` values is not allowed. A variable of type `byte` can be initialized with any value from `0` to `255` and converted back to an integer value with the function `to_integer()` (functions are covered in Chapter 4). ```cpp std::byte b{ 254 }; @@ -179,7 +195,7 @@ auto i = std::to_integer(b); // This is ugly but is shown here for referenc ## Literal prefixes and suffixes -Digits can be grouped into, for example, groups of three for decimal numbers, using apostrophe (`'`) as the delimiter: +Digits can be grouped, for example into groups of three for decimal numbers, using apostrophe (`'`) as the delimiter: ```cpp auto million = 1'000'000; @@ -197,7 +213,7 @@ If some of the example literals in the last table look unfamiliar then the follo (Note: hexadecimal floating-point literals use `P` or `p` as the radix separator, while decimal floating point literals use an `E` or `e` to separate the exponent from the mantissa.) -Suffixes can apply to either integer or floating point literals (both in the case of `L`). Also, `U` and `u` can be combined with `L`, `l`, `LL` and `ll`. +Suffixes can apply to either integer or floating point literals (or to both in the case of `L`). Also, `U` and `u` can be combined with `L`, `l`, `LL`, `ll`, `Z` and `z`. | Suffix | Meaning | Usage | |:------:|:----------------------------------------:|:------------------:| @@ -205,47 +221,50 @@ Suffixes can apply to either integer or floating point literals (both in the cas | l, L | extended precision float OR long integer | 100'000l, 3.3L | | u, U | unsigned integer | 65536u, -1U | | ll, LL | long long integer (64 bits) | 0ll, -1'234'567LL | -| z * | unsigned size type (size_t) | 0z, 4'294'967'296z | +| uz, UZ | unsigned size type (std::size_t) | 0uz, 4'294'967'296UZ | +| z, Z | signed size type (std::ssize_t) | 0z, -2'147'483'648Z | -* Suppport for this literal has to be explicitly enabled, currently +Note there is no literal for `short int` and there is unlikely to ever be one, as the `s` suffix is used for seconds when using the `` header (and `string` when used with the `` header). Also, the integer literal suffixes don't ever actually need to be used in Modern C++, source-code literals in all bases are automatically *promoted* (widened) to a type that can hold the value of the literal. -Note there is no literal for `short int` and is unlikely to ever be so, as the `s` suffix is used for seconds when using the `` header. Also, the integer literal suffixes don't ever actually need to be used in modern C++, source-code literals in all bases are automatically *promoted* (widened) to a type that can hold the value of the literal. To enable all the literal suffixes in the Standard Library use: +To enable all the literal suffixes in the Standard Library after referencing the necessary header(s) use: ```cpp using namespace std::literals; // This is also implied by "using namespace std;" ``` +Note: this is **not** necessary for suffixes of the built-in types, being `F`, `f`, `U`, `u`, `L`, `l`, `LL` and `ll`. + **Experiment** -* Make up some variable assignments from various literals. Use `auto`, and output the variables using `cout`. See if the output is what you expected. +* Make up some variable assignments from various literals. Use `auto`, and output the variables using `print()`. See if the output is what you expected. -* Alter your assignments to specify the correct type instead of `auto`, such as `long long`. Check the tables above if you're not sure, and use uniform initialization. Try to avoid always using the biggest types `long long` and `long double` as this may not be optimal in terms of memory and performance footprint. +* Now specify the correct built-in type instead of `auto`, such as `long long`. Check the tables above if you're not sure, and use uniform initialization. Try to avoid always using the biggest types `long long` and `long double` regardless of the value or calculation, as this may not be optimal in terms of memory footprint and performance. ## Local and global scopes -Variables defined outside of any function scope are called *global* variables, while those defined within functions (including `main()`) are called *local* variables. Global variables have memory reserved for them and are initialized before `main()` is entered, although the order in which they are initialized is **not** guaranteed. Local variables have space reserved for their contents from the function stack when the function is entered, and are available for use after program flow reaches their definition within the function. +Variables defined outside of any function scope are called *global* variables, while those defined within functions (including `main()`) are called *local* variables. Global variables have memory reserved for them and are initialized before `main()` is entered, although the order in which they are initialized is **not** guaranteed across multiple translation units (these being approximately C++ source files, discussed later in this Chapter). Local variables have space reserved for their contents from the function stack when the function is entered, and are available for use after program flow reaches their definition within the function. -A local variable with the same name as a previously defined global variable temporarily takes precedence over, or *shadows*, the global variable until it goes *out of scope*. Variables defined within a function go out of scope at the end of the function, and the space reserved for them is released. +A local variable with the same name as a previously defined global variable temporarily takes precedence over, or *shadows*, the global variable until it goes *out of scope*. Variables defined within a function go out of scope at the end of the function, and the space reserved for them is then released. -It is also possible to nest scopes within functions up to an arblitrary level. The delimiters `{` and `}` are used for this purpose, mirroring their use to introduce a function scope. Code within *sub-scopes* is typically indented an extra level. (Sub-scopes which can contain scoped variable definitions are also introduced by a variety of C++ keywords including `if` and `while`.) Variable names re-defined within sub-scopes lose visibility at the closing brace and can no longer be referenced (the memory they use may not be released until the function exits, however). +It is also possible to nest scopes within functions up to an arbitrary level. The delimiters `{` and `}` are used for this purpose, mirroring their use to introduce a function scope. Code within *sub-scopes* is typically written indented to an extra level. (Sub-scopes which can contain scoped variable definitions are also introduced by a variety of C++ keywords including `if` and `while`.) Variable names which are re-defined within sub-scopes lose visibility at the closing brace and can no longer be referenced (the memory they use may not be released until the function exits, however). The following program defines and initializes the variable `a` three times. This does not violate the One Definition Rule (ODR) because of one simple fact: *the three variables exist in different scopes*. ```cpp // 02-scopes.cpp : define three variables with the same name in one program -#include +#include using namespace std; auto a{ 1.5f }; int main() { - cout << "(1) " << a << '\n'; + println("(1) {}", a); auto a{ 2u }; - cout << "(2) " << a << '\n'; + println("(2) {}", a); { auto a{ 2.5 }; - cout << "(3) " << a << '\n'; + println("(3) {}", a); } } ``` @@ -262,19 +281,19 @@ Running this program produces the output: * Change the assignments to 1, 2, and 3 (using integer literals with `int` instead of `auto`). Does this still satisfy the ODR? -* Add a fourth `cout << a` line between the two closing curly braces, just before `main()` exits. Is the output what you expected? +* Add `println("(4) {}", a);` between the two closing curly braces, just before `main()` exits. Is the output what you expected? -* Change the output command `<< a <<` to `<< ::a <<` in each of the three times it appears in the program. What appears to happen? (Explanation: the global scope resolution operator `::` selects the global `a` over any other `a` that may be visible.) +* Change the output command `, a)` to `, ::a)` in each of the three times it appears in the program. What appears to happen? (Explanation: the global scope resolution operator `::` selects the global `a` over any other `a` that may be visible.) ## Static and thread-local variables -Any global variables defined in the program are visible throughout the whole of the program, which unfortunately means that name clashes are possible in different and unrelated portions of code. The traditional way of getting round this problem, inherited from C, was to use the `static` keyword. All this does in the context of a global variable definition is make the variable local to the *translation unit*, which is the proper name for the each `.cpp` file with all the headers it `#include`s (which compiles to a single `.obj` or `.o` object file). The term *file static* can also be used to describe the visibility of such a variable, referring to the `.cpp` file it is defined in. Thus two `.obj` or `.o` files each with one or more `static` variables of the same name can be linked to form an executable, without generating linker errors. +Any global variables defined in the program are visible throughout the whole of the program, which unfortunately means that name clashes are possible in different and unrelated portions of code. The traditional way of getting round this problem, inherited from C, was to use the `static` keyword. All this does in the context of a global variable definition is make the variable local to the *translation unit*, which is the proper name for the each C++ source file with all the headers it `#include`s (which compiles to a single `.obj` or `.o` object file). The term *file static* can also be used to describe the visibility of such a variable, referring to the `.cpp` file it is defined in. Thus two `.obj` or `.o` files each with one or more `static` variables of the same name can be linked to form an executable, without generating linker errors. ``` static int i = 1000; // only visible within this translation unit ``` -The `thread_local` keyword (added in C++11) can be used at global scope and specifies a variable with global visibility which is created (and optionally initialized) when a new thread is launched: +The `thread_local` keyword (added in C++11) can optionally be used at global scope and specifies a variable with global visibility which is created (and optionally initialized) when a new thread is launched: ``` thread_local size_t my_counter{ 0 }; // different variable initialized for each new thread @@ -296,12 +315,12 @@ The keywords `static` and `thread_local` have uses in other contexts too, as we The purpose of namespaces is to solve the problem of global names clashing with each other. (We have already encountered the `std` namespace which contains all of the Standard Library components.) Namespaces can only be introduced at global scope and are delimited with the by now familiar `{` and `}`. Namespaces **can** exist inside other namespaces, with the scope resolution operator `::` also used to separate nested namespace names. Entities (such as variables, functions and classes) defined within namespaces are still globally visible, and can be either made available with `using` statements or directives, or referenced using their *fully qualified names*. -The next program defines two global variables, each in different namespaces, which means they can exist in the same `.cpp` file. Notice that the namespace names are written in sentence case while the variable names are written in snake case, both common conventions: +The next program defines two global variables, each in different namespaces, which means they can exist in the same `.cpp` file. Notice that the namespace names have been written in sentence case with the variable names in snake case, both common conventions: ```cpp // 02-height.cpp : define the same variable name in two different namespaces -#include +#include using namespace std; namespace Wonderland { @@ -313,11 +332,9 @@ namespace VictorianEngland { } int main() { - cout << "Alice\'s height varies between " - << Wonderland::alice_height_m - << "m and " - << VictorianEngland::alice_height_m - << "m.\n"; + println("Alice\'s height varies between {}m and {}m", + Wonderland::alice_height_m, + VictorianEngland::alice_height_m); } ``` @@ -325,13 +342,13 @@ int main() { * Add the statement `using namespace VictorianEngland;` as the first line of `main()`. Does this change the output in any way? -* Now remove `VictorianEngland::` from the compound `cout` output line. Does the output change now? What do you learn about the connection between `using` directives and unqualified names? +* Now remove `VictorianEngland::` from the output call. Does the output change now? What do you learn about the connection between `using` directives and unqualified names? Namespaces are *open*, that is elements can be added to a namespace from different parts of a program, even from different `.cpp` files. (This means it is technically possible to add to the `std` namespace, but doing so is strongly discouraged as it can create misleading code that may mysteriously fail to compile on other systems or platforms.) Namespaces can be nested in two ways: by either using multiple `namespace` keywords, or using the scope resolution operator, as shown in the code fragments below: -``` +```cpp namespace Wonderland { namespace Animals { auto white_rabbit{ 1 }; @@ -347,7 +364,7 @@ The fully qualified names of both variables defined are very similar, they are: Another feature of namespaces is the curiously named *unnamed namespace*. The syntax is simple, a `namespace` keyword followed immediately by `{`. The purpose of the unnamed namespace is to replace the use of `static` in definition of global names visible to just the current translation unit. The following code fragment defines and assigns a variable whose fully qualified name in the same translation unit is just `i`, and is not visible in any other. -``` +```cpp namespace { int i = 3000; // variable i is only visible later within this file } @@ -361,23 +378,22 @@ int i = 3000; // variable i is only visible later within this file ## Constants and references -Constants are named entities that have only one value during their lifetime, that is their initial value remains unchanged. (I avoid the use of the word "variable" here, or worse still "`const` variable", to avoid confusion, but most of the rules of variables apply to constants too.) Constants are useful in many places in modern C++ programs, and in some places they can be used where variables cannot, such as when specifying array sizes and template parameters. Similarly to `auto` variable definitions, constants **must** have their value specified when they are defined. +Constants are named entities that have only one value during their lifetime, in other words their initial value remains unchanged. (I avoid the use of the word "variable" here, or worse still "`const` variable", to avoid confusion, but most of the rules of variables apply to constants too.) Constants are useful in many places in Modern C++ programs, and in some places they can be used where variables cannot, such as when specifying array sizes and template parameters. Similarly to `auto` variable definitions, constants **must** have their value specified when they are defined. -Constants are defined using the `const` keyword, **either** before or after the mandatory type specifier (or `auto`), as shown in the program below, which defines a global constant and a local constant: +Constants are defined using the `const` keyword, **either** before or after the mandatory type specifier (or `auto`), as shown in the program below which defines a global constant and a local constant: ```cpp // 02-constants.cpp : introducing the const keyword -#include +#include using namespace std; const double PI = 3.14159265358979; int main() { auto const APPROX_E = 3; - cout << "pi is almost exactly " << PI - << "e is approximately " << APPROX_E - << '\n'; + println("pi is almost exactly {}, while e is approximately {}", + PI, APPROX_E); } ``` @@ -389,25 +405,25 @@ Notice that the named constants have been specified using upper case, which is a * Try to output the result of adding the two constants together. Is this what you would expect for two variables of different types (implied in the case of `APPROX_E`)? -Constants can be assigned to a variable, and created from a variable at the point it is defined. Interestingly this implies that the value of a C++ constant *is not necessarily known* at compile-time; not all constants therefore can be used as array sizes, for example. (If a constant compile-time value is needed for this purpose, your compiler will refuse to compile such code.) Variables of many types can usefully be declared `const` where their value shouldn't be changed, or where changing them would make no sense. This stricter use of `const` is known as *const-correctness* and is an additional form of type safety which can often be very useful (of course, use of `const` is optional, as in the above program, but its consistent and correct use is strongly encouraged). +Constants can be assigned to a variable, and created from a variable at the point it is defined. Interestingly this implies that the value of a C++ constant *is not necessarily known* at compile-time; not all constants therefore can be used as array sizes, for example. (If a constant compile-time value is needed for this purpose, your compiler will refuse to compile such code.) Variables of many types can usefully be declared `const` where their value shouldn't be changed, or where changing them would make no sense. This stricter use of `const` is known as *const-correctness* and is an additional form of type safety which can often be very useful (of course, most uses of `const` are optional, as in the above program, but its consistent and correct use is strongly encouraged). -References are hugely important to C++ and the necessity of fully understanding them in order to become proficient in the language cannot be overstated. There are two types of references, the style that date back to the earliest versions of C++, now known as *l-value references*, and those introduced with C++11, known as *r-value references* (or sometimes *forwarding* or even *universal* references. Only l-value references are discussed here. +References are hugely important to C++ and the necessity of fully understanding them in order to become proficient in the language cannot be overstated. There are two types of references, the style that date back to the earliest versions of C++, now known as *l-value references*, and those introduced with C++11, known as *r-value references* (or sometimes as *forwarding* or even *universal* references). Only l-value references are discussed here. A reference is an *alias* (an alternative name) for another variable **which must already exist**. It is (intentionally) difficult to make a reference outlive the variable it is *bound* to, managing to do so creates a *dangling* reference, which is undefined behavior. The primary use of references is to make variables visible from enclosed scopes to outer scopes from which they would not otherwise be accessible, as we shall discover later in the discussion of functions. Changing a reference changes the value of the variable to which it is bound, as shown in the program below: ```cpp // 02-references.cpp : introducing l-value references -#include +#include using namespace std; int alice_age{ 9 }; int main() { - cout << "Alice\'s age is " << alice_age << '\n'; + println("Alice\'s age is {}", alice_age); int& alice_age_ref = alice_age; alice_age_ref = 10; - cout << "Alice\'s age is now " << alice_age << '\n'; + println("Alice\'s age is now {}", alice_age); } ``` @@ -417,13 +433,15 @@ int main() { * Make the global `alice_age` constant. Does the code compile now? -* Now remove the `&` on the first line of `main()`. Does this allow the code to compile? What is the output from running this modified program? +* Now make `alice_age_ref` constant instead. Does the code compile? + +* Now remove the `&` on the second line of `main()`. Does this allow the code to compile? What is the output from running this modified program? As shown above, the syntax for creating and initializing a reference is simple, a single ampersand between the type specifier and the variable name. This difference is subtle compared to a conventional definition, so you will need to be on the lookout for it whenever reading code. The property of "reference-ness" and "const-ness" is stripped away from variables that are being assigned from. It is possible to initialize a constant from another constant when using `auto`, but this needs to be explicitly specified as a property of the entity being initialized: -``` +```cpp const auto a{ 10 }; // define a as constant auto b = a; // define b as variable copy of a const auto c = a; // define c as constant copy of a @@ -431,7 +449,7 @@ const auto c = a; // define c as constant copy of a It is also possible to explicitly (re-)specify the reference property on the assignee side, but attempting to change the value of a constant value through a non-`const` reference is not allowed: -``` +```cpp const auto d{ 11 }; // define d as constant auto e{ 12 }; // define e as variable const auto& f{ 12 }; // define f as constant reference (to a literal constant value) @@ -456,36 +474,36 @@ Another way of qualifying a definition is with the `constexpr` keyword. This is In fact, `constexpr` expressions can be complex with recent compilers, as long as all parts of the expression are themselves `constexpr`. The following program defines two constants, one of which is `constexpr`. Only the `constexpr` entity can be tested against `static_assert()`, which is a boolean truth test checked at compile-time. Don't worry if the inequality syntax is unfamiliar as this is covered in the next Chapter; the test `PI > 3.141 && PI < 3.143` evaluates the mathematical inequality `3.141 < PI < 3.143` in a way that is valid C++: -``` +```cpp // 02-constexpr.cpp : introducing the constexpr keyword -#include +#include #include using namespace std; -const double PI1 = acos(-1.0); // acos is not (yet) constexpr +// Note: currently, not all compilers mark `acos` as a +// constexpr function in cmath. The following line might +// not compile with `clang++` for example. +constexpr double PI1 = acos(-1.0); constexpr double PI2 = 22.0 / 7.0; -// the following line does not compile and has been commented out -//static_assert(PI1 > 3.141 && PI1 < 3.143); +static_assert(PI1 > 3.141 && PI1 < 3.143); static_assert(PI2 > 3.141 && PI2 < 3.143); int main() { - cout << "PI1 = " << PI1 << '\n'; - cout << "PI2 = " << PI2 << '\n'; + println("PI1 = {}", PI1); + println("PI2 = {}", PI2); } ``` -(Hint: this program is the first to require an additional header to ``; you may need to add `-lm` to the compile command under Linux/MacOS in order to link in the math library containing the `acos()` function.) +(Hint: this program is the first to require an additional header to ``; you may need to add `-lm` to the compile command under Linux in order to link in the math library containing the `acos()` function.) **Experiment** -* Uncomment the first `static_assert()` and try to compile the program. Is the error message user-friendly? - -* Now try to make the second `static_assert()` fail by using an invalid inequality test. +* Try to make the second `static_assert()` fail by using an invalid inequality test. * Now change the program to check the value of *e* at compile time. (Hint: use the expression `exp(1.0)` to get a good approximation of *e*.) -As can be seen from attempting to compile and run this program, both `PI1` and `PI2` have exactly the same value and are both constant, but the first `static_assert()` always fails because it can't evaluate at run-time, so compilation halts. In general, a program that fails to compile is preferable to one that runs incorrectly, so `static_assert()` is a useful tool to have. It also adds no *overhead* cost at run-time. The `static_assert()` test can optionally take a second string literal parameter, this being the error message for the compiler to output if the assertion fails. +As can be seen from attempting to compile this program, `static_assert()` is a useful tool to have, and adds no run-time overhead cost. The `static_assert()` test can optionally take a second string literal parameter, this being the error message for the compiler to output if the assertion fails. -*All text and program code ©2019-2022 Richard Spencer, all rights reserved.* +*All text and program code ©2019-2025 Richard Spencer, all rights reserved.* diff --git a/03-conditions-and-operators.md b/03-conditions-and-operators.md index 3672111..d94e5ff 100644 --- a/03-conditions-and-operators.md +++ b/03-conditions-and-operators.md @@ -1,10 +1,24 @@ -# Conditions and Operators +# Conditions and Operators ## Run-time user input -The programs we have seen in the previous two chapters have been a little predictable in how they run as they have a *linear execution path* through the `main()` function. Such simple programs have very little practical use. More complex programs, which alter their *control flow* based on *user input* fall into two types. *Batch programs* take all of their input at the beginning of their execution, usually from any or all of: program parameters, an environment variable, or an input file. *Interactive programs* enact a dialog with the *user* (the computer operator) while the program is executing. This dialog is often two-way as the user is not necessarily expected to know what input is required without being prompted. Interactive programs often use either a console or a *GUI* (Graphical User Interface, historically found on desktop computers, but more often found these days on tablets and smartphones). Interactive console programs often produce output to the console *interleaved* with user input, while batch programs ususally know all of their input at the beginning of their execution and produce all of their output following this with no further user involvement or action. As an example of a modern alternative, a purely voice-activated device (possibly without a screen) has an interface which interestingly has more in common with an interactive console program than with a GUI application. +The programs we have seen in the previous two chapters have been a little predictable in how they run, as they have a *linear execution path* through the `main()` function. Such simple programs have very little practical use. More complex programs, which alter their *control flow* based on *user input* fall into two types: -As a compliment to the stream output object `cout`, the stream input object `cin` (an abbreviation of "Character Input") overloads `>>` (the *stream extraction operator*) to allow variables to be set from user input. When a `cin` input expression is reached, the program waits (indefinitely) for the user to type some input and press Enter. The following program outputs a message inviting the user to enter a number, and then prints this number out again on the console. Before `cin` is used, the variable to be used to accept the input into must have already been defined so that the type of the required input can be deduced. Providing an initial value is preferred (empty braces give it a default value) in case the read by `cin` fails due to either invalid input, such as the user typing letters where digits were required, or end-of-input (Ctrl-D or Ctrl-Z): +* *Batch programs* take all of their input at the beginning of their execution, usually from any or all of: program parameters, an environment variable(s), or an input file. + +* *Interactive programs* enact a dialog with the *user* (the computer operator) while the program is executing. This dialog is often two-way as the user is not necessarily expected to know what input is required without being prompted. + +Interactive programs often use either a console or a *GUI* (Graphical User Interface, historically found on desktop computers, but more often found these days on tablets and smartphones). Interactive console programs often produce output to the console *interleaved* with user input, while batch programs ususally know all of their input at the beginning of their execution and produce all of their output following this with no further user involvement or action. As an example of a modern alternative, a purely voice-activated device (possibly without a screen) has an interface which interestingly has more in common with an interactive console program than with a GUI application. + +Previously we have enountered `print()` and `println()` for putting formatted output to the console. Interestingly, there is currently no direct equivalent in Modern C++ for reading input. The `getline()` functions are not covered until Chapter 7, and as might be guessed from the name read a textual string which must then be processed further in order to obtain a valid value for numerical (or similar) input. For reasons of simplicity, this Chapter only covers the use of *stream objects* for reading from and writing to the console, and use if these requires the `` header. + +As a quick introduction to the stream output object `cout` (an abbreviation of "Character Output"), string literals, character literals, numeric (and other) values and variables are "put to" the console using (possibly multiple) occurrencies of `<<` (the *stream insertion operator*). There is no format string as such, the output is created from the object to the right of each `<<`, in order from left to right. For example: + +```cpp +cout << "The answer is: " << 42 << '\n'; // println("The answer is: {}", 42); +``` + +As a complement to `cout`, the stream input object `cin` (an abbreviation of "Character Input") overloads `>>` (the *stream extraction operator*) to allow variables to be set from user input. When a `cin` input expression is reached, the program waits (indefinitely) for the user to type some input and press Enter. The following program outputs a message inviting the user to enter a number, and then prints this number out again on the console. Before `cin` is used, the variable to be used to accept the input into must have already been defined so that the type of the required input can be deduced. Providing an initial value is preferred (empty braces give it the default value, zero in this case) in case the read by `cin` fails due to either invalid input, such as the user typing letters where digits were required, or end-of-input (Ctrl-D, or Ctrl-Z under Windows): ```cpp // 03-age1.cpp : get and then display an integer @@ -20,15 +34,15 @@ int main() { } ``` -Use of `cin` from the user's perspective has a few quirks. Perhaps usefully, whitespace (any spaces, tabs or preceding new-lines) is ignored, while perhaps not so usefully, non numerical input is (silently) evaluated to the number zero. Also, the program makes no checks on the range of the input, so numbers such as `200` and `-50` are accepted without complaint, and printed out. In fact, the variable `alice_age` can be set to any value that can be held by type `int`; however the number must (usually) be entered as a decimal; the prefixes for binary, octal and hexadecimal are by default only interpreted at compile-time for literals within program code. +Use of `cin` from the user's perspective has a few quirks. Perhaps usefully, whitespace (any spaces, tabs or preceding new-lines) is ignored, while perhaps not so usefully, non numerical input is (silently) evaluated to the number zero. Also, the program makes no checks on the range of the input, so numbers such as `200` and `-50` are accepted without complaint, and printed out. In fact, the variable `alice_age` can be set to any value that can be held by type `int`; however the number must (usually) be entered as a decimal; the prefixes for binary, octal and hexadecimal are by default only interpreted at compile-time for literals within program code, or by conversion functions such as `from_chars()`. ## Conditions and if-else -The keyword `if` is followed by a *conditional expression* in (mandatory) parentheses, which always evaluates to `true` or `false` at run-time (these named boolean values are implicitly convertible both to and from integer `1` and `0` respectively). (To evaluate conditions at compile-time as well the construct `if constexpr` can be used; this is discussed later.) There are a number of symbols that are combined to represent mathematical conditions of equality, greater than, and so on. Some of these symbols together with their meanings are shown in the table below: +The keyword `if` is followed by a *conditional expression* in (mandatory) parentheses, which always evaluates to `true` or `false` at run-time (these named Boolean values are implicitly convertible both to and from integer `1` and `0` respectively). (To evaluate conditions at compile-time as well the construct `if constexpr` can be used; this is discussed later in this Chapter.) There are a number of symbols that are combined to represent mathematical conditions of equality, greater than, and so on. Some of these symbols together with their meanings are shown in the table below: | Symbol | Meaning | |:------:|:---------------------:| -| == | equal* | +| == | equal * | | != | not equal | | > | greater than | | < | less than | @@ -59,19 +73,21 @@ int main() { } ``` -Notice that the scopes for both the `if` and `else` *clauses* are delimited with `{` and `}`, and that indentation is used for the `cout` operations within them. Notice also that the `if` and the `else` keywords line up vertically, this style is recommended in order to enable in-editor code folding to work, amongst other reasons. In this program the braces for the `if` and `else` clauses are in fact optional because they comprise only a single statement each, however using braces even where not strictly needed is again strongly recommended in case extra code needs to be added to the clauses later (and because code folding often only works in editors where an opening brace exists). Braces for function **definitions**, including `main()`, are always mandatory, even in the case of single-statement or empty functions. Function **declarations**, by contrast, do not have braces; they are analogous to a C++ **statement** being followed by a semi-colon. +Notice that the scopes for both the `if` and `else` *clauses* are delimited with `{` and `}`, and that indentation is used for the `cout` operations within them. Notice also that the `if` and the `else` keywords line up vertically, this style is recommended in order to enable in-editor code folding to work, amongst other reasons. In this program the braces for the `if` and `else` clauses are in this case optional because they comprise only a single statement each, however using braces even where not strictly needed is again strongly recommended in case extra code needs to be added to the clauses later (and because code folding often only works in editors where an opening brace exists). + +Note: Braces for function **definitions**, including `main()`, are always mandatory, even in the case of single-statement or empty functions. Function **declarations**, by contrast, do not have braces; they are analogous to a C++ **statement** ending with a semi-colon. **Experiment** * What happens if you press Ctrl-D (Ctrl-Z then Enter under Windows) when prompted? Can you explain why this is? -* Change the program to test against non-zero using the "not equal" operator and a `0`. +* Change the program to test for non-zero using the "not equal" operator and a `0`. Does this work in the same way? -* Change the program again to test against zero, and change the output statements appropriately so the output remains correct. +* Change the program again to test for "equals" zero (as opposed to "not equal"), and change the output statements appropriately so the same logic remains. Is this program better? Consider whether the *happy path* should be satisfied by the first "if" clause (as opposed to "else"). -* Now alter the original program to test a floating-point (`double`) variable as being zero or non-zero. +* Now alter the original program to test a floating-point (`double`) variable as being zero or non-zero. Do you consider use of `0.0` as being better style? -* Delete the braces surrounding the `if` and `else` clauses. Does the code still compile? What happens if you added a second statement line to the `else` clause? Or the `if` clause? +* Delete the braces surrounding the `if` and `else` clauses. Does the code still compile? What happens if you add a second statement line to the `else` clause? Or the `if` clause? The `if` statement is a binary choice, however some decisions require more than two options. To enable this, `if` statements can be *chained* together. The following program chains a further `if` onto the tail of the first `else` clause. Note that in this special case, using braces for the first `else` clause is **not** recommended as this would indent the code. The combination `else if` (with mandatory space) is unambiguous to readers of your code; a second statement to the first `else` clause (which would necessitate braces) is unlikely to be needed as the (possibly itself chained) `if` which follows counts as a single statement. @@ -129,7 +145,7 @@ int main() { **Experiment** -* Change the above program so that the test logic is inverted with the output remaining the same, in other words `alice_age` falling outside the range 6-11 results in a positive condition test. Hint: you will need to use the `or` keyword and change the order of the output statements. +* Change the above program so that the test logic is inverted, while the output remains the same; in other words `alice_age` falling outside the range 6-11 results in a positive condition test. Hint: you will need to use the `or` keyword and change the order of the output statements. * Now change `and` to `&&` in the original program. Does it still compile and run? Which style do you prefer? @@ -176,23 +192,23 @@ int main() { } ``` -Notice that "getting" multiple variables from `cin` allows for the input of three values together, optionally separated by whitespace or newlines. This permissiveness can be useful in some cases but doesn't handle erroneous input very well so is often unsuitable to be used in production code (as error recovery involves clearing the error state, possibly losing input in the process). The four `case` statements each check for a valid integer (actually a character literal) stored in `op` and program flow jumps to the one that matches, if any. The `break` statements are necessary and cause control flow to jump to the closing brace of the switch block; if they were not present flow would *fall through* to the next `case` statement, which is rarely desirable. The `default` case statement is optional, and program flow always continues here if none of the `case` statements match, if it is not present the compiler will often produce a warning. +Notice that "getting" multiple variables from `cin` allows for the input of three values together, optionally separated by whitespace or newlines. This permissiveness can be useful in some cases but doesn't handle erroneous input very well so is often unsuitable to be used in production code (as error recovery involves clearing the error state, possibly losing input in the process). The four `case` statements each check for a valid integer (actually a character literal) stored in `op` and program flow jumps to the one that matches, if any. The `break` statements are necessary and cause control flow to jump to the closing brace of the switch block; if they were not present flow would *fall through* to the next `case` statement, which is rarely desirable. The `default` case statement is optional but usually desirable, and program flow always continues here if none of the `case` statements match; if it is not present the compiler will often produce a warning. -Notice also the use of `cerr` to output error messages to the *standard error stream*; by default `cerr` echos to the terminal (the same as for `cout`) but this output can be redirected at run-time to a text file (or a null device). The `if` test for zero divisor should be familiar syntax by now and prevents a possible floating-point exception. In this case, and in the case of an error, the result variable contains the default value zero. +Notice also the use of `cerr` to output error messages to the *standard error stream*; by default `cerr` echoes to the terminal (the same as for `cout`) but this output can be redirected at run-time to a text file (or a null device). The `if` test for zero divisor should be familiar syntax by now and prevents a possible floating-point exception. In this case, and in the case of an error caused by an invalid operator, the result variable `r` contains the default value zero. **Experiment** -* Change the type of input to `double` and make sure the program still compiles and runs correctly. +* Change the type of the input and result variables to `double` and make sure the program still compiles and runs correctly. -* Add a `case` clause for the exponentiation operator `'^'` which calls the function `pow(x,y)` (C++ has no built-in exponentiation operator, `'^'` in code actually means bitwise exclusive-or). Hint: you will need `#include ` and possibly also `-lm` on the link path. +* Add a `case` clause for the exponentiation operator `'^'` which calls the function `pow(x,y)` (C++ has no built-in exponentiation operator, `^` in code actually means bitwise exclusive-or). Hint: you will need `#include ` and possibly also `-lm` on the link path. -* Go back to using `int` variables and add the modulo operator `%` to the list of valid operators. You will need to add a suitable `case` clause. Note that this operation gives the remainder from a division, so divide-by-zero needs to be caught here as well. +* Go back to using `int` variables and add the modulo operator `%` to the list of valid operators. You will need to add a suitable `case` clause. Note: this operation gives the remainder from a division, so divide-by-zero needs to be caught here as well. -* Rewrite the case values as plain decimal integers, obtained from a table showing ASCII characteres against their numbers. Then try using hexadecimal values, and then octal values. +* Rewrite the case values as plain decimal integers, obtained from a table showing ASCII characters against their numbers. Then try using hexadecimal values, and then octal values. -* Rewrite the whole switch-case block as multiple if-else-if statements. +* Rewrite the whole switch-case block as multiple if-else-if... statements. Test all control-flow paths. -The need for `break` statements at the end of each `case` clause has already been mentioned. Occasionally the behavior of program flow falling through to the next case can be useful. More often, multiple `case` matches using the same code is the desired behavior. The following program demonstrates the former of these: +The need for `break` statements at the end of each `case` clause has already been mentioned, however occasionally the behavior of program flow falling through to the next case can be useful. More often, multiple `case` matches using the same code is the desired behavior. The following program demonstrates the former of these: ```cpp // 03-fallthrough.cpp : demonstrate case clauses without break @@ -224,7 +240,7 @@ int main() { } ``` -Notice that `case 1:` falls through into `case 2:`, and `case 0:` falls through into both of these. Some compilers will warn where `break` is missing from a `case` clause as it is a common programming mistake; this warning can be suppressed by writing `[[fallthrough]]` (this is a C++ *attribute*) where the compiler is expecting to find `break` (immediately before the next `case`). Using this attribute in the way shown here provides clarity to both human reader and compiler, it is not necessary where `case` statements follow on immediately with no code between. +Notice that `case 1:` "falls through" into `case 2:`, and `case 0:` falls through into both of these. Some compilers will warn where `break` is missing from a `case` clause as it is a common programming mistake; this warning can be suppressed by writing `[[fallthrough]]` (this is a C++ *attribute*) where the compiler is expecting to find `break` (immediately before the next `case`). Using this attribute in the way shown here provides clarity to both human reader and compiler; it is not necessary where `case` statements follow on immediately with no code between. **Experiment:** @@ -269,7 +285,7 @@ int main() { } ``` -Note that the parentheses around the **whole** conditional expression **are** needed as `<<` has a higher precedence than `?:`. +Note: the parentheses around the **whole** conditional expression **are** needed as `<<` has a higher precedence than `?:`. **Experiment** @@ -307,9 +323,9 @@ The variable defined in the initializer can optionally be used in the condition **Experiment** -* Try to use `n` and `m` after the closing brace of the `else` clause. +* Try to use `n` and `m` after the closing brace of the `else` clause. Which, if either, is possible? What does this tell you about the scope of an initializer-defined variable? -* Rewrite this program to use a `switch` statement instead of `if`. Pay close attention to the conditional expression needed. The new program should correctly handle all inputs and produce identical output to the one shown. +* Rewrite this program to use a `switch` statement instead of `if`. The new program should correctly handle all inputs and produce identical output to the one shown. Hint: you may want to use `case` statements which fall through. ## Constexpr if @@ -367,4 +383,4 @@ The table below is intended to be a complete list, and as such introduces operat | throw | right to left | exception throw expression | throw expression | | , | left to right | comma sequencing operator | expression, expression | -*All text and program code ©2019-2022 Richard Spencer, all rights reserved.* +*All text and program code ©2019-2025 Richard Spencer, all rights reserved.* diff --git a/04-functions.md b/04-functions.md index e6c03b7..b790fe1 100644 --- a/04-functions.md +++ b/04-functions.md @@ -1,8 +1,8 @@ -# Functions +# Functions ## Scopes -We have become familiar with the `main()` function, which is automatically called (or *entered*) when the program starts. Variables defined within `main()` have been called local variables because they are local to the scope of `main()`. Importantly they are **not** visible within any functions called by `main()`, even though they retain their state between such calls. The following program defines three variables and also three functions (one of which is `main()`); these three variables have the same name, but different types and values. The values of each of these variables are only accessible within the functions they are defined in, that is: the variables are only *visible* within their own defining function's *scope*. +We have become familiar with the `main()` function, which is automatically called (or *entered*) when the program starts. Variables defined within `main()` have been called local variables because they are local to the scope of `main()`. Importantly, they are **not** visible within any functions called by `main()`, even though they retain their state between such calls. The following program defines three variables and also three functions (one of which is `main()`); these three variables have the same name, but different types and values. The values of each of these variables are only accessible within the functions they are defined in, that is: the variables are only *visible* within their own defining function's *scope*. ```cpp // 04-scope.cpp : demonstrate function scope rules @@ -61,7 +61,7 @@ Local variables with the same name (but not necessarily the same type) as one in ## Return value -Functions are declared or defined with a type known to the compiler before the function name, the keyword `auto`, or the keyword `void` if there is none. This type can be a user-defined type as we shall discover later, or perhaps more commonly one of the built-in types such as `int`, `double` and so on. The value thus returned is known as the *return value*; its type is the *return type* of the function. In case of `auto`, the return type is deduced from the entity (entities) after the `return` statement(s); if there is more than one they must return values of the same type. The `return` keyword is implicit at the end of a `void` function; it can also be explicitly used (for example in an `if` clause) to exit from the function early. +Functions are declared or defined with a type known to the compiler before the function name, the keyword `auto`, or the keyword `void` if there is none. This type can be a user-defined type as we shall discover later, or perhaps more commonly one of the built-in types such as `int`, `double` and so on. The value thus returned is known as the *return value*; its type is the *return type* of the function. In case of `auto`, the return type is deduced from the entity (entities) after the `return` statement(s); if there is more than one they must return values of the same type. The `return` keyword is implicit at the end of a `void` function; it can also be explicitly used without a value (for example in an `if` clause) to exit from the function early. The `main()` function is always defined to return an `int` (it can also be `void` in C but this is not legal C++). Uniquely to `main()`, a `return 0;` statement is implicit at the function's closing brace. This causes a return value of zero (which indicates successful execution) to be returned to the calling environment or process; this value is sometimes called the *return code* of a program. Other values are used to indicate different error conditions encountered; a return code of either zero or non-zero is allowed at any point within `main()`, including at the end. @@ -90,7 +90,7 @@ int main() { } ``` -In fact, the call of `abs_value()` yielding its return value could be used directly in the second `cout` call, which means a named variable `a` is not needed. Using a (temporary) variable to store the return value of a function could be seen as unnecessary if the value is used only once, however if the return value of a function is needed more than once and is not stored in a variable, the function must be called multiple times which could become inefficient. +In fact, the call of `abs_value()` yielding its return value could be used directly in the second `cout` call, which means a named variable `a` is not needed. Using a (temporary) variable to store the return value of a function could be seen as unnecessary if the value is used only once, however if the return value of a function is needed more than once and is not stored in a variable, the function must be called every time its return value is needed, which could become inefficient. **Experiment** @@ -98,7 +98,7 @@ In fact, the call of `abs_value()` yielding its return value could be used direc * Modify `abs_value()` so that the keyword `else` is used. Does this make the code any more obvious in intent? Do you get a warning about there being no `return` keyword outside of the `if`-`else` clauses? What happens if you add a third `return` statement just before the function's closing brace? -* Rearrange the order of the definitions (all beginnning with `int`). What errors do you get? +* Rearrange the order of the variable and/or function definitions (all beginning with `int`). What errors do you get? ## Parameters by value @@ -146,7 +146,7 @@ The way the variable `value` is passed from `main()` to `abs_value()` is describ As we have seen, variables which are defined as references are not copies of existing variables, instead they are an alternative name, or *alias*, of a variable **which already exists**. References become particularly useful when defining them in a **different** scope to the variable they reference. As we have seen, a *callee* function cannot access local variables within the *caller* function, instead it can only reference global variables and variables passed as parameters. -Parameter variables an be defined as references by using a single ampersand (`&`) between the type and the variable name in the parameter list. This small and subtle change completely changes the semantics of the function. Changes to a **parameter** variable defined as a *pass by reference* will change the **argument** variable in the calling function, as shown in the following program: +Parameter variables can be defined as references by using a single ampersand (`&`) between the type and the variable name in the parameter list. This small and subtle change completely changes the semantics of the function. Changes to a **parameter** variable defined as a *pass by reference* will change the **argument** variable in the calling function, as shown in the following program: ```cpp // 04-absolute3.cpp : modify a parameter to become its absolute value @@ -187,7 +187,7 @@ The rule for declarations is that an object can be declared multiple times if al A function prototype (or *forward declaration*) is the minimum syntax that needs to have been "seen" before the function can be called. The syntax is simple, the return type, function name and types from the parameter list (the variable names are actually optional, but are often included) each with an optional default value, followed by a semi-colon. This declaration must match *exactly* with the function definition (apart from the presence of default values) for the code to compile and link correctly. The forward declaration of the most recent variant of `abs_value()` is simply: -``` +```cpp void abs_value(int& v); // Function declaration only, not a definition ``` @@ -203,7 +203,7 @@ void abs_value(int& v); // Function declaration only, not a definition ## Default arguments -Providing the wrong number of arguments in a function call always results in a compile-time error. (You may also get errors if the number of parameters in a function definition, or their types, don't match those in a previous function declaration. Unless the name, number of parameters, and their types match **exactly** they will be assumed to be different functions.) C++ provides a way for any or all of the parameters in a function call to be optional, and if not present in the argument list are substituted with default values provided in the function declaration only. (Providing them in the function **definition** is not sufficient or even allowed, for technical reasons, unless defined **before** the *call site* with no declaration used). +Providing the wrong number of arguments in a function call always results in a compile-time error. (You may also get errors if the number of parameters in a function definition, or their types, don't match those in a previous function dedeclaration. Unless the number of parameters, and their types match **exactly** they will be assumed to be different functions; the names used are unimportant and can be different, or even omitted altogether, in function declarations.) C++ provides a way for any or all of the parameters in a function call to be optional, and if not present in the argument list are substituted with default values provided in the function declaration only. (Providing them in the function **definition** is not sufficient or even allowed, for technical reasons, unless defined **before** the *call site* with no declaration used). The following program uses *head recursion* to print out a number in any base up to 16 (defaulting to base 10): @@ -243,7 +243,7 @@ This is the most complex program we have seen so far, although it does not conta * The function **declaration** for `print_base_n()` contains `= 10`. This is the *default value* for the second argument, which is substituted at the appropriate point in the parameter list, if necessary. For example, a function call `print_base_n(1021)` is substituted by `print_base_n(1021, 10)`; this substitution takes place at compile-time. -* The *recursive* function `print_base_n()`, so called because it conditionally calls itself, checks whether or not we are dealing with the **most** significant digit, calling itself **without** the **least** significant digit otherwise (and also with the second parameter it received). In Modern C++, recursive functions can be used without a prototype having already been seen. +* The *recursive* function `print_base_n()`, so called because it conditionally calls itself, checks whether or not we are dealing with the **most** significant digit, calling itself **without** the **least** significant digit otherwise (and also with the second parameter it received). In Modern C++, recursive functions can be used without a prototype (declaration) having already been seen. * The `cout` line outputs a single character which is an index into a string literal of the **least** significant digit (square brackets `[` and `]` are the array index operators, and we are indexing a string literal as if it were an array, which is perfectly legal C++). @@ -253,7 +253,7 @@ If you're struggling to follow the control flow through the recursion then imagi * Remove the variable names `num` and `base` from the declaration of `print_base_n()`. Does the program still compile? What happens if you choose other names instead? -* Make sure the program works correctly by checking with binary, octal and hexadecimal **literals**, and bases 2, 8 and 16 **at run time** respectively. +* Make sure the program works correctly by checking with binary, octal and hexadecimal **literals**, and bases 2, 8 and 16 **at run time** respectively. (Use of `static_assert()` is not possible because of the use of side-effect producing `cout`.) * Modify the program again, so that numbers printed out in up to base 64 are supported. @@ -294,7 +294,7 @@ f(): recieved int: 2 g(): recieved double: 2.5 ``` -Notice that the call `g(1)` promotes the `int` argument to `double` silently, although this is not apparent when printing the number (it doesn't print as `1.0`, but could be made to with stream formatting manipulators, see Chapter 8). Also, notice that the call `f(2.5)` narrows the `double` argument to `int`, so the fractional part is lost. +Notice that the call `g(1)` promotes the `int` argument to `double` silently, although this is not apparent when printing the number (it doesn't print as `1.0`, but could be made to with stream formatting manipulators, see Chapter 8). Also, notice that the call `f(2.5)` silently narrows the `double` argument to `int`, so the fractional part is lost. It is possible to write code that disallows narrowing casts by using universal references and perfect forwarding but demonstrating this is beyond the scope of this Tutorial. You should be aware that in general functions calls may silently produce narrowing effects, however some implicit conversions (such as pointer to integer or floating-point number) are not allowed. @@ -333,7 +333,7 @@ f(): int: 1 f(): double: 2.5 ``` -The function to be used is determined at compile-time from the usage at the call site, as the types of the arguments are always known. A best-match is performed in the case of no exact match, so for example `f('a')` would call `f(int)` while `f(0.5f)` would call `f(double)`. +The function to be used is determined at compile-time from the usage at the call site, as the types of the arguments are always known. A "best-match" is performed in the case of no exact match, so for example `f('a')` would call `f(int)` while `f(0.5f)` would call `f(double)`. **Experiment** @@ -346,12 +346,12 @@ Variables declared `static` inside a function body are in fact global variables ```cpp // 04-static-var.cpp : preserving function state in a static variable -#include +#include using namespace std; void f() { static int s{1}; - cout << s << '\n'; + println("{}", s); ++s; } @@ -380,7 +380,7 @@ Static local variables are slightly deprecated in C++ because they are not *thre Variables declared `thread_local` within a function have a new copy of the variable created upon launching a new thread, which is independent from others within the calling thread or any other thread. Since the way in C++ to launch a new thread is to specify a function to be called, this behavior is useful in multi-threaded programs. Further discussion of *parallelism* is beyond the scope of this Tutorial. (Variables can also be declared both `static` and `thread_local`.) -Functions can be declared `static` by prefixing the return type in the function declaration and definition with the keyword `static`. As with global variables, this reduces the visibility of the function to the translation unit it is defined within. More useful in most cases are `inline` functions, described later. +Functions can be declared `static` by prefixing the return type in the function declaration and definition with the keyword `static`. As with global variables, this reduces the visibility of the function to the translation unit it is defined within. More useful in most cases are `inline` functions, described later in this Chapter. ## Structured bindings @@ -419,18 +419,18 @@ There are three main new things to notice about this program. **Experiment** -* Make `get_numbers()` return three variables, the third being `unsigned`. Hint: you will need to use `return tuple{ d, i, u };` or similar, use `#include `. +* Make `get_numbers()` return three variables, the third being `unsigned`. Hint: you will need to use `return tuple{ d, i, u };` or similar. Hint: use `#include `. * Rewrite `get_numbers()` to accept and modify two reference parameters, and return results to `main()` in this way. ## Inline functions -Functions can be declared as inline functions by using the keyword `inline` before the return type in the function definition. The main aim of declaring a function `inline` is to remove the time overhead of a function call; the code is replicated for each function call *in place* at the call site(s). Functions declared with `inline` must be present (and identical) in each translation unit that uses them, hence they often appear in header files; this is a special relaxation of the ODR. Overuse of inline functions can lead to *code-bloat*, so they are best reserved for very short functions. The following program demonstrates use of the `inline` keyword: +Functions can be declared as inline functions by using the keyword `inline` before the return type in the function definition. The main aim of declaring a function `inline` is to remove the time overhead of a function call; the function body's code is allowed to be replicated for each function call *in place* at the call site(s). Functions declared with `inline` must be present (and identical) in each translation unit that uses them, hence they often appear in header files; this is a special relaxation of the ODR. Overuse of inline functions can lead to *code-bloat*, so they are best reserved for very short functions. The following program demonstrates use of the `inline` keyword: ```cpp // 04-inline.cpp : use of an inline function -#include +#include using namespace std; inline void swap(int& x, int& y) { @@ -441,9 +441,9 @@ inline void swap(int& x, int& y) { int main() { int a = 1, b = 2; - cout << "(1) a = " << a << ", b = " << b << '\n'; + println("(1) a = {}, b = {}", a, b); swap(a, b); - cout << "(2) a = " << a << ", b = " << b << '\n'; + println("(2) a = {}, b = {}", a, b); } ``` @@ -504,7 +504,7 @@ Note that it is **not** necessary (or even possible) to use `if constexpr` for t ## Non-returning and noexcept functions -It is possible to write a function which never returns, for example using an infinite loop. Another example might be a function that causes an abnormal early exit from the running program; the Modern C++ way of doing this is to throw an exception, or even call `std::terminate()` directly (the C Standard Library also provides `abort()`, `exit()` and `quick_exit()` but these do not deallocate all global objects correctly). The way to indicate this property to the compiler is to use the `[[noreturn]]` attribute when declaring the function, as shown in this example program: +It is possible to write a function which never returns, for example using an infinite loop. Another example might be a function that causes an abnormal early exit from the running program; the Modern C++ way of doing this is to throw an exception, or even to call `std::terminate()` directly (the C Standard Library also provides `abort()`, `exit()` and `quick_exit()` but these do not deallocate all global objects correctly). The way to indicate this property to the compiler is to use the `[[noreturn]]` attribute when declaring the function, as shown in this example program: ```cpp // 04-noreturn.cpp : program which does not return from main() @@ -534,27 +534,27 @@ The keyword `noexcept` is used to declare that a function is guaranteed to not t ```cpp // 04-noexcept.cpp : a noexcept function throwing an exception -#include +#include #include using namespace std; -int throw_if_zero(int i) noexcept { +void throw_if_zero(int i) noexcept { if (!i) { throw runtime_error("found a zero"); } - cout << "throw_if_zero(): " << i << '\n'; + println("throw_if_zero(): {}", i); } int main() { - cout << "Entering main()\n"; + println("Entering main()"); try { throw_if_zero(1); throw_if_zero(0); } - catch(...) { - cout << "Caught an exception!\n"; + catch(exception& e) { + println("Caught an exception: {}", e.what()); } - cout << "Leaving main()\n"; + println("Leaving main()"); } ``` @@ -562,4 +562,4 @@ int main() { * Remove the `noexcept` keyword. Does the program compile? What is the output when run? -*All text and program code ©2019-2022 Richard Spencer, all rights reserved.* +*All text and program code ©2019-2025 Richard Spencer, all rights reserved.* diff --git a/05-arrays-pointers-and-loops.md b/05-arrays-pointers-and-loops.md index 57c02a0..df874a9 100644 --- a/05-arrays-pointers-and-loops.md +++ b/05-arrays-pointers-and-loops.md @@ -2,17 +2,17 @@ ## Number and character arrays -A C++ array can be described as a collection of entities *of the same type* and arranged *contiguously* in memory. C++ inherits its *built-in array* syntax from C, sometimes these are referred to as *C-style* arrays. Uniform initialization syntax can be used to assign the contents of an array at the point it is defined (and **only** at this point). This is called *aggregate initialization* using a *braced initializer* (the equals sign shown below is in fact optional): +A C++ array can be described as a collection of entities *of the same type* arranged *contiguously* in memory. C++ inherits its *built-in array* syntax from C, and sometimes these are referred to as *C-style* arrays. Uniform initialization syntax can be used to assign the contents of an array at the point it is defined (and **only** at this point). This is called *aggregate initialization* using a *braced initializer* (the equals sign shown below is in fact optional, as for uniform initialization in general): -``` +```cpp int numbers[] = { 1, 2, 3, 4, 5 }; ``` -Notice that the type is `int[]` ("array of `int`"), however the square brackets *bind* to the variable name, in this case `numbers`, not to the type specifier, in this case `int`. The optional number between the square brackets (which must be a **constant** known at compile-time) is the length of the array; this is fixed at compile-time and cannot be changed at run-time. If no value is provided here then it is calculated from the number of *elements* which make up the initializer (in this case the value is 5). The array size must be **at least** as large as the initializer being assigned from, otherwise a compile-time error is produced. If the size of the array is given as greater than the number of elements in the initializer, the remaining elements are default-constructed (zeroized for the built-in types). +Notice that the type is `int[]` ("array of `int`"), however the square brackets *bind* to the variable name, in this case `numbers`, **not** to the type specifier, in this case `int`. The optional number between the square brackets (which must be a **constant** known at compile-time, if present) is the length of the array; this is fixed at compile-time and cannot be changed at run-time. If no value is provided here then it is calculated from the number of *elements* which make up the initializer (in this case the value is 5). If provided, the array size must be **at least** as large as the initializer being assigned from, otherwise a compile-time error is produced. If the size of the array is given as greater than the number of elements in the initializer, the remaining elements are default-constructed (zeroized for the built-in types). The array variable `numbers` is writable through subscripting syntax using square brackets `[` and `]`. The array index starts at **zero** (`[0]`) for the first element. Attempting to read or write beyond the last element is undefined behavior, as is use of negative indices. -``` +```cpp numbers[4] = 6; // ok, numbers[] is { 1, 2, 3, 4, 6 } numbers[5] = 99; // not ok, compiles but yields undefined behavior auto i = numbers[0]; // ok, i is 1 @@ -21,21 +21,21 @@ auto j = numbers[-1]; // not ok, compiles but yields undefined behavior A string literal can be thought of as simply an array of characters, thus a string literal can be used to initialize an array of `char`: -``` +```cpp char name[] = "Dinah"; ``` -This type of array is modifiable, so individual letters can be changed using array indexing syntax. (Actually fact the variable contents are writable is not without overhead; the string literal used to initialize the array is stored in a read-only part of the executable binary and is copied into the newly-allocated array at run-time.) A terminating zero-byte is also added to the array, so the array length implicit inside the square brackets is 6, not 5. +This type of array is modifiable, so individual letters can be changed using array indexing syntax. (Actually, the fact that the variable contents are writable is not without overhead; the string literal used to initialize the array is stored in a read-only part of the executable binary and is copied into the newly-allocated array at run-time.) A terminating zero-byte is also added to the array, so the array length implicit inside the square brackets is 6, not 5. A braced initializer can also be used with character literals as elements, so the same result could be achieved by using: -``` -char name2[] = { 'D', 'i', 'n', 'a', 'h', '\0' ); +```cpp +char name2[] = { 'D', 'i', 'n', 'a', 'h', '\0' }; ``` This time the terminating zero-byte has to be explicitly specified, if it is desired; both `name` and `name2` are safe to be put to streams such as `cout` as they each have this terminating zero-byte. A single element of each of these variables could also be output, and would produce the same output as a character literal: -``` +```cpp cout << name << '\n'; // outputs "Dinah" followed by new-line cout << name[0]; // outputs "D" ``` @@ -48,7 +48,7 @@ Here, `size(name)` would return 6, while `size_bytes(numbers)` would return 20, We've previously seen that string literals can be output using `cout` without concerning ourselves with the details. The built-in `for` command can be used over a *range of values* applying the same operation(s) to each one in turn. This type of `for` statement is known as a *range-based for loop*, or range-for for short. -A range-for statement can have two or three parts enclosed by parentheses. The initializer statement (the same as for `if` and `switch` statements) is the optional first part, and is followed, if present, with a **semi-colon**. Then follows the *for-loop variable* definition, which can be declared with either `auto` or with an explicit type, and with optional `const` (constant) and `&` (reference) or `&&` (universal reference) semantics. Then a **colon** separates this from the expression to be *iterated* over, known as the *range expression*. This program demonstrates simple use of a two-part range-for (without an initializer): +A range-for statement can have either two or three parts enclosed by parentheses. The initializer statement (the same as for `if` and `switch` statements) is the optional first part, and is followed, if present, with a **semi-colon**. Then follows the *for-loop variable* definition, which can be declared with either `auto` or with an explicit type, and with optional `const` (constant) and `&` (reference) or `&&` (universal reference) qualifiers. Then a **colon** separates this from the expression to be *iterated* over, known as the *range expression*. This program demonstrates simple use of a two-part range-for (without an initializer): ```cpp // 05-range-for.cpp : print a string literal vertically @@ -63,7 +63,7 @@ int main() { } ``` -Here the for-loop variable `c` is deduced (due to the use of `auto`) to be of type `char`, the type of a single element of the range expression `"Dinah"`. The contents of the variable is actually a *copy* of a single element in the range expression; if `auto&` were used instead it would be a reference to a single element within the range expression. The for-loop variable is then sent to `cout` as a single character literal. +Here the for-loop variable `c` is deduced (due to the use of `auto`) to be of type `char`, the type of a single element of the range expression `"Dinah"`. The contents of the variable is actually a *copy* of a single element in the range expression; if `auto&` were used instead it would be a reference to a single element within the range expression, and assignment **to** it would mutate the range expression itself. The for-loop variable is then sent to `cout` as a single character literal. **Experiment** @@ -73,36 +73,36 @@ Here the for-loop variable `c` is deduced (due to the use of `auto`) to be of ty * Now change back to use of `auto` and try using the other types of string literal (recall the prefixes `u8`, `u` and `U`). Does the program still compile and produce the correct output? What about when non-UTF7 characters are used in the string literal? -* Add an initializer statement of type `bool` and make the program output `D,i,n,a,h` using the same range expression. Hint: you might need to use `if` statements +* Add an initializer statement of type `bool` and make the program output `D,i,n,a,h` using the same range expression. Hint: you might need to use `if` statements. * Now declare a separate variable as an array of `char` and use this named variable as the range expression. -* Declare `c` as a reference variable. Does the program still compile? What could be the use of this when using a writable range expression? +* Declare `c` as a reference variable. Does the program still compile? What could be the use of this when using a non-`const` range expression? -In fact, range-for loops can be used with any type of array or container which supports `std::begin()` and `std::end()`, not just built-in types. However, further discussion of creating your own types which can be iterated over in this way is beyond the scope of this Tutorial. +In fact, range-for loops can be used with any type of array or container which supports `std::begin()` and `std::end()` (or has member functions with these names), not just built-in types. However, further discussion of creating your own types which can be iterated over in this way is beyond the scope of this Tutorial. ## Pointers We have learned that subscripting syntax can be used with string literals and built-in arrays. You may be surprised to learn that subscripting also works with pointers. So what exactly is a pointer in C++? -A pointer is a variable that holds a machine address, therefore on most modern machines it is a 64-bit value. Pointers can be `const` or point to `const` data, or both; they can also be typed or untyped (subscripting only works on typed pointers). In addition they can hold the value `nullptr` instead of a (hopefully valid) memory address. +A pointer is a variable that holds a machine address, and is therefore on most modern machines a 64-bit value. Pointers can be `const` or point to `const` data, or both; they can also be typed or untyped (subscripting only works on typed pointers). In addition they can hold the value `nullptr` which safely indicates an invalid memory address. Assigning a string literal directly to a variable declared with `auto` actually assigns a pointer to the first character of the (read-only) string literal. Thus the following two assignments are identical: -``` +```cpp auto s1 = "Dinah"; const char *s2 = "Dinah"; ``` In each case subscripting syntax can be used, in this case from zero up to five, and individual elements can be compared or output. Directly comparing the values of the two pointers, compares the memory addresses, not the value(s) they point to, as shown here: -``` +```cpp if (s1[0] == s2[0]) { /*...*/ } // condition test evaluates to true if (s1 == s2) { /*...*/ } // condition test (probably) evaluates to false // (your compiler may optimize the two data entities into one) ``` -Pointer variables are defined using an asterisk in all cases (it is optional when using `auto`). An asterisk is also used to *dereference* a pointer, that is access the value it "points to". The following program defines a variable `i` and a pointer `p` that points to it (that is `p` holds `i`'s machine address). The type of `i` is `int` while the type of `p` is `int*`. A variable `j` is used to hold user input: +Pointer variables are defined **using an asterisk in all cases** (except that it is optional when using `auto`). An asterisk is also used to *dereference* a pointer, that is access the value it "points to". The following program defines a variable `i` and a pointer `p` that points to it (that is `p` holds `i`'s machine address). The type of `i` is `int` while the type of `p` is `int*`. A variable `j` is used to hold user input: ```cpp // 05-pointer.cpp : write a variables value through a pointer @@ -130,7 +130,7 @@ Please enter an integer: 10 (2) p = 0x7ffd3082cf04, *p = 10, i = 10 ``` -In this program the definition `int *p;` makes `p` a pointer to an `int`. At this point in the program it has not been assigned to, and is therefore an *uninitialized pointer*. The syntax `&i` means address-of `i` (the memory address of **any** variable can be obtained by preceding it with an ampersand in this way) and this value is assigned to `p`. Be careful not to confuse this with the definition of a reference, where the ampersand is on the opposite side of the equals sign. +In this program the definition `int *p;` makes `p` a pointer to an `int`. At this point in the program it has not been assigned to, and is therefore an *uninitialized pointer*. (We could have explicitly or implicitly initialized it with `nullptr`, if desired.) The syntax `&i` means address-of `i` (the memory address of **any** variable can be obtained by preceding it with an ampersand in this way) and this value is assigned to `p`. Be careful not to confuse this with the definition of a reference, where the ampersand is **on the opposite side** of the equals sign. The value of the entity `p` points to can be output by sending `*p` to `cout`. Changing `*p` also changes `i`; this behavior might surprise you, it's almost as if `i` has changed without permission. As can be seen from the output of this program, `p` has the same value throughout, while `*p` and `i` change together. @@ -167,15 +167,15 @@ int main() { } ``` -A few new things to notice about this program. +A few new things to notice about this program: -* The size of the array called `str[]` is set by an integer constant, and this value is needed twice more in the program where it is accessible as `size(str)`, a compile-time value. An alternative way would be to use a constant or macro at each point the value is needed, and this is the only way to avoid repeated *magic constants* in older versions of C++ which did not provide `std::size()`. +* The size of the array called `str[]` is set by an integer constant, and this value is needed twice more in the program where it is accessible as `size(str)`, a compile-time value. An alternative way would be to use a constant or macro at each point the value is needed, and this was the only way to avoid repeated *magic constants* in older versions of C++ which did not provide `std::size()`. -* The function `cin.getline()` (actually the dot `.` indicates that `getline` is a *member function* of `cin`, more on these later) is called to read keyboard input into `str[]`. This reads input directly into the memory location provided by the first argument up to a maximum number of characters (including the zero terminator) as set by the second argument. A new-line character ('\n') is never stored and any extra input which doesn't fit into `str[]` is saved for future calls to `cin`. +* The function `cin.getline()` (actually the dot `.` indicates that `getline()` is a *member function* of `cin`, more about these in Chapter 6) is called to read keyboard input into `str[]`. This reads input directly into the memory location provided by the first argument up to a maximum number of characters (including the zero terminator) as set by the second argument. A new-line character `'\n'` is **never** stored and any extra input which doesn't fit into `str[]` is saved for future calls to `cin`. -* The pointer `p` is set to the first character of `str[]`, and the type `const char *` specifies that we do not wish to modify it. In fact assigning an array to a (correctly typed) pointer is an implicit conversion, known as *array decay* because the size attribute is "lost". This also occurs when calling a function using an array as an argument, to either an array **or** pointer parameter. (In the same manner as `i` and `j` for temporary `int` variables, `p` is a common name for a pointer.) +* The pointer `p` is set to the first character of `str[]`, and the type `const char *` specifies that we do not wish to modify **what it points to**. In fact assigning an array to a (correctly typed) pointer is an implicit conversion, which is known as *array decay* because the size attribute is "lost". This also occurs when calling a function using an array as an argument, to either a pointer **or** (non-sized) array parameter. (In the same manner as when using `i` and `j` for temporary `int` variables, `p` is a common name for a pointer.) -* The **dereferenced** value `*p` is checked against zero by the `while` loop condition test, and if it is non-zero then it is sent to `cout` by the body of the `while` loop. The increment of `p` (actually a pre-increment, the one you should prefer given a choice) is necessary to prevent an infinite loop outputting the first character of `str[]`. This changes the value of `p` by one in order to point it towards the next character (of `str[]`), a process which repeats until the terminating zero-byte is reached. Importantly, `str[]` is left unchanged and remains able to be used again; this is the motivation for assigning `str` to a different (mutable) pointer variable `p`. +* The **dereferenced** value `*p` is checked against zero by the `while` loop condition test, and if it is non-zero then it is sent to `cout` by the body of the `while` loop. The increment of `p` (actually a pre-increment, the one you should prefer given a choice) is necessary to prevent an infinite loop outputting the first character of `str[]`. This increases the value of `p` by one in order to point it towards the next character (of `str[]`), a process which repeats until the terminating zero-byte is reached. Importantly, `str[]` is left unchanged and remains able to be used again; this is the motivation for assigning `str` to a different (mutable) pointer variable `p`, instead of using `++str`. **Experiment** @@ -189,7 +189,7 @@ A few new things to notice about this program. ## For loops -A standard `for` loop is similar to a `while` loop in that it has a pre-condition test. A common historical use for `for` loops is to iterate over an array using subscript syntax, rather that pointers. A `for` loop has three parts enclosed within parentheses, any of which can be empty, each part separated by a semi-colon. The first part is an initializer, as in a three clause range-for loop. This typically initializes a single variable known as the *loop counter*, whose scope is the body of the `for` loop (only). The second part is the condition test, which functions exactly the same way as that in a `while` loop; if empty it evaluates to `true`, which causes an *infinite loop*. The third part is an iteration statement to be executed **after** each time the body of the loop has been executed. +A standard `for` loop is similar to a `while` loop in that it has a pre-condition test. A common historical use for `for` loops is to iterate over an array using subscript syntax, rather that pointers. A `for` loop has three parts enclosed within parentheses, any of which can be empty, each part (empty or otherwise) separated by a semi-colon. The first part is an initializer, as in a three clause range-for loop. This typically initializes a single variable known as the *loop counter*, whose scope is the body of the `for` loop (only). The second part is the condition test, which functions exactly the same way as that in a `while` loop; if empty it evaluates to `true`, which causes an *infinite loop*. The third part is an iteration statement to be executed **after** each time the body of the loop has been executed. The following program defines and assigns to an array of `int` called `a`, and outputs each element in turn (on the same line), by subscripting the array with loop counter `i`: @@ -214,15 +214,15 @@ Output from this program: 9 8 7 6 5 4 ``` -Notice that the *loop counter* `i` is initialized to zero and has this value on the first pass through the `for` loop. The test `i != 6` is true exactly `6` times (with `i` having values in turn of: `0`, `1`, `2`, `3`, `4`, `5`); this matches **all** of the the valid array indices of `a[]`. Use of `i != 6` in this way is considered better C++ programming style than `i < 6`, or even worse `i <= 5` (neither of which is actually any safer in practice). This program produces trailing space in its output, which isn't ideal, but we'll ignore this defect for now. The last statement in `main()` outputs a newline, and being outside of the body of the loop is executed only once. +Notice that the *loop counter* `i` is initialized to zero and has this value on the first pass through the `for` loop. The test `i != 6` is true exactly `6` times (with `i` having the values `0`, `1`, `2`, `3`, `4` and `5` in turn); this matches **all** of the the valid array indices of `a[]`. Use of `i != 6` in this way is usually considered better C++ programming style than `i < 6`, or the even worse `i <= 5` (neither of which is actually any "safer" in practice). This program produces trailing space in its output, which isn't ideal, but we'll ignore this defect for now. The last statement in `main()` outputs a newline, and being outside of the body of the loop is executed only once. **Experient** * Change both the size of `a[]` and the number in the condition test to `10`, without altering the braced initializer. What do you notice about the output? Can this be relied upon? -* Now change the condition test to automatically track the size of the array. Hint: use `size(a)`. +* Now change the condition test to automatically track the size of the array. Hint: use `std::size()`. -* Rewrite the program to use a `while` loop instead of `for`. What similarities do you notice? +* Rewrite the program to use a `while` loop instead of `for`. What similarities do you notice? What is the scope of the loop counter? * Write a program to accept five `double`'s as user input into a suitable array and then print them out on separate lines. Hint: use two (non-nested) `for` loops @@ -251,19 +251,21 @@ A few things to note about this program: * There are two variables defined and initialized in the loop initializer, therefore there is no clear distinction as to which, if either, is the loop counter. +* The condition test uses `<=` (against previous advice), which also has the effect of indicating the loop counter to be `i`. + * The construct `++i, --j` uses the *sequencing operator* (comma) to sneak two operations in where only a single statement is allowed. This use of comma is rare; another possible use is in ternary expressions. * The add-assign operator (`+=`) is used as a shorthand for an assignment to self followed by an addition. It is often used in C++, as well as other operator-assign expressions. **Experiment** -* Find two different variants of the condition test (`i <= 12`) that continue to work correctly. +* Find two different alternatives to the condition test (`i <= 12`) that continue to work correctly. * Find a way to dispense with the variable `j`. Consider whether this version is clearer to understand. ## Do-while loops -A `do`-`while` loop is unique in having a *post-condition test*; thus the loop body is guaranteed to execute at least once. The loop begins with the `do` keyword followed immediately by the loop body (which would usually be delimited by braces). The loop ends with the `while` keyword followed by the loop post-condition test in parentheses and then a (mandatory) trailing semi-colon. +A `do`-`while` loop is unique in having a *post-condition test*; thus the loop body is guaranteed to execute **at least once**. The loop begins with the `do` keyword followed immediately by the loop body (which would usually be delimited by braces). The loop ends with the `while` keyword followed by the loop post-condition test in parentheses and then a (mandatory) trailing semi-colon. Do-while loops are similar to "repeat-until" loops of other languages, except that the post-condition test is logically inverted. Use of `do` and `while` has been criticised in the past, mainly because the indentation of the body of the loop is visually misleading; it is always executed but could be interpreted as not being so from a casual glance. A better result can usually be achieved using a `while` loop and some duplication of code. @@ -285,7 +287,7 @@ int main() { } ``` -The variable `i` is defined before the loop, so it is still in scope after the loop completes. The `do`-`while` loop then repeats indefinitely until a negative number has been entered. To provide a comparison, this is an exactly equivalent program, written with a regular `while` loop instead: +The variable `i` is defined before the loop, so it is still in scope after the loop completes. The `do`-`while` loop then repeats indefinitely until a negative number has been entered. To provide a comparison, here is an exactly equivalent program, written with a regular `while` loop instead: ```cpp // 05-not-do-while.cpp : alternative to post-condition loop @@ -305,11 +307,11 @@ int main() { } ``` -Notice that this pre-condition test (after the `while` keyword) is identical to the previously used post-condition test. Also, the regular `while` loop version offers the opportunity for an addition message such as `"Invalid input! Please try again: "` to be printed in order to aid the user should they get the first input wrong. +Notice that this pre-condition test (after the `while` keyword) is identical to the previously used post-condition test. Also, the regular `while` loop version offers the opportunity for an alternate message such as `"Invalid input! Please try again: "` to be printed, in order to aid the user should they get the first input attempt wrong. **Experiment** -* Write a program to output a count-down from a user-entered positive integer down to zero using two `do`-`while` loops. +* Write a program to output a countdown from a user-entered positive integer down to zero using two `do`-`while` loops. ## Break and continue @@ -350,13 +352,15 @@ Notice that no output is produced when entering a negative number, and that the **Experiment** -* Write a program which uses a regular `for` loop with an empty condition test, and an increment operator as the iteration expresssion, which outputs all the even numbers between zero and 20 (inclusive) +* Does the order of the `if` clauses make a difference in this program? Is there a motivation to use `else if`? + +* Write a program which uses a regular `for` loop with an empty condition test, and an increment operator as the iteration expresssion, which outputs all the **even** numbers between zero and 20 (inclusive) * Write a program which asks for a positive integer, and outputs all positive even numbers between zero and this number (inclusive, if the input is even). ## Array decay and pointer arithmetic -It is possible for a function to accept built-in array as a parameter, however any size information previously known to the compiler is lost. Therefore there is no advantage in declaring the parameter as an array type, as opposed to a pointer type. The following program demonstrates two functions which are equivalent: +It is possible for a function to accept built-in array as a parameter, however any size information previously known to the compiler is lost. Therefore there is no advantage in declaring the parameter as an (non-sized) array type, as opposed to a pointer type. The following program demonstrates two functions which are equivalent: ```cpp // 05-array-decay.cpp : demonstrate equivalence of pointer vs array parameters @@ -386,11 +390,11 @@ int main() { A couple of things to note about this program: -* As a constant string literal is passed, both functions need the parameter to be qualified with `const`. This means that the variable `*s` cannot modify what it points to, although it can itself be modified (for example, by being incremented as shown here). It is possible for the pointer to be non-modifiable too, by utilizing a second `const` as in: `const char * const s`. +* As a constant string literal is passed, both functions **need** the parameter to be qualified with `const`. This means that the variable `*s` cannot modify what it points to, although it can itself be modified (for example, by being incremented as shown here). If desired, it is possible for the pointer to be non-modifiable too, by utilizing a second `const` as in: `const char * const s`, however the ability to modify it is needed by both these functions. -* The "array" variable accepted by `print_arr()` is able to be dereferenced and incremented in exactly the same way as the pointer accepted by `print_ptr()`. Once either has been modified, the original reference is lost; notice that the bodies of both functions are identical. +* The non-sized array variable accepted by `print_arr()` is able to be dereferenced and incremented in exactly the same way as the pointer accepted by `print_ptr()`. Once either has been modified, the original reference is lost; notice that the bodies of both functions are identical. -It should be clear that when passing an array to a function, only a pointer to the first element is in fact passed. Thus it is similar in concept to pass-by-reference, that is a function which modifies an array passed to it also modifies the same entity as visible in the calling function. +It should be understood that when passing an array to a function, only a pointer to the first element is in fact passed. Thus it is similar in concept to pass-by-reference, that is a function which modifies an array passed to it also modifies the same entity as visible in the calling function. **Experiment** @@ -429,7 +433,7 @@ int main() { * What happens if the length argument passed to the function is too small? Or too large? -* Swap the bodies of the two functions over. Does the program still compile? +* Swap the bodies of the two functions over. Does the program still compile and run correctly? * Add a second `const` qualifier to `s` in `print_ptr()`. Does the program still compile? Does this surprise you? @@ -479,7 +483,7 @@ We have seen the traversal of an array by comparing a pointer against zero (as y The following program outputs a list of integers stored within an array: ```cpp -// 05-begin-end.cpp : demostration of the use of begin() and end() +// 05-begin-end.cpp : demonstration of the use of begin() and end() #include using namespace std; @@ -506,7 +510,7 @@ A couple of things to note about this program: In fact, `begin()` and `end()` return pointer values for built-in arrays, these pointers actually contain the address of the first element and the address of "one past the last" element. When referencing arrays which are constant, `cbegin()` and `cend()` can be used, which return `const` pointers. The family is complemented with variants which access a "reversed" array. -The following table lists all eight members of the `begin()`/`end()` family, where `array[]` is the name of a built-in array with elements of any type, and `N` is `size(array)` (the number of elements). Note that `&array[N]` and `&array[-1]` **are** legal pointer values, but they must **never** be dereferenced. +The following table lists all eight members of the `begin()`/`end()` family, where `array[]` is the name of a built-in array with elements of any type, and `N` is `std::size(array)` (the number of elements). Note that `&array[N]` and `&array[-1]` **are** legal pointer values, but they must **never** be dereferenced. | Function name | Index Syntax | Pointer Syntax | |:-------------------:|:------------:|:---------------:| @@ -523,4 +527,4 @@ The following table lists all eight members of the `begin()`/`end()` family, whe * Now modify the program so that only the last element of the array is printed out, whatever size the array is. -*All text and program code ©2019-2022 Richard Spencer, all rights reserved.* +*All text and program code ©2019-2025 Richard Spencer, all rights reserved.* diff --git a/06-enums-and-structs.md b/06-enums-and-structs.md index f0d9c76..4a47c4e 100644 --- a/06-enums-and-structs.md +++ b/06-enums-and-structs.md @@ -2,17 +2,17 @@ ## Enumerations -Some variables belong to a small, defined set; that is they can have exactly one of a list of values. The `enum` type and its closely related `enum class` type each define a set of (integer) values which a variable is permitted to have. +Some variables belong to a small, **closed** set; that is they can have exactly one of a list of values. The `enum` type and its closely related `enum class` type each define a set of (integer) values which a variable is permitted to have. -Think of a complete pack of playing cards: each card has a suit and rank. Considering the rank first of all, this is how it can be represented and defined: +Think of a complete pack of playing cards: each card has a suit and rank. Considering the rank first of all, here is how it can be represented and defined in C++: ```cpp enum Rank : unsigned short { ace = 1, two, three, four, five, six, seven, eight, nine, ten, jack, queen, king, none = 99 }; ``` -The name of this type is `Rank`, by convention for a user-defined type this is in *SentenceCase*. Following the colon `:` is the *underlying type*; this **must** be a built-in integer type (`char` is also allowed) and defaults to `int` if not specified. Since we have specified `unsigned short` we can assign values from `0` to `65535` (most likely, however strictly speaking this is implementation dependent). Then, within curly braces are a list of comma-separated *enumerators*, each of which can optionally have values specified. We have set `ace = 1` instead of relying on the default value of zero for the first enumerator because it allows the internal value and representation to be the same. Subsequent enumerators take the next available value. +The name of this type is `Rank`, by convention for a user-defined type this is in *SentenceCase*. Following the colon `:` is the *underlying type*; this **must** be a built-in integer type (`char` is also allowed) and defaults to `int` if not specified. Since we have specified `unsigned short` we can assign values from `0` to `65535` (most likely, however strictly speaking this is implementation dependent). Then, within curly braces are a list of comma-separated *enumerators*, each of which can optionally have values specified. We have set `ace = 1` instead of relying on the default value of zero for the first enumerator because it allows both the internal value and its conceptual representation to be the same; although this is not mandatory it is good programming style. Subsequent enumerators take the next sequentially available value. -A variable of type `enum` (also known as *plain* enum) such as our `Rank` can be initialized from any of the enumerators listed in its definition. However, care should be taken not to assign values not in its enumeration set; this includes default-initialization: +A variable of type `enum` (also known as *plain* enum), such as `Rank` above, can be initialized from any of the enumerators listed in its definition. However, care should be taken not to assign values not in its enumeration set; this includes default-initialization if zero is not one of the enumerators: ```cpp Rank r1{ ace }; // ok, r1 is value of enumeration constant ace (1) @@ -23,7 +23,7 @@ auto r5 = king; // ok, r5 is of type Rank (not unsigned short) int i = seven; // ok, implicit conversion to integral type ``` -It may be surprising to discover that in most ways `ace`, `two`, `three`, `four` and so on are just "normal" integer constant values. (Indeed in some historical versions of the C language, the way to define constants was by using anonymous `enum`s; this curiosity was given the affectionate name of the "enum hack".) Thus variables of type `enum` can "borrow" enumerators from different types of `enum`s! Even worse, enumerators from different `enum` definitions in the same scope could **not** use the same name without causing a name collision. +It may be surprising to discover that in most ways `ace`, `two`, `three`, `four` and so on are just "normal" integer constant values. (Indeed in some historical versions of the C language, the only way to define constants was by using anonymous `enum`s; this curiosity was given the affectionate name of the "enum hack".) Thus variables of type `enum` can "borrow" enumerators from different types of `enum`s! Even worse, enumerators from different `enum` definitions in the same scope could **not** use the same name without causing a name collision. To address these limitations the C++ `enum class` type was created; this type is also known as *scoped* or *strongly typed* enumeration. We can represent the suit of a card using this type: @@ -31,7 +31,7 @@ To address these limitations the C++ `enum class` type was created; this type is enum class Suit : char { spades = 'S', clubs = 'C', hearts = 'H', diamonds = 'D', none = '\?' }; ``` -The difference in syntax is small, we have `enum class Suit` compared to `enum Rank`. However the `none` in `Suit` does not clash with `none` in `Rank`, and related to this feature the enumerators in an `enum class` have to be qualified with the type name, as follows: +The difference in syntax is small, we have `enum class Suit` compared to `enum Rank`, although this time the underlying type is `char` and character literals are used for the enumerators. However the `none` in `Suit` does not clash with `none` in `Rank`, and related to this feature the enumerators in an `enum class` have to be qualified with the type name, as follows: ```cpp Suit s1 = Suit::hearts; // good, types match @@ -53,22 +53,22 @@ Of course, in the context of a pack of playing cards it is not practical to thin ```cpp struct PlayingCard { - Rank r; - Suit s; + Rank rank; + Suit suit; }; ``` This `struct` type is named `PlayingCard`, again using sentence case. The fields of the `struct` are listed between braces like variable definitions, type-then-name, separated by semi-colons; there is also a **mandatory** semi-colon after the closing brace. The order of the fields is not usually significant; we have put `Rank` first as it is a 16-bit value compared to `Suit` being 8-bit, which makes the `struct`'s logical memory layout more sensible. (There is probably no gap between the fields in memory layout in this case, but `PlayingCard` is probably padded out to 32-bits at the end.) Also, this layout matches the usual order of the description of a card, such as "Three of Clubs". -Instances (variables) of type `PlayingCard` are examples of what are often called *objects* (as in *Object Oriented Progamming*, or *OOP*), and they can be defined and initialized in a similar way to containers using uniform initialization syntax. The code below demonstrates how to create the first card in the pack, and how to extract the object's fields back into separate variables: +Instances (variables) of type `PlayingCard` are examples of what are often called *objects* (as in *Object Oriented Progamming*, or *OOP*), and they can be defined and initialized in a similar way to arrays and containers using uniform initialization syntax. The code below demonstrates how to create the first card in the pack, and how to extract the object's fields back into separate variables: ```cpp PlayingCard ace_of_spades{ ace, Suit::spades }; -auto the_rank1 = ace_of_spades.r; // the_rank1 = ace, and is of type Rank -auto the_suit1 = ace_of_spades.s; // the_suit1 = Suit::spades, and is of type Suit +auto the_rank1 = ace_of_spades.rank; // the_rank1 = ace, and is of type Rank +auto the_suit1 = ace_of_spades.suit; // the_suit1 = Suit::spades, and is of type Suit -auto [ the_rank2, the_suit2 ] = ace_of_spades; // the_rank2 = ace, the_suit2 = Suit::spades +auto [ the_rank2, the_suit2 ] = ace_of_spades; // the_rank2 = ace, the_suit2 = Suit::spades, types as previously ``` The variables `the_rank1` and `the_suit1` are initialized from the individual fields of `ace_of_spades` separately using *dot-notation*, while `the_rank2` and `the_suit2` are initialized using *aggregate initialization* syntax. @@ -79,7 +79,7 @@ The variables `the_rank1` and `the_suit1` are initialized from the individual fi * What error message do you get if you swap `ace` and `Suit::spades` over in the definition of `ace_of_spades`. Would this error be easy to catch if plain `int` values were used instead of typed enumerators? -It may be desirable to create `struct`s with multiple fields of the same type. An example of this is a simple two-dimensional `Point` class with fields called `x` and `y`, both being signed integers: +It may be desirable to create `struct`s with multiple fields of the same type. An example of this is a simple two-dimensional `Point` class with fields (or data members) called `x` and `y`, both being signed integers: ```cpp struct Point { @@ -107,13 +107,13 @@ It's a valid question, and at the machine level produces (most likely) similar c * Modify this program to manipulate these fields in some way (such as multiplying them by two) and output them. -* Write a function called `mirror_point()` which reflects its input (of type `Point`) in both the x- and y-axes. Experiment with passing by value and `const`-reference (and returning the modified `Point`), and by reference and by pointer (two different `void` functions). Hint: for the last variant pass an address of `Point` and access the fields with `p->x` and `p->y`, and see Chapter 4: [Parameters by value](https://learnmoderncpp.com/functions#topic-2) and [Parameters by reference](https://learnmoderncpp.com/functions#topic-3) for a refresher. Compare all four versions of this function for ease of comprehension and maintainability. +* Write a function called `mirror_point()` which reflects its input (of type `Point`) in both the x- and y-axes. Experiment with passing by value and `const`-reference (and returning the modified `Point`), and by reference and by pointer (two different `void` functions). Hint: for the last variant pass an address of `Point` and access the fields with `p->x` and `p->y`, and see the topics in Chapter 4: "Parameters by value" and "Parameters by reference" for a refresher. Compare all four versions of this function for ease of comprehension and maintainability. ## Inheritance vs composition We have talked about composite types being made up of other types, and in fact types can be *composed* (nested) indefinitely, although many programmers would struggle to comprehend more than a few levels. The other way to create new types with characteristics of previously defined types is through *inheritance*, which is a key concept of OOP. -The following program defines an `enum class` called `Color` (feel free to add more color enumerators) and uses the same `Point` class to create a new `Pixel` class, which has both a location and a color being composed of both `Point` and `Color` fields. +The following program defines an `enum class` called `Color` (feel free to add more color enumerators) and uses the same `Point` class to create a new `Pixel` class, which has both a location and a color, by being composed of both `Point` and `Color` fields. ```cpp // 06-pixel1.cpp : Color and position Pixel type through composition @@ -137,15 +137,13 @@ string_view get_color(Color c) { switch (c) { case Color::red: return "red"; - break; case Color::green: return "green"; - break; case Color::blue: return "blue"; - break; + default: + return ""; } - return ""; } int main() { @@ -175,7 +173,7 @@ Most, if not all, of the syntax should be familiar, however a few things to note * The variable `p2` is set to `Color::blue` explicitly at initialization, with the co-ordinates `-1,2` using nested initializer syntax. -* The member variables `x` and `y` are members of `Point`, `pt` is a member of `Pixel`, so the full names of `p2`'s two co-ordinates are `p2.pt.x` and `p2.pt.y`. This ahows how the member operator `.` can be chained in this way (it works for member functions, too), and operations remain fully type-safe. +* The member variables `x` and `y` are members of `Point`, `pt` is a member of `Pixel`, so the full names of `p2`'s two co-ordinates are `p2.pt.x` and `p2.pt.y`. This shows how the member operator `.` can be chained in this way (it works for member functions, too), and operations remain fully type-safe. **Experiment:** @@ -183,9 +181,9 @@ Most, if not all, of the syntax should be familiar, however a few things to note * Can you call `get_pixel()` from `main()` with a third `Pixel`, without using a named variable? Hint: try to use initializer syntax in the function call. -* Change the default `Color` assigned to `p1` to be ``. Hint: this is a simple change. +* Change the default `Color` assigned to `p1` to be ``. Hint: this is a simple change, but is not in `main()`. -The next program accomplishes exactly the same as the previous one, producing the same output, and most likely very similar code of comparable efficiency. However it use *inheritance* instead of composition, which is indicated by a slightly different definition of `Pixel`: +The next program accomplishes exactly the same as the previous one, producing the same output, and most likely very similar code of comparable efficiency. It use *inheritance* instead of composition, however, which is indicated by a slightly different definition of `Pixel` and different use of dot-notation in `main()`: ```cpp // 06-pixel2.cpp : Color and position Pixel type through inheritance @@ -208,15 +206,13 @@ string_view get_color(Color c) { switch (c) { case Color::red: return "red"; - break; case Color::green: return "green"; - break; case Color::blue: return "blue"; - break; + default: + return ""; } - return ""; } int main() { @@ -296,7 +292,7 @@ A few things to note about this program: * The member variables `x` and `y` are in scope for all of the member functions, so there is no need to fully qualify them as `this->x` and `this->y`. -* The member function returns both `x` and `y` as a `std::pair`. The `auto` return type is used (it's actually `std::pair`) and is declared `const` between the (empty) parameter list and the function body. The use of `const` in this context means the member function promises not to modify any member variables (its own state). If you remember one thing about member functions, it should be to declare them `const` whenever they do not modify the object. +* The member function returns both `x` and `y` as a `std::pair`. The `auto` return type is used (it's actually `std::pair`) and is declared `const` between the (empty) parameter list and the function body. The use of `const` in this context means the member function promises not to modify any member variables (in other words, the object's own state). The important concept of `const` correctness for member functions is to declare them `const` whenever they do not modify the object, thus enabling for objects which are themselves constants (such as `const Point`). * The *access specifier* `private:` is used before the member variables `x` and `y` which means that code outside the scope of `Point` (such as in `main()`) cannot use them; they must use the getter and setters. @@ -316,11 +312,11 @@ A few things to note about this program: * Try to modify `x` within `getXY()`. What happens? Now try to return a modified `x` such as `x+1` instead. What happens now? Try both of these having removed the `const` qualifier. -* Change the name of `x` to `super_x` within `Point`, remebering to change all of the member functions which use `x` too. Does the code compile without any changes to `main()`? What does this tell you about another advantage of separating implementation from interface? +* Change the name of `x` to `super_x` at all occurencies within `Point`, remebering to change all of the member functions which use `x` too. Does the code compile without any changes to `main()`? What does this tell you about another advantage of separating implementation from interface? ## Static members -In the context of a class definition, `static` member variables (sometimes called *class variables*) are similar to global variables, in that there is only one *instance*. They are said to be *per-class* as opposed to *per-object*; that is, regardless of how many objects of a `struct` (or `class`) there are. Also they are referred to outside of the `struct` definition with a double colon operator (`::`), not dot-notation. +In the context of a class definition, `static` member variables (sometimes called *class variables*) are similar to global variables, in that there is only one *instance*. They are said to be *per-class* as opposed to *per-object*; regardless of how many objects of a `struct` (or `class`) there are, there can be only one instance of any `static` member. Also they are referred to outside of the `struct` definition with a double colon operator (`::`), not dot-notation. The following program extends the `Point` class with two `static` member constants. The member functions `setX()` and `setY()` have been modified, try to guess what they now do from the code: @@ -378,15 +374,15 @@ int main() { A few things to note about this program: -* The static member variables `screenX` and `screenY` are declared both `static` and `const` and are assigned values within the definition of `Point`. +* The static member variables `screenX` and `screenY` are declared both `static` and `const` and are assigned values within the definition of `Point`. Storage is automatically assigned for them due to this being true (non-`const` would need to use `inline static` in order to provide this). -* These variables can be accessed directly from within `main()` as they are defined before the `private:` access specifier. As they are **read-only** it is acceptable for them to be accessed directly. +* These variables can be accessed directly from within `main()` as they are defined before the `private:` access specifier. As they are **read-only** it is acceptable for them to be accessed directly while preserving encapsulation. * The default values of `x` and `y` (zero) do not need to be changed as they fall within the permitted values. * The class *invariants* `0 <= x <= screenX` and `0 <= y <= screenY` are not easily able to be broken when `Point` is written with setters which validate their input. -The goal of encapsulation is still achieved with `screenX` and `screenY` being directly accessible from within `main()` because they are constants. If `screenX` and `screenY` could be modified directly, this would no longer be the case, and a setter/getter pair (or similar) should be created. (A similar rule is allowing global *constants*, as opposed to *variables*, without restriction as neither data-races nor accidental reassignment can occur with constants.) +The goal of encapsulation is still achieved with `screenX` and `screenY` being directly accessible from within `main()` because they are constants. If `screenX` and `screenY` could be modified directly, this would no longer be the case, and a setter/getter pair (or similar) should be created. (A similar rule relaxation is allowing global *constants*, as opposed to *variables*, without restriction as neither data-races nor accidental/erroneous reassignment can occur with constants.) **Experiment:** @@ -400,9 +396,9 @@ The goal of encapsulation is still achieved with `screenX` and `screenY` being d ## Operator overloading -There are many operators in C++ and most of these can be adapted (or *overloaded*) to work with user-defined types. (Operators for built-in types are not able to be redefined.) Like many other features of the language their availability and flexibility should be approached with some degree of restraint. +There are many operators in C++ and most of these can be adapted (or *overloaded*) to work with user-defined types. (Operators for built-in types are not able to be redefined.) Like many other features of the language, their availability and flexibility should be approached with some degree of restraint. -Operator oveloading works in a similar way to function overloading, so some familiarity is assumed with this concept. C++ resolves operator calls to user-defined types, to function calls, so that `r = a X b` is resolved to `r = operator X (a, b)`. (This is a slight simplification; where `a` is a user-defined type, the member function `r = a.operator X (b)` is used in preference, if available.) +Operator oveloading works in a similar way to function overloading, so some familiarity with this concept is assumed. C++ resolves operator calls to user-defined types to function calls, so that `r = a X b` is resolved to `r = operator X (a, b)`. (This is a slight simplification; where `a` is a user-defined type, the member function `r = a.operator X (b)` is used in preference, if available.) The following program demonstrates the `Point` type, simplified back to its original form, with global `operator+` defined for it: @@ -432,15 +428,15 @@ int main() { A few things to note about this program: -* The return type of the `operator+` we define is returned by value; it is a new variable. The return value is declared `const` in order to **prevent** accidental operations on a temporary, such as: `(p1 + p2).x = -99;` +* The return type of the `operator+` we define is returned by value; it is a new variable. The return value is declared `const` in order to **prevent** accidental operations on a temporary, such as: `(p1 + p2).x = -99;` (It also **allows** invocation of `const` member functions, as in: `(p1 + p2).getXY();` assuming `getXY()` exists as a `const` member function.) -* The parameters of this function are passed in by `const` reference. The names `lhs` and `rhs` are very common (for the left-hand-side and right-hand-side to the operator at the *call site* in `main()`). +* The parameters of this function are passed in by `const` reference. The names `lhs` and `rhs` are very common (for the left-hand-side and right-hand-side to the operator at the *call site* respectively). -* The function `operator+` needs to access the member variables of the parameters passed in. +* The function `operator+` needs to access the member variables of the parameters passed in, thus member data must be public or have public getters (also, see discussion of friend functions in Chapter 9). * The new values `result.x` and `result.y` are computed independently, as might be expected. -* The statement `p3 = p1 + p2;` invokes a call to `operator+` automatically. +* The statement `p3 = p1 + p2;` invokes the user-defined `operator+` automatically. **Experiment:** @@ -448,7 +444,7 @@ A few things to note about this program: * Write an `operator-` and call it from `main()`. -It is usual to write `operator`s as global (or *free*, or *non-member*) functions when they do not need to access `private:` parts of the types which they operate on. This is not a problem for member function `operator`s as they implicitly have access to all parts of both themselves and the variable they operate on. +It is usual to write `operator`s as global (or *free*, or *non-member*) functions when they do not need to access `private:` parts of the types which they operate on. This is not a problem for **member** `operator`s as they implicitly have access to all parts of both themselves and the variable they operate on. The simplified result of these conventions is demonstrated in the following program: @@ -487,9 +483,9 @@ A few things to note about this program: * The member function `operator+=` takes **one** parameter named `rhs` and modifies its own member variables. It returns a **reference** to a `Point`, this being itself. One of the rare uses of the `this` pointer, dereferenced here with `*`, is shown here without further explanation. -* The global `operator+` makes a **copy** of `lhs` and then calls (member) `operator+=` on this with parameter `rhs`. (Both of the `rhs`'s are the same variable as they are passed by reference.) +* The global `operator+` makes a **copy** of `lhs` and then calls (member) `operator+=` on this (with parameter `rhs`). -* Global `operator+` does **not** directly access the member variables of either of its parameters. +* Global `operator+` does **not** directly access the member variables of either of its parameters, this is better C++ style. * The variable `result` is then returned by `const` value, as before. @@ -501,4 +497,4 @@ A few things to note about this program: * Add a `static` function to calculate the diagonal distance between two `Point`s and return it as a `double`. Consider how to implement `operator/` to calculate this value, and whether this would be a suitable use of OO. -*All text and program code ©2019-2022 Richard Spencer, all rights reserved.* +*All text and program code ©2019-2025 Richard Spencer, all rights reserved.* diff --git a/07-strings-containers-and-views.md b/07-strings-containers-and-views.md index efd371c..ab734c7 100644 --- a/07-strings-containers-and-views.md +++ b/07-strings-containers-and-views.md @@ -2,17 +2,21 @@ ## String initialization, concatenation and comparison -Whilst support for read-only string literals is built into C++, we must make use of the Standard Library when we want a string-type which is be able to be manipulated and compared, using operators such as `+` and `==`. The `std::string` type supports all of the operations you would expect to be present, such as concatenation, indexing, sub-string extraction, comparisons and reporting the length. All of the memory management operations necessary are taken care of automatically at run-time; string objects are allowed to use heap memory and interestingly do not use any "special" features of the language not available to the application programmer. +Whilst support for read-only string literals is built into C++, we must make use of the Standard Library when we want a string-type which is be able to be manipulated and compared, using operators such as `+` (concatenation) and `==` (equality comparison). The `std::string` type supports all of the operations you would expect to be present, such as concatenation, indexing, sub-string extraction, comparisons and reporting the length. It is also possible to directly access the raw string data, if desired, or pass a `std::string` to a (C-style) function expecting a `const char*`. All of the memory management operations necessary are taken care of automatically at run-time; string objects are allowed to use heap memory and interestingly do not use any "special" features of the language not available to the application programmer. (Writing your own string class is a commonly advised exercise in gaining proficiency in C++.) -An empty string object can be created using `string` as the type specifier, either using uniform initializtion syntax, `auto`, or omitting the braces altogether where the type specifier is first: +An empty string object can be created using `string` as the type specifier, either using uniform initialization syntax, or `auto`, or omitting the braces altogether where the type specifier is first: ```cpp string s1; string s2{}; -auto s3 = string{}; +auto s3 = string{}; // s1, s2, s3 are empty (mutable) strings ``` -Other variants exist as well, but these shown are the most modern. When an empty `std::string` is compared against an empty string literal `""` using `==` the result is `true`. +Other variants exist, but these shown are the most modern. When an empty `std::string` is compared against an empty string literal `""` using `==` the result is `true`. + +```cpp +auto is_empty = (s1 == ""); // is_empty has value "true", also for s2 and s3 +``` A `std::string` can be initialized or re-assigned from a string literal: @@ -34,14 +38,14 @@ Single `char` literals can be appended too, although a `std::string` **cannot** ```cpp string s1 = 'A'; // Error! Does not compile -auto s2 = string{} + 'A'; // This version is fine, but maybe nonobvious +auto s2 = string{} + 'A'; // This version is fine, but maybe non-obvious ``` Strings can be reset to empty using a member function, or by assigning to an empty string literal: ```cpp s1 = ""; // Both of these accomplish the same thing -s2.clear(); // (This is the preferred method) +s2.clear(); // (Using clear() is the preferred method) ``` Confusingly, there are two different member functions which return a `std::string`'s length (excluding the `\0` terminator if it was constructed from a string literal), and a third which returns a `bool` (value `true` indicates length is zero): @@ -77,7 +81,7 @@ void string_to_uppercase(string &s) { } int main() { - cout << "Please enter some text in lower-, mixed- or upper-case:\n"; + cout << "Please enter some text in lower, mixed or uppercase:\n"; string input; getline(cin, input); string_to_uppercase(input); @@ -89,13 +93,13 @@ Things to note about this program: * Both variables `s` and `c` are declared as references, thus modifiying them changes the variable they refer to, not a copy. -* The type of `c` is, perhaps unsurprisingly, `char&`. +* The type of `c` is deduced by the compiler as, `char&`. * The `getline()` function (explained further in Chapter 8) is used to get an arbitrarily long line of input from `cin` and store it in `input`. (Note: don't confuse this with `cin.getline()` which we met in Chapter 5.) **Experiment:** -* Remove one of the `&`s in the function `string_to_uppercase()`. Does the program still compile? Does it produce the expected output when run? Now remove the other `&` instead and try the same thing. What does this tell you about the importance of reading code which uses reference semantics very carefully? +* Remove one of the `&`s in the function `string_to_uppercase()`. Does the program still compile? Does it produce the expected output when run? Now remove the other `&` instead and try the same thing. What does this tell you about the importance of reading code which uses reference semantics, very carefully? * Modify `string_to_uppercase()` so that the uppercase string is *appended* to the input. Hint: this is a simple change that just requires some thought. @@ -118,7 +122,7 @@ auto c2 = book.at(99); // throws an exception, possibly terminating the progra * Now modify this program to use **checked** array access. What happens if you make a (deliberate) bounds-checking error? -The member functions `front()` and `back()` return (modifiable) references to the first and last characters of a string; they can be used instead of `s[0]` and `s[s.length() - 1]`. Interestingly, **reading** `s[s.length()]` is not undefined behavior, but instead returns a value which is the default value of the underlying character type (`'\0'` for `char`). +The member functions `front()` and `back()` return (writeable) references to the first and last characters of a string, resepctively; they can be used instead of `s[0]` and `s[s.length() - 1]`. Interestingly, **reading** `s[s.length()]` is not undefined behavior, but instead returns a value which is the default value of the underlying character type (`'\0'` for `char`). To add or remove individual characters or substrings, the `insert()` and `erase()` member functions can be used (don't try to write to `s[s.length()]`): @@ -143,11 +147,12 @@ string wizard = "Gandalf the Gray"; auto s1 = wizard.substr(0, 7); // s1 is "Gandalf" auto s2 = wizard.substr(8, 3); // s2 is "the" auto s3 = wizard.substr(12); // s3 is "Gray" + // or wizard.substr(12, 4) or wizard.substr(12, string::npos) ``` The return type of `substr()` is `std::string`, which is a **new** variable containing a **copy** of (part of) the contents of the original `std::string`. -Finally there is `append()` which is considered better style than using the `+` operator as it is potentially more efficient: +Finally there is `append()` which is considered better style than using the `+=` operator as it is potentially more efficient: ```cpp auto wizard2 = "Saruman"s; // note: suffix produces a string @@ -159,44 +164,44 @@ wizard2.append(" the White"); // wizard2 becomes "Saruman the White" Sometimes it is necessary to convert between a `std::string` and other (often built-in) types, such as converting to and from an integer or floating-point number. The Standard Library function template `to_string` is overloaded to cope with different (built-in) types: ```cpp -auto n1 = 1.23; -auto n2 = 45; +auto n1 = 1.23; // n1 is type double +auto n2 = 45; // n2 is type int auto s1 = to_string(n1); // s1 is "1.230000" auto s2 = to_string(n2); // s2 is "45" ``` -Converting the other way, the group of functions `sto`*x*`()` allow the (exact) output integer or floating-point type from an input `std::string` (often usefully a sub-string). The full list is: `stoi()`, `stol()`, `stoul()`, `stoll()`, `stoull()`, `stof()`, `stod()` and `stold()`. +Converting the other way, the group of functions `sto…()` allow conversion to an integer or floating-point type from an input `std::string` (often usefully a sub-string). The full list is: `stoi()`, `stol()`, `stoul()`, `stoll()`, `stoull()`, `stof()`, `stod()` and `stold()`. ```cpp auto n3 = stoi(s2); // n3 is of type int auto n4 = stold(s1); // n4 is of type long double ``` -For the functions which return an integer type, the optional third parameter is the numerical base to be applied (this defaults to 10), while for all of them the optional second parameter is a pointer to `std::size_t` variable used to indicate the index into the `std::string` of the first unused character (this defaults to `nullptr`, that is no index is returned). +For these `sto…()` conversion functions which return an integer type, the optional third parameter is the numerical base to be applied (this defaults to 10), while for all of them the optional second parameter is a pointer to `std::size_t` variable used to indicate the index into the `std::string` of the first unused character (this defaults to `nullptr`, that is no index is written to this pointer address). It is possible to declare `std::string` variables using syntax which is very similar to that for string literals, which uses the *literal suffix* `s`: ```cpp auto h1{ "Merry"s }; // h1 is mutable const auto h2{ "Pippin"s }; // h2 cannot be altered -constexpr auto h3{ "Samwise"s }; // h3 can be used in constexpr contexts, new to C++20 +constexpr auto h3{ "Samwise"s }; // h3 can be used in constexpr contexts ``` In addition, a single (possibly empty) `std::string` literal can be safely concatenated with any number of string and character literals: ```cpp auto alphabet = ""s + "ABCDEF" + ' ' + "abcde" + 'f'; - // alphabet contains "ABCDEF abcdef" and of type std::string + // alphabet contains "ABCDEF abcdef" and is of type std::string ``` -Here `alphabet` has type `std::string`, and the concatenation is performed at run-time (use `constexpr` to make it happen at compile-time). +Here `alphabet` has type `std::string`, and the concatenation is usually performed at run-time (use `constexpr` to make it happen at compile-time). -Access to the underlying `char` representation of a `std::string` is provided by the member functions `c_str()` (an abbreviation of "C-String") and `data()`. The difference between the two is that `c_str()` **guarantees** to include a terminating zero-byte and is **not** writable, whereas `data()` **is** writable but with the caveat that there may be not be any terminating zero-byte (it depends on both how the `std::string` was initialized and the library implementation). Thus `c_str()` returns a `const char *` that can be safely used as a parameter to C functions such as `puts()`, or with C++ stream output, whereas `data()` returns a `char *` which is not safe to be used with any function which expects a zero-byte terminator. +A `std::string` provides direct access to its underlying array-of-`char` representation through two member functions: `c_str()` and `data()`. The difference between the two is that `c_str()` returns a **read-only** (`const char *`) pointer to an NTMBS (see Chapter 1), while `data()` returns a **writable** (`char *`) pointer to the same (pre-C++11 did not guarantee the null terminator to be present for `data()`). Where you have the choice, use `c_str()` as it is available for `const std::string` objects (itself being a `const` member function). **Experiment:** -* Modify `string_to_uppercase()` to use `data()` inside a regular for-loop to do its work. Hint: Continue to use a loop index, the syntax may surprise you. +* Modify `string_to_uppercase()` to use `data()` inside a regular for-loop to do its work. Hint: continue to use a loop index, the syntax may surprise you. * Now modify this program to use pointer arithmetic instead of a loop index. @@ -206,7 +211,7 @@ Access to the underlying `char` representation of a `std::string` is provided by There is a fourth string-like type (besides literal string, built-in array of `char` and `std::string`) called `std::string_view`, which provides a "half-way house" between a fully-fledged string type and raw array access. Typically it is implemented with only two fields (pointer and length); its main advantage over `std::string` is that it can be constructed and passed around more cheaply in many cases. -The `std::string_view` type only provides a subset of the features provided by `std::string`, in particular it does **not** support either in-place modification or concatenation. It also does **not** "own" the resource it refers to, therefore care must be taken to ensure that a `std::string_view` object does not outlive the entity it was constructed from (usually a `std::string` or `const char *`). It is safe when used as a function parameter (where otherwise a `const std::string&` or `const char *` would be used), and sometimes safe as a return type (instead of `const char *`). +The `std::string_view` type only provides a subset of the features provided by `std::string`, in particular it does **not** support either in-place modification or concatenation. It also does **not** "own" the resource it refers to, therefore care must be taken to ensure that a `std::string_view` object does not outlive the entity from which it was constructed (usually a `std::string` or `const char *`—construction from a string literal is always safe.). It is safe when used as a function parameter (as an alternative to `const std::string&` or `const char *`), and is sometimes safe as a return type (instead of `const char *`). Finally, it does not own or include a null terminator, unless the entity from which it is constructed has one; this behavior is useful in cases where a sliding textual "window" over a larger string entity is needed. ```cpp string_view v1{ "Elrond" }; // string_view constructed from const char * @@ -270,7 +275,7 @@ A few things to note about this program: ## Vectors and iterators -If you remember one thing about C++ container types, of which `std::vector` is one, it should be that elements are meant to be manipulated using *iterators*. (We have seen the `std::string` member functions `insert()` and `erase()` being used with indices, however even these can use iterators instead.) An iterator is a pointer-like object that when dereferenced, yields exactly one object from within a container; the `begin()` and `end()` family of functions should each be thought of as returning an iterator, rather than a pointer. +A key concept of C++ is that the Standard Library container types, of which `std::vector` is one, is that elements are meant to be manipulated using *iterators*. (We have seen the `std::string` member functions `insert()` and `erase()` being used with indices, however these can use iterators instead.) An iterator is a *pointer-like object* that when dereferenced, yields exactly one object from within a container; thus the `begin()` and `end()` family of functions should each be thought of as returning an iterator, rather than a pointer. The following program populates a `std::vector` of integers from user input, and then outputs it in numerically sorted order. @@ -309,7 +314,7 @@ A few things to note about this program: * The `push_back()` member function of `vector` is used to make `i` the new last element, this "grows" the container automatically as needed. -* The *Standard Libary algorithm* `std::sort()` gets all of the information about the `vector` that it needs in order to operate from the two iterators provided as parameters. (It can be relied upon to be an efficient algorithm, probably performing better than any hand-written code.) +* The Standard Libary *algorithm* `std::sort()` gets all of the information about the `vector` that it needs in order to operate from the two iterators provided as parameters. (It can be relied upon to be an efficient algorithm, probably performing better than hand-written code—there is no need or advantage of using C's `qsort()`.) * Instead of a traditional or range-for loop, a second algorithm `std::copy()` is used. As might be guessed this copies everything from the first iterator up to, but not including, the second iterator to its third parameter, which is actually an *output iterator*. There is no "magic" involved, all you need to understand is that a `std::output_iterator` *object* takes a single type of its output between triangular brackets (here it is `int`) and the output stream and optional delimiter are specified as parameters. (This is boilerplate code that can be reused in your own programs, possibly with different types and delimiters.) @@ -321,13 +326,13 @@ A few things to note about this program: * Change to using a range-for loop instead of `std::copy()` to output the `vector`. Hint: use `const auto&`. -* Use member functions `begin()` and `end()` in the call to `std::sort()`. Does the compile to the same thing? Which style do you prefer? +* Use **member** functions `begin()` and `end()` in the call to `std::sort()`. Does the compile to the same thing? Which style do you prefer? * Rewrite the second `for`-loop using an index variable and subscript access. Do you still prefer this form? -There are many member functions belonging to `std::vector` and the other standard containers, and even experienced C++ programmers don't remember them all. There are even more (over 100) function templates (algorithms) which operate with the standard containers through iterators; where there is a choice between both the member function should be used as this will be specialized for the container type. There is almost never a need to write a mini-algorithm which operates within a loop over the elements of a container, as would be needed in C or with built-in arrays; they have been implemented in the Standard Library ready for you to use. +There are many member functions belonging to `std::vector` and the other standard containers, and even experienced C++ programmers don't remember them all. There are also many (over 100) function templates (algorithms) which operate with the standard containers through iterators; where there is a choice between using both, the member function should be used as this will be specialized for the container type (thus potentially more efficient). There is almost never a need to write a mini-algorithm which operates within a loop over the elements of a container, as would be needed in C; they have already been implemented in the Standard Library ready for you to use. -When you reach for a container, `std::vector` is often the best fit, and should be your natural first choice. Should you decide that one of the other container types is needed, this would usually be a design decision made early in the development of your program. There is uniformity in the naming of the member functions, so all containers support `clear()`, for example. However as soon as you delve into the implementation details, such similarity appears superficial. It is important to have a basic understanding of the implementation of each container so that their individual advantages and limitations are understood, in order for the correct one to be chosen and used effectively. +When you reach for a container, `std::vector` is often the best fit, and should be your natural first choice. Should you decide that one of the other container types is needed, this would usually be a design decision made early in the development of your program. There is uniformity in the naming of the member functions, so all containers support `clear()`, for example. However as soon as you delve into the implementation details, such similarity appears superficial. It is important to have a basic understanding of the implementation of each container such that their individual advantages and limitations are understood, in order for the correct one to be chosen and used effectively. As an example, consider the use of `std::find()` versus member function `find()` when using `std::string`, `std::vector` and `std::set`; this function finds the first occurence of its parameter in the specified container. The `std::set` container is similar to `std::vector` except that it maintains its elements in sorted order. The differences are: @@ -394,11 +399,11 @@ vector: 1 9 7 3 set: 3 4 6 8 Found in string at position: 2 Found in vector: 7 -Found in set: 4 +Found in set: 6 After: string: helo vector: 1 9 3 -set: 3 6 8 +set: 3 4 8 ``` Take time to study this program as it contains some important concepts: @@ -407,13 +412,13 @@ Take time to study this program as it contains some important concepts: * The first part assigns a `std::string` from a string literal, and a `std::vector` and `std::set` from two different initializer lists. Note that `std::set` can only hold unique values, so the container begins with a size of four, not five as for the initializer list (because of the duplicated value `3`). -* The interesting part of the program is the third part, itself split into three. The logic is the same, search for an element value with the correct form of `find()`, compare it against "not found", and if found then erase it. The form of `erase()` used for `std::string` needs a length for the second parameter, while for `std::vector` and `std::set` it takes an iterator as the single element to erase. +* The interesting part of the program is the third part, itself split into three. The logic is the same, search for an element value with the correct form of `find()`, compare it against the "not found" type for the specific container, and if found then erase it. The form of `erase()` used for `std::string` needs a length for the second parameter, while for `std::vector` and `std::set` the form used takes an iterator as the single element to erase. **Experiment** * Sort the `std::vector` and use a binary search instead of a linear one. Hint: use `std::lower_bound()` not `std::binary_search()`. -* Experiment with adding values to the containers, at the beginning, in the middle and at the end. Use a mixture of member function `push_back()` (where possible) and member function `insert()` or `std::insert()` where applicable. +* Experiment with adding values to the containers, at the beginning, in the middle, and at the end. Use a mixture of member function `push_back()` (where possible) and member function `insert()` or `std::insert()` where applicable. ## Spans and arrays @@ -421,7 +426,7 @@ It can be very inefficient to copy `std::vector`s by value, as copies of both th **Experiment** -* Write a function called `populate_int()` which takes a `vector` as its parameter and implements the logic of the `for`-loop in `07-vector.cpp`. Call this function from `main()` instead of using a `for`-loop . +* Write a function called `populate_int()` which takes a `vector` as its parameter and implements the logic of the `for`-loop in `07-vector.cpp`. Call this function from `main()` instead of using a `for`-loop. * Now use `double` instead of `int` in the program. How many code changes are needed? @@ -459,13 +464,13 @@ int main() { A few things to note about this program: -* A range-for loop with an initializer field prints out the values of the `std::span` parameter, outputting a separator in-between, but not after, the elements. The trick of reassigning the variable `sep` gets around the limitation of using `std::copy()`. +* A range-for loop with an initializer field prints out the values of the `std::span` parameter, outputting a separator in-between, but not after, the elements. The trick of reassigning the variable `sep` gets around this limitation of using `std::copy()`. -* The three array-like types are initialized in `main()`. The size of type `std::array` is fixed at compile-time (from its optional second template parameter) and allows it to be allocated on the stack, not using any heap memory. (Due to the fact that `begin()` and `end()` can be used with built-in arrays there are not many cases where `std::array` is useful.) +* The three array-like types are initialized in `main()`. The size of type `std::array` is fixed at compile-time (from its optional second template parameter) and this allows it to be allocated on the stack, not using any heap memory (as for a built-in array). (Due to the fact that `begin()` and `end()` can be used with built-in arrays there are not very many cases where `std::array` is more useful.) * The commented-out call to `print_ints()` doesn't compile as there is no valid conversion from `std::initializer_list` to `std::span`. This is a possible use case for a temporary `std::array`, as in: `print_ints(array{ 9, 8, 7 ,6 });` -Unlike `std::string_view`, `std::span` can modify its elements, although it does not "own" them. Also, a second type of `std::span` takes its size parameter after the type, which is also fixed at compile-time. +Unlike `std::string_view`, `std::span` can modify its elements, even though it does not "own" them. Also, a second form of `std::span` takes its size parameter after the type, which is also fixed at compile-time. **Experiment** @@ -473,13 +478,13 @@ Unlike `std::string_view`, `std::span` can modify its elements, although it does * Perform a sort within `print_ints()` before outputting. -* Now output the containers in `main()` after calling `print_ints()`. Have the orders of these changed? +* Now output the containers in `main()` after calling `print_ints()`, without calling it again. Have the elements of these changed order? ## Ordered and unordered sets -A `std::set` holds its contents in sorted order at all times, thus it is called an *ordered container*. Occasionally this is desirable, however there are space and time costs to this convenience so before using this container type you should consider whether a `std::vector` which can be (manually) sorted when required is a better solution. Array access (using `[]`) is not supported for `std::set`; this may be a deciding factor as to its suitability. Ordered containers require that operator less-than (`<`) is defined when using them to hold user-defined types (other ordering critera can be specified, if needed). +A `std::set` holds its contents in sorted order at all times, thus it is called an *ordered container*. Occasionally this is desirable, however there are space and time costs to this convenience so before using this container type you should consider whether a `std::vector`, which can be (manually) sorted when required, is a better solution. Array access (using `[]`) is not supported for `std::set`; this may be a deciding factor as to its suitability. Ordered containers require that `operator<` (less-than) is defined when using them to hold user-defined types (other ordering criteria can be specified, if desired). -A feature of `std::set` is that it cannot hold duplicate values; subsequently inserting a previously held value does not alter the container, while an initializer list containing duplicates is shortened (and sorted) immediately. (The type `std::multiset` does allow duplicate values.) +A feature of `std::set` is that it cannot hold duplicate values; inserting a previously held value does not alter the container, while an initializer list containing duplicates is shortened (and sorted) immediately. (The type `std::multiset` does allow duplicate values.) The following program defines a `std::set` with value type `std::string`: @@ -495,7 +500,7 @@ using namespace std; int main() { set s{ - "Rossum, Guido van", + "Stroustrup, Bjarne", "Yukihiro, Matsumoto", "Wall, Larry", "Eich, Brendan" @@ -512,7 +517,7 @@ int main() { * Change the container type to `std::multiset`. Does the program compile and run? What happens if you (deliberately) enter a duplicate name? -* The correct ordering depends on the rule of surname first with capitalized first letter. Remove this second restriction by storing all names in lower-case, capitalizing the first letter for output. +* The correct ordering depends on the rule of surname first with capitalized first letter. Remove this second restriction by storing all names in lower-case, capitalizing the first letter for output. Test with name: "van Rossum, Guido". Lookup for `std::set` is faster than linear searching due to the fact that its elements are always sorted. There is also the container type `std::unordered_set` which can claim to have constant-time lookup in the best case due to utilization of a *hash function*. (To complete the quartet, there exists `std::unordered_multiset`.) @@ -526,11 +531,11 @@ In fact, due to the way that the *unordered containers* are implemented, removal ## Lists and forward-lists -Some operations can be inefficient with `std::vector` because of the way it is implemented by the library; operations such as `insert()` and `erase()` can involve the movement much of the data stored in memory. (In fact this is unavoidable, the Standard dictates that the elements of a `std::vector` are stored contiguously in memory.) Other operations such as `push_front()` are not implemented at all, for the same reason. (Using a `std::deque` instead would resolve this particular limitation.) +Some operations can be inefficient with `std::vector` because of the way it is implemented by the library; operations such as `insert()` and `erase()` can involve the movement of much of the data stored in memory. (In fact this is unavoidable since the Standard dictates that the elements of a `std::vector` are stored contiguously in memory.) Other operations such as `push_front()` are not implemented at all, for the same reason. (Using a `std::deque`, as in "double-ended queue", instead would resolve this particular limitation.) -The implementation of `std::list` is fairly straightforward; each element is stored in its own block of assigned memory, together with two pointers; one pointer to the previous element and one pointer to the next element. This does mean that element insertion and deletion can be much quicker than for `std::vector`, however more memory is used by this container in total (the size of two pointers times number of elements, approximately). Lists of "large" objects become more efficient than lists of "small" ones, and as for `std::vector` all elements must be of the same type and size. It follows that the implementation of `std::forward_list` is similar but with only one pointer in each block, pointing to the next element. +The implementation of `std::list` is fairly straightforward; each element is stored in its own block of assigned memory, together with two pointers; one pointer to the previous element and one pointer to the next element. This does mean that element insertion and deletion can be much quicker than for `std::vector`, however more memory is used by this container in total (the difference is the size of two pointers times number of elements, approximately). Lists of "large" objects become more efficient than lists of "small" ones, and as for `std::vector` all elements must be of the same type and size. It follows that the implementation of `std::forward_list` is similar but with only one pointer in each block, pointing to the next element. -Some operations that `std::vector` supports, such as indexing using subscript syntax (`[]`) and `std::sort()`, are not supported at all, either because performance would be unacceptably poor or because the algorithm requres a *random-access iterator*. In fact, `std::list` implements its own member function `sort()` which performs a *stable sort* in-place. The iterator type which works with `std::list` is called a *bi-directional iterator*, meaning that pointer arithmetic-style operations on iterators cannot work. The iterator type for `std::forward_list` is called a *forward iterator*. +Some of the operations that `std::vector` supports, such as indexing using subscript syntax (`[]`) and `std::sort()`, are not supported at all, either because performance would be unacceptably poor or because the algorithm requres a *random-access iterator*. In fact, `std::list` implements its own member function `sort()` which performs a *stable sort* in-place. The iterator type which works with `std::list` is called a *bi-directional iterator*, meaning that pointer arithmetic-style operations on iterators cannot work. The iterator type for `std::forward_list` is called a *forward iterator*. The following program demonstrates both `std::forward_list` and `std::list` being used, although it is not intended to be an example of best practice: @@ -585,11 +590,13 @@ A few new things about this program: * Since the input is to be sorted eventually, experiment with other ways of populating `fwd`. Hint: consider `push_front()`. +* Now find a way to avoid the use of `fwd` altogether. + ## Ordered and unordered maps -All of the containers seen so far have stored a number of elements of a single type. There has been no other information stored with the element, except possibly for `std::vector` where the first element has index `0`, the second has index `1` and so on. This index can be thought of as the *key* as it allows direct access to a single *value*. +All of the containers seen so far have stored a number of elements of a single type. There has been no other information stored with the element, except possibly for `std::vector` where the first element *implicitly* has index `0`, the second has index `1` and so on. This index can be thought of as the *key* as it allows direct access to a single *value*. -This can be generalized so that the key can be of any type, not just a sequence of advancing integers. In C++ all maps operate with a type called `std::pair` which as might be guessed has two fields; these are called `first` and `second`. We could define `std::pair` as follows: +This can be generalized so that the key can be of any type, not just a sequence of advancing integers. In C++ all maps operate with a type called `std::pair` which as might be guessed has two fields; these are called `first` and `second`. We could define `std::pair` as follows (see Chapter 10 for a discussion of the `template` and `typename` keywords): ```cpp template @@ -599,7 +606,7 @@ struct pair { }; ``` -However we don't need to do this as the Standard Library provides this definition (or one very similar, the exact implementation details are not important). Maps operate on collections of *key/value pairs* which are provided by this type. +However we don't need to do this as the Standard Library provides this definition in header `` (or one very similar, the exact implementation details are not important). Maps operate on collections of *key/value pairs* which are provided by this type. The first *associative container* we will look at is `std::map`. The following program uses a `std::map` to hold the per-weight prices of a list of fruits, which can be added to during a run of the program: @@ -663,11 +670,11 @@ int main() { This is a longer program but does not contain much that is new. A few points to note: -* The `std::map` called `products` is initialized from a nested initializer list, and the key/value types are specified within the angle brackets. The output of floating point numbers is fixed to two decimal places. +* The `std::map` called `products` is initialized from a nested initializer list, and the key and value types must be specified within the angle brackets. The output of floating point numbers is fixed to two decimal places. -* With user option `A`, member function `insert()` is called with a (temporary) `std::pair`. This is usually preferred over using array syntax, while `products[product] = price` would work in most cases it is not the most efficient. +* With user option `A`, member function `insert()` is called with a (temporary) `std::pair`. This is usually preferred over using array subscript syntax, while `products[product] = price` would work in most cases it is not always the most efficient method. -* With user option `C`, all of the products are printed out by a range-for loop which iterates over `products` and outputs the `first` and `second` fields of each element. Then member function `find()` is called to obtain an iterator. This is compared against `end(products)` (which if equal would indicate "not found"), a valid value allows the value as `iter->second` to be retrieved. +* With user option `C`, all of the products are printed out by a range-for loop which iterates over `products` and outputs the `first` and `second` fields of each element. Then member function `find()` is called to obtain an iterator. This is compared against `end(products)` (which if equal would indicate "not found"), being other than this allows the **value** part as `iter->second` to be retrieved (the **key** would be available as `iter->first`). As explained above, use of array syntax is not used by this program when adding an entry, nor is it advisable in most cases for element lookup: @@ -693,19 +700,21 @@ There are some other containers and *container adaptors* implemented in the Stan * `std::bitset` also stores binary bits, but has its size fixed at compile-time -* `std::deque` (pronounced "deck") implements a double-ended FIFO similar to `std::vector` +* `std::deque` (pronounced "deck") implements a double-ended container similar to `std::vector`, but with additional operations such as `push_front()` -* `std::stack` implements a LIFO +* `std::stack` implements a LIFO (Last In First Out) -* `std::queue` implements a FIFO +* `std::queue` implements a FIFO (First In First Out) * `std::priority_queue` implements a FIFO that sorts by age and priority -* `std::flat_map` implements an unordered map essentially as two vectors (new in C++23) +* `std::flat_set` implements a sorted container of unique values, typically implemented as a vector + +* `std::flat_map` implements a sorted associative container, typically implemented as two vectors (one for keys and one for values) A brief Tutorial such as this is not the place to delve into these, and indeed the other containers covered in this Chapter have much more detail to discover. As a go-to for both tutorial and reference I can highly recommend [CppReference.com](https://en.cppreference.com)[^1] and [Josuttis, "The C++ Standard Library"](http://cppstdlib.com)[^2]. [^1]: https://en.cppreference.com [^2]: http://cppstdlib.com -*All text and program code ©2019-2022 Richard Spencer, all rights reserved.* +*All text and program code ©2019-2025 Richard Spencer, all rights reserved.* diff --git a/08-files-and-formatting.md b/08-files-and-formatting.md index 45ac675..64a9f59 100644 --- a/08-files-and-formatting.md +++ b/08-files-and-formatting.md @@ -1,9 +1,125 @@ # Files and Formatting +## Formatting values and variables for output + +We have seen how values and variables can be put to output streams using `<<`, and how `print()` and `println()` can be used to output subsequent parameters using curly braces in the format string. For further control over the way these are output, such as field width, accuracy etc. we can specify this using stream manipulators (when outputting to streams) or extra information in the *format string* (when using `print()`/`println()`). Manipulators are covered later in this Chapter, what follows is a discussion of how to use *format specifiers* with `print()`, `println()` and `format()`/`format_to()`. + +The following program demonstrates use of format specifiers for some common types: + +```cpp +// 08-format1.cpp : Basic usage of format string + +#include +#include +using namespace std; + +int main() { + string s{ "Formatted" }; + auto d{ 10.0 / 3.0 }; + auto i{ 20000 }; + println("{0:20}:{2:8}, {1:12.11}", s, d, i); +} +``` + +This program outputs the text `Formatted` followed by sufficient spaces to pad up to a width of 20 characters, then a colon present in the format string, then the value `20000` right-aligned to a width of 8 characters, then the comma and space present in the format string, and finally the value 3.3333333333 at a "precision" of 11 figures (plus decimal point) padded to a width of 12 characters (only padding, as opposed to truncation, is possible). + +**Experiment**: + +* Try printing the three parameters in a different order, by changing the numbers before the colon within the curly braces. + +* Is it possible to achieve the same results when removing these numbers altogether? + +* What happens if you repeat one of `s`, `d`, or `i` in the parameter list? Or take one away? + +The format string, and its associated format specifier(s), are evaluated at compile-time for maximum performance. It must therefore be a string literal, not a string-type variable (unless it is `constexpr`). The values of the subsequent parameters referenced by the format specifier(S) can (and probably will) change during the run of the program. + +## Format specifiers + +As well as describing the field width and precision for all of the built-in types (plus several Standard Library types), format specifiers offer fine-grained control over the output. In fact, all format specifiers are made up of eight optional parts, all of which (if used) appear in order after the colon in the format string. These are listed in the table below: + +| Field | Description | Example | Result | +|----------------|-----------------------------------------------------|---------|--------------------------| +| Fill-and-align | Optional fill character then: <, >, or ^ | {:@>10} | @@@@1233456 | +| Sign | One of: +, - (default), or space | {:+} | +1.23 | +| # | Use alternate form | {:#} | 0x12a, 3.0 | +| 0 | Pad integers with leading zeros | {:06} | 000123 | +| Width | Minimum field width | {:10} | "abc " | +| Precision | FP-precision, maximum field width | {:.7} | 3.333333, "Formatt" | +| L | Use locale-specific setting | {L} | 12,345, 1.234,56, "faux" | +| Type | One of: b, B, d, o, x, X, a, A, e, E, f, F, g, G, ? | {:8.7a} | 1.aaaaaabp+1 | + +It is also possible to write custom formatters which operate on arbitrary format specifiers and user-defined classes. An alternative method would be to create a public `toString()` method in the class and simply invoke this on a parameter of this type (after the format string, which would use plain `{}`). + +The format specifiers listed above work with `print()` and `println()` as well as other functions from the `` header (which include wide-character variants). Here is a complete list: + +| Function | Description | Parameters | Return value | +|---------------|-------------------------------------------------|--------------------------------|---------------------------------| +| `print()` | Output to `stdout`, `FILE*` or `std::ostream` | [dest, ] fmt, ... | None | +| `println()` | As for `print()` with trailing newline | [dest, ] fmt, ... | None | +| `format()` | Create a string from (wide) format string | [locale, ] fmt, ... | `std::string`, `std::wstring` | +| `format_to()` | Write to a (wide) output iterator | iter, [locale, ] fmt, ... | `out` member is `std::iterator` | +| `format_to_n()` | As for `format_to()` with size limit | iter, max, [locale, ] fmt, ... | `out` member is `std::iterator` | + +In choosing between the above functions, the aim would be to choose the most performant for the task. The following program outputs different format strings and parameters utilizing a variety of these functions: + +```cpp +// 08-format2.cpp : Various format string-using functions + +#include +#include +#include +#include +#include +#include +#include +using namespace std; + +int main() { + string world{ "World" }; + print(cout, "Hello, {}!\n", world); + println("{1} or {0}", false, true); + + constexpr const char *fmt = "Approximation of π = {:.12g}"; + string s = format(fmt, asin(1.0) * 2); + cout << s << '\n'; + + constexpr const wchar_t *wfmt = L"Approximation of pi = {:.12g}"; + wstring ws = format(wfmt, asin(1.0) * 2); + wcout << ws << L'\n'; + + format_to(ostream_iterator(cout), "Hello, {}!\n", world); + wstring ww{ L"World" }; + array wa; + auto iter = format_to_n(wa.begin(), 8, L"Hello, {}!\n", ww); + *(iter.out) = L'\0'; + wcout << wa.data() << L'\n'; +} +``` + +A few things to note about this program: + +* The use of `print()` is straightforward and simply outputs `Hello, World!` on a single line, using the variant that prints to a `std::ostream`, in this case `cout`. + +* The call to `println()` reverses the order of its subsequent parameters and outputs them textually: `true or false`. You should be aware that this prints to the C standard output (`stdout`); mixing C++ stream and C output can sometimes cause buffering issues. + +* The uses of `format()`, firstly with a 8-bit, and secondly with a wide-character format string, create a temporary (wide-)string and then put this to the (wide-)character output stream. + +* The function `format_to()` is called with `ostream_iterator(cout)` which is boilerplate for creating a suitable output iterator from a stream object. + +* The use for `format_to_n()` is more involved as it uses a fixed size `std::array` to hold the wide-character output string. The first parameter is the (writable) iterator pointing to the start of the array, and the second is the maximum number of characters to write. The return value has an `out` data member which is the iterator pointing to the next character in the array, which needs to be set to zero in order to allow putting (`std::array`'s, not `std::string`'s) `data()` to `wcout`. + +**Experiment:** + +* Modify this program to use different field widths. Do they work with wide characters? + +* Try some of the different format specifiers from the table above, together with different built-in types such as `long long` and `double`. + ## Simple file access All of the programs we have seen so far lose their internal state, together with any user input, when they exit. A program which can save and/or restore its state makes use of *persistence*. The way this is usually achieved, of course, is to enable saving to and loading from a disk file, stored on a hard-drive, memory card or network server. +C++ file access using the Standard Library header `` is designed to be analogous to use of `cin` and `cout`, using the stream extraction (`>>`) and insertion (`<<`) operators. File access using the C Library's `` header is also possible, and a suitable `FILE *` pointer can be passed as the first parameter to `print()` and `println()` to switch output to that file. + The following program reads from a previously created file and echoes the content to the console. (The filename is provided at run-time as the first environment parameter after the executable name.) This program is only safe to use with text files, so fire up your favorite editor and create a test file to use, including some whitespace such as spaces, tabs and newlines. ```cpp @@ -36,13 +152,13 @@ A few things to note about this program: * An explicit call to close the input file is not needed, this happens automaticalls whien `infile` goes out of scope. -* The only parts of this class we use is the member function `get()`, which confusingly returns an `int`, not a `char` as you might expect, and `ifstream::traits_type::eof()`. The `int` returned by `get()` can be any of the valid range of `char` (usually 0 to 255, or -128 to 127 if `char` is signed) plus a special marker value outside this range to indicate that the *end-of-file* has been reached and no more characters can be read. (If the double-double-colon syntax confuses you don't worry, this boilerplate can be used without a detailed knowledge of the makeup of the stream classes.) +* The only parts of this class we use is the member function `get()`, which confusingly returns an `int`, not a `char` as you might expect, and `ifstream::traits_type::eof()`. The `int` returned by `get()` can be any of the valid range of `char` (usually 0 to 255, or -128 to 127 if `char` is signed) plus a special marker value outside this range to indicate that the *end-of-file* has been reached and no more characters can be read. (If the double-double-colon syntax confuses you don't worry, this boilerplate can be used without a detailed knowledge of the makeup of the stream classes. Using it is better style than relying on C's `EOF` macro from ``.) -* The `while`-loop body uses a cast to convert the variable `c` from an `int` to a `char` in order that is output as a character and not as a number. +* The `while`-loop body uses a cast to convert the variable `c` from an `int` to a `char` in order to ensure that is output as a character and not as a number. **Experiment:** -* Try removing `static_cast` and see what happens. Consider if this could ever be desirable. +* Try removing `static_cast` and see what happens. Consider whether this could ever be desirable. * What happens if you change the same line to `cout.put(c);`? @@ -86,15 +202,15 @@ A few differences to note about this program: * Remove the stream manipulator and one of the `>>`'s. What do you notice when the input file contains spaces, tabs etc? -* Add the standalone statement line `infile >> noskipws;` before the `while`-loop. What do you notice now? (The entity `noskipws` is actually a *manipulator* which modifies the stream it is put to.) +* Add the standalone statement line `infile >> noskipws;` before the `while`-loop, and use plain `infile >> c;` within it. What do you notice now? (The entity `noskipws` is actually a *manipulator* which modifies the stream it is put to.) * Rewrite the loop as a `for`-loop. Can you again remove the need for any statements in the body? ## Files as streams -The member functions `get()` and `put()` are fine for simple character access to C++ streams but are not easily extensible. (Think of the complexity involved in reading a `double` or `std::string` using only these member functions.) When reading input files, the stream extraction operator is overloaded for all of the built-in types, as well as `std::string`. Similarly, the stream insertion operator is overloaded for files being written to, and works identically to the use of `cout` and `cerr` we are familiar with. We will see that you can write your own custom input and output overloads fairly easily, too. +The member functions `get()` and `put()` are adequate for simple character access to C++ streams but are not easily extensible. (Think of the complexity involved in reading a `std::string` or a`double` using only these member functions.) When reading input files, the stream extraction operator is overloaded for all of the built-in types, as well as `std::string`. Similarly, the stream insertion operator is overloaded for files being written to, and works identically to the use of `cout` and `cerr` we are familiar with. We will see that you can write your own custom input and output overloads fairly easily, too. -Saving the state of a program is sometimes called *serialization*, while loading it back is called *deserialization*. Of course, there are no guarantees that the same platform is being used to load the previously serialized state back in, so considerations such as *endian-ness* (big versus little) and *address width* (32 versus 64 bit) can come into play. A way round this issue is to use plain text representation (solely), and in our example programs we will be using text files exclusively. +Saving the state of a program (possibly in binary format) is sometimes called *serialization*, while loading it back is called *deserialization*. Of course, there are no guarantees that the same platform is being used to load the previously serialized state back in, so considerations such as *endian-ness* (big versus little) and *address width* (32 versus 64 bit) can come into play. A way round this issue is to use plain text representation (solely), and in our example programs we will be using text files exclusively. The following program is our calculator program from previously, modified to read calculations to be performed from a text file. These are read one by one, and the results are output. When all of the input file has been read, the program exits. The functionality which has been seen before is contained in the `calc()` function, with the addition of support for exponent: @@ -195,7 +311,7 @@ You may be tempted to use `noskipws` to help you enter a line of text containing So far we have seen byte (8-bit character) raw input as well as formatted input. However, programs (especially interactive ones) often get their input line-by-line. Lines of input can often be evaluated for errors and processed more reliably than relying on the stream extraction operator and repeatedly checking against `fail()` and `bad()`. In the case of a line of input being found to be invalid, the program can prompt the user to try again. -The following program uses the `getline()` member function to obtain a line of input from the console. This function takes two parameters: the address of a C-style array, and its size in bytes. Care must be taken to provide both a valid array address and correct length. The line of text **can** include spaces and is stored in the array **without** a newline and **with** a trailing zero-byte character. +The following program uses the `getline()` **member** function to obtain a line of input from the console. This function takes two parameters: the address of a C-style array, and its size in bytes. Care must be taken to provide both a valid array address and correct length. The line of text **can** include spaces and is stored in the array **without** a newline and **with** a trailing zero-byte character. ```cpp // 08-line1.cpp : obtain a line of input from the user and display it @@ -221,7 +337,7 @@ int main() { As can be found from experimentation, any characters that do not fit into the C-style array are left in the input buffer and are left unprocessed; also the *fail-bit* is set in the input stream's flags, meaning any further calls to `getline()` will return an empty string. The stream fail-bit for `cin` can be unset with `cin.clear()`, after which the unprocessed characters can be read with further call(s) to `getline()`. Optionally, the `ignore()` member function can be used to skip one or more input characters. -There is a non-member function which we met in Chapter 7, perhaps confusingly also named `getline()`, which reads directly from an input stream object into a `std::string`. There is no restriction to the length of the input which can be stored in the `std::string`, and the input ends with a newline (which is not stored). The following program demonstrates the use of this function, with minimal changes from the previous one: +There is also a **non-member** function which we met in Chapter 7, perhaps confusingly also named `getline()`, which reads directly from an input stream object into a `std::string`. There is no restriction to the length of the input which can be stored in the `std::string`, and the input ends with a newline (which is not stored). The following program demonstrates the use of this function, with minimal changes from the previous one: ```cpp // 08-line2.cpp : obtain a line of input from the user, store it in a string variable and display it @@ -381,7 +497,7 @@ A few things to note about this program: So far we have encountered `noskipws` which is a *stream manipulator* that works on input streams. The exact details of how this, and other, manipulators work is unimportant for the purposes of using them, however in general they are put to the stream object with either `<<` or `>>`. *Stream flags* can also be explicitly set or cleared using the member functions `setf()` and `unsetf()`, and *stream parameters* can be set using named member functions such as `width()` and `precision()`. -Getting formatted output to "look right" is quite tricky and relies to a great extent on trial-and-error combined with (tedious) manual checking of program's output. For some performance-critical code, using C++ streams and manipulators may not be practical or desirable. Also, providing localization (*l10n*) to the user's language and other settings can be difficult when using interleaved manipulators and messages. These caveats are a large part of the reason for C++20 adopting *libfmt* into the Standard Library (as `std::format` and associated types). +Getting formatted output to "look right" is quite tricky and relies to a great extent on trial-and-error combined with (tedious) manual checking of program's output. For some performance-critical code, using C++ streams and manipulators may not be practical or desirable. Also, providing localization (*l10n*) to the user's language and other settings can be difficult when using interleaved manipulators and messages. For these reasons, considering use of the `print()` and `format()` family in preference is recommended. The following program produces a simulated cash-till receipt formatted to a width of 20 characters. This program is longer than most of the ones we've seen, and uses `struct` and `std::vector` introduced in previous Chapters. All of the text formatting functionality is in the `main()` program, so try and run the program and compare its output with the multiple uses of `cout` in the code (note that any product descriptions inputted may not contain spaces): @@ -512,7 +628,7 @@ There are quite a lot of stream formatting flags and parameters available, most ## User-defined types and I/O -It is possible, and sometimes desirable, to define how user-defined types are formatted when put to output streams with `<<`. This is done by overloading the global `operator<<` (this, despite appearances, is the **name** of the function for which you must write an overload.) The syntax is ugly, unlike in some other programming languages where you merely provide a `tostring()` method, or similar. +It is possible, and sometimes desirable, to define how user-defined types are formatted when put to output streams with `<<`. This is done by overloading the global `operator<<`, which, despite appearances, is actually the **name** of the function for which you must write an overload. Sadly, the syntax is ugly, unlike in some other programming languages where you merely provide a `tostring()` method, or similar. The following program reintroduces the `Point` type and defines an *overloaded* stream output function (overloaded becuase the function already exists with a different second parameter for other built-in and user types): @@ -608,4 +724,4 @@ int main() { * Modify this program to read `Pixel`s. -*All text and program code ©2019-2022 Richard Spencer, all rights reserved.* +*All text and program code ©2019-2025 Richard Spencer, all rights reserved.* diff --git a/09-classes-friends-and-polymorphism.md b/09-classes-friends-and-polymorphism.md index 3926e3e..3ff8c6d 100644 --- a/09-classes-friends-and-polymorphism.md +++ b/09-classes-friends-and-polymorphism.md @@ -21,11 +21,11 @@ Person a_person{}; Person genius{ { 1879, 3, 14 }, "Einstein", "Albert" }; // Error: does not (yet) compile ``` -This `Person` class (here defined with `class` as opposed to the `struct` keyword we met in Chapter 6) contains three members: `dob` (itself a user-defined type), `familyname` and `firstname` (both of which are `std::string`s). We can define a variable of type `Person` (here `a_person`) using default-initialization syntax (the braces are in fact optional) but we cannot do a lot else with this object. Its fields will be zero-initialized for `a_person.dob.year`, `a_person.dob.month`, and `a_person.dob.day`, while `a_person.familyname` and `a_person.firstname` are empty strings. This is becuase the access specifier `private:` (which we also met in Chapter 6) is always implied for `class`es. This means we cannot either access the fields (member variables) directly using dot-notation, or use uniform initialization syntax, as with `genius`. +This `Person` class (here defined with `class` as opposed to the `struct` keyword we met in Chapter 6) contains three members: `dob` (itself of a user-defined type called `Date`), `familyname` and `firstname` (both of which are `std::string`s). We can define a variable of type `Person` (here `a_person`) using default-initialization syntax (the braces shown here are in fact optional, while empty parentheses are **not** permitted) but we cannot do a lot else with this object. Its fields will be zero-initialized for `a_person.dob.year`, `a_person.dob.month`, and `a_person.dob.day`, while `a_person.familyname` and `a_person.firstname` are empty strings. This is becuase the access specifier `private:` (which we also met in Chapter 6) is always implied for `class`es. This means we cannot either access the fields (member variables) directly using dot-notation, or use uniform initialization syntax, as with `genius`. **Experiment:** -* Change the above fragment to use `struct` instead of `class` in order to enable compilation, and also an empty `main()` function. Does the program run? +* Change the above fragment to use `struct` instead of `class` in order to enable compilation, and also write an empty `main()` function. Does the program run? Is it therefore self-contained? * Now try to create `genius` within `main()` using assignment to member variables and uniform initialization. What error messages do you get? Does changing the keyword `class` to `struct` fix this problem in both cases? @@ -34,30 +34,29 @@ The key to solving the inability to create `Person`s using uniform initializatio ```cpp // 09-person1.cpp : model Person as a class with constructor +#include #include #include #include using namespace std; - -struct Date { - int year{}, month{}, day{}; -}; +using namespace std::chrono; class Person { public: - Person(const Date& dob, string_view familyname, string_view firstname) + Person(const year_month_day& dob, string_view familyname, string_view firstname) : dob{ dob }, familyname{ familyname }, firstname{ firstname } {} string getName() const { return firstname + ' ' + familyname; } + const year_month_day& getDob() const { return dob; } private: - const Date dob; + const year_month_day dob; string familyname, firstname; }; int main() { - Person genius{ { 1879, 3, 14 }, "Einstein", "Albert" }; - cout << genius.getName() << '\n'; + Person genius{ { 1879y, March, 14d }, "Einstein", "Albert" }; + cout << genius.getName() << " was born " << genius.getDob() << '\n'; } ``` @@ -67,27 +66,31 @@ Quite a few things to note about this program: * The constructor's parameters have names `dob`, `familyname` and `firstname`, these being the same names as for the member variables (this is allowed in Modern C++). The conventions for naming (`private:`) class members vary, historically a trailing underscore is used, but this can become difficult to read. -* The member variables are initialized using uniform initialization syntax; this forbids narrowing conversions, and there shouldn't be any as the parameter types should have been carefully chosen. (Older code may use parentheses here instead of braces.) The order of construction is the same as the way the member fields are laid out (after the `private:` access specifier); the order in the comma-separated initializers is unimportant (although you should try to replicate the order of the member fields, your compiler will warn if they differ). The constructor's body is empty here (although it must be present), and this is not unusual. +* The member variables are initialized using uniform initialization syntax; this forbids narrowing conversions, and there shouldn't be any as the parameter types should have been carefully chosen. (Older code may use parentheses here instead of braces.) The order of construction is the same as the way the member fields are laid out (in this class they are all after the `private:` access specifier); the order in the comma-separated initializers is unimportant (although you should try to replicate the order of the member fields, and your compiler will warn if they differ). The constructor's body is empty here (although it must be present), and this is not unusual. -* The `Date` parameter is passed as `const`-reference instead of by value, as it is probably too big to fit in a single register to pass by value. The names are passed by value as `std::string_view` although in older code `const std::string&` would be common. +* The `std::chrono::year_month_day` parameter (itself initialized by uniform initialization) is passed as `const`-reference instead of by value, as it is probably too big to fit in a single register to pass by value. The names are passed by value as `std::string_view` although in older code `const std::string&` would be common. * The member function `getName()` is declared `const` as it is guaranteed not to change any member variables. It returns a newly created `std::string` which must be returned by value. -* The member variable `date` is declared `const` as it will never need to be changed; of course it needs to be initialized by the constructor, but this is allowed. The member variables `familyname` and `firstname` need to be of type `std::string` (not `std::string_view` as for the constructor's parameters) for them to be guaranteed to exist for the lifetime of the class. +* The member variable `dob` is declared `const` as it will never need to be changed; of course it needs to be initialized by the constructor, and this case is allowed. The member variables `familyname` and `firstname` need to be of type `std::string` (not `std::string_view` as for the constructor's parameters) for them to be guaranteed to exist for the lifetime of the class (consider factory functions which return a newly-constructed object, as we saw in Chapter 8). + +* The member function `getDob()` is also declared `const` and returns a `const`-reference. It is possible to put this return value directly to a `std::ostream` as the Standard Library overloads `operator<<` for `std::chrono::year_month_day`. **Experiment:** -* Add more `Person` variables to `main()`, and output their names. +* Add more `Person` objects to `main()`, and output their names. * Rewrite the constructor to initialize the member variables in the body, instead of using the comma-separated list of member initializers. -* Write getters (all declared `const`) called `getFamilyName()`, `getFirstName()`, `getDOB()` avoiding creation of unnecessary temporary variables. Modify `main()` to use these. +* Modify this program to use `std::println()` instead of `cout`. Perform most of the formatting in a `const` member function `toString()`, which returns a `std::string`. + +* Write getters (all declared `const`) called `getFamilyName()` and `getFirstName()` avoiding creation of unnecessary temporary variables. Modify `main()` to use these. * Write setters called `setFamilyName()` and `setFirstName()`. Test these from `main()` again. * Modify the original constructor to allow for `firstname` not being present. Hint: use a defaulted function parameter. What other function needs to be changed? -* Try to create a default-constructed `Person`. What do you find? +* Try to create a default-constructed `Person`. What do you find? Now try to create a `public:` default constructor (with an empty parameter list). There is a third type of access specifier called `protected:`. Its meaning is the same as for `private:` except when inheritance is in use, when it means that (member functions defined within) derived classes have access to any members in the base class which were declared `protected:`. It's rare to find this in real code, although the next program we shall look at demonstrates its syntax and use. @@ -100,17 +103,18 @@ The following program defines three `class`es, the second and third of which der ```cpp // 09-person2.cpp : model Person, Student and Employee as a class inheritance hierarchy +#include #include #include #include #include using namespace std; +using namespace std::chrono; class Person { public: - struct Date; - Person(Date dob) : dob{ dob } {} - Person(Date dob, string_view familyname, string_view firstname, bool familynamefirst = false) + Person(year_month_day dob) : dob{ dob } {} + Person(year_month_day dob, string_view familyname, string_view firstname, bool familynamefirst = false) : dob{ dob }, familyname{ familyname }, firstname{ firstname }, familynamefirst{ familynamefirst } {} virtual ~Person() {} @@ -128,12 +132,8 @@ public: return firstname + ' ' + familyname; } } - struct Date { - unsigned short year{}; - unsigned char month{}, day{}; - }; protected: - const Date dob; + const year_month_day dob; private: string familyname, firstname; bool familynamefirst{}; @@ -144,7 +144,7 @@ public: enum class Schooling; Student(const Person& person, const vector& attended_classes = {}, Schooling school_type = Schooling::preschool) : Person{ person }, school_type{ school_type }, attended_classes{ attended_classes } {} - const Date& getDOB() const { return dob; } + const year_month_day& getDob() const { return dob; } const vector& getAttendedClasses() const { return attended_classes; } enum class Schooling { preschool, elementary, juniorhigh, highschool, college, homeschool, other }; private: @@ -156,7 +156,7 @@ class Employee : public Person { public: Employee(const Person& person, int employee_id, int salary = 0) : Person{ person }, employee_id{ employee_id }, salary{ salary } {} - bool isBirthday(Date today) const { return dob.month == today.month && dob.day == today.day; } + bool isBirthdayToday(year_month_day today) const { return dob.month() == today.month() && dob.day() == today.day(); } void setSalary(int salary) { salary = salary; } auto getDetails() const { return pair{ employee_id, salary }; } private: @@ -165,7 +165,7 @@ private: }; int main() { - Person genius{ { 1879, 3, 14 }, "Einstein", "Albert" }; + Person genius{ { 1879y, March, 14d }, "Einstein", "Albert" }; Student genius_student{ genius, { "math", "physics", "philosophy" }, Student::Schooling::other }; Employee genius_employee{ genius, 1001, 15000 }; @@ -179,8 +179,8 @@ int main() { auto [ id, salary ] = genius_employee.getDetails(); cout << "ID: " << id << ", Salary: $" << salary << '\n'; - Person::Date next_bday{ 2020, 3, 14 }; - if (genius_employee.isBirthday(next_bday)) { + year_month_day next_bday{ 2024y, March, 14d }; + if (genius_employee.isBirthdayToday(next_bday)) { cout << "Happy Birthday!\n"; } } @@ -188,15 +188,11 @@ int main() { Many things to note about this program: -* The `Date` type has been moved to be inside the `Person`; its fully qualified name is therefore `Person::Date`; this has been done to illustrate how `struct`s and `class`es and be nested inside each other. (The type `std::year_month_day`, new to C++20, was not available in my compiler when this program was written.) A forward-declaration `struct Date;` is necessary to avoid having to define `Date` in full before the first constructor. +* A second constructor for `Person` taking only a `std::chrono::year_month_day` has been added. Setters can be used later to initialize or modify the other three member variables, which are left defaulted by this constructor (empty for the two `std::string`s and `false` for the `bool`). -* A second constructor for `Person` taking only a `Date` has been added. Setters can be used later to initialize or modify the other three member variables, which are left defaulted by this constructor (empty for the two `std::string`s and `false` for the `bool`). +* A `virtual` destructor has been added to `Person`; a key C++ concept is that base classes often need a virtual destructor. This is so that any heap objects of type `Student` or `Employee` assigned to a pointer of type `Person*` (including use of smart pointers), the correct destructor of the **derived** class can be found and thus called, avoiding memory leaks. -* A `virtual` destructor has been addded to `Person`; if you remember one thing about inheritance, it should be that base classes need a virtual destructor. This is so that any heap objects of type `Student` or `Employee` assigned to a pointer of type `Person*` (including use of smart pointers), the correct destructor of the **derived** class can be found and thus called, avoiding memory leaks. - -* The `getName()` function returns the name(s) provided by either the constructor or the setter(s) as a single `std::string`, ordered according to the member variable `familynamefirst`. (I hope this attempt at cultural inclusion doesn't offend anyone!) - -* The `Date` type has been modified to try to fit it into 32-bits, so it is almost certainly passed more efficiently by value in a single register rather than by `const`-reference. +* The `getName()` function returns the name(s) provided by either the constructor or the setter(s) as a single `std::string`, ordered according to the member variable `familynamefirst`. (Hopefully this attempt at cultural inclusion doesn't offend anyone!) * The member variable `dob` is declared `protected:`, the other three are `private:`, as before. @@ -208,13 +204,13 @@ Many things to note about this program: * The base class portion of `Student` is initialized as `Person{ person }` where `person` is of type `const Person&`. Then the other two fields of `Student` are initialized. The constructor parameter variable `attended_classes` is passed as a `const vector&` so that only one copy is made, which is when the member variable of the same name is initialized. -* A `public:` member function `getDOB()` makes the `protected:` member of the base class `dob` available to **users** of the derived class. It is declared `const` and returns a `const`-reference. +* A `public:` member function `getDob()` makes the `protected:` data member of the base class `dob` available to **users** of the derived class, in this case `Student`. It is declared `const` and returns a `const`-reference. * The member function `getAttendedClasses()` returns a `const`-reference to `attended_classes`, therefore this `std::vector` is made visible to the function which calls this member function, but is not modifiable. * The `Employee` constructor takes three parameters, the third of which is optional. The base class portion is initialized in the same way as for `Student`. -* The member function `isBirthday()` takes a `Person::Date` as a parameter and compares the `day` and `month` fields with those of `dob`, returning `true` if they are the same, or `false` otherwise. (We're pretending "today" is March 14, 2020.) +* The member function `isBirthdayToday()` takes a `std::chrono::year_month_day` as a parameter and compares the return values of the `day()` and `month()` members with those of `dob`, returning `true` if they are the same, or `false` otherwise. (We're pretending "today" is March 14, 2024, so this function always returns `true`.) * The member variable `employee_id` is not meant to be able to be changed, so is declared `const`. The setter `setSalary()` is defined so that `salary` can be updated, while the getter `getDetails()` returns an aggregate of both derived class member variables by value. @@ -232,7 +228,7 @@ Many things to note about this program: * Write a second constructor for `Employee` to accomplish the same thing. -* Add `getDOB()` to `Employee`, as for `Student`. Now try to add it to `Person`, what do you find? Would a single `public:` getter in the base class be more useful than a `protected:` member variable? +* Add `getDob()` to `Employee`, as for `Student`. Now try to add it to `Person`, what do you find? Would a single `public:` getter in the base class be more useful than a `protected:` member variable? * Add member functions `addAttendedClass()` and `removeAttendedClass()` to `Student`. Make them smart enough to handle duplicates/invalid parameters. @@ -327,38 +323,30 @@ A couple of things to note: Friends have access to all members of the `class` that declares them a `friend`, including those declared `private:` or `protected:`. Sometimes this is desirable, as shown in the following program: ```cpp -// 09-person3.cpp : define operator== and operator<< for Person class +// 09-person3.cpp : define operator<=> for Person class #include using namespace std; struct Date { int year{}, month{}, day{}; + auto operator<=>(const Date&) const = default; }; -bool operator== (const Date& lhs, const Date& rhs) { - return lhs.year == rhs.year && lhs.month == rhs.month && lhs.day == rhs.day; -} - class Person { public: Person(const Date& dob, string_view familyname, string_view firstname) : dob{ dob }, familyname{ familyname }, firstname{ firstname } {} string getName() const { return firstname + ' ' + familyname; } - friend bool operator== (const Person&, const Person&); + const auto& getDob() const { return dob; } + auto operator<=>(const Person&) const = default; friend ostream& operator<< (ostream&, const Person&); private: - const Date dob; string familyname, firstname; + const Date dob; }; -bool operator== (const Person& lhs, const Person& rhs) { - return lhs.familyname == rhs.familyname - && lhs.firstname == rhs.firstname - && lhs.dob == rhs.dob; -} - ostream& operator<< (ostream& os, const Person& p) { os << "Name: " << p.getName() << ", DOB: " << p.dob.year << '/' << p.dob.month << '/' << p.dob.day; @@ -366,8 +354,8 @@ ostream& operator<< (ostream& os, const Person& p) { } int main() { - Person person1{ { 2000, 1, 1 }, "John", "Doe" }, - person2{ { 1987, 11, 31 }, "John", "Doe" }; + Person person1{ { 2000, 1, 1 }, "Doe", "John" }, + person2{ { 1987, 11, 31 }, "Doe", "John" }; cout << "person1: " << person1 << '\n'; cout << "person2: " << person2 << '\n'; if (person1 == person2) { @@ -376,16 +364,28 @@ int main() { else { cout << "Different person!\n"; } + + cout << "person1 is "; + if (person1.getDob() > person2.getDob()) { + cout << "younger than "; + } + else if (person1.getDob() < person2.getDob()) { + cout << "older than "; + } + else { + cout << "the same age as "; + } + cout " person2\n"; } ``` Some things to note about this program: -* Global `operator==` is defined for `Date`. Note that if we used either `std::year_month_day` or `operator<=>` (both new in C++20, but not covered in this Tutorial), this definition would not be necessary. As this `Date` is a `struct` with all members `public:`, use of the keyword `friend` is not needed. +* Member `operator<=>` (the "spaceship operator") is defaulted for this roll-your-own `Date`; this is all that is needed for the equality and ordering comparisons to be defined for this class, with ordering performed member-wise starting with the first data member. -* Within the definition of `Person`, both global `operator==` and global `operator<<` are declared `friend`. This is more boilerplate that you can use in your own classes, changing all occurrences of `Person` to the name of your class. (They are identical to normal function declarations, other than the use of the `friend` keyword.) +* Within the definition of `Person`, global `operator<<` is declared as a `friend` function. This is more boilerplate that you can use in your own classes, changing parameter `const Person&` to the name of your class. (They are identical to normal function declarations, other than the use of the `friend` keyword.) -* Global `operator==` is defined for `Person`. Here the `std:::string` members are compared explicitly, before the `Date` members are compared, calling the previously defined `operator==` for `Date` automatically. +* Member `operator<=>` is defaulted for `Person`; with this code the `std:::string` members will be compared (`familyname` before `firstname`), before the `Date` members are compared. * Global `operator<<` is also defined for `Person`, allowing objects to be put to `cout` (and any other `std::ostream`s) using `<<`. This needs to be a `friend` because it accesses `dob`. @@ -395,11 +395,9 @@ Some things to note about this program: * Now give them different names. What output do you get? -* Define global `operator<<` for `Date`. Can you remove the need for `operator<<` for `Person`, to itself be a `friend` of `class Person`? - -* Make global `operator==` for `Person` compare `getName()`s. Can you remove the need for it to be a `friend`? +* Define global `operator<<` for `Date`. Can you remove the need for `operator<<` for `Person` to itself be a `friend` of `class Person`? -* Make `operator==` for `Person` a member function instead of a global function. +* Compare a few `Person` instances with similar or same family names and first names, storing them in a `std::set`. Write code to output them telephone-book style. Are they ordered in the way you would expect? Classes can be declared `friend`s as well as functions, although this use is probably less common. The following program defines two `class`es `A` and `B` which are mutual friends, thus allowing member functions of either to access each other's `private:` members. @@ -479,11 +477,11 @@ The meanings implied for these member functions in the context of the `virtual` * `f()` is a function in a base class or derived class which can (optionally) be redefined (in the derived class). -* `g()` is a *pure-virtual* function of an abstract base class, which is **never** defined in this class and **must** be defined in a class that derives from it, in order for objects of the derived class to able to be created. Objects of an abstract class **cannot** be instantiated; attempting to do so would trigger a compile-time error. +* `g()` is a *pure-virtual* function of an abstract base class, which is not usually defined in this class and **must** be defined in a class that derives from it, in order for objects of the derived class to able to be created. Objects of an abstract class **cannot** be instantiated; attempting to do so would trigger a compile-time error. * `h()` is a function in a derived class which redefines (overrides) a previous definition; the function signature must exactly match that in the base class (including `const` and `noexcept` qualifiers). This function **can** itself be redefined in any subsequently derived class. -* `k()` is the same as `h()` except this function **cannot** be redefined in a subsequently derived class. +* `k()` is the same as `h()` except this function **cannot** again be redefined in a subsequently derived class. The following program demonstrates all of these uses in a more complex hierarchy deriving from an abstract `Shape` class: @@ -628,4 +626,4 @@ A lot of things to note about this program: [^1]: Grady Booch, Robert A. Maksimchuk, Michael W. Engle, Bobbi J. Young, Jim Conallen, Kelli A. Houston *Object-Oriented Analysis and Design with Applications* (3rd ed. Pearson, 2007, ISBN-13: 9780201895513) -*All text and program code ©2019-2022 Richard Spencer, all rights reserved.* +*All text and program code ©2019-2025 Richard Spencer, all rights reserved.* diff --git a/10-templates-exceptions-lambdas-smart-pointers.md b/10-templates-exceptions-lambdas-smart-pointers.md index 53ca361..cd734cf 100644 --- a/10-templates-exceptions-lambdas-smart-pointers.md +++ b/10-templates-exceptions-lambdas-smart-pointers.md @@ -91,6 +91,7 @@ T minimum(const T& a, const T& b) { auto m1 = minimum(3, 2.5); // Error! minimum or minimum? auto m2 = minimum(-2, 1); // m2 is an int with value -2 auto m3 = minimum(-5.5, -6.5); // m3 is a double with value -6.5 +auto m4 = minimum(3.0, 4) // m4 is a double with value 3 ``` Notice that we do not have to specify a type for `T` explicitly unless the deduction from the supplied arguments would be ambiguous (which is the case if the types of the two function arguments are different). @@ -123,15 +124,15 @@ public: } }; -auto o1 = Opt{ 1.2 }; -auto o2 = Opt{ 3 }; -auto o3 = Opt{}; -auto o4 = Opt{}; +auto o1 = Opt{ 1.2 }; // T = double, valid = true +auto o2 = Opt{ 3 }; // T = int, valid = true +auto o3 = Opt{}; // T = char, valid = false +auto o4 = Opt{}; // T = size_t, valid = false ``` Some things to note about this program: -* A default type for `T` is required as we make use of a defaulted default-constructor; `char` was chosen as the smallest type (`void` may be in theory preferrable, but cannot be used as the compiler would encounter the construct `void value` when instatiating the class and produce an error). +* A default type for `T` is required as we make use of a defaulted default-constructor; `char` was chosen as the smallest type (`void` may be in theory preferrable, but cannot be used as the compiler would encounter the construct `void value` when instantiating the class and produce an error). * The other constructor matches `T` from the type of `value`, storing this in the member variable `value`, and also sets `valid` to `true`. @@ -139,7 +140,7 @@ Some things to note about this program: * Calling member function `hasValue()` is always safe, yielding a `bool`. Calling `get()` on an `Opt` with no value immediately terminates the program (the keyword `throw` is explained later in this Chapter). -Of course, this simple class is of limited practical use, if you need a type to be considered optionally valid without using a "special" value to indicate this, then make use of `std::optional` from the Standard Library. +Of course, this simple class is of limited practical use; if you need a type to be considered optionally valid without using a "special" value to indicate this, then making use of `std::optional` from the Standard Library is recommended. Member functions can be template functions, too. The following program defines a `Stringy` class with a `std::string` member, which can be initialized from another `std::string`, a `std::string_view` or a `const char *`: @@ -159,11 +160,11 @@ Stringy sy4{ 'V' }; // initialize from char Stringy sy5{ 5 }; // Error! Attempt to narrow from int to char ``` -Notice that the constructor (only) is defined with both `template` and `explicit`, meaning a new contructor is (attempted to be) generated when called with different types, and takes an r-value reference `T&&`. A function taking an r-value reference promises not to modify it; it can also be safely used with temporaries (such as `"Hello"s + " World"`) and is efficient as the temporary is not copied. (An optimization to use `std::move` when called with a `std::string` (only) r-value is a possiblilty here, however this would entail writing a second `explicit` constructor.) +Notice that the constructor (only) is defined with both `template` and `explicit`, meaning a new constructor is (attempted to be) generated when called with different types, and takes an r-value reference `T&&`. A function taking an r-value reference promises not to modify it; it can also be safely used with temporaries (such as `"Hello"s + " World"`) and is efficient as the temporary is not copied. (An optimization to use `std::move` when called with a `std::string` (only) r-value is a possiblilty here, however this would entail writing a second `explicit` constructor.) ## Standard exceptions, try, throw and catch -Exceptions are a means of altering program flow (at run-time) and *propagating* error conditions from a callee (sub-)function to its caller function (potentially as far back as `main()`, thus bypassing the usual function return mechanism. Program flow is interrupted at the point where an exception is *thrown*, and resumes at the point the exception is *caught*, which is always within the scope of a caller function (again, possibly `main()`, the beginning of the function call stack). Any code designed to handle an exception being thrown is contained within a *try-block*; this is a block of code enclosed in curly braces immediately after the `try` keyword. This try-block **is** allowed to make function/method calls, implicitly enclosing these within the try-block scope. (Any exceptions thrown from functions declared `noexcept`, or thrown from outside of a try-block's scope will terminate the program.) +Exceptions are a means of altering program flow (at run-time) and *propagating* error conditions from a callee (sub-)function to its caller function (potentially as far back as `main()`, thus bypassing the usual function return mechanism). Program flow is interrupted at the point where an exception is *thrown*, and resumes at the point the exception is *caught*, which is always within the scope of a caller function (again, possibly `main()`, the beginning of the function call stack). Any code designed to handle an exception being thrown is contained within a *try-block*; this is a block of code enclosed in curly braces immediately after the `try` keyword. This try-block **is** allowed to make function/method calls, implicitly enclosing these within the try-block scope. (Any exceptions thrown from functions declared `noexcept`, or thrown from outside of a try-block's scope will terminate the program.) An exception is thrown by using the `throw` keyword, followed by the object to be thrown. (If no object is specified then `std::terminate` is called, as for `noexcept` functions.) Usually, you will want to throw an instance of the `std::exception` hierarchy, although **any** user-defined or built-in type can be thrown. @@ -292,7 +293,7 @@ A few new things to note about this program: * It works by throwing a `std::runtime_error` (which is derived from `std::exception`), throwing a plain `std::exception`, throwing an `int` or returning an `int`. -* There is no need for `break` statements within the `switch` as no `case:` conditions can fall through (except for `default:`, which never needs `break`). +* There is no need for `break` statements within the `switch` as no `case:` conditions can fall through (except for `default:`, which never needs `break` as it should be the final clause). * The order of the catch-blocks is significant, with ellipsis last and (derived class) `std::runtime_error` first. @@ -372,8 +373,7 @@ struct MinMaxAvg { int main() { vector v{ 3, 5, 2, 6, 2, 4 }; - MinMaxAvg f; - for_each(begin(v), end(v), ref(f)); + MinMaxAvg f = for_each(begin(v), end(v), MinMaxAvg{}); cout << "Min: " << f.min << " Max: " << f.max << " Avg: " << f.avg << " Num: " << f.num << '\n'; } @@ -381,13 +381,11 @@ int main() { A few points to note about this program: -* Only `num` and `first` are required to be set before the first `std::for_each()` call; we have used universal initializtion of the member variables, but this could also be achieved by using a (default-)constructor. +* Only `num` and `first` are required to be set before the `std::for_each()` call; we have used universal initialization of the member variables, but this could also be achieved by using a (default-)constructor. -* The definition of `f` (a `MinMaxAvg` function object) is required **before** the call to `std::for_each()` so that its state is still accessible after the call. It is destroyed at the end of `main()`, as this is the scope it is declared within. The code `for_each(begin(v), end(v), MinMaxAvg{});` would compile, but its result would be lost as the functor would itself be destroyed here. +* The assignment of `f` (a `MinMaxAvg` function object) is the result of the call to `std::for_each()`, being the modified (default-constructed) third parameter. -* The syntax `std::ref(f)` passes the function object by reference, with plain `f` a **copy** would be made which would mean the copy's member variables would be discarded at the end of the `for_each()` scope, so again the result would be lost. - -* The function template `std::for_each()` call decomposes to the equivalent of: `f(3); f(5); f(2); f(6); f(2); f(4);`. Of course, a range-for loop could be used to accomplish the same thing, but the *logic* would have to be written (or repeated) within the body of the loop. +* The function template `std::for_each()` call decomposes to the equivalent of: `auto f = MinMaxAvg{}; f(3); f(5); f(2); f(6); f(2); f(4);`. Of course, a range-for loop could be used to accomplish the same thing, but the *logic* within the functor's `operator()` would have to be written (or repeated) within the body of the loop. **Experiment:** @@ -720,7 +718,7 @@ A few things to note about this program: * Then, the sub-scope exits, destroying `p2`, however the object it points to says alive becuase `p1` points to it. -* Finally, `main()` exits, destroying `p1` and `"p2"`. Thus `"p1"` and `"p2"` are destroyed in the **same** order in which they were initialized, unlike for `std::unique_ptr` where it would always be in reverse order. +* Finally, `main()` exits, destroying `p1` and `p2`. Thus `"p1"` and `"p2"` are destroyed in the **same** order in which they were initialized, unlike for `std::unique_ptr` where it would always be in reverse order. Any `std::shared_ptr` object can be passed by **value** to a function, implying a copy of the `std::shared_ptr` and a sharing of ownership. Also a container of `std::shared_ptr`s can share ownership with named `std::shared_ptr`s, or even another container of `std::shared_ptr`s. @@ -854,7 +852,7 @@ int main() { cout << "Name not recognized!\n"; } } -} +} ``` This is one of the larger programs we have seen, and covers much of the contents of this Chapter: @@ -885,4 +883,4 @@ This is one of the larger programs we have seen, and covers much of the contents * Change `Pupil` and `Class` to be `class`es instead of `struct`s, with `private:` data members. Hint: you will need to write some `public:` getters. -*All text and program code ©2019-2022 Richard Spencer, all rights reserved.* +*All text and program code ©2019-2025 Richard Spencer, all rights reserved.* diff --git a/README.md b/README.md index da60838..ad76abe 100644 --- a/README.md +++ b/README.md @@ -2,7 +2,11 @@ ## Contents -All of the source Markdown pages for the Modern C++ Tutorial on https://learnmoderncpp.com/ plus complete, working programs from the course text. +All of the source Markdown pages for the Modern C++ Tutorial on https://learnmoderncpp.com/ plus complete, working programs machine-extracted from the course text. + +**Note:** Some Chapters have had significant changes made to update them to C++23, and not all programs compile successfully yet. In particular, use of `std::println()` with `import std;` does not compile. In case of issues with your compiler please see the Releases page for the C++20 version of the Tutorial. + +**New:** Jupyter Notebooks have been auto-generated from the source Markdown files and are located in the `jupyter-notebooks` directory. You will need the executable `jupyter-lab` available with suitable C++ kernels, see output from running `jupyter kernelspec list` (tested with kernel `cpp23`, which needs to be set on first load). The "headers" subdirectory contains C++ programs with legacy header `#includes`, whilst the "modules" subdirectory contains the same programs using the `import` keyword instead. See https://learnmoderncpp.com/2020/09/05/where-are-c-modules/ for more details about C++ compilers which have support for modules. @@ -10,7 +14,7 @@ The "scripts" subdirectory contains a C++ program which extracts all programs fr ## Compiling under Windows -Most programs compile successfully under Windows with Visual Studio 2022 (v17.5 or later), some of the modules versions do not currently compile. +Most programs compile successfully under Windows with Visual Studio 2022 (v17.8 or later), some of the modules versions (as noted above) do not currently compile. The supplied batch scripts `build-vs2022-headers.bat` and `build-vs2022-modules.bat` can be used to compile all of the programs within a Visual Studio command prompt, simply run: @@ -20,7 +24,7 @@ C:\Program Files\Microsoft Visual Studio\2022\Community\VC\Auxiliary\Build\vcvar or similar in a command window, or create a desktop link, and then run either of the build scripts. -Alternatively, to compile individual programs from within a Visual Studio command prompt run either (modules): +Alternatively, to compile individual programs from within a Visual Studio command prompt, run either (modules): ``` cl /EHsc /std:c++latest /MTd 00-example.cpp @@ -39,13 +43,13 @@ substituting the correct file for `00-example.cpp` in either case. Note that the To compile individual programs in the "headers" subdirectory under Linux, use: ``` -g++ -std=c++20 -o 00-example 00-example.cpp +g++ -std=c++23 -o 00-example 00-example.cpp ``` or: ``` -clang++ -std=c++20 -o 00-example 00-example.cpp +clang++ -std=c++23 -o 00-example 00-example.cpp ``` substituting both occurrencies of `00-example` with the correct file name. diff --git a/headers/01-hellow.cpp b/headers/01-hellow.cpp index 27c0cad..ffebe56 100644 --- a/headers/01-hellow.cpp +++ b/headers/01-hellow.cpp @@ -1,8 +1,8 @@ // 01-hellow.cpp : prints a line of text to the console -#include +#include using namespace std; int main() { - cout << "Hello, World!" << '\n'; + println("Hello, World!"); } diff --git a/headers/01-title.cpp b/headers/01-title.cpp index 666da3f..1211ccd 100644 --- a/headers/01-title.cpp +++ b/headers/01-title.cpp @@ -1,15 +1,15 @@ // 01-title.cpp : output the title page of a well-known book -#include +#include using namespace std; int main() { - cout << 1+R"( + print(1+R"( Alice's Adventures In Wonderland by LEWIS CARROLL -)"; +)"); } diff --git a/headers/02-assign.cpp b/headers/02-assign.cpp index 37f7201..16cf86c 100644 --- a/headers/02-assign.cpp +++ b/headers/02-assign.cpp @@ -1,14 +1,14 @@ // 02-assign.cpp : assign to local variables -#include +#include using namespace std; int main() { int i = 1, j = 2; unsigned k; - cout << "(1) i = " << i << ", j = " << j << ", k = " << k << '\n'; + println("(1) i = {}, j = {}, k = {}", i, j, k); i = j; j = 3; k = -1; - cout << "(2) i = " << i << ", j = " << j << ", k = " << k << '\n'; + println("(2) i = {}, j = {}, k = {}", i, j, k); } diff --git a/headers/02-constants.cpp b/headers/02-constants.cpp index bfe311c..d26061e 100644 --- a/headers/02-constants.cpp +++ b/headers/02-constants.cpp @@ -1,13 +1,12 @@ // 02-constants.cpp : introducing the const keyword -#include +#include using namespace std; const double PI = 3.14159265358979; int main() { auto const APPROX_E = 3; - cout << "pi is almost exactly " << PI - << "e is approximately " << APPROX_E - << '\n'; + println("pi is almost exactly {}, while e is approximately {}", + PI, APPROX_E); } diff --git a/headers/02-constexpr.cpp b/headers/02-constexpr.cpp index 99db913..4e478c7 100644 --- a/headers/02-constexpr.cpp +++ b/headers/02-constexpr.cpp @@ -1,17 +1,19 @@ // 02-constexpr.cpp : introducing the constexpr keyword -#include +#include #include using namespace std; -const double PI1 = acos(-1.0); // acos is not (yet) constexpr +// Note: currently, not all compilers mark `acos` as a +// constexpr function in cmath. The following line might +// not compile with `clang++` for example. +constexpr double PI1 = acos(-1.0); constexpr double PI2 = 22.0 / 7.0; -// the following line does not compile and has been commented out -//static_assert(PI1 > 3.141 && PI1 < 3.143); +static_assert(PI1 > 3.141 && PI1 < 3.143); static_assert(PI2 > 3.141 && PI2 < 3.143); int main() { - cout << "PI1 = " << PI1 << '\n'; - cout << "PI2 = " << PI2 << '\n'; + println("PI1 = {}", PI1); + println("PI2 = {}", PI2); } diff --git a/headers/02-height.cpp b/headers/02-height.cpp index 4cc3cfe..0fe5e61 100644 --- a/headers/02-height.cpp +++ b/headers/02-height.cpp @@ -1,6 +1,6 @@ // 02-height.cpp : define the same variable name in two different namespaces -#include +#include using namespace std; namespace Wonderland { @@ -12,9 +12,7 @@ namespace VictorianEngland { } int main() { - cout << "Alice\'s height varies between " - << Wonderland::alice_height_m - << "m and " - << VictorianEngland::alice_height_m - << "m.\n"; + println("Alice\'s height varies between {}m and {}m", + Wonderland::alice_height_m, + VictorianEngland::alice_height_m); } diff --git a/headers/02-references.cpp b/headers/02-references.cpp index 8fcb691..b4ff255 100644 --- a/headers/02-references.cpp +++ b/headers/02-references.cpp @@ -1,13 +1,13 @@ // 02-references.cpp : introducing l-value references -#include +#include using namespace std; int alice_age{ 9 }; int main() { - cout << "Alice\'s age is " << alice_age << '\n'; + println("Alice\'s age is {}", alice_age); int& alice_age_ref = alice_age; alice_age_ref = 10; - cout << "Alice\'s age is now " << alice_age << '\n'; + println("Alice\'s age is now {}", alice_age); } diff --git a/headers/02-scopes.cpp b/headers/02-scopes.cpp index 3a5eed3..06624db 100644 --- a/headers/02-scopes.cpp +++ b/headers/02-scopes.cpp @@ -1,16 +1,16 @@ // 02-scopes.cpp : define three variables with the same name in one program -#include +#include using namespace std; auto a{ 1.5f }; int main() { - cout << "(1) " << a << '\n'; + println("(1) {}", a); auto a{ 2u }; - cout << "(2) " << a << '\n'; + println("(2) {}", a); { auto a{ 2.5 }; - cout << "(3) " << a << '\n'; + println("(3) {}", a); } } diff --git a/headers/02-swap.cpp b/headers/02-swap.cpp index 586d6a1..84d91bf 100644 --- a/headers/02-swap.cpp +++ b/headers/02-swap.cpp @@ -1,13 +1,13 @@ // 02-swap.cpp : attempt to swap the values of an int and a double -#include +#include using namespace std; int main() { int a = 1; double b = 2.5; - cout << "(1) a = " << a << ", b = " << b << '\n'; + println("(1) a = {}, b = {}", a, b); a = 2.5; b = 1; - cout << "(2) a = " << a << ", b = " << b << '\n'; + println("(2) a = {}, b = {}", a, b); } diff --git a/headers/02-uniform.cpp b/headers/02-uniform.cpp index 10af780..1ffca37 100644 --- a/headers/02-uniform.cpp +++ b/headers/02-uniform.cpp @@ -1,11 +1,11 @@ // 02-uniform.cpp : avoid compiler error with uniform initialization and explicit narrowing cast -#include +#include using namespace std; int main() { // int c = { 2.5 }; // Error: this does NOT compile int c = { static_cast(2.5) }; // while this does double d = { 1 }; // and so does this - cout << "c = " << c << ", d = " << d << '\n'; + println("c = {}, d = {}", c, d); } diff --git a/headers/03-calc.cpp b/headers/03-calc.cpp index c260858..653dab6 100644 --- a/headers/03-calc.cpp +++ b/headers/03-calc.cpp @@ -6,7 +6,7 @@ using namespace std; int main() { int r{}, x{}, y{}; char op{}; - cout << "Please enter a calulation (number op number, op is one of +-*/):\n"; + cout << "Please enter a calculation (number op number, op is one of +-*/):\n"; cin >> x >> op >> y; switch (op) { case '+': diff --git a/headers/04-inline.cpp b/headers/04-inline.cpp index bad26c5..7a3e472 100644 --- a/headers/04-inline.cpp +++ b/headers/04-inline.cpp @@ -1,6 +1,6 @@ // 04-inline.cpp : use of an inline function -#include +#include using namespace std; inline void swap(int& x, int& y) { @@ -11,7 +11,7 @@ inline void swap(int& x, int& y) { int main() { int a = 1, b = 2; - cout << "(1) a = " << a << ", b = " << b << '\n'; + println("(1) a = {}, b = {}", a, b); swap(a, b); - cout << "(2) a = " << a << ", b = " << b << '\n'; + println("(2) a = {}, b = {}", a, b); } diff --git a/headers/04-noexcept.cpp b/headers/04-noexcept.cpp index 16507fc..5ac841f 100644 --- a/headers/04-noexcept.cpp +++ b/headers/04-noexcept.cpp @@ -1,24 +1,24 @@ // 04-noexcept.cpp : a noexcept function throwing an exception -#include +#include #include using namespace std; -int throw_if_zero(int i) noexcept { +void throw_if_zero(int i) noexcept { if (!i) { throw runtime_error("found a zero"); } - cout << "throw_if_zero(): " << i << '\n'; + println("throw_if_zero(): {}", i); } int main() { - cout << "Entering main()\n"; + println("Entering main()"); try { throw_if_zero(1); throw_if_zero(0); } - catch(...) { - cout << "Caught an exception!\n"; + catch(exception& e) { + println("Caught an exception: {}", e.what()); } - cout << "Leaving main()\n"; + println("Leaving main()"); } diff --git a/headers/04-static-var.cpp b/headers/04-static-var.cpp index 652a72b..bcc64e5 100644 --- a/headers/04-static-var.cpp +++ b/headers/04-static-var.cpp @@ -1,11 +1,11 @@ // 04-static-var.cpp : preserving function state in a static variable -#include +#include using namespace std; void f() { static int s{1}; - cout << s << '\n'; + println("{}", s); ++s; } diff --git a/headers/06-pixel1.cpp b/headers/06-pixel1.cpp index b23e1ee..36f6a0d 100644 --- a/headers/06-pixel1.cpp +++ b/headers/06-pixel1.cpp @@ -19,15 +19,13 @@ string_view get_color(Color c) { switch (c) { case Color::red: return "red"; - break; case Color::green: return "green"; - break; case Color::blue: return "blue"; - break; + default: + return ""; } - return ""; } int main() { diff --git a/headers/06-pixel2.cpp b/headers/06-pixel2.cpp index e3dcaf8..360921b 100644 --- a/headers/06-pixel2.cpp +++ b/headers/06-pixel2.cpp @@ -18,15 +18,13 @@ string_view get_color(Color c) { switch (c) { case Color::red: return "red"; - break; case Color::green: return "green"; - break; case Color::blue: return "blue"; - break; + default: + return ""; } - return ""; } int main() { diff --git a/headers/06-point3.cpp b/headers/06-point3.cpp index 7d41707..0097ec7 100644 --- a/headers/06-point3.cpp +++ b/headers/06-point3.cpp @@ -7,7 +7,7 @@ struct Point{ int x{}, y{}; }; -Point operator+ (const Point& lhs, const Point& rhs) { +const Point operator+ (const Point& lhs, const Point& rhs) { Point result; result.x = lhs.x + rhs.x; result.y = lhs.y + rhs.y; diff --git a/headers/06-point4.cpp b/headers/06-point4.cpp index 0a86cfe..3bc6781 100644 --- a/headers/06-point4.cpp +++ b/headers/06-point4.cpp @@ -13,7 +13,7 @@ struct Point{ } }; -Point operator+ (const Point& lhs, const Point& rhs) { // non-member operator+ +const Point operator+ (const Point& lhs, const Point& rhs) { // non-member operator+ Point result{ lhs }; result += rhs; return result; diff --git a/headers/08-format1.cpp b/headers/08-format1.cpp new file mode 100644 index 0000000..be0b084 --- /dev/null +++ b/headers/08-format1.cpp @@ -0,0 +1,12 @@ +// 08-format1.cpp : Basic usage of format string + +#include +#include +using namespace std; + +int main() { + string s{ "Formatted" }; + auto d{ 10.0 / 3.0 }; + auto i{ 20000 }; + println("{0:20}:{2:8}, {1:12.11}", s, d, i); +} diff --git a/headers/08-format2.cpp b/headers/08-format2.cpp new file mode 100644 index 0000000..d4e949a --- /dev/null +++ b/headers/08-format2.cpp @@ -0,0 +1,31 @@ +// 08-format2.cpp : Various format string-using functions + +#include +#include +#include +#include +#include +#include +#include +using namespace std; + +int main() { + string world{ "World" }; + print(cout, "Hello, {}!\n", world); + println("{1} or {0}", false, true); + + constexpr const char *fmt = "Approximation of π = {:.12g}"; + string s = format(fmt, asin(1.0) * 2); + cout << s << '\n'; + + constexpr const wchar_t *wfmt = L"Approximation of pi = {:.12g}"; + wstring ws = format(wfmt, asin(1.0) * 2); + wcout << ws << L'\n'; + + format_to(ostream_iterator(cout), "Hello, {}!\n", world); + wstring ww{ L"World" }; + array wa; + auto iter = format_to_n(wa.begin(), 8, L"Hello, {}!\n", ww); + *(iter.out) = L'\0'; + wcout << wa.data() << L'\n'; +} diff --git a/headers/09-person1.cpp b/headers/09-person1.cpp index 69dd890..d3931b6 100644 --- a/headers/09-person1.cpp +++ b/headers/09-person1.cpp @@ -1,27 +1,26 @@ // 09-person1.cpp : model Person as a class with constructor +#include #include #include #include using namespace std; - -struct Date { - int year{}, month{}, day{}; -}; +using namespace std::chrono; class Person { public: - Person(const Date& dob, string_view familyname, string_view firstname) + Person(const year_month_day& dob, string_view familyname, string_view firstname) : dob{ dob }, familyname{ familyname }, firstname{ firstname } {} string getName() const { return firstname + ' ' + familyname; } + const year_month_day& getDob() const { return dob; } private: - const Date dob; + const year_month_day dob; string familyname, firstname; }; int main() { - Person genius{ { 1879, 3, 14 }, "Einstein", "Albert" }; - cout << genius.getName() << '\n'; + Person genius{ { 1879y, March, 14d }, "Einstein", "Albert" }; + cout << genius.getName() << " was born " << genius.getDob() << '\n'; } diff --git a/headers/09-person2.cpp b/headers/09-person2.cpp index 488b5cf..2dd318b 100644 --- a/headers/09-person2.cpp +++ b/headers/09-person2.cpp @@ -1,16 +1,17 @@ // 09-person2.cpp : model Person, Student and Employee as a class inheritance hierarchy +#include #include #include #include #include using namespace std; +using namespace std::chrono; class Person { public: - struct Date; - Person(Date dob) : dob{ dob } {} - Person(Date dob, string_view familyname, string_view firstname, bool familynamefirst = false) + Person(year_month_day dob) : dob{ dob } {} + Person(year_month_day dob, string_view familyname, string_view firstname, bool familynamefirst = false) : dob{ dob }, familyname{ familyname }, firstname{ firstname }, familynamefirst{ familynamefirst } {} virtual ~Person() {} @@ -28,12 +29,8 @@ class Person { return firstname + ' ' + familyname; } } - struct Date { - unsigned short year{}; - unsigned char month{}, day{}; - }; protected: - const Date dob; + const year_month_day dob; private: string familyname, firstname; bool familynamefirst{}; @@ -44,7 +41,7 @@ class Student : public Person { enum class Schooling; Student(const Person& person, const vector& attended_classes = {}, Schooling school_type = Schooling::preschool) : Person{ person }, school_type{ school_type }, attended_classes{ attended_classes } {} - const Date& getDOB() const { return dob; } + const year_month_day& getDob() const { return dob; } const vector& getAttendedClasses() const { return attended_classes; } enum class Schooling { preschool, elementary, juniorhigh, highschool, college, homeschool, other }; private: @@ -56,7 +53,7 @@ class Employee : public Person { public: Employee(const Person& person, int employee_id, int salary = 0) : Person{ person }, employee_id{ employee_id }, salary{ salary } {} - bool isBirthday(Date today) const { return dob.month == today.month && dob.day == today.day; } + bool isBirthdayToday(year_month_day today) const { return dob.month() == today.month() && dob.day() == today.day(); } void setSalary(int salary) { salary = salary; } auto getDetails() const { return pair{ employee_id, salary }; } private: @@ -65,7 +62,7 @@ class Employee : public Person { }; int main() { - Person genius{ { 1879, 3, 14 }, "Einstein", "Albert" }; + Person genius{ { 1879y, March, 14d }, "Einstein", "Albert" }; Student genius_student{ genius, { "math", "physics", "philosophy" }, Student::Schooling::other }; Employee genius_employee{ genius, 1001, 15000 }; @@ -79,8 +76,8 @@ int main() { auto [ id, salary ] = genius_employee.getDetails(); cout << "ID: " << id << ", Salary: $" << salary << '\n'; - Person::Date next_bday{ 2020, 3, 14 }; - if (genius_employee.isBirthday(next_bday)) { + year_month_day next_bday{ 2023y, March, 14d }; + if (genius_employee.isBirthdayToday(next_bday)) { cout << "Happy Birthday!\n"; } } diff --git a/headers/09-person3.cpp b/headers/09-person3.cpp index f876ede..c9d0e1e 100644 --- a/headers/09-person3.cpp +++ b/headers/09-person3.cpp @@ -1,35 +1,27 @@ -// 09-person3.cpp : define operator== and operator<< for Person class +// 09-person3.cpp : define operator<=> for Person class #include using namespace std; struct Date { int year{}, month{}, day{}; + auto operator<=>(const Date&) const = default; }; -bool operator== (const Date& lhs, const Date& rhs) { - return lhs.year == rhs.year && lhs.month == rhs.month && lhs.day == rhs.day; -} - class Person { public: Person(const Date& dob, string_view familyname, string_view firstname) : dob{ dob }, familyname{ familyname }, firstname{ firstname } {} string getName() const { return firstname + ' ' + familyname; } - friend bool operator== (const Person&, const Person&); + const auto& getDob() const { return dob; } + auto operator<=>(const Person&) const = default; friend ostream& operator<< (ostream&, const Person&); private: - const Date dob; string familyname, firstname; + const Date dob; }; -bool operator== (const Person& lhs, const Person& rhs) { - return lhs.familyname == rhs.familyname - && lhs.firstname == rhs.firstname - && lhs.dob == rhs.dob; -} - ostream& operator<< (ostream& os, const Person& p) { os << "Name: " << p.getName() << ", DOB: " << p.dob.year << '/' << p.dob.month << '/' << p.dob.day; @@ -37,8 +29,8 @@ ostream& operator<< (ostream& os, const Person& p) { } int main() { - Person person1{ { 2000, 1, 1 }, "John", "Doe" }, - person2{ { 1987, 11, 31 }, "John", "Doe" }; + Person person1{ { 2000, 1, 1 }, "Doe", "John" }, + person2{ { 1987, 11, 31 }, "Doe", "John" }; cout << "person1: " << person1 << '\n'; cout << "person2: " << person2 << '\n'; if (person1 == person2) { @@ -47,4 +39,16 @@ int main() { else { cout << "Different person!\n"; } + + cout << "person1 is "; + if (person1.getDob() > person2.getDob()) { + cout << "younger than "; + } + else if (person1.getDob() < person2.getDob()) { + cout << "older than "; + } + else { + cout << "the same age as "; + } + cout << "person2" << '\n'; } diff --git a/headers/10-functor2.cpp b/headers/10-functor2.cpp index 353a2ce..81ec9ce 100644 --- a/headers/10-functor2.cpp +++ b/headers/10-functor2.cpp @@ -28,8 +28,7 @@ struct MinMaxAvg { int main() { vector v{ 3, 5, 2, 6, 2, 4 }; - MinMaxAvg f; - for_each(begin(v), end(v), ref(f)); + MinMaxAvg f = for_each(begin(v), end(v), MinMaxAvg{}); cout << "Min: " << f.min << " Max: " << f.max << " Avg: " << f.avg << " Num: " << f.num << '\n'; } diff --git a/headers/10-pupils.cpp b/headers/10-pupils.cpp index 3c7d043..5896c9c 100644 --- a/headers/10-pupils.cpp +++ b/headers/10-pupils.cpp @@ -96,4 +96,4 @@ int main() { cout << "Name not recognized!\n"; } } -} +} diff --git a/jupyter-notebooks/01-string-and-character-literals.ipynb b/jupyter-notebooks/01-string-and-character-literals.ipynb new file mode 100644 index 0000000..4f4089a --- /dev/null +++ b/jupyter-notebooks/01-string-and-character-literals.ipynb @@ -0,0 +1,256 @@ +{ + "cells": [ + { + "cell_type": "markdown", + "id": "cbd0eb18", + "metadata": {}, + "source": [ + "# String and Character Literals\n", + "\n", + "## Introducing a Modern C++ program\n", + "\n", + "Convention dictates that a first program should output the programmer's timeless cry of \"Hello, World!\" to the screen, and do no more (or less). This is often useful in order to test that the compilation environment is fully functional in terms of executable paths, header files, link libraries etc. The most up-to-date C++ version of this program is shown below:" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "6243faee", + "metadata": {}, + "outputs": [], + "source": [ + "// 01-hellow.cpp : prints a line of text to the console\n", + "\n", + "#include \n", + "using namespace std;\n", + "\n", + "int main() {\n", + " println(\"Hello, World!\");\n", + "}" + ] + }, + { + "cell_type": "markdown", + "id": "e31585aa", + "metadata": {}, + "source": [ + "If you prefer not to cut-and-paste, this source file is included in the zip archive linked from this site.[^1] If you are reading this as a Jupyter notebook, clicking within the code cell and then either pressing the \"play\" button on the menu bar, or typing Ctrl-Enter, should compile and run the program, showing any resulting output immediately below. (Notebooks previewed on GitHub are not functional in this way, however all of the content is displayed.)\n", + "\n", + "Having seen what this program does (admittedly not that much), let's explore how it is put together:\n", + "\n", + "* The first line is a comment; syntax of comments are discussed in more detail later in this Chapter. I've chosen to repeat the filename of the source code file in the comment, and also to summarize the purpose of the program. This summary is intended to be useful to anybody who later reads the code, possibly including the original author!\n", + "\n", + "* Then comes some *boilerplate* code, which is common code that we'll see again in future programs we write. The *include directive* is a command interpreted by the *pre-processor* which pastes the entire contents of the relevant *header file* (and any other files it `#include`s) at that point into the *compilation unit*. These directives are being phased out of Modern C++ in favor of the `import` keyword (which has the potential to speed up compilation times significantly), but it is likely the transition will take years to complete. Both Clang/LLVM and MSVC implement `import` although extra command-line switches are needed currently.\n", + "\n", + "* The next line `using namespace std;` is another directive which makes available all of the elements of the Standard Namespace (abbreviated as `std`) available tothe scope in which the directive appears (in this case global scope). Many experienced programmers would consider this *namespace pollution* bad form, preferring instead to use the *fully qualified names* of the individual components. I have chosen to use it in all of the the example programs we will see in this Tutorial as it facilitates better readability of (and familiarity with) the component names. The name \"Standard\" comes from the definition of the C++ Library's classes, functions and other facilities as defined by the ISO Standardization Committee. Programs can use any part of the Standard Library and be expected to compile on any compiler/platform combination without modification.\n", + "\n", + "* Next we have a function definition, which is for the `main()` function; here the *parentheses*, or brackets, indicate an unused (or empty) *parameter list*. Every executable C++ program has to have exactly one `main()` in order for it to be able to be linked into an executable binary, and this is where execution begins when the program is run. (This is almost true: you should be aware that global *objects*, if any, will have their constructors called before `main()` is entered.)\n", + "\n", + "* The `int` specifies the correct *return type* of `main()`, although unique to `main()` specifying `return` within the function body is in fact optional; failing to specify it causes a value of zero (indicating successful completion) to be returned to the calling environment or process as the *system return code*.\n", + "\n", + "* Curly braces `{` and `}` are used to delimit a *block* of code, in this case the *body* of the `main()` function. The convention of putting the opening brace `{` at the end of the line instead of on a line by itself follows the \"One True Brace Style\" (or *1TBS* for short) popularized for the C programming language. I use it both here and in future example programs because it saves on vertical space, and works better with code-folding modes found in many text editors. Some people feel very strongly about whitespace and formatting conventions in their code; your organization will almost certainly have its own coding standards (which you will have to follow even if you don't agree with them!) I highly recommend the [Clang-Format](https://clang.llvm.org/docs/ClangFormat.html)[^2] utility, which exists as a plugin to many IDEs and can be used to automatically reformat source code to a pre-defined set of rules.\n", + "\n", + "* The only part of this program which appears to perform an action is within the body of `main()`. It is a call to the C++ library function `println()` (new to C++23, previously *stream objects* would have been used) which outputs its *format string* plus parameters (if any) followed by a new-line sequence. (The `print()` function works identically but omits the trailing new-line.) Output is sent directly to `stdout` (the C-Library's Standard Output) which implies that C++ streams are not used at all.\n", + "\n", + "* A *string literal*, used to write the format string within code, is delimited by a matching pair of double quotes (`\"`) on the same line, and can contain any number of printable and escaped, non-printable characters. It would usually be stored verbatim in a read-only data segment of the final executable. A *character literal* however is delimited by a matching pair of single quotes (`'`) and contains exactly one printable or escaped, non-printable character. To be used **within** any literal, both of these types of quotes need to be *escaped* by preceding them with a backslash (`\\`). Certain other codes have to be *escape sequences* as well, with `\\n` representing new-line; for a complete list see the table later in this Chapter.\n", + "\n", + "Maybe you've heard about C++ supporting *generics* through the `template` keyword? Even this simple program only works due to the use of *template instantiation* (`std::print()` is in fact a *generic function*), which is in simple terms creating code to be compiled from the provided parameter(s). Hopefully, it soon becomes apparent that support for such capabilities can lead to easily comprehensible client code compared with leaner (\"simpler\") programming languages, such as C.\n", + "\n", + "**Experiment:**\n", + "\n", + "* Adapt the above program (perhaps calling the modified version `01-hellow2.cpp`) to print the new-line character from within the string literal, using the `print()` function instead. Is the output identical?\n", + "\n", + "* Move the using-directive in the original program to within `main()`, and make sure the program still compiles. Does its position within `main()` matter?\n", + "\n", + "* Now use a using-statement `using std::println;` *instead* of `using namespace std;`. Are there any other changes you need to make to the code?\n", + "\n", + "* Finally, go back to the version using `println()` and try omitting any `using` statement at all, and prefix the function call with `std::`. Check this code compiles, and then consider whether you prefer this use of *fully qualified* Standard Library entities. Personally, I feel that for new C++ programmers, fully qualified names in code look too similar to each other, making it harder to learn to recognize the individual names. However, you should be aware that having `using namespace std;` in your code does make you look like a beginner to more experienced C++ coders.\n", + "\n", + "## Special characters\n", + "\n", + "Some characters cannot be easily entered into string or character literals within code, this may be because they are ASCII *control* characters (also known as non-printable, being in the range 0-31) or *top-bit-set* characters (in the range 128-255) not available on your keyboard, or because they have special meaning (such as Delete). Some of the more common control and other \"special\" characters have single-letter short forms; we've already encountered `\\n` for new-line; the others are listed in the table below.\n", + "\n", + "Note: a backslash followed by an *octal* (base 8) number up to three digits (between `\\0` or `\\377`) can be used for any character in the range 0-255 (decimal), as can a backslash followed by `x` and one or two *hexadecimal* (base 16) digits (such as `\\xa` or `\\xA3`) from `\\x00` up to `\\xff`.\n", + "\n", + "| Escape sequence | Description |\n", + "|:---------------:|:------------------:|\n", + "| \\\\n | newline |\n", + "| \\\\t | horizontal tab |\n", + "| \\\\v | vertical tab |\n", + "| \\\\b | backspace |\n", + "| \\\\r | carriage return |\n", + "| \\\\f | form feed |\n", + "| \\\\a | alert (bell) |\n", + "| \\\\\\\\ | backslash |\n", + "| \\\\? | question mark |\n", + "| \\\\' | single quote |\n", + "| \\\\\" | double quote |\n", + "| \\\\ooo | octal (0-377) |\n", + "| \\\\xhh | hexadecimal (0-ff) |\n", + "| \\\\uhhhh | Unicode sequence (0-ffff) * |\n", + "| \\\\Uhhhhhhhh | Unicode sequence (0-10ffff) * |\n", + "\n", + "* Note: not all Unicode sequences are allowed in 8-bit string or character literals, however these escape sequences become more useful with *Unicode string literals*, explained later in this Chapter.\n", + "\n", + "Any escape sequence can be used within single quotes to represent exactly one character literal (including zero `'\\0'`). Zero has a special meaning, as it is the string literal termination character. C++ inherits its string literals from C, and C-strings (as they are sometimes known, and as referred to in this Tutorial) were added as a bit of an afterthought to the C language of the early 1970s. String literals can be thought of as a **read-only** array of characters with an automatically added zero terminator; the space needed to store the string literal `\"Hello\"` is therefore six bytes, and not five as might be assumed.\n", + "\n", + "When outputting a string literal via `print()` or `println()`, the zero byte, or *null terminator*, is not outputted, but must be present to stop further raw memory being seen to be part of the string. Other terms you may encounter for string literal are *NTMBS* (null-terminated multi-byte string), `zstring` (a common *typedef* to implement the type of zero-terminated string), and `czstring` (`const`ant zero-terminated string).\n", + "\n", + "It is an oversimplification to say that any valid character can fit into a character literal; a character literal is simply a 8-bit type (possibly signed or unsigned) called `char` (pronounced \"car\", at least in the US) which can hold one of 256 possible values. Historically, the first 128 characters of *ASCII* (American Standard Code for Information Interchange) were the same on any platform (this was also known as 7-bit ASCII, and more recently as UTF-7), while the second 128 (\"top-bit-set\") characters could change according to the specification of the chosen *code page* (also known as *extended ASCII*). With the advent and near-universal adoption of UTF-8 (Unicode encoded into an eight-bit octet stream), all *top-bit-set* characters begin a two-, three- or four-character sequence, all having their top bits set.\n", + "\n", + "The good news is that despite the complexity of implementation of UTF-8, if your editor is set to edit text in UTF-8 (optionally with an identifying *magic number* BOM at the start) and your shell uses a UTF-8 *locale*, then your programs should output code to the console exactly as you type it into your editor. To repeat: string literals containing raw UTF-8 sequences entered into string literals within the code should display correctly in the console when the program is run. (On Windows it may be necessary to enter `chcp 65001` at the shell prompt once for every shell session before running your program. This changes the active code page to UTF-8 instead of the most likely default Windows-1252, which is a simple eight-bit encoding. Alternatively, you may wish to use UTF-16 in your editor and *wide character* literals and streams, see later in this Chapter.)\n", + "\n", + "**Experiment:**\n", + "\n", + "* Modify `01-hellow.cpp` to output each word on a new line indented by one tab-stop, using only one string literal.\n", + "\n", + "* Modify the sub-string reading `Hello,` to `Hello\\0`, and run the program. Are you surprised by this change?\n", + "\n", + "* Now go back to the `print()`-using version and try outputting the character literal `\\0` at the end instead of `\\n`. What do you discover?\n", + "\n", + "* Now try to create a program that can output: `$(USD) £(GBP) €(EUR)` Hint: The Dollar symbol should be on your keyboard, and the Pound and Euro symbols may well be too, but if not use a character picker such as Character Map and a UTF-8 encoding in your editor (and in the console when running your program, remember `chcp 65001` for Windows).\n", + "\n", + "* Use Character Map (or similar) to enter a *pi* symbol into your text editor, and make this program output: `π has the value 3.14159...`\n", + "\n", + "## Raw string literals\n", + "\n", + "String literals are interpreted at compile time and any escape sequences are translated at this point. The resultant *raw string* is then stored in read-only memory, and the running program uses a pointer to the first character. This pointer is in fact a variable (as opposed being a constant) however the string data itself is constant and attempting to change it (for example through subscript assignment) is a compile time error. With these facts in mind, try to predict the output from changing the string literal parameter in `01-hellow.cpp` to: `\"Hello, World!\\n\" + 7;` Surprised? (Some compilers warn about shifting a pointer with \"pointer arithmetic\" in this way.)\n", + "\n", + "Now consider the usefulness of being able to insert whitespace (particularly tabs and newlines) and unescaped backslashes (particularly in regular expressions, or \"regexes\") into string literals *without* the need for escape characters. Such an entity is called a *raw string literal*, and takes the format: `R\"(a \\raw\\ string literal)\"`\n", + "\n", + "The start of a raw string literal is a capital letter \"R\" followed by a double quote and opening regular parenthesis, none of which form part of the stored or output string. A raw string literal is ended with a closing regular parenthesis and double quote. In the (unlikely) event that a raw string literal is required to *contain* a closing parenthesis followed by a double quote, this can be achieved by putting a unique sequence (often a word, or one or more asterisks) between the double quote and parenthesis *at both ends*, for example: `R\"*(can contain )\" here)*\"`\n", + "\n", + "*Pointer arithmetic* combined with raw string literals can serve a useful purpose, as shown in this next example program `01-title.cpp`:" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "4b41231e", + "metadata": {}, + "outputs": [], + "source": [ + "// 01-title.cpp : output the title page of a well-known book\n", + "\n", + "#include \n", + "using namespace std;\n", + "\n", + "int main() {\n", + " print(1+R\"(\n", + " Alice's\n", + " Adventures In\n", + " Wonderland\n", + "\n", + " by\n", + " LEWIS CARROLL\n", + ")\");\n", + "}" + ] + }, + { + "cell_type": "markdown", + "id": "ba6131a8", + "metadata": {}, + "source": [ + "Compile and run this program following the same process as before. Notice that the `1+R\"(` *idiom* omits a blank line before the output, thus the first line output is the correct number of spaces followed by `Alice's`. Using a raw string literal means we don't have to litter the output string with escape characters for new lines, and can begin the output **unindented** as the `1+R\"(` skips the first character, which is (intentionally) a new line in the source file. The raw string literal is in this case (again intentionally) terminated at the start of a blank line, separate from the indentation of `print(` within `main()`; this is preferable to including \"invisible\" trailing whitespace in the output string, as would be the case if the `)\"` were itself indented.\n", + "\n", + "**Experiment:**\n", + "\n", + "* Change the program above to output the first stanza from the rhyme at the beginning of the same book (shown below), indenting all **even-numbered** lines by eight spaces. Is there more than one way of achieving this?\n", + "\n", + "```\n", + "All in the golden afternoon\n", + "Full leisurely we glide;\n", + "For both our oars, with little skill,\n", + "By little arms are plied,\n", + "While little hands make vain pretence\n", + "Our wanderings to guide.\n", + "```\n", + "\n", + "* Now use a (non-raw) string literal for each line and a single call to `print()` with suitable escape characters. Note: it is possible to *concatenate* the string literals without any operator: concatenation of adjacent string literals is automatically performed by the pre-processor.\n", + "\n", + "* Modify `01-title.cpp` to output the title of your favorite book or film centered on the console window (assume an 80 character fixed width, and change the size of the console window if different).\n", + "\n", + "## Wide characters\n", + "\n", + "Although very popular, and supported in most modern programming languages, UTF-8 encoded string literals are not the only way to manipulate and display characters outside the range of seven or eight-bit ASCII. We've discussed `char` as being the *underlying type* of string and character literals in C++, and there is also the `wchar_t` (possibly pronounced \"dub-car-tee\") type associated with *wide character* stream objects and strings (the names of which also start with \"w\"), and these predate Unicode support in the C++ Standard Library.\n", + "\n", + "Wide character support is platform-specific, and in particular the size of `wchar_t` in bits is not standardized; on many systems it is 32 bits but on Microsoft Windows it is 16 bits (and encodes Unicode UTF-16). If you think you need to use wide character support, and want to find out if it is suitable for your needs, consult your platform's documentation. It is important to note that while your editor/IDE may have support for wide-character/UTF-16/32 support, `print()` only works with eight-bit data (either ASCII 8-bit or UTF-8). For stream output of wide-character data, `wcout` can be used, but conversion between encodings using the Standard Library `codecvt` is deprecated. This may lead to differing I/O schemes being necessary if to software targets Windows.\n", + "\n", + "As well as the eight-bit type `char` there is now also `char8_t` which is useful for explicitly specifying that a string is UTF-8, and can encode all UTF-8 code points when using `\\u` and `\\U`. (Note: even plain `char` can include UTF-8 code points under most modern compilers, including those above U+00FF.) Specifying `char8_t` removes the uncertainty of whether `char` is signed or unsigned, which can cause programs to work differently on different platforms in some cases. Also available are `char16_t` and `char32_t` designed to be the correct size for holding a single UTF-16 or UTF-32 Unicode code point, respectively. Whilst these types are built into the language, converting strings between these types is a complex task and requires use of either the Standard Library, or third-party libraries (such as ICU[^3]), further discussion of which is beyond the scope of this Tutorial.\n", + "\n", + "The following table lists C++ types, sizes, target encodings, literals and objects used with normal and wide character sets:\n", + "\n", + "| Type | Bits | Encoding | String Literal | Character Literal | Raw String Literal | String Type | Stream Output | print() |\n", + "|:--------:|:-----:|:--------:|:--------------:|:-----------------:|:------------------:|:-----------:|:-------------:|:-------:|\n", + "| char | 8 | ASCII | \"abcd\" | 'a' | R\"(abcd)\" | string | cout/cerr | yes |\n", + "| char8_t | 8 | UTF-8 | u8\"abcd\" | u8'a' | u8R\"(abcd)\" | u8string | cout/cerr * | yes |\n", + "| char16_t | 16 | UTF-16 | u\"abcd\" | u'a' | uR\"(abcd)\" | u16string | n/a | no |\n", + "| char32_t | 32 | UTF-32 | U\"abcd\" | U'a' | UR\"(abcd)\" | u32string | n/a | no |\n", + "| wchar_t | 16/32 | n/a + | L\"abcd\" | L'ab' | LR\"(abcd)\" | wstring | wcout/wcerr | no |\n", + "\n", + "* An explicit cast to type `char` in `operator<<` may be required when using `cout`/`cerr`, for example: `cout << reinterpret_cast(u8\"Hello \\u20AC!\\n\");`.\n", + "\n", + "+ The `wchar_t` encoding and streams under Windows are 16-bit and support UTF-16.\n", + "\n", + "## Code comments\n", + "\n", + "C++ has two types of comments: single line comments which begin anywhere on a line (except inside a quoted string literal) with `//` and continue to the end of the same line, and multi-line comments which are delimited with `/*` at the beginning and `*/` at the end. A variation on this is that multi-line comments can instead have **both** delimiters on the same line, thus only commenting a part of that line." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "3a85fd57", + "metadata": {}, + "outputs": [], + "source": [ + "// This is a single line comment, ignored by the compiler.\n", + "\n", + "/* This multi-\n", + " line comment\n", + " is also ignored.\n", + "*/\n", + "\n", + "int main( /* this appears as empty parentheses to the compiler */ ) {}" + ] + }, + { + "cell_type": "markdown", + "id": "e4f783ea", + "metadata": {}, + "source": [ + "Modern C++ code favors the `//` style, with multiple lines of comments possible by starting each one with `//`. Temporarily *commenting-out* a whole block of code, thus preventing it from being compiled, can be achieved by putting `/*` before the beginning and `*/` after the end of the block. Nesting multi-line comments is not possible as the comment always ends at the first `*/` reached; single line comments within a multi-line block are possible, however.\n", + "\n", + "Comments are like strings in that they do not contain program code, instead they are written in natural language (usually English) using the same character encoding of the source program file. (The often contain references to variables/functions etc. and it is important that these are kept in-sync with the code.) The content of comments is not formalized, unless you wish to employ a tool such as [Doxygen](https://www.doxygen.nl/manual/docblocks.html)[^4], which generates HTML documentation from source code by reading custom mark-up within comments. Comments within code that comprise paragraphs of text are often formatted to a fixed width, for example 77 characters (the standard for plain text email).\n", + "\n", + "Learning when and how to comment code comes with experience; typically you shouldn't duplicate information that can be easily inferred from the program code itself. Comments such as \"This is correct\" aren't particularly helpful either, instead you should try to be relevant and concise, aiming at the ability level of a fellow programmer (or even yourself in the future) who reads your code. When reading other people's code remember the time-honored saying: if code and comments disagree, then both are wrong.\n", + "\n", + "**Experiment:**\n", + "\n", + "* Going back to `01-hellow.cpp` add a single-line comment sequence to the line beginning `println()`. Does this program compile and run?\n", + "\n", + "* Uncomment this line and use a pair of multi-line delimiters to comment-out the whole of the body of `main()`. Does this program compile and run?\n", + "\n", + "[^1]: https://learnmoderncpp.com/2019/08/03/welcome/\n", + "[^2]: https://clang.llvm.org/docs/ClangFormat.html\n", + "[^3]: http://site.icu-project.org/home\n", + "[^4]: https://www.doxygen.nl/manual/docblocks.html\n", + "\n", + "*All text and program code ©2019-2025 Richard Spencer, all rights reserved.*" + ] + } + ], + "metadata": { + "jupytext": { + "cell_metadata_filter": "-all" + }, + "kernelspec": { + "display_name": "C++ 23", + "language": "c++", + "name": "cpp23" + } + }, + "nbformat": 4, + "nbformat_minor": 5 +} diff --git a/jupyter-notebooks/02-variables-scopes-and-namespaces.ipynb b/jupyter-notebooks/02-variables-scopes-and-namespaces.ipynb new file mode 100644 index 0000000..9f21e2e --- /dev/null +++ b/jupyter-notebooks/02-variables-scopes-and-namespaces.ipynb @@ -0,0 +1,736 @@ +{ + "cells": [ + { + "cell_type": "markdown", + "id": "b692b1cf", + "metadata": {}, + "source": [ + "# Variables, Scopes and Namespaces\n", + "\n", + "## Declarations, definitions and assignment\n", + "\n", + "A variable is a named entity which can hold a value; thus it has *state*. As the name \"variable\" suggests, this value can, and often does, change during the entity's lifetime. A *declaration* can be thought of as introducing a variable to your program, as if it is saying: \"I exist with this name and have such-and-such type, use me.\" On the other hand, a *definition* is **everything** a declaration is, plus asking: \"Please reserve some memory for me here.\" Additionally, an *assignment* can be combined with a definition, thus stating \"I have this initial value from now until (optional) later reassignment (unless I am a constant).\" Defining a variable without giving it an initial value is usually best avoided, as the variable will likely contain random garbage (dereferencing an uninitialized variable causes undefined behavior in C++; your compiler can and often will warn of this). Declarations that are not also definitions are rare for variables of the built-in types, so we will omit further discussion of them here.\n", + "\n", + "C++ is a statically typed language, meaning that the type of each variable is known at compile time (importantly, this is also true of variables defined with the keyword `auto`, see later). Due to the fact the types are known and fixed, the amount of memory needed for each variable is known at compile time too; this specific amount of memory is called the variable's *storage class*. (Storage class applies to **all** user-defined types, too.) This fact gives rise to the *One Definition Rule* (ODR) which states that a variable can be declared or assigned to multiple times, but must be defined **exactly** once. This is the key concept concerning memory usage of variables in C++, so remember the ODR. By default, C++ reserves space for new local (function or sub-scope) variables on the *stack*, which means that two variables of the same name can exist in different scopes (one scope enclosing the other); however the *address* of the variable which is always unique. The other place variables can be stored is on the *heap*, which is often preferable for large objects or arrays. Again these variables always have a unique address, but continue to use memory until it is explicitly deallocated, with the responsibility being the programmer's, not the C++ runtime.\n", + "\n", + "The shortest possible name or *identifier* for a variable is a single letter, and these are often the name of choice for variables whose purpose is obvious (such as a loop counter); this convention also provides a symmetry with variable names in Mathematics. Variable names must start with a lower- or uppercase letter or an underscore, followed by an arbitrary number of lower- or uppercase letters, underscores or decimal digits in any order.\n", + "\n", + "Reserved names that should not be used as identifiers are:\n", + "\n", + "* Any of the C++ keywords (of which there are just under a hundred at the time of writing).\n", + "* Names beginning with an underscore followed by a capital letter (these are reserved for the Standard Library).\n", + "* Names containing adjacent double underscores (reserved for purposes such as name mangling).\n", + "* At **global scope** any name beginning with an underscore.\n", + "\n", + "Use of top-bit-set characters (including UTF-8 sequences) **is** permitted in variable names with more recent compilers, including as the initial character; such sequences are also recognized by the preprocessor.\n", + "\n", + "Unlike some programming languages, C++ does not mandate different uses of capital letters and so on for different types of entity, but your organization may well follow conventions such as constants in upper case, user-defined types in sentence case and member functions in camel case. The rules for identifiers are the same for `class`, `struct`, `enum` and `union` names, function names, namespace names and macro names. Different variable naming styles, the use of which may fall under coding standards requirements at your employer, are listed in the following table:\n", + "\n", + "| Naming Style | Example |\n", + "|:----------------:|:---------------:|\n", + "| Lower Case | avariablename |\n", + "| Sentence Case | AVariableName |\n", + "| Upper Case | AVARIABLENAME |\n", + "| Snake Case | a_variable_name |\n", + "| Upper Snake Case | A_VARIABLE_NAME |\n", + "| Camel Case | aVariableName |\n", + "\n", + "New variables are introduced (defined) by providing a type, an identifier and, optionally (but highly recommended) either an initial value after an equals sign (`=`), and/or within or a pair of braces `{` and `}` (which can be empty to assign the default value for the type). Use of equals is historical syntax, while use of braces (where the equals sign becomes optional) is called *uniform initialization* and is discussed on more detail later in this Chapter.\n", + "\n", + "Braces are also used with strings passed to `print()` and `println()` indicating a point in the string where a variable's current value should be substituted. The number of brace pairs must equal the number of additional parameters passed to the functions. (To output a literal `{` or `}` use one of the escape sequences `{{` or `}}` respectively.)\n", + "\n", + "The following program defines three variables but only assigns to two of them initially, despite the fact that it prints them all out twice:" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "1d851450", + "metadata": {}, + "outputs": [], + "source": [ + "// 02-assign.cpp : assign to local variables\n", + "\n", + "#include \n", + "using namespace std;\n", + "\n", + "int main() {\n", + " int i = 1, j = 2;\n", + " unsigned k;\n", + " println(\"(1) i = {}, j = {}, k = {}\", i, j, k);\n", + " i = j;\n", + " j = 3;\n", + " k = -1;\n", + " println(\"(2) i = {}, j = {}, k = {}\", i, j, k);\n", + "}" + ] + }, + { + "cell_type": "markdown", + "id": "73fca9d5", + "metadata": {}, + "source": [ + "Running this program produced the output:\n", + "\n", + "```\n", + "(1) i = 1, j = 2, k = 16151149\n", + "(2) i = 2, j = 3, k = 4294967295\n", + "```\n", + "\n", + "There are probably no surprises for the values of `i` and `j` as output the first and second time. Note that the statement `i = j` merely assigns the **current** value of `j` to `i` and does not imply that they point to the same object; the values of `i` and `j` can subsequently change **independently**.\n", + "\n", + "The first time `k` is output its value is essentially random, an example of *undefined behavior* (UB); nothing can be guaranteed about its value other than it is within the valid range for the `unsigned` type. Assigning a negative number to an `unsigned` type is (perhaps surprisingly) legal C++, and if you are unsure of why the second output of `k` is what it is, you may want to do some research into \"two's-complement\" binary representation of integers (it's actually the number 232-1 represented as a positive integer).\n", + "\n", + "**Experiment**\n", + "\n", + "* Fix this program by giving `k` an initial value. Experiment with positive and negative integers. What do you learn about the `unsigned` type?\n", + "\n", + "* Now unfix it by no longer giving `j` an initial value. How is `i` affected at `(2)`?\n", + "\n", + "* Now fix it again by adding the line: `int j{};` between `int i` and `unsigned k`. What else has to be changed? Is this the ODR manifesting itself?\n", + "\n", + "## Casts and uniform initialization\n", + "\n", + "The following program assigns an integer to a variable `a` of type `int`, and a real number to a variable `b` of type `double`. In case you're wondering, the name for the type of `b` comes from *double precision* as defined in the IEEE Standard for Floating-Point Arithmetic (IEEE 754), which defines how an (accurate) approximation of a real number is stored in 64 bits of memory. Single precision `float` uses 32 bits, and extended precision `long double` typically uses up to 96 bits (the storage class may be different from the number of precision bits used). The initial values of `a` and `b` are then reassigned to each other, meaning the second output line is different:" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "1616ff9f", + "metadata": {}, + "outputs": [], + "source": [ + "// 02-swap.cpp : attempt to swap the values of an int and a double\n", + "\n", + "#include \n", + "using namespace std;\n", + "\n", + "int main() {\n", + " int a = 1;\n", + " double b = 2.5;\n", + " println(\"(1) a = {}, b = {}\", a, b);\n", + " a = 2.5;\n", + " b = 1;\n", + " println(\"(2) a = {}, b = {}\", a, b);\n", + "}" + ] + }, + { + "cell_type": "markdown", + "id": "92174bfd", + "metadata": {}, + "source": [ + "Running this program produces the output:\n", + "\n", + "```\n", + "(1) a = 1, b = 2.5\n", + "(2) a = 2, b = 1\n", + "```\n", + "\n", + "The variable assignment statement `a = 2.5` is called a *narrowing cast* because of the reduction in precision and likelihood of information being lost. In this case the value is **automatically** *truncated* from `2.5` to `2`, as the decimal part cannot be represented in an `int`. Even though the term being assigned is floating-point (actually it's a double-precision literal, see later in this Chapter) the type of `a` **remains** as `int` (and this is why the fractional part is lost). In contrast, the statement `b = 1` is a *widening cast* with the assumption that there is no chance of information being lost; `b` remains of type `double` holding an integer value (which could be represented explictly as a literal `1.0`). Both of these casts are *implicit casts* becuase the compiler makes them happen automatically; the instruction to carry out the type casting is implicit. (We could have used a more verbose `static_cast(2.5)` and `static_cast(1)` to make the casts explicit, we'll see this later.)\n", + "\n", + "**Experiment**\n", + "\n", + "* Modify this program so that no narrowing casts occur.\n", + "\n", + "* Now modify this program again to produce only integer outputs.\n", + "\n", + "* Again, modify the original program to use `static_cast`. (Hint: don't worry if you don't fully understand the syntax yet.)\n", + "\n", + "Implicit casts can happen with variable initialization-and-assignment too, however this is not always the behavior we want. To force the compiler to disallow (possibly unintentional) narrowing casts we can use *uniform initialization* which involves enclosing the assigned value in curly braces:" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "5faa40b6", + "metadata": {}, + "outputs": [], + "source": [ + "// 02-uniform.cpp : avoid compiler error with uniform initialization and explicit narrowing cast\n", + "\n", + "#include \n", + "using namespace std;\n", + "\n", + "int main() {\n", + " // int c = { 2.5 }; // Error: this does NOT compile\n", + " int c = { static_cast(2.5) }; // while this does\n", + " double d = { 1 }; // and so does this\n", + " println(\"c = {}, d = {}\", c, d);\n", + "}" + ] + }, + { + "cell_type": "markdown", + "id": "718abfb1", + "metadata": {}, + "source": [ + "It is important not to confuse a single value in curly braces with an initializer list containing one element when reading code like this; in practice here there is no ambiguity because if we had wanted to initialize an array of `int` a single element list we would have written `int c[] = {2.5,};` using a trailing comma inside the braces. Interestingly, the equals sign in uniform initialization is in fact **optional**, so we could have written `int c{2.5}` and `double d{1}`. Uniform initialization appears elsewhere in C++ so it is a good idea to become familiar with the syntax early on, and know the nuances of its behavior compared to using a time-honored C-style equals sign instead. In Modern C++, uniform initialization is probably considered better style, where you have the choice of the two.\n", + "\n", + "**Experiment**\n", + "\n", + "* Change `int c` to `float c` in the above code. Does this fix the problem? Are you surprised by this?\n", + "\n", + "* Fix the code so that it compiles with `float c` (read on if not sure, and think \"float literal\").\n", + "\n", + "* Now change `{1}` to `{1LL}`. Does the code still compile? Try to fix this.\n", + "\n", + "## Numeric types and type inference\n", + "\n", + "C++ has quite a lot of built-in types, most of them inherited from the C language; so far we've met `char`, `int`, `unsigned`, `float` and `double`. On most platforms, integer types can be 8, 16, 32 or 64 bits *wide* in signed and unsigned types, while floating point types of 32, 64 and 96 bits wide are usually available. The table below lists many of the built-in types in C++ together with their typical properties (on most modern machines) and *numeric literal* forms:\n", + "\n", + "| Type | Typical Bits | Minimum Value | Maximum Value | Example Literal |\n", + "|:------------------:|:------------:|:--------------------:|:--------------------:|:------------------------------:|\n", + "| signed char | 8 | -128 | 127 | '\\x20' |\n", + "| unsigned char | 8 | 0 | 255 | '\\xa0' |\n", + "| short | 16 | -32768 | 32767 | n/a (as for int) |\n", + "| unsigned short | 16 | 0 | 65535 | n/a (as for unsigned) |\n", + "| int | 32 | -2147483648 | 2147483647 | -1000, 0x7fff |\n", + "| unsigned | 32 | 0 | 4294967295 | 1000U, 0xffffU |\n", + "| long | 64 (32) + | -2147483648 | 2147483647 | 1L, 0x7fffffffL |\n", + "| unsigned long | 64 (32) + | 0 | 4294967295 | 10000000UL, 0xbbbfUL |\n", + "| long long | 64 + | -9223372036854775808 | 9223372036854775807 | -10000LL, 0x80000000000LL |\n", + "| unsigned long long | 64 + | 0 | 18446744073709551615 | 10000ULL, 0x7fffffffffULL |\n", + "| ssize_t | 64 (32) | -9223372036854775808 | 9223372036854775807 | 0Z * |\n", + "| size_t | 64 (32) | 0 | 18446744073709551615 | 0UZ * |\n", + "| float | 32 | 1.17549e-38 | 3.40282e+38 | 0.f, 3.2e-10f |\n", + "| double | 64 | 2.22507e-308 | 1.79769e+308 | 2.3, 1.2345e200 |\n", + "| long double | 128 | 3.3621e-4932 | 1.18973e+4932 | 100000000.5L, 0.0000345L |\n", + "\n", + "* The \"size types\" `std::size_t` (unsigned) and `std::ssize_t` (signed) are from the Standard Library, and so require a header which defines them, such as ``. (Negative values for `std::ssize_t` are typically used to represent error values.)\n", + "\n", + "+ On 32-bit machines `long`, `unsigned long`, `ssize_t` and `size_t` are usually 32 bits, and are usually 64 bits on 64-bit machines, while `long long` and `unsigned long long` are guaranteed to be (at least) 64 bits on all platforms.\n", + "\n", + "The variable definition `double n{2.3};` should by now appear familiar and correct; it assigns a floating-point number (actually as shown in the table, a numeric literal) to a double precision variable. In other words it's an exact match between the declared type and the literal type. (If it were a narrowing cast, such as `double n{2.3L}` we would expect compilation to fail.)\n", + "\n", + "The `auto` type specifier has a specific meaning in Modern C++: deduce the type of the variable being assigned **to** from the value, variable or expression being assigned **from**. This means, however, that the variable definition must also *always be an assignment* as uninitialized `auto` variables are not allowed. The reason for this is simple: C++ variables must have their type known at compile time, and this is no different for `auto` variables. I'll repeat this as it is so important; *C++ is a statically typed language, and every available use of* `auto` *does not change this*.\n", + "\n", + "Some example usage of `auto` is shown here:" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "a3225d1b", + "metadata": {}, + "outputs": [], + "source": [ + "int i = 1; // both i and 1 are of type int\n", + "auto j = i; // j is also of type int\n", + "auto k{ 1.0 }; // k has type double (using uniform initialization syntax)\n", + "auto q; // Error: will not compile" + ] + }, + { + "cell_type": "markdown", + "id": "aee9a883", + "metadata": {}, + "source": [ + "Program can be (re-)written without any use of `auto`, however you will often encounter it in modern code so you need to be able to recognize and understand its meaning. It is especially useful where the type in question is overly verbose, such as when using types related to generic classes. Notice from the example shown here the use of uniform initialization syntax with `auto`-assignment for the variable `k`; this usage can be expected to become more common.\n", + "\n", + "## Bool and byte\n", + "\n", + "The boolean type can hold one of exactly two values: `true` and `false`; these map directly to `int` values of `1` and `0` respectively. (Note: **not** `-1`.) A variable defined as `bool` can be used to hold the result of a conditional expression (we'll meet conditions in the next Chapter)." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "ae2ba5e8", + "metadata": {}, + "outputs": [], + "source": [ + "bool success{ true };\n", + "bool are_equal = (a == b);" + ] + }, + { + "cell_type": "markdown", + "id": "1a637170", + "metadata": {}, + "source": [ + "The `byte` type, often referred to as `std::byte` as it is a type made available from within the Standard Library namespace (in order to avoid name clashes with existing code), designed to replace `unsigned char` where the variable (or array) contains (8-bit) binary data.\n", + "\n", + "This type is actually implemented as an `enum class` (see Chapter 6) and only the bitwise operators are supported, so addition or subtraction of `byte` values is not allowed. A variable of type `byte` can be initialized with any value from `0` to `255` and converted back to an integer value with the function `to_integer()` (functions are covered in Chapter 4)." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "2e316447", + "metadata": {}, + "outputs": [], + "source": [ + "std::byte b{ 254 };\n", + "auto i = std::to_integer(b); // This is ugly but is shown here for reference\n", + " // You can compare it with static_cast()" + ] + }, + { + "cell_type": "markdown", + "id": "ef649e53", + "metadata": {}, + "source": [ + "## Literal prefixes and suffixes\n", + "\n", + "Digits can be grouped, for example into groups of three for decimal numbers, using apostrophe (`'`) as the delimiter:" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "f9944300", + "metadata": {}, + "outputs": [], + "source": [ + "auto million = 1'000'000;" + ] + }, + { + "cell_type": "markdown", + "id": "2a4d93cb", + "metadata": {}, + "source": [ + "This syntax is **only** for numeric literals embedded within code, not for numbers read from the keyboard or a file using stream input.\n", + "\n", + "If some of the example literals in the last table look unfamiliar then the following two tables should help explain. Prefixes can be applied which specify the number base of the literal; `0` and `0b`/`0B` are for **integer** types only:\n", + "\n", + "| Prefix | Base |\n", + "|:------:|:-----------:|\n", + "| 0b, 0B | Binary |\n", + "| 0 | Octal |\n", + "| 0x, 0X | Hexadecimal |\n", + "\n", + "(Note: hexadecimal floating-point literals use `P` or `p` as the radix separator, while decimal floating point literals use an `E` or `e` to separate the exponent from the mantissa.)\n", + "\n", + "Suffixes can apply to either integer or floating point literals (or to both in the case of `L`). Also, `U` and `u` can be combined with `L`, `l`, `LL`, `ll`, `Z` and `z`.\n", + "\n", + "| Suffix | Meaning | Usage |\n", + "|:------:|:----------------------------------------:|:------------------:|\n", + "| f, F | single-precision float | 3.3f, -0.0123F |\n", + "| l, L | extended precision float OR long integer | 100'000l, 3.3L |\n", + "| u, U | unsigned integer | 65536u, -1U |\n", + "| ll, LL | long long integer (64 bits) | 0ll, -1'234'567LL |\n", + "| uz, UZ | unsigned size type (std::size_t) | 0uz, 4'294'967'296UZ |\n", + "| z, Z | signed size type (std::ssize_t) | 0z, -2'147'483'648Z |\n", + "\n", + "Note there is no literal for `short int` and there is unlikely to ever be one, as the `s` suffix is used for seconds when using the `` header (and `string` when used with the `` header). Also, the integer literal suffixes don't ever actually need to be used in Modern C++, source-code literals in all bases are automatically *promoted* (widened) to a type that can hold the value of the literal.\n", + "\n", + "To enable all the literal suffixes in the Standard Library after referencing the necessary header(s) use:" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "5553ae2c", + "metadata": {}, + "outputs": [], + "source": [ + "using namespace std::literals; // This is also implied by \"using namespace std;\"" + ] + }, + { + "cell_type": "markdown", + "id": "b2334996", + "metadata": {}, + "source": [ + "Note: this is **not** necessary for suffixes of the built-in types, being `F`, `f`, `U`, `u`, `L`, `l`, `LL` and `ll`.\n", + "\n", + "**Experiment**\n", + "\n", + "* Make up some variable assignments from various literals. Use `auto`, and output the variables using `print()`. See if the output is what you expected.\n", + "\n", + "* Now specify the correct built-in type instead of `auto`, such as `long long`. Check the tables above if you're not sure, and use uniform initialization. Try to avoid always using the biggest types `long long` and `long double` regardless of the value or calculation, as this may not be optimal in terms of memory footprint and performance.\n", + "\n", + "## Local and global scopes\n", + "\n", + "Variables defined outside of any function scope are called *global* variables, while those defined within functions (including `main()`) are called *local* variables. Global variables have memory reserved for them and are initialized before `main()` is entered, although the order in which they are initialized is **not** guaranteed across multiple translation units (these being approximately C++ source files, discussed later in this Chapter). Local variables have space reserved for their contents from the function stack when the function is entered, and are available for use after program flow reaches their definition within the function.\n", + "\n", + "A local variable with the same name as a previously defined global variable temporarily takes precedence over, or *shadows*, the global variable until it goes *out of scope*. Variables defined within a function go out of scope at the end of the function, and the space reserved for them is then released.\n", + "\n", + "It is also possible to nest scopes within functions up to an arbitrary level. The delimiters `{` and `}` are used for this purpose, mirroring their use to introduce a function scope. Code within *sub-scopes* is typically written indented to an extra level. (Sub-scopes which can contain scoped variable definitions are also introduced by a variety of C++ keywords including `if` and `while`.) Variable names which are re-defined within sub-scopes lose visibility at the closing brace and can no longer be referenced (the memory they use may not be released until the function exits, however).\n", + "\n", + "The following program defines and initializes the variable `a` three times. This does not violate the One Definition Rule (ODR) because of one simple fact: *the three variables exist in different scopes*." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "904a714c", + "metadata": {}, + "outputs": [], + "source": [ + "// 02-scopes.cpp : define three variables with the same name in one program\n", + "\n", + "#include \n", + "using namespace std;\n", + "\n", + "auto a{ 1.5f };\n", + "\n", + "int main() {\n", + " println(\"(1) {}\", a);\n", + " auto a{ 2u };\n", + " println(\"(2) {}\", a);\n", + " {\n", + " auto a{ 2.5 };\n", + " println(\"(3) {}\", a);\n", + " }\n", + "}" + ] + }, + { + "cell_type": "markdown", + "id": "e882d34e", + "metadata": {}, + "source": [ + "Running this program produces the output:\n", + "\n", + "```\n", + "(1) 1.5\n", + "(2) 2\n", + "(3) 2.5\n", + "```\n", + "\n", + "**Experiment**\n", + "\n", + "* Change the assignments to 1, 2, and 3 (using integer literals with `int` instead of `auto`). Does this still satisfy the ODR?\n", + "\n", + "* Add `println(\"(4) {}\", a);` between the two closing curly braces, just before `main()` exits. Is the output what you expected?\n", + "\n", + "* Change the output command `, a)` to `, ::a)` in each of the three times it appears in the program. What appears to happen? (Explanation: the global scope resolution operator `::` selects the global `a` over any other `a` that may be visible.)\n", + "\n", + "## Static and thread-local variables\n", + "\n", + "Any global variables defined in the program are visible throughout the whole of the program, which unfortunately means that name clashes are possible in different and unrelated portions of code. The traditional way of getting round this problem, inherited from C, was to use the `static` keyword. All this does in the context of a global variable definition is make the variable local to the *translation unit*, which is the proper name for the each C++ source file with all the headers it `#include`s (which compiles to a single `.obj` or `.o` object file). The term *file static* can also be used to describe the visibility of such a variable, referring to the `.cpp` file it is defined in. Thus two `.obj` or `.o` files each with one or more `static` variables of the same name can be linked to form an executable, without generating linker errors.\n", + "\n", + "```\n", + "static int i = 1000; // only visible within this translation unit\n", + "```\n", + "\n", + "The `thread_local` keyword (added in C++11) can optionally be used at global scope and specifies a variable with global visibility which is created (and optionally initialized) when a new thread is launched:\n", + "\n", + "```\n", + "thread_local size_t my_counter{ 0 }; // different variable initialized for each new thread\n", + "```\n", + "\n", + "Further discussion of multi-threaded coding is beyond the scope of this Tutorial, but you should be aware that global `thread_local` variables may add semi-hidden *time-and-space* costs (lower run-time efficiency in CPU time and increased memory usage) to threaded programs because of the extra initialization that has to be performed whenever a new thread is launched.\n", + "\n", + "The keywords `static` and `thread_local` have uses in other contexts too, as we will discover later (`static` local and class variables and `thread_local` variables in functions). A variable can also be both file `static` and `thread_local`.\n", + "\n", + "**Experiment**\n", + "\n", + "* Create two different `.cpp` files with the above definition of `i`, except using different assigned values. Can they be compiled and linked together with a third file containing a `main()` function? Which, if any, value of `i` can `main()` reference?\n", + "\n", + "* Remove the `static` keyword from both files defining global `i`. What error message do you get trying to link all three files together?\n", + "\n", + "* Now add the `static` keyword back to just one of the `.cpp` files defining `i`. Does the program now compile, and if so which `i` is output?\n", + "\n", + "## Namespaces\n", + "\n", + "The purpose of namespaces is to solve the problem of global names clashing with each other. (We have already encountered the `std` namespace which contains all of the Standard Library components.) Namespaces can only be introduced at global scope and are delimited with the by now familiar `{` and `}`. Namespaces **can** exist inside other namespaces, with the scope resolution operator `::` also used to separate nested namespace names. Entities (such as variables, functions and classes) defined within namespaces are still globally visible, and can be either made available with `using` statements or directives, or referenced using their *fully qualified names*.\n", + "\n", + "The next program defines two global variables, each in different namespaces, which means they can exist in the same `.cpp` file. Notice that the namespace names have been written in sentence case with the variable names in snake case, both common conventions:" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "4ec13200", + "metadata": {}, + "outputs": [], + "source": [ + "// 02-height.cpp : define the same variable name in two different namespaces\n", + "\n", + "#include \n", + "using namespace std;\n", + "\n", + "namespace Wonderland {\n", + " auto alice_height_m{ 0.15 };\n", + "}\n", + "\n", + "namespace VictorianEngland {\n", + " auto alice_height_m{ 0.9 };\n", + "}\n", + "\n", + "int main() {\n", + " println(\"Alice\\'s height varies between {}m and {}m\",\n", + " Wonderland::alice_height_m,\n", + " VictorianEngland::alice_height_m);\n", + "}" + ] + }, + { + "cell_type": "markdown", + "id": "a36144d5", + "metadata": {}, + "source": [ + "**Experiment**\n", + "\n", + "* Add the statement `using namespace VictorianEngland;` as the first line of `main()`. Does this change the output in any way?\n", + "\n", + "* Now remove `VictorianEngland::` from the output call. Does the output change now? What do you learn about the connection between `using` directives and unqualified names?\n", + "\n", + "Namespaces are *open*, that is elements can be added to a namespace from different parts of a program, even from different `.cpp` files. (This means it is technically possible to add to the `std` namespace, but doing so is strongly discouraged as it can create misleading code that may mysteriously fail to compile on other systems or platforms.)\n", + "\n", + "Namespaces can be nested in two ways: by either using multiple `namespace` keywords, or using the scope resolution operator, as shown in the code fragments below:" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "06cb4ed8", + "metadata": {}, + "outputs": [], + "source": [ + "namespace Wonderland {\n", + "namespace Animals {\n", + "auto white_rabbit{ 1 };\n", + "}\n", + "}\n", + "\n", + "namespace Wonderland::Animals {\n", + "auto mouse{ 2 };\n", + "}" + ] + }, + { + "cell_type": "markdown", + "id": "bb6b791c", + "metadata": {}, + "source": [ + "The fully qualified names of both variables defined are very similar, they are: `Wonderland::Animals::white_rabbit` and `Wonderland::Animals::mouse`. Notice that the definitions within the `namespace` keywords have **not** been indented; this is common practice because of the nature of the code (functions and class definitions) that can often appear within namespaces, which reads better unindented.\n", + "\n", + "Another feature of namespaces is the curiously named *unnamed namespace*. The syntax is simple, a `namespace` keyword followed immediately by `{`. The purpose of the unnamed namespace is to replace the use of `static` in definition of global names visible to just the current translation unit. The following code fragment defines and assigns a variable whose fully qualified name in the same translation unit is just `i`, and is not visible in any other." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "8d021854", + "metadata": {}, + "outputs": [], + "source": [ + "namespace {\n", + "int i = 3000; // variable i is only visible later within this file\n", + "}" + ] + }, + { + "cell_type": "markdown", + "id": "f09a34e4", + "metadata": {}, + "source": [ + "**Experiment**\n", + "\n", + "* Create a program that uses the above fragment to output the value of `i` from within `main()`\n", + "\n", + "* Now move `main()` to a separate file. Does the program still compile? What if there were more than one translation unit containing an unnamed namespace?\n", + "\n", + "## Constants and references\n", + "\n", + "Constants are named entities that have only one value during their lifetime, in other words their initial value remains unchanged. (I avoid the use of the word \"variable\" here, or worse still \"`const` variable\", to avoid confusion, but most of the rules of variables apply to constants too.) Constants are useful in many places in Modern C++ programs, and in some places they can be used where variables cannot, such as when specifying array sizes and template parameters. Similarly to `auto` variable definitions, constants **must** have their value specified when they are defined.\n", + "\n", + "Constants are defined using the `const` keyword, **either** before or after the mandatory type specifier (or `auto`), as shown in the program below which defines a global constant and a local constant:" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "b6a1d978", + "metadata": {}, + "outputs": [], + "source": [ + "// 02-constants.cpp : introducing the const keyword\n", + "\n", + "#include \n", + "using namespace std;\n", + "\n", + "const double PI = 3.14159265358979;\n", + "\n", + "int main() {\n", + " auto const APPROX_E = 3;\n", + " println(\"pi is almost exactly {}, while e is approximately {}\",\n", + " PI, APPROX_E);\n", + "}" + ] + }, + { + "cell_type": "markdown", + "id": "dbf5f8f1", + "metadata": {}, + "source": [ + "Notice that the named constants have been specified using upper case, which is a common convention.\n", + "\n", + "**Experiment**\n", + "\n", + "* Try to re-assign to `PI` within `main()`. What error do you get?\n", + "\n", + "* Try to output the result of adding the two constants together. Is this what you would expect for two variables of different types (implied in the case of `APPROX_E`)?\n", + "\n", + "Constants can be assigned to a variable, and created from a variable at the point it is defined. Interestingly this implies that the value of a C++ constant *is not necessarily known* at compile-time; not all constants therefore can be used as array sizes, for example. (If a constant compile-time value is needed for this purpose, your compiler will refuse to compile such code.) Variables of many types can usefully be declared `const` where their value shouldn't be changed, or where changing them would make no sense. This stricter use of `const` is known as *const-correctness* and is an additional form of type safety which can often be very useful (of course, most uses of `const` are optional, as in the above program, but its consistent and correct use is strongly encouraged).\n", + "\n", + "References are hugely important to C++ and the necessity of fully understanding them in order to become proficient in the language cannot be overstated. There are two types of references, the style that date back to the earliest versions of C++, now known as *l-value references*, and those introduced with C++11, known as *r-value references* (or sometimes as *forwarding* or even *universal* references). Only l-value references are discussed here.\n", + "\n", + "A reference is an *alias* (an alternative name) for another variable **which must already exist**. It is (intentionally) difficult to make a reference outlive the variable it is *bound* to, managing to do so creates a *dangling* reference, which is undefined behavior. The primary use of references is to make variables visible from enclosed scopes to outer scopes from which they would not otherwise be accessible, as we shall discover later in the discussion of functions. Changing a reference changes the value of the variable to which it is bound, as shown in the program below:" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "25990617", + "metadata": {}, + "outputs": [], + "source": [ + "// 02-references.cpp : introducing l-value references\n", + "\n", + "#include \n", + "using namespace std;\n", + "\n", + "int alice_age{ 9 };\n", + "\n", + "int main() {\n", + " println(\"Alice\\'s age is {}\", alice_age);\n", + " int& alice_age_ref = alice_age;\n", + " alice_age_ref = 10;\n", + " println(\"Alice\\'s age is now {}\", alice_age);\n", + "}" + ] + }, + { + "cell_type": "markdown", + "id": "d9fd6e62", + "metadata": {}, + "source": [ + "**Experiment**\n", + "\n", + "* Change both instances of `int` to `auto`. Does the code still compile?\n", + "\n", + "* Make the global `alice_age` constant. Does the code compile now?\n", + "\n", + "* Now make `alice_age_ref` constant instead. Does the code compile?\n", + "\n", + "* Now remove the `&` on the second line of `main()`. Does this allow the code to compile? What is the output from running this modified program?\n", + "\n", + "As shown above, the syntax for creating and initializing a reference is simple, a single ampersand between the type specifier and the variable name. This difference is subtle compared to a conventional definition, so you will need to be on the lookout for it whenever reading code.\n", + "\n", + "The property of \"reference-ness\" and \"const-ness\" is stripped away from variables that are being assigned from. It is possible to initialize a constant from another constant when using `auto`, but this needs to be explicitly specified as a property of the entity being initialized:" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "be965fe7", + "metadata": {}, + "outputs": [], + "source": [ + "const auto a{ 10 }; // define a as constant\n", + "auto b = a; // define b as variable copy of a\n", + "const auto c = a; // define c as constant copy of a" + ] + }, + { + "cell_type": "markdown", + "id": "67fc3f83", + "metadata": {}, + "source": [ + "It is also possible to explicitly (re-)specify the reference property on the assignee side, but attempting to change the value of a constant value through a non-`const` reference is not allowed:" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "cae9d6c6", + "metadata": {}, + "outputs": [], + "source": [ + "const auto d{ 11 }; // define d as constant\n", + "auto e{ 12 }; // define e as variable\n", + "const auto& f{ 12 }; // define f as constant reference (to a literal constant value)\n", + "const auto& g = d; // define g as constant reference to d\n", + "auto& h = e; // define h as reference to e\n", + "const auto& i = e; // define i as constant reference to e\n", + "auto& j = d; // define j as reference to d\n", + "auto& k = f; // define k as reference to f" + ] + }, + { + "cell_type": "markdown", + "id": "1735ce02", + "metadata": {}, + "source": [ + "Of the above, only `b`, `e` and `h` are re-assignable.\n", + "\n", + "**Experiment**\n", + "\n", + "* Try the above definitions and assignment within a `main()` function. Can you get them all to output what you expect?\n", + "\n", + "* Try assigning a new value to each of these eleven variables. What error messages do you expect?\n", + "\n", + "## Constexpr variables\n", + "\n", + "Another way of qualifying a definition is with the `constexpr` keyword. This is a stronger form of `const` which explicity causes evaluation of the initial (and only) value of the variable **at compile-time**. This keyword has uses in *metaprogramming*, which is essentially causing code to be generated and run at compile-time. Note that use of floating-point values **is** permitted; this is not the case in traditional C++ Template Metaprogramming (TMP) which has been around since C++98.\n", + "\n", + "In fact, `constexpr` expressions can be complex with recent compilers, as long as all parts of the expression are themselves `constexpr`. The following program defines two constants, one of which is `constexpr`. Only the `constexpr` entity can be tested against `static_assert()`, which is a boolean truth test checked at compile-time. Don't worry if the inequality syntax is unfamiliar as this is covered in the next Chapter; the test `PI > 3.141 && PI < 3.143` evaluates the mathematical inequality `3.141 < PI < 3.143` in a way that is valid C++:" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "53b6d322", + "metadata": {}, + "outputs": [], + "source": [ + "// 02-constexpr.cpp : introducing the constexpr keyword\n", + "\n", + "#include \n", + "#include \n", + "using namespace std;\n", + "\n", + "// Note: currently, not all compilers mark `acos` as a\n", + "// constexpr function in cmath. The following line might\n", + "// not compile with `clang++` for example.\n", + "constexpr double PI1 = acos(-1.0);\n", + "constexpr double PI2 = 22.0 / 7.0;\n", + "\n", + "static_assert(PI1 > 3.141 && PI1 < 3.143);\n", + "static_assert(PI2 > 3.141 && PI2 < 3.143);\n", + "\n", + "int main() {\n", + " println(\"PI1 = {}\", PI1);\n", + " println(\"PI2 = {}\", PI2);\n", + "}" + ] + }, + { + "cell_type": "markdown", + "id": "1fd7b42f", + "metadata": {}, + "source": [ + "(Hint: this program is the first to require an additional header to ``; you may need to add `-lm` to the compile command under Linux in order to link in the math library containing the `acos()` function.)\n", + "\n", + "**Experiment**\n", + "\n", + "* Try to make the second `static_assert()` fail by using an invalid inequality test.\n", + "\n", + "* Now change the program to check the value of *e* at compile time. (Hint: use the expression `exp(1.0)` to get a good approximation of *e*.)\n", + "\n", + "As can be seen from attempting to compile this program, `static_assert()` is a useful tool to have, and adds no run-time overhead cost. The `static_assert()` test can optionally take a second string literal parameter, this being the error message for the compiler to output if the assertion fails.\n", + "\n", + "*All text and program code ©2019-2025 Richard Spencer, all rights reserved.*" + ] + } + ], + "metadata": { + "jupytext": { + "cell_metadata_filter": "-all" + }, + "kernelspec": { + "display_name": "C++ 23", + "language": "c++", + "name": "cpp23" + } + }, + "nbformat": 4, + "nbformat_minor": 5 +} diff --git a/jupyter-notebooks/03-conditions-and-operators.ipynb b/jupyter-notebooks/03-conditions-and-operators.ipynb new file mode 100644 index 0000000..5275b5f --- /dev/null +++ b/jupyter-notebooks/03-conditions-and-operators.ipynb @@ -0,0 +1,553 @@ +{ + "cells": [ + { + "cell_type": "markdown", + "id": "396500d0", + "metadata": {}, + "source": [ + "# Conditions and Operators\n", + "\n", + "## Run-time user input\n", + "\n", + "The programs we have seen in the previous two chapters have been a little predictable in how they run, as they have a *linear execution path* through the `main()` function. Such simple programs have very little practical use. More complex programs, which alter their *control flow* based on *user input* fall into two types:\n", + "\n", + "* *Batch programs* take all of their input at the beginning of their execution, usually from any or all of: program parameters, an environment variable(s), or an input file.\n", + "\n", + "* *Interactive programs* enact a dialog with the *user* (the computer operator) while the program is executing. This dialog is often two-way as the user is not necessarily expected to know what input is required without being prompted.\n", + "\n", + "Interactive programs often use either a console or a *GUI* (Graphical User Interface, historically found on desktop computers, but more often found these days on tablets and smartphones). Interactive console programs often produce output to the console *interleaved* with user input, while batch programs ususally know all of their input at the beginning of their execution and produce all of their output following this with no further user involvement or action. As an example of a modern alternative, a purely voice-activated device (possibly without a screen) has an interface which interestingly has more in common with an interactive console program than with a GUI application.\n", + "\n", + "Previously we have enountered `print()` and `println()` for putting formatted output to the console. Interestingly, there is currently no direct equivalent in Modern C++ for reading input. The `getline()` functions are not covered until Chapter 7, and as might be guessed from the name read a textual string which must then be processed further in order to obtain a valid value for numerical (or similar) input. For reasons of simplicity, this Chapter only covers the use of *stream objects* for reading from and writing to the console, and use if these requires the `` header.\n", + "\n", + "As a quick introduction to the stream output object `cout` (an abbreviation of \"Character Output\"), string literals, character literals, numeric (and other) values and variables are \"put to\" the console using (possibly multiple) occurrencies of `<<` (the *stream insertion operator*). There is no format string as such, the output is created from the object to the right of each `<<`, in order from left to right. For example:" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "0124b85c", + "metadata": {}, + "outputs": [], + "source": [ + "cout << \"The answer is: \" << 42 << '\\n'; // println(\"The answer is: {}\", 42);" + ] + }, + { + "cell_type": "markdown", + "id": "d6853143", + "metadata": {}, + "source": [ + "As a complement to `cout`, the stream input object `cin` (an abbreviation of \"Character Input\") overloads `>>` (the *stream extraction operator*) to allow variables to be set from user input. When a `cin` input expression is reached, the program waits (indefinitely) for the user to type some input and press Enter. The following program outputs a message inviting the user to enter a number, and then prints this number out again on the console. Before `cin` is used, the variable to be used to accept the input into must have already been defined so that the type of the required input can be deduced. Providing an initial value is preferred (empty braces give it the default value, zero in this case) in case the read by `cin` fails due to either invalid input, such as the user typing letters where digits were required, or end-of-input (Ctrl-D, or Ctrl-Z under Windows):" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "1cd669f6", + "metadata": {}, + "outputs": [], + "source": [ + "// 03-age1.cpp : get and then display an integer\n", + "\n", + "#include \n", + "using namespace std;\n", + "\n", + "int main() {\n", + " int alice_age{};\n", + " cout << \"Please enter your guess for Alice\\'s age: \";\n", + " cin >> alice_age;\n", + " cout << \"You guessed \" << alice_age << \"!\\n\";\n", + "}" + ] + }, + { + "cell_type": "markdown", + "id": "7ee5ea65", + "metadata": {}, + "source": [ + "Use of `cin` from the user's perspective has a few quirks. Perhaps usefully, whitespace (any spaces, tabs or preceding new-lines) is ignored, while perhaps not so usefully, non numerical input is (silently) evaluated to the number zero. Also, the program makes no checks on the range of the input, so numbers such as `200` and `-50` are accepted without complaint, and printed out. In fact, the variable `alice_age` can be set to any value that can be held by type `int`; however the number must (usually) be entered as a decimal; the prefixes for binary, octal and hexadecimal are by default only interpreted at compile-time for literals within program code, or by conversion functions such as `from_chars()`.\n", + "\n", + "## Conditions and if-else\n", + "\n", + "The keyword `if` is followed by a *conditional expression* in (mandatory) parentheses, which always evaluates to `true` or `false` at run-time (these named Boolean values are implicitly convertible both to and from integer `1` and `0` respectively). (To evaluate conditions at compile-time as well the construct `if constexpr` can be used; this is discussed later in this Chapter.) There are a number of symbols that are combined to represent mathematical conditions of equality, greater than, and so on. Some of these symbols together with their meanings are shown in the table below:\n", + "\n", + "| Symbol | Meaning |\n", + "|:------:|:---------------------:|\n", + "| == | equal * |\n", + "| != | not equal |\n", + "| > | greater than |\n", + "| < | less than |\n", + "| >= | greater than or equal |\n", + "| <= | less than or equal |\n", + "\n", + "* Note: different from the assignment operator, which is single equals `=` (confusing the two is a common mistake for new C++ programmers).\n", + "\n", + "Variables of any built-in type can be directly tested by an `if` expression; non-zero evaluates as `true` while zero evaluates to `false` (this is the case for both integer and floating-point types). The following program asks for an integer, and outputs `zero` or `nonzero` depending upon the value entered:" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "fa7eb55a", + "metadata": {}, + "outputs": [], + "source": [ + "// 03-zerotest1.cpp : test an integer value against zero\n", + "\n", + "#include \n", + "using namespace std;\n", + "\n", + "int main() {\n", + " cout << \"Please enter an integer value: \";\n", + " int n{};\n", + " cin >> n;\n", + " cout << \"The value entered was \";\n", + " if (n) {\n", + " cout << \"nonzero\\n\";\n", + " }\n", + " else {\n", + " cout << \"zero\\n\";\n", + " }\n", + "}" + ] + }, + { + "cell_type": "markdown", + "id": "a593d126", + "metadata": {}, + "source": [ + "Notice that the scopes for both the `if` and `else` *clauses* are delimited with `{` and `}`, and that indentation is used for the `cout` operations within them. Notice also that the `if` and the `else` keywords line up vertically, this style is recommended in order to enable in-editor code folding to work, amongst other reasons. In this program the braces for the `if` and `else` clauses are in this case optional because they comprise only a single statement each, however using braces even where not strictly needed is again strongly recommended in case extra code needs to be added to the clauses later (and because code folding often only works in editors where an opening brace exists).\n", + "\n", + "Note: Braces for function **definitions**, including `main()`, are always mandatory, even in the case of single-statement or empty functions. Function **declarations**, by contrast, do not have braces; they are analogous to a C++ **statement** ending with a semi-colon.\n", + "\n", + "**Experiment**\n", + "\n", + "* What happens if you press Ctrl-D (Ctrl-Z then Enter under Windows) when prompted? Can you explain why this is?\n", + "\n", + "* Change the program to test for non-zero using the \"not equal\" operator and a `0`. Does this work in the same way?\n", + "\n", + "* Change the program again to test for \"equals\" zero (as opposed to \"not equal\"), and change the output statements appropriately so the same logic remains. Is this program better? Consider whether the *happy path* should be satisfied by the first \"if\" clause (as opposed to \"else\").\n", + "\n", + "* Now alter the original program to test a floating-point (`double`) variable as being zero or non-zero. Do you consider use of `0.0` as being better style?\n", + "\n", + "* Delete the braces surrounding the `if` and `else` clauses. Does the code still compile? What happens if you add a second statement line to the `else` clause? Or the `if` clause?\n", + "\n", + "The `if` statement is a binary choice, however some decisions require more than two options. To enable this, `if` statements can be *chained* together. The following program chains a further `if` onto the tail of the first `else` clause. Note that in this special case, using braces for the first `else` clause is **not** recommended as this would indent the code. The combination `else if` (with mandatory space) is unambiguous to readers of your code; a second statement to the first `else` clause (which would necessitate braces) is unlikely to be needed as the (possibly itself chained) `if` which follows counts as a single statement." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "b272195f", + "metadata": {}, + "outputs": [], + "source": [ + "// 03-signtest.cpp : test an integer value for zero, positive or negative\n", + "\n", + "#include \n", + "using namespace std;\n", + "\n", + "int main() {\n", + " cout << \"Please enter an integer value: \";\n", + " int n{};\n", + " cin >> n;\n", + " cout << \"The value entered was \";\n", + " if (!n) {\n", + " cout << \"zero\\n\";\n", + " }\n", + " else if (n < 0) {\n", + " cout << \"negative\\n\";\n", + " }\n", + " else {\n", + " cout << \"positive\\n\";\n", + " }\n", + "}" + ] + }, + { + "cell_type": "markdown", + "id": "3e346150", + "metadata": {}, + "source": [ + "Notice that the conditional test `if (!n)` **reverses** the logic of the previous program, that is it tests `n` agains zero and then inverts the previous result, producing `true` for zero and `false` for non-zero. We could have used `if (n == 0)` to get the same result, however the idiom of testing `!n` is preferred as it also works with objects such as `std::ofstream`, leading to consistent syntax.\n", + "\n", + "Also, notice that the final `else` clause catches everything that reached that point, without performing a further test. It is up to the programmer to ensure that by the time control flow reaches here all other possibilities have been tested for (failure to do so is a *semantic*, or logic error, which can't usually be caught by the compiler).\n", + "\n", + "**Experiment**\n", + "\n", + "* Modify the above program by removing the `else` clauses and make it instead perform three different `if` tests. Consider why this is usually seen to be poor style.\n", + "\n", + "From Mathematics you will be familiar with equality conditions such as 0 ≤ *x* < 10 specifying that the variable *x* is between zero (inclusive) and 10 (exclusive). It is not possible to write such conditions directly in C++ as conditional tests are non-associative, however a close approximation usually can be found by combining condition tests with the keywords `and` and `or` (which operate the same way as `&&` and `||`, which are historically called *logical and* and *logical or*). The following is a variation on a previous program which asks the user to guess an age, and says whether or not it is a good guess:" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "b888511e", + "metadata": {}, + "outputs": [], + "source": [ + "// 03-age2.cpp : get and then test an integer is within range\n", + "\n", + "#include \n", + "using namespace std;\n", + "\n", + "int main() {\n", + " int alice_age{};\n", + " cout << \"Please enter your guess for Alice\\'s age: \";\n", + " cin >> alice_age;\n", + " if (6 <= alice_age and alice_age <= 11) {\n", + " cout << \"A good guess!\\n\";\n", + " }\n", + " else {\n", + " cout << \"Not a good guess.\\n\";\n", + " }\n", + "}" + ] + }, + { + "cell_type": "markdown", + "id": "0ff1bb34", + "metadata": {}, + "source": [ + "**Experiment**\n", + "\n", + "* Change the above program so that the test logic is inverted, while the output remains the same; in other words `alice_age` falling outside the range 6-11 results in a positive condition test. Hint: you will need to use the `or` keyword and change the order of the output statements.\n", + "\n", + "* Now change `and` to `&&` in the original program. Does it still compile and run? Which style do you prefer?\n", + "\n", + "* Now change `or` to `||` in the modified program. (Known as the *pipe* symbol, this is often Shift+Backslash on the keyboard.) Does it compile and run as expected?\n", + "\n", + "## Conditions and switch-case\n", + "\n", + "When a test for more than one constant integer value is required, a switch-case block can be employed. The *switch expression* follows the `switch` statement and is enclosed in (again mandatory) parentheses and must evaluate to a built-in integral type (that is, between `char` and `long long` in size, possibly with the `unsigned` qualifier). The possible values to test for are listed in a set of `case` statements that fall within the switch scope, delimited as usual by braces. This example program shows a simple desktop calculator with four arithmetic functions:" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "aaf6383c", + "metadata": {}, + "outputs": [], + "source": [ + "// 03-calc.cpp : simple calculator with four functions\n", + "\n", + "#include \n", + "using namespace std;\n", + "\n", + "int main() {\n", + " int r{}, x{}, y{};\n", + " char op{};\n", + " cout << \"Please enter a calculation (number op number, op is one of +-*/):\\n\";\n", + " cin >> x >> op >> y;\n", + " switch (op) {\n", + " case '+':\n", + " r = x + y;\n", + " break;\n", + " case '-':\n", + " r = x - y;\n", + " break;\n", + " case '*':\n", + " r = x * y;\n", + " break;\n", + " case '/':\n", + " if (y) {\n", + " r = x / y;\n", + " }\n", + " else {\n", + " cerr << \"Error: divide by zero.\\n\";\n", + " }\n", + " break;\n", + " default:\n", + " cerr << \"Error: invalid op.\\n\";\n", + " break;\n", + " }\n", + " cout << \"Result: \" << r << '\\n';\n", + "}" + ] + }, + { + "cell_type": "markdown", + "id": "b969bdca", + "metadata": {}, + "source": [ + "Notice that \"getting\" multiple variables from `cin` allows for the input of three values together, optionally separated by whitespace or newlines. This permissiveness can be useful in some cases but doesn't handle erroneous input very well so is often unsuitable to be used in production code (as error recovery involves clearing the error state, possibly losing input in the process). The four `case` statements each check for a valid integer (actually a character literal) stored in `op` and program flow jumps to the one that matches, if any. The `break` statements are necessary and cause control flow to jump to the closing brace of the switch block; if they were not present flow would *fall through* to the next `case` statement, which is rarely desirable. The `default` case statement is optional but usually desirable, and program flow always continues here if none of the `case` statements match; if it is not present the compiler will often produce a warning.\n", + "\n", + "Notice also the use of `cerr` to output error messages to the *standard error stream*; by default `cerr` echoes to the terminal (the same as for `cout`) but this output can be redirected at run-time to a text file (or a null device). The `if` test for zero divisor should be familiar syntax by now and prevents a possible floating-point exception. In this case, and in the case of an error caused by an invalid operator, the result variable `r` contains the default value zero.\n", + "\n", + "**Experiment**\n", + "\n", + "* Change the type of the input and result variables to `double` and make sure the program still compiles and runs correctly.\n", + "\n", + "* Add a `case` clause for the exponentiation operator `'^'` which calls the function `pow(x,y)` (C++ has no built-in exponentiation operator, `^` in code actually means bitwise exclusive-or). Hint: you will need `#include ` and possibly also `-lm` on the link path.\n", + "\n", + "* Go back to using `int` variables and add the modulo operator `%` to the list of valid operators. You will need to add a suitable `case` clause. Note: this operation gives the remainder from a division, so divide-by-zero needs to be caught here as well.\n", + "\n", + "* Rewrite the case values as plain decimal integers, obtained from a table showing ASCII characters against their numbers. Then try using hexadecimal values, and then octal values.\n", + "\n", + "* Rewrite the whole switch-case block as multiple if-else-if... statements. Test all control-flow paths.\n", + "\n", + "The need for `break` statements at the end of each `case` clause has already been mentioned, however occasionally the behavior of program flow falling through to the next case can be useful. More often, multiple `case` matches using the same code is the desired behavior. The following program demonstrates the former of these:" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "4f7e4c1e", + "metadata": {}, + "outputs": [], + "source": [ + "// 03-fallthrough.cpp : demonstrate case clauses without break\n", + "\n", + "#include \n", + "using namespace std;\n", + "\n", + "int main() {\n", + " cout << \"Please enter an integer between zero and three:\\n\";\n", + " int n{};\n", + " cin >> n;\n", + " switch (n) {\n", + " case 0:\n", + " cout << \"Number is less than 1\\n\";\n", + " [[fallthrough]];\n", + " case 1:\n", + " cout << \"Number is less than 2\\n\";\n", + " [[fallthrough]];\n", + " case 2:\n", + " cout << \"Number is less than 3\\n\";\n", + " break;\n", + " case 3:\n", + " cout << \"Number is exactly 3\\n\";\n", + " break;\n", + " default:\n", + " cout << \"Number out of range!\\n\";\n", + " break;\n", + " }\n", + "}" + ] + }, + { + "cell_type": "markdown", + "id": "7b3cae96", + "metadata": {}, + "source": [ + "Notice that `case 1:` \"falls through\" into `case 2:`, and `case 0:` falls through into both of these. Some compilers will warn where `break` is missing from a `case` clause as it is a common programming mistake; this warning can be suppressed by writing `[[fallthrough]]` (this is a C++ *attribute*) where the compiler is expecting to find `break` (immediately before the next `case`). Using this attribute in the way shown here provides clarity to both human reader and compiler; it is not necessary where `case` statements follow on immediately with no code between.\n", + "\n", + "**Experiment:**\n", + "\n", + "* See if you can correctly predict the output of this program with user input of `0` through `3`.\n", + "\n", + "* Remove the attributes and try to compile this program. See if your compiler gives a warning; if not, try to enable a warning flag in your compiler options.\n", + "\n", + "## Conditional expressions\n", + "\n", + "The need to choose between two values based on a condition test is so common that C++ has a built-in operator to do exactly that. Consider the following condition test:" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "c1eccf4c", + "metadata": {}, + "outputs": [], + "source": [ + "if (condition) {\n", + " value = first;\n", + "}\n", + "else {\n", + " value = second;\n", + "}" + ] + }, + { + "cell_type": "markdown", + "id": "04cb53bd", + "metadata": {}, + "source": [ + "The pseudocode shown here is indentical in meaning to the following *conditional expression*:" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "59403b54", + "metadata": {}, + "outputs": [], + "source": [ + "value = (condition) ? first : second;" + ] + }, + { + "cell_type": "markdown", + "id": "ccfcb4b7", + "metadata": {}, + "source": [ + "The parentheses around the condition in the conditional expression are in fact optional, because the *ternary operator* `?:` has lower precedence than the (in-)equality tests, however they are often included to aid code readability. Using `if` generates code which is in most cases equally efficient but sometimes a conditional expression is preferred style. Note that `first` and `second` need to be of the same type, or convertible to the same common type, as the type of the entity assigned to `value` needs to be determined at compile-time.\n", + "\n", + "The program below is identical to `03-zerotest1.cpp` in operation except that it has been written using the ternary operator:" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "fbc01288", + "metadata": {}, + "outputs": [], + "source": [ + "// 03-zerotest2.cpp : test an integer value against zero and use conditional expression\n", + "\n", + "#include \n", + "using namespace std;\n", + "\n", + "int main() {\n", + " cout << \"Please enter an integer value: \";\n", + " int n{};\n", + " cin >> n;\n", + " cout << \"The value entered was \" << ( (n) ? \"nonzero\\n\" : \"zero\\n\" );\n", + "}" + ] + }, + { + "cell_type": "markdown", + "id": "fb8990db", + "metadata": {}, + "source": [ + "Note: the parentheses around the **whole** conditional expression **are** needed as `<<` has a higher precedence than `?:`.\n", + "\n", + "**Experiment**\n", + "\n", + "* Modify this program to remove the code duplication `\"zero\\n\"`.\n", + "\n", + "* Change the program to use two **nested** conditional expressions and produce the same output as `03-signtest.cpp` (this is quite tricky to get right).\n", + "\n", + "## If and switch initializer expressions\n", + "\n", + "An extension to the `if` and `switch` conditional expression syntax is to precede the conditional expression with an initializer and a semi-colon. In fact this is quite flexible, just about any legal C++ expression can be used. The scope of a variable defined in such an initializer has the scope of both the `if` **and** `else` clauses (if present) for an `if` statement, and the `switch` body when used with a `switch` statement.\n", + "\n", + "The following example defines a variable within an `if` statement:" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "9692888a", + "metadata": {}, + "outputs": [], + "source": [ + "// 03-ifinitializer.cpp : use of variable initializer in if statement\n", + "\n", + "#include \n", + "using namespace std;\n", + "\n", + "int main() {\n", + " cout << \"Please enter a positive number:\\n\";\n", + " unsigned n{};\n", + " cin >> n;\n", + " cout << \"The least significant digit was \";\n", + " if (auto m = n % 10; m < 5) {\n", + " cout << \"less than five (\" << m << \")\\n\";\n", + " }\n", + " else {\n", + " cout << \"five or more (\" << m << \")\\n\";\n", + " }\n", + "}" + ] + }, + { + "cell_type": "markdown", + "id": "2b1f73de", + "metadata": {}, + "source": [ + "The variable defined in the initializer can optionally be used in the condition test, as shown here.\n", + "\n", + "**Experiment**\n", + "\n", + "* Try to use `n` and `m` after the closing brace of the `else` clause. Which, if either, is possible? What does this tell you about the scope of an initializer-defined variable?\n", + "\n", + "* Rewrite this program to use a `switch` statement instead of `if`. The new program should correctly handle all inputs and produce identical output to the one shown. Hint: you may want to use `case` statements which fall through.\n", + "\n", + "## Constexpr if\n", + "\n", + "The conditional expression following an `if` statement is evaluated at run-time. However, if the values and entities within the conditional expression are `constexpr` (see the previous article) then the if clause can be made to execute at compile-time, meaning that control flow at run-time is both known and fixed (and hence optimizations can be made)." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "d27cbce4", + "metadata": {}, + "outputs": [], + "source": [ + "// 03-ifconstexpr.cpp : demonstrate compile-time if\n", + "\n", + "#include \n", + "using namespace std;\n", + "\n", + "int main() {\n", + " constexpr auto int_size = sizeof(int);\n", + " if constexpr (int_size == 4) {\n", + " cout << \"32 bit ints\\n\";\n", + " }\n", + " else if constexpr (int_size == 8) {\n", + " cout << \"64 bit ints\\n\";\n", + " }\n", + " else {\n", + " cout << \"Man, you have weird ints!\\n\";\n", + " }\n", + "}" + ] + }, + { + "cell_type": "markdown", + "id": "66c7af46", + "metadata": {}, + "source": [ + "Testing this program in the online Compiler Explorer results in only one of the three string literals actually being embedded in the assembly language output, therefore proving that it is a compile-time evaluation. The ability to perform an if-test at compile-time, as well as assign from `constexpr`-returning function calls, means that the `constexpr` functionality of C++ is in fact *Turing Complete*. It also allows floating-point numbers and even some user-defined types (with `constexpr` constructors) as well as Standard Library types to be used and evaluated at compile-time.\n", + "\n", + "**Experiment**\n", + "\n", + "* Rewrite the program testing π from the previous Chapter to use `if constexpr` instead of `static_assert()`\n", + "\n", + "## Operator precedence\n", + "\n", + "C++ has quite a lot of operators, many of which are inherited from C and operate on the built-in types in exactly the same ways. Some are unary (operate on one value) while others are binary (operate on two values). Unary operators exist which are prefix (written before the object) or postfix (written after the object) and two (`++` and `--`) are both. Binary operators are exclusively infix (written between the two objects they operate on). There is also exactly one ternary operator (which we have seen in this Chapter, the conditional operator), which operates on three values. Some operators are left-to-right associative and others are right-to-left associative, except for the scope resolution operator and comparison operators which are non-associative.\n", + "\n", + "The table below is intended to be a complete list, and as such introduces operators not previously covered; the highest precedence operators are listed first:\n", + "\n", + "| Operator | Associativity | Description | Pattern |\n", + "|------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|---------------|-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|\n", + "| ::
:: | n/a | global scope (unary)
namespace/class scope (binary) | ::name
namespace_name::entity, class_name::member_name |\n", + "| ()
()
()
{}
type()
type{}
[]
.
->
++
\\-\\-
typeid
any_cast
const_cast
dynamic_cast
reinterpret_cast
static_cast | left to right | parentheses
function call
initialization
uniform initialization
function-style cast
function-style cast
array subscript
object member access
pointer to object member access
post-increment
post-decrement
run-time type information (RTTI)
cast back to type from any
cast away const
run-time hierarchical down-cast
cast pointers and integers
type-checked cast | (expression)
function_name(parameters)
type_name(expression)
type_name(expression)
new_type(expression)
new_type{expression}
pointer[expression]
object.member_name
pointer_to_object->member_name
lvalue++
lvalue\\-\\-
typeid(type) or typeid(expression)
any_cast<type>(expression)
const_cast<type>(expression)
dynamic_cast<type>(expression)
reinterpret_cast<type>(expression)
static_cast<type>(expression) |\n", + "| +
-
++
\\-\\-
!, not
~, compl
(type)
sizeof
&
*
new
new[]
delete
delete[]
| right to left | unary plus
unary minus
pre-increment
pre-decrement
logical not
bitwise not
C-style cast
size in bytes
address of
dereference
dynamic heap memory allocation
dynamic array allocation
dynamic heap memory release
dynamic array release
| +expression
-expression
++lvalue
--lvalue
!expression
~expression
(new_type)expression
sizeof(type) or sizeof(expression)
&lvalue
*pointer_expression
new type
new type[expression]
delete pointer
delete[] pointer |\n", + "| ->*
.* | left to right | member pointer selector
member object selector | object_pointer->*pointer_to_member
object.*pointer_to_member |\n", + "| *
/
% | left to right | multiplication
division
modulo (remainder from division)
| expression * expression
expression / expression
expression % expression |\n", + "| +
- | left to right | addition
subtraction | expression + expression
expression - expression |\n", + "| <<
>> | left to right | bitwise shift left
bitwise shift right | expression << expression
expression >> expression |\n", + "| <
<=
>
>=
| none | comparison less than
comparison less than or equals
comparison greater than
comparison greater than or equals | expression < expression
expression <= expression
expression > expression
expression >= expression |\n", + "| ==
!= | none | test equality
test inequality | expression == expression
expression != expression |\n", + "| &, bitand | left to right | bitwise and | expression & expression |\n", + "| ^, bitxor | left to right | bitwise exclusive-or | expression ^ expression |\n", + "| \\|, bitor | left to right | bitwise or | expression \\| expression |\n", + "| &&, and | left to right | logical and | expression && expression |\n", + "| \\|\\|, or | left to right | logical or | expression \\|\\| expression |\n", + "| ?:
=
*=
/=
%=
+=
-=
<<=
>>=
&=
\\|=
^= | right to left | conditional ternary operator
assignment
multiplication assignment
division assignment
modulo assignment
addition assignment
subtraction assignment
bitwise shift left assignment
bitwise shift right assignment
bitwise and assignment
bitwise or assignment
bitwise exclusive-or assignment | expression ? expression : expression
lvalue = expression
lvalue *= expression
lvalue /= expression
lvalue %= expression
lvalue += expression
lvalue -= expression
lvalue <<= expression
lvalue >>= expression
lvalue &= expression
lvalue \\|= expression
lvalue ^= expression |\n", + "| throw | right to left | exception throw expression | throw expression |\n", + "| , | left to right | comma sequencing operator | expression, expression |\n", + "\n", + "*All text and program code ©2019-2025 Richard Spencer, all rights reserved.*" + ] + } + ], + "metadata": { + "jupytext": { + "cell_metadata_filter": "-all" + }, + "kernelspec": { + "display_name": "C++ 23", + "language": "c++", + "name": "cpp23" + } + }, + "nbformat": 4, + "nbformat_minor": 5 +} diff --git a/jupyter-notebooks/04-functions.ipynb b/jupyter-notebooks/04-functions.ipynb new file mode 100644 index 0000000..339fc40 --- /dev/null +++ b/jupyter-notebooks/04-functions.ipynb @@ -0,0 +1,756 @@ +{ + "cells": [ + { + "cell_type": "markdown", + "id": "7de40175", + "metadata": {}, + "source": [ + "# Functions\n", + "\n", + "## Scopes\n", + "\n", + "We have become familiar with the `main()` function, which is automatically called (or *entered*) when the program starts. Variables defined within `main()` have been called local variables because they are local to the scope of `main()`. Importantly, they are **not** visible within any functions called by `main()`, even though they retain their state between such calls. The following program defines three variables and also three functions (one of which is `main()`); these three variables have the same name, but different types and values. The values of each of these variables are only accessible within the functions they are defined in, that is: the variables are only *visible* within their own defining function's *scope*." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "912896f1", + "metadata": {}, + "outputs": [], + "source": [ + "// 04-scope.cpp : demonstrate function scope rules\n", + "\n", + "#include \n", + "using namespace std;\n", + "\n", + "int alice_height_m;\n", + "\n", + "void victorian_england() {\n", + " size_t alice_height_m{ 1 };\n", + " cout << \"In \\\"victorian_england()\\\", alice_height_m is \" << alice_height_m << \".\\n\";\n", + "}\n", + "\n", + "void wonderland() {\n", + " double alice_height_m{ 0.15 };\n", + " cout << \"In \\\"wonderland()\\\", alice_height_m is \" << alice_height_m << \".\\n\";\n", + "}\n", + "\n", + "int main() {\n", + " cout << \"In \\\"main()\\\", alice_height_m is \" << alice_height_m << \".\\n\";\n", + " victorian_england();\n", + " wonderland();\n", + " cout << \"Back in \\\"main()\\\", alice_height_m is still \" << alice_height_m << \".\\n\";\n", + "}" + ] + }, + { + "cell_type": "markdown", + "id": "eb49b755", + "metadata": {}, + "source": [ + "There are plenty of new things to notice about this program:\n", + "\n", + "* Both of the functions `victorian_england()` and `wonderland()` are defined before (or above) `main()`. This is neccessary for the checks the compiler needs to perform when the function call is reached; C++ function call syntax is the name of the function followed by (possibly empty) brackets and a semi-colon. \n", + "\n", + "* Both of these function definitions begin with the keyword `void`; these are known as `void` functions (analogous to *procedures* in other programming languages) because they do not return a value (unlike `main()` which always returns an `int`, more on this later in this Chapter).\n", + "\n", + "* Program flow begins as usual in `main()` which produces output both before and after calling the two (previously defined) functions. If `main()` didn't call these two functions, there would be little evidence they even existed in the source code; they probably wouldn't even be *linked* into the executable binary.\n", + "\n", + "* The global variable `alice_height_m` receives a default value because it has *static linkage* (being a global variable).\n", + "\n", + "The output from running the program shouldn't surprise you, being:\n", + "\n", + "```\n", + "In \"main()\", alice_height_m is 0.\n", + "In \"victorian_england()\", alice_height_m is 1.\n", + "In \"wonderland()\", alice_height_m is 0.15.\n", + "Back in \"main()\", alice_height_m is still 0.\n", + "```\n", + "\n", + "**Experiment**\n", + "\n", + "* Change all of the variables to type `float`. Does this change the output of the program? Is this what you expected?\n", + "\n", + "* Now give the variables different names, and change the lines beginning with `cout` accordingly. Does this make the program code clearer or less so?\n", + "\n", + "* Now try removing variable definitions from each of the functions in the original program, one by one. Does this change the behavior of the remaining variables?\n", + "\n", + "Local variables with the same name (but not necessarily the same type) as one in the *global scope* (or any *enclosing scope*) temporarily *hide* the other variable for the duration of their own lifetime. After this point, the original variable can be referenced again using the same name.\n", + "\n", + "## Return value\n", + "\n", + "Functions are declared or defined with a type known to the compiler before the function name, the keyword `auto`, or the keyword `void` if there is none. This type can be a user-defined type as we shall discover later, or perhaps more commonly one of the built-in types such as `int`, `double` and so on. The value thus returned is known as the *return value*; its type is the *return type* of the function. In case of `auto`, the return type is deduced from the entity (entities) after the `return` statement(s); if there is more than one they must return values of the same type. The `return` keyword is implicit at the end of a `void` function; it can also be explicitly used without a value (for example in an `if` clause) to exit from the function early.\n", + "\n", + "The `main()` function is always defined to return an `int` (it can also be `void` in C but this is not legal C++). Uniquely to `main()`, a `return 0;` statement is implicit at the function's closing brace. This causes a return value of zero (which indicates successful execution) to be returned to the calling environment or process; this value is sometimes called the *return code* of a program. Other values are used to indicate different error conditions encountered; a return code of either zero or non-zero is allowed at any point within `main()`, including at the end.\n", + "\n", + "The `return` keyword is mandatory at the end of any non-`void` function other than `main()`, together with a return value which is convertible to the return type of the function. The following program defines a function called `abs_value()` which always returns a positive number:" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "da24a598", + "metadata": {}, + "outputs": [], + "source": [ + "// 04-absolute1.cpp : return the absolute value of a global variable\n", + "\n", + "#include \n", + "using namespace std;\n", + "\n", + "int value;\n", + "\n", + "int abs_value() {\n", + " if (value < 0) {\n", + " return -value;\n", + " }\n", + " return value;\n", + "}\n", + "\n", + "int main() {\n", + " cout << \"Please enter a positive or negative integer: \";\n", + " cin >> value;\n", + " auto a = abs_value();\n", + " cout << \"The absolute value is: \" << a << '\\n';\n", + "}" + ] + }, + { + "cell_type": "markdown", + "id": "73c5fbed", + "metadata": {}, + "source": [ + "In fact, the call of `abs_value()` yielding its return value could be used directly in the second `cout` call, which means a named variable `a` is not needed. Using a (temporary) variable to store the return value of a function could be seen as unnecessary if the value is used only once, however if the return value of a function is needed more than once and is not stored in a variable, the function must be called every time its return value is needed, which could become inefficient.\n", + "\n", + "**Experiment**\n", + "\n", + "* Modify `main()` so that the variable `a` is not needed.\n", + "\n", + "* Modify `abs_value()` so that the keyword `else` is used. Does this make the code any more obvious in intent? Do you get a warning about there being no `return` keyword outside of the `if`-`else` clauses? What happens if you add a third `return` statement just before the function's closing brace?\n", + "\n", + "* Rearrange the order of the variable and/or function definitions (all beginning with `int`). What errors do you get?\n", + "\n", + "## Parameters by value\n", + "\n", + "Having the function `abs_value()` refer to a global variable is clumsy and error prone, does not scale to larger programs, and is very bad C++ style. What is far better is for the function to have its own local state to operate on, while accepting and returning the desired values. The following program shows how this can be achieved:" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "17955da3", + "metadata": {}, + "outputs": [], + "source": [ + "// 04-absolute2.cpp : return the absolute value of a local variable\n", + "\n", + "#include \n", + "using namespace std;\n", + "\n", + "int abs_value(int v) {\n", + " if (v < 0) {\n", + " return -v;\n", + " }\n", + " else {\n", + " return v;\n", + " }\n", + "}\n", + "\n", + "int main() {\n", + " int value{};\n", + " cout << \"Please enter a positive or negative integer: \";\n", + " cin >> value;\n", + " auto a = abs_value(value);\n", + " cout << \"The absolute value is: \" << a << '\\n';\n", + "}" + ] + }, + { + "cell_type": "markdown", + "id": "c9ed761e", + "metadata": {}, + "source": [ + "The local variable `v` inside `abs_value()` is a **copy** of `main()`'s `value`, whose lifetime is (exactly) the length of the function call to `abs_value()`. Its type and name appears between the parentheses after the function name where the function is defined, thus `v` is defined in the function's *parameter list*. The name of another variable, or possibly a constant value, appears between the parentheses where the function is called. Thus `v` is the *parameter* (or *formal parameter*) variable of function `abs_value()`, and this function is called with *argument* (or *actual parameter*) `value` from `main()`. Parameters can also be declared with `auto`, but be aware that the function then becomes a *generic function* (see Chapter 10) and must always be defined in full, not merely declared.\n", + "\n", + "**Experiment**\n", + "\n", + "* Again, modify `main()` so that the variable `a` is not needed.\n", + "\n", + "* Modify `abs_value()` so that the parameter variable is called `value` (instead of `v`), like in `main()`. Does the program still work correctly?\n", + "\n", + "* Modify `abs_value()` to use the conditional operator (`?:`). Can you make this into a one-line function?\n", + "\n", + "* With `abs_value()` as a one-line function, are the braces surrounding the function body still necessary?\n", + "\n", + "The way the variable `value` is passed from `main()` to `abs_value()` is described as *pass by value*. When passed in this way, a copy of the variable is made that can be changed (or *mutated*) by the function accepting the parameter **without** the original value being changed. In this example we have set the return value of the function to the absolute value of the parameter variable, however there is another common way of extracting a modified variable from a function, which is where it is *passed by reference*.\n", + "\n", + "## Parameters by reference\n", + "\n", + "As we have seen, variables which are defined as references are not copies of existing variables, instead they are an alternative name, or *alias*, of a variable **which already exists**. References become particularly useful when defining them in a **different** scope to the variable they reference. As we have seen, a *callee* function cannot access local variables within the *caller* function, instead it can only reference global variables and variables passed as parameters.\n", + "\n", + "Parameter variables can be defined as references by using a single ampersand (`&`) between the type and the variable name in the parameter list. This small and subtle change completely changes the semantics of the function. Changes to a **parameter** variable defined as a *pass by reference* will change the **argument** variable in the calling function, as shown in the following program:" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "f93f0571", + "metadata": {}, + "outputs": [], + "source": [ + "// 04-absolute3.cpp : modify a parameter to become its absolute value\n", + "\n", + "#include \n", + "using namespace std;\n", + "\n", + "void abs_value(int& v) {\n", + " if (v < 0) {\n", + " v = -v;\n", + " }\n", + "}\n", + "\n", + "int main() {\n", + " int value{};\n", + " cout << \"Please enter a positive or negative integer: \";\n", + " cin >> value;\n", + " abs_value(value);\n", + " cout << \"The absolute value is: \" << value << '\\n';\n", + "}" + ] + }, + { + "cell_type": "markdown", + "id": "353927f3", + "metadata": {}, + "source": [ + "This time, `abs_value()` has been defined as a `void` function, with reference parameter `int& v`. This variable is then reassigned (negated) if it tests as less than zero. When the function `abs_value()` returns, the value of `v`, modified or not, is also returned to `main()`'s argument variable `value`. This version of the program is the briefest we have seen so far.\n", + "\n", + "**Experiment**\n", + "\n", + "* Remove the `&` from the parameter list of `abs_value()`. Does the program still compile? Does it work as expected with positive and negative numbers as input?\n", + "\n", + "* Can the sequence `<< value <<` be replaced with `<< abs_value(value) <<` in this program? Why do you think this is?\n", + "\n", + "* Modify `abs_value()` so that the last change above works. Can you see a possible problem with this?\n", + "\n", + "## Forward declarations\n", + "\n", + "We have become used to global variables and functions used by `main()` being written above (*defined before*) the `main()` function. For our simple programs this requirement hasn't presented a problem, however it doesn't scale well to larger projects.\n", + "\n", + "The rule for declarations is that an object can be declared multiple times if all of the declarations are **identical**. (This doesn't violate the ODR, which is to do with definitions.) A declaration implies that an entity is available, at global or local scope depending on the scope of the declaration, without saying where it is defined. (This is left to the linker to resolve; unfortunately, linker errors are often less easy to correct than other, compile-time, errors because the compilation stage has been completed and therefore the source-code is unavailable.)\n", + "\n", + "A function prototype (or *forward declaration*) is the minimum syntax that needs to have been \"seen\" before the function can be called. The syntax is simple, the return type, function name and types from the parameter list (the variable names are actually optional, but are often included) each with an optional default value, followed by a semi-colon. This declaration must match *exactly* with the function definition (apart from the presence of default values) for the code to compile and link correctly. The forward declaration of the most recent variant of `abs_value()` is simply:" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "f2090f62", + "metadata": {}, + "outputs": [], + "source": [ + "void abs_value(int& v); // Function declaration only, not a definition" + ] + }, + { + "cell_type": "markdown", + "id": "494cbd5d", + "metadata": {}, + "source": [ + "**Experiment**\n", + "\n", + "* Rewrite the four programs introduced so far in this chapter with `main()` as the first function defined, providing suitable function prototypes to forward declare the other functions called. This shouldn't take too long as you can use copy and paste. Hint: don't forget the semi-colons after the forward function declarations.\n", + "\n", + "* For the first two programs, write the global variable **definition** below `main()`. To enable compilation, it is necessary to provide a global variable **declaration** before the function(s) which use the global variable; this declaration takes a form similar to: `extern int i;`. Write the necessary global declarations with the correct type and variable name near the start of the program.\n", + "\n", + "* Now try making these declarations local to a function. Does the code compile and link? How is it possible to (deliberately) cause a linker error?\n", + "\n", + "* What happens if the wrong type is used as return type or parameter in a function declaration? Or the wrong type for a global variable declaration? Consider why this strict behavior might be useful.\n", + "\n", + "## Default arguments\n", + "\n", + "Providing the wrong number of arguments in a function call always results in a compile-time error. (You may also get errors if the number of parameters in a function definition, or their types, don't match those in a previous function dedeclaration. Unless the number of parameters, and their types match **exactly** they will be assumed to be different functions; the names used are unimportant and can be different, or even omitted altogether, in function declarations.) C++ provides a way for any or all of the parameters in a function call to be optional, and if not present in the argument list are substituted with default values provided in the function declaration only. (Providing them in the function **definition** is not sufficient or even allowed, for technical reasons, unless defined **before** the *call site* with no declaration used).\n", + "\n", + "The following program uses *head recursion* to print out a number in any base up to 16 (defaulting to base 10):" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "1194c0bf", + "metadata": {}, + "outputs": [], + "source": [ + "// 04-base-n.cpp : print out a number to given base\n", + "\n", + "#include \n", + "using namespace std;\n", + "\n", + "void print_base_n(unsigned long long num, unsigned base = 10);\n", + "\n", + "int main() {\n", + " cout << \"Please enter a number (in decimal): \";\n", + " long long n{};\n", + " cin >> n;\n", + " cout << \"Please enter the required base (2-16): \";\n", + " int b{};\n", + " cin >> b;\n", + " if ((b >= 2) and (b <= 16)) {\n", + " print_base_n(n, b);\n", + " cout << '\\n';\n", + " }\n", + " else {\n", + " cerr << \"Base not in range.\\n\";\n", + " }\n", + "}\n", + "\n", + "void print_base_n(unsigned long long num, unsigned base) {\n", + " if (num >= base) {\n", + " print_base_n(num / base, base);\n", + " }\n", + " cout << \"0123456789abcdef\"[num % base];\n", + "}" + ] + }, + { + "cell_type": "markdown", + "id": "1cdc0c66", + "metadata": {}, + "source": [ + "This is the most complex program we have seen so far, although it does not contain much that is new.\n", + "\n", + "* The function **declaration** for `print_base_n()` contains `= 10`. This is the *default value* for the second argument, which is substituted at the appropriate point in the parameter list, if necessary. For example, a function call `print_base_n(1021)` is substituted by `print_base_n(1021, 10)`; this substitution takes place at compile-time. \n", + "\n", + "* The *recursive* function `print_base_n()`, so called because it conditionally calls itself, checks whether or not we are dealing with the **most** significant digit, calling itself **without** the **least** significant digit otherwise (and also with the second parameter it received). In Modern C++, recursive functions can be used without a prototype (declaration) having already been seen.\n", + "\n", + "* The `cout` line outputs a single character which is an index into a string literal of the **least** significant digit (square brackets `[` and `]` are the array index operators, and we are indexing a string literal as if it were an array, which is perfectly legal C++).\n", + "\n", + "If you're struggling to follow the control flow through the recursion then imagine the function call `print_base_n(9)`, and then `print_base_n(89)`, and then `print_base_n(789)`. (Recursion makes use of the fact that each call of the function retains its own private copy of the parameter variables as well as any other local variables.)\n", + "\n", + "**Experiment**\n", + "\n", + "* Remove the variable names `num` and `base` from the declaration of `print_base_n()`. Does the program still compile? What happens if you choose other names instead?\n", + "\n", + "* Make sure the program works correctly by checking with binary, octal and hexadecimal **literals**, and bases 2, 8 and 16 **at run time** respectively. (Use of `static_assert()` is not possible because of the use of side-effect producing `cout`.)\n", + "\n", + "* Modify the program again, so that numbers printed out in up to base 64 are supported.\n", + "\n", + "* Now modify the program so that `num` being **signed** `long long` is supported.\n", + "\n", + "## Implicit narrowing casts\n", + "\n", + "We have seen that variable initialization using uniform initialization syntax disallows implicit narrowing when defining a new variable. Calling a function can also imply a narrowing cast, and this **is** allowed, as demonstrated in the following program:" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "9dcb29e3", + "metadata": {}, + "outputs": [], + "source": [ + "// 04-no-narrow.cpp : calling a function with different types of arguments and parameters\n", + "\n", + "#include \n", + "using namespace std;\n", + "\n", + "void f(int i) {\n", + " cout << \"f(): received int: \" << i << '\\n';\n", + "}\n", + "\n", + "void g(double d) {\n", + " cout << \"g(): recieved double: \" << d << '\\n';\n", + "}\n", + "\n", + "int main() {\n", + " f(1);\n", + " g(1);\n", + " f(2.5);\n", + " g(2.5);\n", + "}" + ] + }, + { + "cell_type": "markdown", + "id": "910f77b5", + "metadata": {}, + "source": [ + "Running this program produces the output:\n", + "\n", + "```\n", + "f(): recieved int: 1\n", + "g(): recieved double: 1\n", + "f(): recieved int: 2\n", + "g(): recieved double: 2.5\n", + "```\n", + "\n", + "Notice that the call `g(1)` promotes the `int` argument to `double` silently, although this is not apparent when printing the number (it doesn't print as `1.0`, but could be made to with stream formatting manipulators, see Chapter 8). Also, notice that the call `f(2.5)` silently narrows the `double` argument to `int`, so the fractional part is lost.\n", + "\n", + "It is possible to write code that disallows narrowing casts by using universal references and perfect forwarding but demonstrating this is beyond the scope of this Tutorial. You should be aware that in general functions calls may silently produce narrowing effects, however some implicit conversions (such as pointer to integer or floating-point number) are not allowed.\n", + "\n", + "**Experiment**\n", + "\n", + "* Add a third function `h()` which takes parameter `unsigned u`. What happens when you call it with a negative integer or floating-point value? Does this surprise you?\n", + "\n", + "## Function overloading\n", + "\n", + "C++ allows multiple definintions of functions with the **same** name if the parameter(s) is/are of **different** types. (This works at the level of the linker by use of name mangling, whereby the name of the function is augmented by its parameter list. It is possible to disable function name mangling by declaring *C linkage*; such functions are declared with `extern \"C\"` and can also be called from C code.) The following program declares two functions again, this time both called `f()`:" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "23870ae5", + "metadata": {}, + "outputs": [], + "source": [ + "// 04-overload.cpp : calling a function with different types of arguments and parameters\n", + "\n", + "#include \n", + "using namespace std;\n", + "\n", + "void f(int i) {\n", + " cout << \"f(): int: \" << i << '\\n';\n", + "}\n", + "\n", + "void f(double d) {\n", + " cout << \"f(): double: \" << d << '\\n';\n", + "}\n", + "\n", + "int main() {\n", + " f(1);\n", + " f(2.5);\n", + "}" + ] + }, + { + "cell_type": "markdown", + "id": "2a3e21aa", + "metadata": {}, + "source": [ + "Running this program produces the output:\n", + "\n", + "```\n", + "f(): int: 1\n", + "f(): double: 2.5\n", + "```\n", + "\n", + "The function to be used is determined at compile-time from the usage at the call site, as the types of the arguments are always known. A \"best-match\" is performed in the case of no exact match, so for example `f('a')` would call `f(int)` while `f(0.5f)` would call `f(double)`.\n", + "\n", + "**Experiment**\n", + "\n", + "* Add a third overload `f(unsigned u)`. How can you cause this function to be called?\n", + "\n", + "## Static and thread-local\n", + "\n", + "Variables declared `static` inside a function body are in fact global variables with visibility limited to function scope. They are initialized when the program starts, although conceptually they are given an initial value when the function is first called, which is then preserved between function calls. The following program demonstrates this:" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "f092d22f", + "metadata": {}, + "outputs": [], + "source": [ + "// 04-static-var.cpp : preserving function state in a static variable\n", + "\n", + "#include \n", + "using namespace std;\n", + "\n", + "void f() {\n", + " static int s{1};\n", + " println(\"{}\", s);\n", + " ++s;\n", + "}\n", + "\n", + "int main() {\n", + " f();\n", + " f();\n", + " f();\n", + "}" + ] + }, + { + "cell_type": "markdown", + "id": "73114920", + "metadata": {}, + "source": [ + "The output from running this program is:\n", + "\n", + "```\n", + "1\n", + "2\n", + "3\n", + "```\n", + "\n", + "Static local variables are slightly deprecated in C++ because they are not *thread-safe*; different threads calling the same function that has a `static` variable will lead to unpredictable results. In real code there will almost always be a better way of doing things than using a static variable.\n", + "\n", + "**Experiment**\n", + "\n", + "* Modify this program so that it counts from `10` down to `0` and then outputs `Blastoff!`. (Don't use a loop, even if you're tempted to. Loops are covered in the next Chapter.)\n", + "\n", + "* Modify the same program to use a file-`static` variable instead of a function-`static` one (this is a small change to the code, but you should try to understand the difference).\n", + "\n", + "Variables declared `thread_local` within a function have a new copy of the variable created upon launching a new thread, which is independent from others within the calling thread or any other thread. Since the way in C++ to launch a new thread is to specify a function to be called, this behavior is useful in multi-threaded programs. Further discussion of *parallelism* is beyond the scope of this Tutorial. (Variables can also be declared both `static` and `thread_local`.)\n", + "\n", + "Functions can be declared `static` by prefixing the return type in the function declaration and definition with the keyword `static`. As with global variables, this reduces the visibility of the function to the translation unit it is defined within. More useful in most cases are `inline` functions, described later in this Chapter.\n", + "\n", + "## Structured bindings\n", + "\n", + "You may be interested to learn that the return type of any function other than `main()` can be declared and defined with `auto` (this includes implicitly `void` functions). As mentioned above, functions that are defined with `auto` as the return type must use the same type for all of their `return` statements for the return type to be correctly deduced.\n", + "\n", + "Use of `auto` return type becomes especially useful when returning two or more values from a function. Such a return type is called an *aggregate*, which is unpacked into single variables using a *strutured binding*. The following program returns a `double` and an `int` from a function `get_numbers()`:" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "40e43fd6", + "metadata": {}, + "outputs": [], + "source": [ + "// 04-aggregate.cpp : calling a function with different types of arguments and parameters\n", + "\n", + "#include \n", + "#include \n", + "using namespace std;\n", + "\n", + "auto get_numbers() {\n", + " cout << \"Please enter a float and an integer: \";\n", + " double d{};\n", + " int i{};\n", + " cin >> d >> i;\n", + " return pair{ d, i };\n", + "}\n", + "\n", + "int main() {\n", + " auto [ a, b ] = get_numbers();\n", + " cout << \"You entered \" << a << \" and \" << b << '\\n';\n", + "}" + ] + }, + { + "cell_type": "markdown", + "id": "8ae3623b", + "metadata": {}, + "source": [ + "There are three main new things to notice about this program. \n", + "\n", + "* The function `get_numbers()` is declared with `auto` return type.\n", + "\n", + "* The last line of this function returns a Standard Library `pair`, which is initialized with uniform initialization syntax.\n", + "\n", + "* This `pair` is *unpacked* into the variables `a` and `b` in `main()` using structured binding syntax, which again uses `auto`. The types of `a` and `b` are determined (at compile-time) from the aggregate return type of `get_numbers()`.\n", + "\n", + "**Experiment**\n", + "\n", + "* Make `get_numbers()` return three variables, the third being `unsigned`. Hint: you will need to use `return tuple{ d, i, u };` or similar. Hint: use `#include `.\n", + "\n", + "* Rewrite `get_numbers()` to accept and modify two reference parameters, and return results to `main()` in this way.\n", + "\n", + "## Inline functions\n", + "\n", + "Functions can be declared as inline functions by using the keyword `inline` before the return type in the function definition. The main aim of declaring a function `inline` is to remove the time overhead of a function call; the function body's code is allowed to be replicated for each function call *in place* at the call site(s). Functions declared with `inline` must be present (and identical) in each translation unit that uses them, hence they often appear in header files; this is a special relaxation of the ODR. Overuse of inline functions can lead to *code-bloat*, so they are best reserved for very short functions. The following program demonstrates use of the `inline` keyword:" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "9942db11", + "metadata": {}, + "outputs": [], + "source": [ + "// 04-inline.cpp : use of an inline function\n", + "\n", + "#include \n", + "using namespace std;\n", + "\n", + "inline void swap(int& x, int& y) {\n", + " auto z = x;\n", + " x = y;\n", + " y = z;\n", + "}\n", + "\n", + "int main() {\n", + " int a = 1, b = 2;\n", + " println(\"(1) a = {}, b = {}\", a, b);\n", + " swap(a, b);\n", + " println(\"(2) a = {}, b = {}\", a, b);\n", + "}" + ] + }, + { + "cell_type": "markdown", + "id": "189204ca", + "metadata": {}, + "source": [ + "Running the above code produces:\n", + "\n", + "```\n", + "(1) a = 1, b = 2\n", + "(2) a = 2, b = 1\n", + "```\n", + "\n", + "The `swap()` function swaps over two `int`s *in-place* by using reference parameters and a local variable. (In real code you would want to use the Standard Library's `std::swap()` template, rather than writing your own version.)\n", + "\n", + "**Experiment**\n", + "\n", + "* Remove the `inline` keyword from the above program. Does it still compile? Experiment with the online Compiler Explorer to see if it produces more efficient code when present.\n", + "\n", + "* Now try moving the `swap()` function to below `main()`, adding a function declaration before `main()`. Can the function be made `inline` again?\n", + "\n", + "* Modify the program `04-abs2.cpp` so that `abs_value()` is an `inline` function. (This change is trivial to make.) Does it compile as expected? Does it still run correctly?\n", + "\n", + "## Constexpr functions\n", + "\n", + "Functions can be defined with the `constexpr` keyword before the return type in the function definition. Like `constexpr` variables and `if constexpr`, this allows the compiler to generate and run code at compile-time. The following program shows how compile-time `static_assert()` can be used with the return value of a `constexpr` function:" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "8d72cbd2", + "metadata": {}, + "outputs": [], + "source": [ + "// 04-constexpr.cpp : use of a constexpr function with static_assert\n", + "\n", + "#include \n", + "using namespace std;\n", + "\n", + "constexpr int factorial(int n) {\n", + " if (n < 2) {\n", + " return 1;\n", + " }\n", + " else {\n", + " return n * factorial(n - 1);\n", + " }\n", + "}\n", + "\n", + "static_assert(factorial(0) == 1);\n", + "static_assert(factorial(5) == 120);\n", + "\n", + "int main() {\n", + " cout << \"Please enter a number: \";\n", + " int n{};\n", + " cin >> n;\n", + " cout << n << \"! = \" << factorial(n) << '\\n';\n", + "}" + ] + }, + { + "cell_type": "markdown", + "id": "7f2bdc8a", + "metadata": {}, + "source": [ + "Note that it is **not** necessary (or even possible) to use `if constexpr` for the condition test within the function; the `constexpr` function is nevertheless able to be evaluated at compile-time as well as run-time. A constexpr function is **not** allowed to modify global state (such as `cout`), amongst other restrictions.\n", + "\n", + "**Experiment**\n", + "\n", + "* Experiment with invalid input (ie. negative numbers, overly large numbers or Ctrl-D/Ctrl-Z). Consider how you could modify the program to deal with this.\n", + "\n", + "* Write a program to calculate the N-th Fibonacci number, where *fib(0) = 0*, *fib(1) = 1* and *fib(n) = fib(n-1) + fib(n-2)* for *n >= 2*. Hint: utilize tail recursion again.\n", + "\n", + "## Non-returning and noexcept functions\n", + "\n", + "It is possible to write a function which never returns, for example using an infinite loop. Another example might be a function that causes an abnormal early exit from the running program; the Modern C++ way of doing this is to throw an exception, or even to call `std::terminate()` directly (the C Standard Library also provides `abort()`, `exit()` and `quick_exit()` but these do not deallocate all global objects correctly). The way to indicate this property to the compiler is to use the `[[noreturn]]` attribute when declaring the function, as shown in this example program:" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "9c74d772", + "metadata": {}, + "outputs": [], + "source": [ + "// 04-noreturn.cpp : program which does not return from main()\n", + "\n", + "#include \n", + "using namespace std;\n", + "\n", + "[[noreturn]] void report_fatal_error(int e) {\n", + " cerr << \"Fatal error code: \" << e << '\\n';\n", + " terminate();\n", + "}\n", + "\n", + "int main() {\n", + " cout << \"Entering main()\\n\";\n", + " cout << \"Calling report_fatal_error()\\n\";\n", + " report_fatal_error(-1);\n", + " cout << \"Leaving main()\\n\";\n", + "}" + ] + }, + { + "cell_type": "markdown", + "id": "fea4e5f8", + "metadata": {}, + "source": [ + "A function declared with `[[noreturn]]` should be a `void` function (as having a return type is meaningless if the function never returns). The compiler should warn if any code path can achieve a natural return from such a function.\n", + "\n", + "The keyword `noexcept` is used to declare that a function is guaranteed to not throw an exception. This guarantee is preserved over function calls, thus a non-`noexcept` function called by a `noexcept` function is implicitly `noexcept`. The motivation behind this keyword is that the compiler and run-time do not have to support stack unwinding used by the keyword `throw`, which can add a significant time and space advantage to your code.\n", + "\n", + "**Any** exception thrown by an explicitly or implicitly `noexcept` function, or any library routine it may call, causes a call to `std::terminate()` as above. The following program demonstrates this:" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "f1d28aef", + "metadata": {}, + "outputs": [], + "source": [ + "// 04-noexcept.cpp : a noexcept function throwing an exception\n", + "\n", + "#include \n", + "#include \n", + "using namespace std;\n", + "\n", + "void throw_if_zero(int i) noexcept {\n", + " if (!i) {\n", + " throw runtime_error(\"found a zero\");\n", + " }\n", + " println(\"throw_if_zero(): {}\", i);\n", + "}\n", + "\n", + "int main() {\n", + " println(\"Entering main()\");\n", + " try {\n", + " throw_if_zero(1);\n", + " throw_if_zero(0);\n", + " }\n", + " catch(exception& e) {\n", + " println(\"Caught an exception: {}\", e.what());\n", + " }\n", + " println(\"Leaving main()\");\n", + "}" + ] + }, + { + "cell_type": "markdown", + "id": "8bb77a96", + "metadata": {}, + "source": [ + "**Experiment:**\n", + "\n", + "* Remove the `noexcept` keyword. Does the program compile? What is the output when run?\n", + "\n", + "*All text and program code ©2019-2025 Richard Spencer, all rights reserved.*" + ] + } + ], + "metadata": { + "jupytext": { + "cell_metadata_filter": "-all" + }, + "kernelspec": { + "display_name": "C++ 23", + "language": "c++", + "name": "cpp23" + } + }, + "nbformat": 4, + "nbformat_minor": 5 +} diff --git a/jupyter-notebooks/05-arrays-pointers-and-loops.ipynb b/jupyter-notebooks/05-arrays-pointers-and-loops.ipynb new file mode 100644 index 0000000..601a438 --- /dev/null +++ b/jupyter-notebooks/05-arrays-pointers-and-loops.ipynb @@ -0,0 +1,781 @@ +{ + "cells": [ + { + "cell_type": "markdown", + "id": "977041d2", + "metadata": {}, + "source": [ + "# Arrays, Pointers and Loops\n", + "\n", + "## Number and character arrays\n", + "\n", + "A C++ array can be described as a collection of entities *of the same type* arranged *contiguously* in memory. C++ inherits its *built-in array* syntax from C, and sometimes these are referred to as *C-style* arrays. Uniform initialization syntax can be used to assign the contents of an array at the point it is defined (and **only** at this point). This is called *aggregate initialization* using a *braced initializer* (the equals sign shown below is in fact optional, as for uniform initialization in general):" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "f070355e", + "metadata": {}, + "outputs": [], + "source": [ + "int numbers[] = { 1, 2, 3, 4, 5 };" + ] + }, + { + "cell_type": "markdown", + "id": "d2549599", + "metadata": {}, + "source": [ + "Notice that the type is `int[]` (\"array of `int`\"), however the square brackets *bind* to the variable name, in this case `numbers`, **not** to the type specifier, in this case `int`. The optional number between the square brackets (which must be a **constant** known at compile-time, if present) is the length of the array; this is fixed at compile-time and cannot be changed at run-time. If no value is provided here then it is calculated from the number of *elements* which make up the initializer (in this case the value is 5). If provided, the array size must be **at least** as large as the initializer being assigned from, otherwise a compile-time error is produced. If the size of the array is given as greater than the number of elements in the initializer, the remaining elements are default-constructed (zeroized for the built-in types).\n", + "\n", + "The array variable `numbers` is writable through subscripting syntax using square brackets `[` and `]`. The array index starts at **zero** (`[0]`) for the first element. Attempting to read or write beyond the last element is undefined behavior, as is use of negative indices." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "02da11d3", + "metadata": {}, + "outputs": [], + "source": [ + "numbers[4] = 6; // ok, numbers[] is { 1, 2, 3, 4, 6 }\n", + "numbers[5] = 99; // not ok, compiles but yields undefined behavior\n", + "auto i = numbers[0]; // ok, i is 1\n", + "auto j = numbers[-1]; // not ok, compiles but yields undefined behavior" + ] + }, + { + "cell_type": "markdown", + "id": "9285687c", + "metadata": {}, + "source": [ + "A string literal can be thought of as simply an array of characters, thus a string literal can be used to initialize an array of `char`:" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "825c544f", + "metadata": {}, + "outputs": [], + "source": [ + "char name[] = \"Dinah\";" + ] + }, + { + "cell_type": "markdown", + "id": "46575e3f", + "metadata": {}, + "source": [ + "This type of array is modifiable, so individual letters can be changed using array indexing syntax. (Actually, the fact that the variable contents are writable is not without overhead; the string literal used to initialize the array is stored in a read-only part of the executable binary and is copied into the newly-allocated array at run-time.) A terminating zero-byte is also added to the array, so the array length implicit inside the square brackets is 6, not 5.\n", + "\n", + "A braced initializer can also be used with character literals as elements, so the same result could be achieved by using:" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "879b6340", + "metadata": {}, + "outputs": [], + "source": [ + "char name2[] = { 'D', 'i', 'n', 'a', 'h', '\\0' };" + ] + }, + { + "cell_type": "markdown", + "id": "1cddef76", + "metadata": {}, + "source": [ + "This time the terminating zero-byte has to be explicitly specified, if it is desired; both `name` and `name2` are safe to be put to streams such as `cout` as they each have this terminating zero-byte. A single element of each of these variables could also be output, and would produce the same output as a character literal:" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "58d79e3b", + "metadata": {}, + "outputs": [], + "source": [ + "cout << name << '\\n'; // outputs \"Dinah\" followed by new-line\n", + "cout << name[0]; // outputs \"D\"" + ] + }, + { + "cell_type": "markdown", + "id": "a8cc92ac", + "metadata": {}, + "source": [ + "Due to the fact that the size of an array is known to the compiler, this size can be used in code. The Standard Library *function templates* `std::size()` and `std::size_bytes()` can be used to provide the number of array elements and amount of memory used, respectively. The value returned from either of these functions can be used in expressions declared with `constexpr`.\n", + "\n", + "Here, `size(name)` would return 6, while `size_bytes(numbers)` would return 20, assuming 32-bit `int`s. Older variants of C++ only provided the built-in `sizeof()` operator, which returns the same as `size_bytes()`.\n", + "\n", + "## Range-for\n", + "\n", + "We've previously seen that string literals can be output using `cout` without concerning ourselves with the details. The built-in `for` command can be used over a *range of values* applying the same operation(s) to each one in turn. This type of `for` statement is known as a *range-based for loop*, or range-for for short.\n", + "\n", + "A range-for statement can have either two or three parts enclosed by parentheses. The initializer statement (the same as for `if` and `switch` statements) is the optional first part, and is followed, if present, with a **semi-colon**. Then follows the *for-loop variable* definition, which can be declared with either `auto` or with an explicit type, and with optional `const` (constant) and `&` (reference) or `&&` (universal reference) qualifiers. Then a **colon** separates this from the expression to be *iterated* over, known as the *range expression*. This program demonstrates simple use of a two-part range-for (without an initializer):" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "5c192696", + "metadata": {}, + "outputs": [], + "source": [ + "// 05-range-for.cpp : print a string literal vertically\n", + "\n", + "#include \n", + "using namespace std;\n", + "\n", + "int main() {\n", + " for (auto c : \"Dinah\") {\n", + " cout << \"- \" << c << '\\n';\n", + " }\n", + "}" + ] + }, + { + "cell_type": "markdown", + "id": "d5161b67", + "metadata": {}, + "source": [ + "Here the for-loop variable `c` is deduced (due to the use of `auto`) to be of type `char`, the type of a single element of the range expression `\"Dinah\"`. The contents of the variable is actually a *copy* of a single element in the range expression; if `auto&` were used instead it would be a reference to a single element within the range expression, and assignment **to** it would mutate the range expression itself. The for-loop variable is then sent to `cout` as a single character literal.\n", + "\n", + "**Experiment**\n", + "\n", + "* Does this program produce any unwanted output? Find two different ways of fixing this.\n", + "\n", + "* Change `auto` to `char` and try to note any changes. Change this again to `int`. What is the output now?\n", + "\n", + "* Now change back to use of `auto` and try using the other types of string literal (recall the prefixes `u8`, `u` and `U`). Does the program still compile and produce the correct output? What about when non-UTF7 characters are used in the string literal?\n", + "\n", + "* Add an initializer statement of type `bool` and make the program output `D,i,n,a,h` using the same range expression. Hint: you might need to use `if` statements.\n", + "\n", + "* Now declare a separate variable as an array of `char` and use this named variable as the range expression.\n", + "\n", + "* Declare `c` as a reference variable. Does the program still compile? What could be the use of this when using a non-`const` range expression?\n", + "\n", + "In fact, range-for loops can be used with any type of array or container which supports `std::begin()` and `std::end()` (or has member functions with these names), not just built-in types. However, further discussion of creating your own types which can be iterated over in this way is beyond the scope of this Tutorial.\n", + "\n", + "## Pointers\n", + "\n", + "We have learned that subscripting syntax can be used with string literals and built-in arrays. You may be surprised to learn that subscripting also works with pointers. So what exactly is a pointer in C++?\n", + "\n", + "A pointer is a variable that holds a machine address, and is therefore on most modern machines a 64-bit value. Pointers can be `const` or point to `const` data, or both; they can also be typed or untyped (subscripting only works on typed pointers). In addition they can hold the value `nullptr` which safely indicates an invalid memory address.\n", + "\n", + "Assigning a string literal directly to a variable declared with `auto` actually assigns a pointer to the first character of the (read-only) string literal. Thus the following two assignments are identical:" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "a46b7470", + "metadata": {}, + "outputs": [], + "source": [ + "auto s1 = \"Dinah\";\n", + "const char *s2 = \"Dinah\";" + ] + }, + { + "cell_type": "markdown", + "id": "44c84165", + "metadata": {}, + "source": [ + "In each case subscripting syntax can be used, in this case from zero up to five, and individual elements can be compared or output. Directly comparing the values of the two pointers, compares the memory addresses, not the value(s) they point to, as shown here:" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "a3d7ae94", + "metadata": {}, + "outputs": [], + "source": [ + "if (s1[0] == s2[0]) { /*...*/ } // condition test evaluates to true\n", + "if (s1 == s2) { /*...*/ } // condition test (probably) evaluates to false\n", + " // (your compiler may optimize the two data entities into one)" + ] + }, + { + "cell_type": "markdown", + "id": "2129cb7b", + "metadata": {}, + "source": [ + "Pointer variables are defined **using an asterisk in all cases** (except that it is optional when using `auto`). An asterisk is also used to *dereference* a pointer, that is access the value it \"points to\". The following program defines a variable `i` and a pointer `p` that points to it (that is `p` holds `i`'s machine address). The type of `i` is `int` while the type of `p` is `int*`. A variable `j` is used to hold user input:" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "5af214cb", + "metadata": {}, + "outputs": [], + "source": [ + "// 05-pointer.cpp : write a variables value through a pointer\n", + "\n", + "#include \n", + "using namespace std;\n", + "\n", + "int main() {\n", + " int i{ -1 }, j{};\n", + " int *p; // define p as an int*\n", + " p = &i; // set p to address of i\n", + " cout << \"(1) p = \" << p << \", *p = \" << *p << \", i = \" << i << '\\n';\n", + " cout << \"Please enter an integer: \";\n", + " cin >> j;\n", + " *p = j; // assign the value of j to the variable p points to\n", + " cout << \"(2) p = \" << p << \", *p = \" << *p << \", i = \" << i << '\\n';\n", + "}" + ] + }, + { + "cell_type": "markdown", + "id": "f9fd98ec", + "metadata": {}, + "source": [ + "Running this program produced the following output, with the input value of `j` being `10`:\n", + "\n", + "```\n", + "(1) p = 0x7ffd3082cf04, *p = -1, i = -1\n", + "Please enter an integer: 10\n", + "(2) p = 0x7ffd3082cf04, *p = 10, i = 10\n", + "```\n", + "\n", + "In this program the definition `int *p;` makes `p` a pointer to an `int`. At this point in the program it has not been assigned to, and is therefore an *uninitialized pointer*. (We could have explicitly or implicitly initialized it with `nullptr`, if desired.) The syntax `&i` means address-of `i` (the memory address of **any** variable can be obtained by preceding it with an ampersand in this way) and this value is assigned to `p`. Be careful not to confuse this with the definition of a reference, where the ampersand is **on the opposite side** of the equals sign.\n", + "\n", + "The value of the entity `p` points to can be output by sending `*p` to `cout`. Changing `*p` also changes `i`; this behavior might surprise you, it's almost as if `i` has changed without permission. As can be seen from the output of this program, `p` has the same value throughout, while `*p` and `i` change together.\n", + "\n", + "**Experiment**\n", + "\n", + "* Modify this program so that `p` is defined on the same line as `i` and `j`.\n", + "\n", + "* Now modify the program so the variable `j` is not needed. Is the output the same?\n", + "\n", + "* Now modify the program again so that p is initialized on the same line as `i`. The `main()` function should now comprise of just 5 lines.\n", + "\n", + "## While loops\n", + "\n", + "We have tested for a condition being true or false at a single point during the program, using the `if` statement. Often, it is desired for an operation to continue for as long as a certain condition is met. The `while` statement begins a loop with a *pre-condition test*, where the body is only ever executed if the condition evalautes to `true`. The condition is written within parentheses and the test is done **before** the loop is entered, thus it is perfectly legal and possible for a `while` loop to execute zero times.\n", + "\n", + "The following program shows a `while` loop being used to iterate over the contents of an array of `char`, which has been entered by the user, as long as the element pointed to by `p` is non-zero:" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "031ca83a", + "metadata": {}, + "outputs": [], + "source": [ + "// 05-while.cpp : print a user-entered string vertically\n", + "\n", + "#include \n", + "using namespace std;\n", + "\n", + "int main() {\n", + " char str[20];\n", + " cout << \"Please enter a string (up to \"\n", + " << size(str) - 1 << \" characters):\\n\";\n", + " cin.getline(str, size(str));\n", + " const char *p = str;\n", + " while (*p) {\n", + " cout << \"- \" << *p << '\\n';\n", + " ++p;\n", + " }\n", + "}" + ] + }, + { + "cell_type": "markdown", + "id": "494b0347", + "metadata": {}, + "source": [ + "A few new things to notice about this program:\n", + "\n", + "* The size of the array called `str[]` is set by an integer constant, and this value is needed twice more in the program where it is accessible as `size(str)`, a compile-time value. An alternative way would be to use a constant or macro at each point the value is needed, and this was the only way to avoid repeated *magic constants* in older versions of C++ which did not provide `std::size()`.\n", + "\n", + "* The function `cin.getline()` (actually the dot `.` indicates that `getline()` is a *member function* of `cin`, more about these in Chapter 6) is called to read keyboard input into `str[]`. This reads input directly into the memory location provided by the first argument up to a maximum number of characters (including the zero terminator) as set by the second argument. A new-line character `'\\n'` is **never** stored and any extra input which doesn't fit into `str[]` is saved for future calls to `cin`.\n", + "\n", + "* The pointer `p` is set to the first character of `str[]`, and the type `const char *` specifies that we do not wish to modify **what it points to**. In fact assigning an array to a (correctly typed) pointer is an implicit conversion, which is known as *array decay* because the size attribute is \"lost\". This also occurs when calling a function using an array as an argument, to either a pointer **or** (non-sized) array parameter. (In the same manner as when using `i` and `j` for temporary `int` variables, `p` is a common name for a pointer.)\n", + "\n", + "* The **dereferenced** value `*p` is checked against zero by the `while` loop condition test, and if it is non-zero then it is sent to `cout` by the body of the `while` loop. The increment of `p` (actually a pre-increment, the one you should prefer given a choice) is necessary to prevent an infinite loop outputting the first character of `str[]`. This increases the value of `p` by one in order to point it towards the next character (of `str[]`), a process which repeats until the terminating zero-byte is reached. Importantly, `str[]` is left unchanged and remains able to be used again; this is the motivation for assigning `str` to a different (mutable) pointer variable `p`, instead of using `++str`.\n", + "\n", + "**Experiment**\n", + "\n", + "* Enter a blank line as input. Is there any output produced? Why is this?\n", + "\n", + "* Swap the lines starting `cin.getline` and `const char *p`. Does the program still compile? Why do you think this is?\n", + "\n", + "* Change the while loop to be a single line, by moving the `++` operator into the `cout` statement. Hint: you will have to use `p++` instead of `++p`; do you understand why?\n", + "\n", + "* Now remove the braces from the body of the `while` loop. Does the code still compile? What would happen if another line were added to the body of the loop?\n", + "\n", + "## For loops\n", + "\n", + "A standard `for` loop is similar to a `while` loop in that it has a pre-condition test. A common historical use for `for` loops is to iterate over an array using subscript syntax, rather that pointers. A `for` loop has three parts enclosed within parentheses, any of which can be empty, each part (empty or otherwise) separated by a semi-colon. The first part is an initializer, as in a three clause range-for loop. This typically initializes a single variable known as the *loop counter*, whose scope is the body of the `for` loop (only). The second part is the condition test, which functions exactly the same way as that in a `while` loop; if empty it evaluates to `true`, which causes an *infinite loop*. The third part is an iteration statement to be executed **after** each time the body of the loop has been executed.\n", + "\n", + "The following program defines and assigns to an array of `int` called `a`, and outputs each element in turn (on the same line), by subscripting the array with loop counter `i`:" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "bfa4e836", + "metadata": {}, + "outputs": [], + "source": [ + "// 05-for.cpp : output an array using for\n", + "\n", + "#include \n", + "using namespace std;\n", + "\n", + "int main() {\n", + " int a[]{ 9, 8, 7, 6, 5, 4 };\n", + " for (int i{ 0 }; i != 6; ++i) {\n", + " cout << a[i] << ' ';\n", + " }\n", + " cout << '\\n';\n", + "}" + ] + }, + { + "cell_type": "markdown", + "id": "cef5a372", + "metadata": {}, + "source": [ + "Output from this program:\n", + "\n", + "```\n", + "9 8 7 6 5 4 \n", + "```\n", + "\n", + "Notice that the *loop counter* `i` is initialized to zero and has this value on the first pass through the `for` loop. The test `i != 6` is true exactly `6` times (with `i` having the values `0`, `1`, `2`, `3`, `4` and `5` in turn); this matches **all** of the the valid array indices of `a[]`. Use of `i != 6` in this way is usually considered better C++ programming style than `i < 6`, or the even worse `i <= 5` (neither of which is actually any \"safer\" in practice). This program produces trailing space in its output, which isn't ideal, but we'll ignore this defect for now. The last statement in `main()` outputs a newline, and being outside of the body of the loop is executed only once.\n", + "\n", + "**Experient**\n", + "\n", + "* Change both the size of `a[]` and the number in the condition test to `10`, without altering the braced initializer. What do you notice about the output? Can this be relied upon?\n", + "\n", + "* Now change the condition test to automatically track the size of the array. Hint: use `std::size()`.\n", + "\n", + "* Rewrite the program to use a `while` loop instead of `for`. What similarities do you notice? What is the scope of the loop counter?\n", + "\n", + "* Write a program to accept five `double`'s as user input into a suitable array and then print them out on separate lines. Hint: use two (non-nested) `for` loops\n", + "\n", + "* Now write a program to output a countdown from `10` to `0` inclusive, printing `Blastoff!` at the end. Hint: use a suitable condition test and iteration statement with a `for` loop.\n", + "\n", + "Loops in production code are often more difficult to decipher than the one shown above, partly because the simplest cases are handled by range-for. The following program has two loop counters, either of which could be tested in the condition test part of the `for` loop:" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "b0c7484f", + "metadata": {}, + "outputs": [], + "source": [ + "// 05-christmas.cpp : calculate total number of gifts from the popular song\n", + "\n", + "#include \n", + "using namespace std;\n", + "\n", + "int main() {\n", + " int gifts{ 0 };\n", + " for (int i{ 1 }, j{ 12 }; i <= 12; ++i, --j) {\n", + " gifts += i * j;\n", + " }\n", + " cout << gifts << \" gifts in total.\\n\";\n", + "}" + ] + }, + { + "cell_type": "markdown", + "id": "b50dc4d2", + "metadata": {}, + "source": [ + "A few things to note about this program:\n", + "\n", + "* The variable `gifts` is defined and initialized outside of the `for` loop and is able to be referenced after the loop has completed.\n", + "\n", + "* There are two variables defined and initialized in the loop initializer, therefore there is no clear distinction as to which, if either, is the loop counter.\n", + "\n", + "* The condition test uses `<=` (against previous advice), which also has the effect of indicating the loop counter to be `i`.\n", + "\n", + "* The construct `++i, --j` uses the *sequencing operator* (comma) to sneak two operations in where only a single statement is allowed. This use of comma is rare; another possible use is in ternary expressions.\n", + "\n", + "* The add-assign operator (`+=`) is used as a shorthand for an assignment to self followed by an addition. It is often used in C++, as well as other operator-assign expressions.\n", + "\n", + "**Experiment**\n", + "\n", + "* Find two different alternatives to the condition test (`i <= 12`) that continue to work correctly.\n", + "\n", + "* Find a way to dispense with the variable `j`. Consider whether this version is clearer to understand.\n", + "\n", + "## Do-while loops\n", + "\n", + "A `do`-`while` loop is unique in having a *post-condition test*; thus the loop body is guaranteed to execute **at least once**. The loop begins with the `do` keyword followed immediately by the loop body (which would usually be delimited by braces). The loop ends with the `while` keyword followed by the loop post-condition test in parentheses and then a (mandatory) trailing semi-colon.\n", + "\n", + "Do-while loops are similar to \"repeat-until\" loops of other languages, except that the post-condition test is logically inverted. Use of `do` and `while` has been criticised in the past, mainly because the indentation of the body of the loop is visually misleading; it is always executed but could be interpreted as not being so from a casual glance. A better result can usually be achieved using a `while` loop and some duplication of code.\n", + "\n", + "The following program asks for input from the user repeatedly until the input provided is valid:" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "12207f3f", + "metadata": {}, + "outputs": [], + "source": [ + "// 05-do-while.cpp : use of post-condition loop to validate user input\n", + "\n", + "#include \n", + "using namespace std;\n", + "\n", + "int main() {\n", + " int i{};\n", + " do {\n", + " cout << \"Please enter a negative number: \";\n", + " cin >> i;\n", + " } while (i >= 0);\n", + " cout << \"You entered: \" << i << '\\n';\n", + "}" + ] + }, + { + "cell_type": "markdown", + "id": "471a7e5e", + "metadata": {}, + "source": [ + "The variable `i` is defined before the loop, so it is still in scope after the loop completes. The `do`-`while` loop then repeats indefinitely until a negative number has been entered. To provide a comparison, here is an exactly equivalent program, written with a regular `while` loop instead:" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "aa22a123", + "metadata": {}, + "outputs": [], + "source": [ + "// 05-not-do-while.cpp : alternative to post-condition loop\n", + "\n", + "#include \n", + "using namespace std;\n", + "\n", + "int main() {\n", + " int i{};\n", + " cout << \"Please enter a negative number: \";\n", + " cin >> i;\n", + " while (i >= 0) {\n", + " cout << \"Please enter a negative number: \";\n", + " cin >> i;\n", + " }\n", + " cout << \"You entered: \" << i << '\\n';\n", + "}" + ] + }, + { + "cell_type": "markdown", + "id": "504b2564", + "metadata": {}, + "source": [ + "Notice that this pre-condition test (after the `while` keyword) is identical to the previously used post-condition test. Also, the regular `while` loop version offers the opportunity for an alternate message such as `\"Invalid input! Please try again: \"` to be printed, in order to aid the user should they get the first input attempt wrong.\n", + "\n", + "**Experiment**\n", + "\n", + "* Write a program to output a countdown from a user-entered positive integer down to zero using two `do`-`while` loops.\n", + "\n", + "## Break and continue\n", + "\n", + "All of the loop constructs we have met so far have repeated predictably so long as a condition is met (either a boolean condition with `while`, `for` and `do`-`while`, or until the range expression had been iterated over fully in the case of range-for).\n", + "\n", + "Similar to its use with `switch`, `break` allows us to *break out of* a loop early; typically it is used within an `if` clause. When encountering a `break` statement, control flow jumps immediately to the first statement **outside** of the loop body, which causes an early exit from the loop into its immediately enclosing scope.\n", + "\n", + "The `continue` keyword is used to *jump back* to the beginning of the loop, again it is typically used within an `if` clause. When encountering `continue` in a `while` loop, control flow returns to the condition test. With regular `for` the same happens, but the iteration statement is executed too. For `do`-`while` loops control flow jumps to the `do` statement, while for range-for the for-loop variable gets the value of the next element of the range expression.\n", + "\n", + "In the case of nested loops, use of `break` may not operate as required as it only breaks out of the innermost loop. In this case a `goto` statement inside the loop, and a target label after the end the outermost loop can be used, although this is very rarely needed.\n", + "\n", + "The following program shows `break` and `continue` operating in what would otherwise be an infinite loop:" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "a51c88aa", + "metadata": {}, + "outputs": [], + "source": [ + "// 05-break-continue.cpp : use of control flow commands in loop\n", + "\n", + "#include \n", + "using namespace std;\n", + "\n", + "int main() {\n", + " for (;;) {\n", + " int i{};\n", + " cout << \"Please enter a positive number (zero to quit): \";\n", + " cin >> i;\n", + " if (i == 0) {\n", + " break;\n", + " }\n", + " if (i < 0) {\n", + " continue;\n", + " }\n", + " cout << \"You entered: \" << i << '\\n';\n", + " }\n", + " cout << \"Program ended\\n\";\n", + "}" + ] + }, + { + "cell_type": "markdown", + "id": "09b5c656", + "metadata": {}, + "source": [ + "Notice that no output is produced when entering a negative number, and that the only way to quit the program is to enter `0`. The `for (;;)` loop (read as \"forever\") iterates repeatedly because the empty condition test always evaluates to `true`, as mentioned previously.\n", + "\n", + "**Experiment**\n", + "\n", + "* Does the order of the `if` clauses make a difference in this program? Is there a motivation to use `else if`?\n", + "\n", + "* Write a program which uses a regular `for` loop with an empty condition test, and an increment operator as the iteration expresssion, which outputs all the **even** numbers between zero and 20 (inclusive)\n", + "\n", + "* Write a program which asks for a positive integer, and outputs all positive even numbers between zero and this number (inclusive, if the input is even).\n", + "\n", + "## Array decay and pointer arithmetic\n", + "\n", + "It is possible for a function to accept built-in array as a parameter, however any size information previously known to the compiler is lost. Therefore there is no advantage in declaring the parameter as an (non-sized) array type, as opposed to a pointer type. The following program demonstrates two functions which are equivalent:" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "92a223c1", + "metadata": {}, + "outputs": [], + "source": [ + "// 05-array-decay.cpp : demonstrate equivalence of pointer vs array parameters\n", + "\n", + "#include \n", + "using namespace std;\n", + "\n", + "void print_arr(const char s[]) {\n", + " while (*s) {\n", + " cout << *s++;\n", + " }\n", + " cout << '\\n';\n", + "}\n", + "\n", + "void print_ptr(const char *s) {\n", + " while (*s) {\n", + " cout << *s++;\n", + " }\n", + " cout << '\\n';\n", + "}\n", + "\n", + "int main() {\n", + " print_arr(\"Hello\");\n", + " print_ptr(\"World\");\n", + "}" + ] + }, + { + "cell_type": "markdown", + "id": "83ab17fd", + "metadata": {}, + "source": [ + "A couple of things to note about this program:\n", + "\n", + "* As a constant string literal is passed, both functions **need** the parameter to be qualified with `const`. This means that the variable `*s` cannot modify what it points to, although it can itself be modified (for example, by being incremented as shown here). If desired, it is possible for the pointer to be non-modifiable too, by utilizing a second `const` as in: `const char * const s`, however the ability to modify it is needed by both these functions.\n", + "\n", + "* The non-sized array variable accepted by `print_arr()` is able to be dereferenced and incremented in exactly the same way as the pointer accepted by `print_ptr()`. Once either has been modified, the original reference is lost; notice that the bodies of both functions are identical.\n", + "\n", + "It should be understood that when passing an array to a function, only a pointer to the first element is in fact passed. Thus it is similar in concept to pass-by-reference, that is a function which modifies an array passed to it also modifies the same entity as visible in the calling function.\n", + "\n", + "**Experiment**\n", + "\n", + "* Remove the `const` qualifiers and modify `main()` so that the program still compiles. Hint: use built-in arrays instead of string literals.\n", + "\n", + "* Now try to modify the first letter within either function. Does this change the array within `main()`?\n", + "\n", + "Should the length of an array be needed in the callee function, then this needs to be passed as a separate parameter. Both pointer and array parameters can be indexed using either square brackets or *pointer arithmetic*, as shown in this example program:" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "5d207c15", + "metadata": {}, + "outputs": [], + "source": [ + "// 05-pointer-index.cpp : demonstrate array indexing and pointer arithmetic\n", + "\n", + "#include \n", + "using namespace std;\n", + "\n", + "void print_arr(const char s[], size_t n) {\n", + " for (int i = 0; i != n; ++i) {\n", + " cout << s[i];\n", + " }\n", + "}\n", + "\n", + "void print_ptr(const char *s, size_t n) {\n", + " for (int i = 0; i != n; ++i) {\n", + " cout << *(s + i);\n", + " }\n", + "}\n", + "\n", + "int main() {\n", + " print_arr(\"Hello, \", 7);\n", + " print_ptr(\"World!\", 6);\n", + " cout << '\\n';\n", + "}" + ] + }, + { + "cell_type": "markdown", + "id": "366ba749", + "metadata": {}, + "source": [ + "**Experiment**\n", + "\n", + "* What happens if the length argument passed to the function is too small? Or too large?\n", + "\n", + "* Swap the bodies of the two functions over. Does the program still compile and run correctly?\n", + "\n", + "* Add a second `const` qualifier to `s` in `print_ptr()`. Does the program still compile? Does this surprise you?\n", + "\n", + "It is important to understand that there is **no difference** between `array[0]` and `*array`, nor between `array[n]` and `*(array + n)`. Pointer arithmetic works by adding *n times sizeof element* to the array or pointer variable, which itself is a machine address. (The only type of pointer that does not support this is `void *`.)\n", + "\n", + "## Environment\n", + "\n", + "We have met the concept of the return value of `main()` being passed back to the calling environment. A different definition of `main()` accepts arguments from the calling environment, as a pointer to a list of pointers to string literals, as well as the length of this list. Of course, you are free to name these variables how you wish, but by convention they are called `argv` and `argc`.\n", + "\n", + "The following program prints out all of the arguments it is called with at run-time:" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "7877c65b", + "metadata": {}, + "outputs": [], + "source": [ + "// 05-args.cpp : print out all arguments\n", + "\n", + "#include \n", + "using namespace std;\n", + "\n", + "int main(int argc, char *argv[]) {\n", + " for (int i{ 0 }; i != argc; ++i) {\n", + " cout << \"Argument \" << i << \": \" << argv[i] << '\\n';\n", + " }\n", + "}" + ] + }, + { + "cell_type": "markdown", + "id": "cb4a2eea", + "metadata": {}, + "source": [ + "A few things to note about this program:\n", + "\n", + "* A range-for loop cannot be used here, as `argv[]` contains no size information within `main()` (array decay).\n", + "\n", + "* The second parameter can instead be declared as `char **argv` with no change in meaning.\n", + "\n", + "* Each string literal is accessed as `argv[N]`, the first character of which would be `argv[N][0]`. The memory holding the string literals should be regarded as read-only. (It is possible, and common, to declare `argc` and `argv` as `const`.)\n", + "\n", + "* The string literal `argv[0]` is the name of the program as called when run, while `argv[argc]` is always a null pointer.\n", + "\n", + "**Experiment**\n", + "\n", + "* Run the program with no arguments. Is there any output? What is the value of `argc`?\n", + "\n", + "* Now try multiple arguments inside double quotes (`\"`). How are these handled?\n", + "\n", + "* Now try options beginning with `-`, `--` or `/`. Could you construct a program which recognizes these as run-time option switches?\n", + "\n", + "## The begin() and end() family\n", + "\n", + "We have seen the traversal of an array by comparing a pointer against zero (as you will recall a string literal is a read-only zero-terminated array of `char`) using a `while` loop, and have iterated over an array using a `for` loop and index variable with array syntax, and have also used range-for. There is another way which does not require the use of a special sentinel value at the end of the array, nor a loop counter. It does, however, require the use of pointers and comparison of **addresses**. (In fact, this is how range-for operates \"under the hood\".)\n", + "\n", + "The following program outputs a list of integers stored within an array:" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "2e282392", + "metadata": {}, + "outputs": [], + "source": [ + "// 05-begin-end.cpp : demonstration of the use of begin() and end()\n", + "\n", + "#include \n", + "using namespace std;\n", + "\n", + "int main() {\n", + " int numbers[] = { 3, 4, 1, 5, 6, 2 };\n", + " for (auto p = begin(numbers); p != end(numbers); ++p) {\n", + " cout << \"- \" << *p << '\\n';\n", + " }\n", + "}" + ] + }, + { + "cell_type": "markdown", + "id": "9abeb2e7", + "metadata": {}, + "source": [ + "A couple of things to note about this program:\n", + "\n", + "* The `for`-loop is not range-for, however it almost reads like English.\n", + "\n", + "* There is no loop counter, only the pointer `p`, which is dereferenced to provide a single element for output.\n", + "\n", + "**Experiment**\n", + "\n", + "* Change the type of `numbers[]` to double. Is this the only change needed? Add some fractional values to the array. Do they print out correctly? What does this tell you about the preservation of type information in pointers?\n", + "\n", + "* Rewrite the `for`-loop without using `auto`, `begin()` or `end()`. Hint: don't change the body, and use `double*` in the initialization and `numbers + size(numbers)` in the condition test.\n", + "\n", + "In fact, `begin()` and `end()` return pointer values for built-in arrays, these pointers actually contain the address of the first element and the address of \"one past the last\" element. When referencing arrays which are constant, `cbegin()` and `cend()` can be used, which return `const` pointers. The family is complemented with variants which access a \"reversed\" array.\n", + "\n", + "The following table lists all eight members of the `begin()`/`end()` family, where `array[]` is the name of a built-in array with elements of any type, and `N` is `std::size(array)` (the number of elements). Note that `&array[N]` and `&array[-1]` **are** legal pointer values, but they must **never** be dereferenced.\n", + "\n", + "| Function name | Index Syntax | Pointer Syntax |\n", + "|:-------------------:|:------------:|:---------------:|\n", + "| begin(), cbegin() | &array[0] | array |\n", + "| end(), cend() | &array[N] | (array + N) |\n", + "| rbegin(), crbegin() | &array[N-1] | (array + N - 1) |\n", + "| rend(), crend() | &array[-1] | (array - 1) |\n", + "\n", + "**Experiment**\n", + "\n", + "* Print out `numbers[]` backwards. Is the use of `++p` still correct? Does this surprise you? Hint: use `rbegin()` and `rend()`.\n", + "\n", + "* Now try to fill `numbers[]` with all `1`s. What error message do you get when using `cbegin()`/`cend()`?\n", + "\n", + "* Now modify the program so that only the last element of the array is printed out, whatever size the array is.\n", + "\n", + "*All text and program code ©2019-2025 Richard Spencer, all rights reserved.*" + ] + } + ], + "metadata": { + "jupytext": { + "cell_metadata_filter": "-all" + }, + "kernelspec": { + "display_name": "C++ 23", + "language": "c++", + "name": "cpp23" + } + }, + "nbformat": 4, + "nbformat_minor": 5 +} diff --git a/jupyter-notebooks/06-enums-and-structs.ipynb b/jupyter-notebooks/06-enums-and-structs.ipynb new file mode 100644 index 0000000..d1a7bd5 --- /dev/null +++ b/jupyter-notebooks/06-enums-and-structs.ipynb @@ -0,0 +1,691 @@ +{ + "cells": [ + { + "cell_type": "markdown", + "id": "fb297cee", + "metadata": {}, + "source": [ + "# Enums and Structs\n", + "\n", + "## Enumerations\n", + "\n", + "Some variables belong to a small, **closed** set; that is they can have exactly one of a list of values. The `enum` type and its closely related `enum class` type each define a set of (integer) values which a variable is permitted to have.\n", + "\n", + "Think of a complete pack of playing cards: each card has a suit and rank. Considering the rank first of all, here is how it can be represented and defined in C++:" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "58da13aa", + "metadata": {}, + "outputs": [], + "source": [ + "enum Rank : unsigned short { ace = 1, two, three, four, five, six, seven, eight, nine, ten, jack, queen, king, none = 99 };" + ] + }, + { + "cell_type": "markdown", + "id": "50af910f", + "metadata": {}, + "source": [ + "The name of this type is `Rank`, by convention for a user-defined type this is in *SentenceCase*. Following the colon `:` is the *underlying type*; this **must** be a built-in integer type (`char` is also allowed) and defaults to `int` if not specified. Since we have specified `unsigned short` we can assign values from `0` to `65535` (most likely, however strictly speaking this is implementation dependent). Then, within curly braces are a list of comma-separated *enumerators*, each of which can optionally have values specified. We have set `ace = 1` instead of relying on the default value of zero for the first enumerator because it allows both the internal value and its conceptual representation to be the same; although this is not mandatory it is good programming style. Subsequent enumerators take the next sequentially available value.\n", + "\n", + "A variable of type `enum` (also known as *plain* enum), such as `Rank` above, can be initialized from any of the enumerators listed in its definition. However, care should be taken not to assign values not in its enumeration set; this includes default-initialization if zero is not one of the enumerators:" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "3398e8f7", + "metadata": {}, + "outputs": [], + "source": [ + "Rank r1{ ace }; // ok, r1 is value of enumeration constant ace (1)\n", + "Rank r2{}; // possible problem, r2 has value zero which is not in enumeration set\n", + "Rank r3; // worse, r3 has \"random\" (uninitialized) value\n", + "Rank r4{ 15 }; // possible problem, r4 has a value not within enumeration set\n", + "auto r5 = king; // ok, r5 is of type Rank (not unsigned short)\n", + "int i = seven; // ok, implicit conversion to integral type" + ] + }, + { + "cell_type": "markdown", + "id": "62a459fc", + "metadata": {}, + "source": [ + "It may be surprising to discover that in most ways `ace`, `two`, `three`, `four` and so on are just \"normal\" integer constant values. (Indeed in some historical versions of the C language, the only way to define constants was by using anonymous `enum`s; this curiosity was given the affectionate name of the \"enum hack\".) Thus variables of type `enum` can \"borrow\" enumerators from different types of `enum`s! Even worse, enumerators from different `enum` definitions in the same scope could **not** use the same name without causing a name collision.\n", + "\n", + "To address these limitations the C++ `enum class` type was created; this type is also known as *scoped* or *strongly typed* enumeration. We can represent the suit of a card using this type:" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "c7a6f247", + "metadata": {}, + "outputs": [], + "source": [ + "enum class Suit : char { spades = 'S', clubs = 'C', hearts = 'H', diamonds = 'D', none = '\\?' };" + ] + }, + { + "cell_type": "markdown", + "id": "5890f166", + "metadata": {}, + "source": [ + "The difference in syntax is small, we have `enum class Suit` compared to `enum Rank`, although this time the underlying type is `char` and character literals are used for the enumerators. However the `none` in `Suit` does not clash with `none` in `Rank`, and related to this feature the enumerators in an `enum class` have to be qualified with the type name, as follows:" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "51102220", + "metadata": {}, + "outputs": [], + "source": [ + "Suit s1 = Suit::hearts; // good, types match\n", + "Suit s2{}; // possible problem, s2 has value zero (NUL-byte)\n", + "Suit s3{ 'S' }; // ok, perhaps surprisingly\n", + "auto s4 = Suit::diamonds; // s4 is of type Suit\n", + "char c = Suit::none; // error: no implicit conversion to underlying type, static_cast needed" + ] + }, + { + "cell_type": "markdown", + "id": "a09ee1f8", + "metadata": {}, + "source": [ + "**Experiment:**\n", + "\n", + "* Write a program to populate a 13-element array of `Rank` with element `[0]` taking `ace`. Cause it to print this array in reverse order.\n", + "\n", + "* Write a function which outputs one of `Spades`, `Clubs`, `Hearts` or `Diamonds` based on its single parameter of type `Suit`.\n", + "\n", + "## Member variables\n", + "\n", + "Of course, in the context of a pack of playing cards it is not practical to think of \"suit\" and \"rank\" as separate entities: each playing card has both. The term *composite types* is used to describe objects composed from other types (either built-in or user-defined). The following `struct` definition is an example of a composite type, containing two fields (also called *member variables*) using the `enum` and `enum class` types already introduced:" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "5752fed3", + "metadata": {}, + "outputs": [], + "source": [ + "struct PlayingCard {\n", + " Rank rank;\n", + " Suit suit;\n", + "};" + ] + }, + { + "cell_type": "markdown", + "id": "2c99bb44", + "metadata": {}, + "source": [ + "This `struct` type is named `PlayingCard`, again using sentence case. The fields of the `struct` are listed between braces like variable definitions, type-then-name, separated by semi-colons; there is also a **mandatory** semi-colon after the closing brace. The order of the fields is not usually significant; we have put `Rank` first as it is a 16-bit value compared to `Suit` being 8-bit, which makes the `struct`'s logical memory layout more sensible. (There is probably no gap between the fields in memory layout in this case, but `PlayingCard` is probably padded out to 32-bits at the end.) Also, this layout matches the usual order of the description of a card, such as \"Three of Clubs\".\n", + "\n", + "Instances (variables) of type `PlayingCard` are examples of what are often called *objects* (as in *Object Oriented Progamming*, or *OOP*), and they can be defined and initialized in a similar way to arrays and containers using uniform initialization syntax. The code below demonstrates how to create the first card in the pack, and how to extract the object's fields back into separate variables:" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "7185eab1", + "metadata": {}, + "outputs": [], + "source": [ + "PlayingCard ace_of_spades{ ace, Suit::spades };\n", + "\n", + "auto the_rank1 = ace_of_spades.rank; // the_rank1 = ace, and is of type Rank\n", + "auto the_suit1 = ace_of_spades.suit; // the_suit1 = Suit::spades, and is of type Suit\n", + "\n", + "auto [ the_rank2, the_suit2 ] = ace_of_spades; // the_rank2 = ace, the_suit2 = Suit::spades, types as previously" + ] + }, + { + "cell_type": "markdown", + "id": "86a77269", + "metadata": {}, + "source": [ + "The variables `the_rank1` and `the_suit1` are initialized from the individual fields of `ace_of_spades` separately using *dot-notation*, while `the_rank2` and `the_suit2` are initialized using *aggregate initialization* syntax.\n", + "\n", + "**Experiment:**\n", + "\n", + "* Put the definitions of `Rank`, `Suit` and `PlayingCard` into the same source file, together with a `main()` program which defines `ace_of_spades` as shown above. Does the order of the `enum` and `struct` definitions matter?\n", + "\n", + "* What error message do you get if you swap `ace` and `Suit::spades` over in the definition of `ace_of_spades`. Would this error be easy to catch if plain `int` values were used instead of typed enumerators?\n", + "\n", + "It may be desirable to create `struct`s with multiple fields of the same type. An example of this is a simple two-dimensional `Point` class with fields (or data members) called `x` and `y`, both being signed integers:" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "57f86d61", + "metadata": {}, + "outputs": [], + "source": [ + "struct Point {\n", + " int x{}, y{};\n", + "};\n", + "\n", + "Point p1{ 2, 3 };" + ] + }, + { + "cell_type": "markdown", + "id": "af9e33de", + "metadata": {}, + "source": [ + "As the field variables are of the same type they can be defined together, separated by a comma. The empty braces `{}` mean the same thing as for `int` variable definitions, `x` and `y` will get the default value of the this type, being zero.\n", + "\n", + "A question you may ask is: \"Why not simply use a two-element array type?\", such as:" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "eef6f498", + "metadata": {}, + "outputs": [], + "source": [ + "using PointA = int[2];\n", + "\n", + "PointA p2{ 4, 5 };" + ] + }, + { + "cell_type": "markdown", + "id": "aa2662b7", + "metadata": {}, + "source": [ + "It's a valid question, and at the machine level produces (most likely) similar code. In this case using a `struct` has the edge because it default-initializes, and having fields called `p1.x` and `p1.y` is more intuitive and less error-prone than having to use subscripting syntax `p2[0]` and `p2[1]`.\n", + "\n", + "**Experiment:**\n", + "\n", + "* Write a program to obtain the two fields of a previously defined `Point` object from `cin`. Don't use any temporary variables.\n", + "\n", + "* Modify this program to manipulate these fields in some way (such as multiplying them by two) and output them.\n", + "\n", + "* Write a function called `mirror_point()` which reflects its input (of type `Point`) in both the x- and y-axes. Experiment with passing by value and `const`-reference (and returning the modified `Point`), and by reference and by pointer (two different `void` functions). Hint: for the last variant pass an address of `Point` and access the fields with `p->x` and `p->y`, and see the topics in Chapter 4: \"Parameters by value\" and \"Parameters by reference\" for a refresher. Compare all four versions of this function for ease of comprehension and maintainability.\n", + "\n", + "## Inheritance vs composition\n", + "\n", + "We have talked about composite types being made up of other types, and in fact types can be *composed* (nested) indefinitely, although many programmers would struggle to comprehend more than a few levels. The other way to create new types with characteristics of previously defined types is through *inheritance*, which is a key concept of OOP.\n", + "\n", + "The following program defines an `enum class` called `Color` (feel free to add more color enumerators) and uses the same `Point` class to create a new `Pixel` class, which has both a location and a color, by being composed of both `Point` and `Color` fields." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "33f94f55", + "metadata": {}, + "outputs": [], + "source": [ + "// 06-pixel1.cpp : Color and position Pixel type through composition\n", + "\n", + "#include \n", + "#include \n", + "using namespace std;\n", + "\n", + "struct Point {\n", + " int x{}, y{};\n", + "};\n", + "\n", + "enum class Color { red, green, blue };\n", + "\n", + "struct Pixel {\n", + " Point pt;\n", + " Color col{};\n", + "};\n", + "\n", + "string_view get_color(Color c) {\n", + " switch (c) {\n", + " case Color::red:\n", + " return \"red\";\n", + " case Color::green:\n", + " return \"green\";\n", + " case Color::blue:\n", + " return \"blue\";\n", + " default:\n", + " return \"\";\n", + " }\n", + "}\n", + "\n", + "int main() {\n", + " Pixel p1;\n", + " cout << \"Pixel p1 has color \" << get_color(p1.col);\n", + " cout << \" and co-ordinates \" << p1.pt.x;\n", + " cout << ',' << p1.pt.y << '\\n';\n", + "\n", + " Pixel p2{ { -1, 2 }, Color::blue };\n", + " cout << \"Pixel p2 has color \" << get_color(p2.col);\n", + " cout << \" and co-ordinates \" << p2.pt.x;\n", + " cout << ',' << p2.pt.y << '\\n';\n", + "}" + ] + }, + { + "cell_type": "markdown", + "id": "f84dac4b", + "metadata": {}, + "source": [ + "Most, if not all, of the syntax should be familiar, however a few things to note:\n", + "\n", + "* `Point` and `Color` must be defined before `Pixel`, as the fields of `Pixel` are variables of these two types.\n", + "\n", + "* Inside `Point`, `x` and `y` are default-initialized (to zero). This means that in `Pixel`, `pt` has aleady automatically default-initialized, while `col` has to be initialized explicitly.\n", + "\n", + "* The function `get_color()` uses a `Color` as the `switch`-variable, this is permitted because `enum` and `enum class` always have an integer as the underlying type.\n", + "\n", + "* This function returns a `std::string_view`, although `std::string` or `const char *` would work equally well. The data that the `std::string_view` refers to is guaranteed to outlive the scope of the function `get_color()` because they are read-only string literals; no copy is ever made. (The type `std::string_view` is covered in more detail in Chapter 7.)\n", + "\n", + "* In `main()` the variable `p1` is default-initialized to `Color::red` and `0,0` becuase of the default-initialization syntax in the `struct` definitions. The member variable `p1.col` is `red` because that is the enumeration with value zero (from default initlalization with `{}`).\n", + "\n", + "* The variable `p2` is set to `Color::blue` explicitly at initialization, with the co-ordinates `-1,2` using nested initializer syntax.\n", + "\n", + "* The member variables `x` and `y` are members of `Point`, `pt` is a member of `Pixel`, so the full names of `p2`'s two co-ordinates are `p2.pt.x` and `p2.pt.y`. This shows how the member operator `.` can be chained in this way (it works for member functions, too), and operations remain fully type-safe.\n", + "\n", + "**Experiment:**\n", + "\n", + "* Write a function `get_pixel()` which returns information about a `Pixel`, and remove the code duplication in calls to `cout` from `main()`. Hint: the return type should be `std::string` and it should call `get_color()`; the code in `main()` should read: `cout << get_pixel(p1) << '\\n';`\n", + "\n", + "* Can you call `get_pixel()` from `main()` with a third `Pixel`, without using a named variable? Hint: try to use initializer syntax in the function call.\n", + "\n", + "* Change the default `Color` assigned to `p1` to be ``. Hint: this is a simple change, but is not in `main()`.\n", + "\n", + "The next program accomplishes exactly the same as the previous one, producing the same output, and most likely very similar code of comparable efficiency. It use *inheritance* instead of composition, however, which is indicated by a slightly different definition of `Pixel` and different use of dot-notation in `main()`:" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "c04b11b0", + "metadata": {}, + "outputs": [], + "source": [ + "// 06-pixel2.cpp : Color and position Pixel type through inheritance\n", + "\n", + "#include \n", + "#include \n", + "using namespace std;\n", + "\n", + "struct Point {\n", + " int x{}, y{};\n", + "};\n", + "\n", + "enum class Color { red, green, blue };\n", + "\n", + "struct Pixel : Point {\n", + " Color col{};\n", + "};\n", + "\n", + "string_view get_color(Color c) {\n", + " switch (c) {\n", + " case Color::red:\n", + " return \"red\";\n", + " case Color::green:\n", + " return \"green\";\n", + " case Color::blue:\n", + " return \"blue\";\n", + " default:\n", + " return \"\";\n", + " }\n", + "}\n", + "\n", + "int main() {\n", + " Pixel p1;\n", + " cout << \"Pixel p1 has color \" << get_color(p1.col);\n", + " cout << \" and co-ordinates \" << p1.x;\n", + " cout << ',' << p1.y << '\\n';\n", + "\n", + " Pixel p2{ { -1, 2}, Color::blue};\n", + " cout << \"Pixel p2 has color \" << get_color(p2.col);\n", + " cout << \" and co-ordinates \" << p2.x;\n", + " cout << ',' << p2.y << '\\n';\n", + "}" + ] + }, + { + "cell_type": "markdown", + "id": "7ce3a76c", + "metadata": {}, + "source": [ + "A few things to note about this program:\n", + "\n", + "* The definition syntax `struct Pixel : Point {...};` causes `Pixel` to be *derived* from `Point`, meaning `Pixel` inherits **all** of `Point`'s members. `Pixel` is therefore the *derived* class, while `Point` is the *base* class. Sometimes the terms *sub-class* and *super-class* are used to refer to derived and base respectively. (Due to the fact we are using inheritance, it might be considered natural to refer to the `struct` types defined here as \"classes\".)\n", + "\n", + "* The `pt` member variable has been removed as it is no longer used\n", + "\n", + "* The member variables `x` and `y` are now direct members of `p1` and `p2`, accessed using `p1.x` etc.\n", + "\n", + "**Experiment:**\n", + "\n", + "* What error message do you get it you change `p1.x` back to `p1.pt.x`. Would you understand what the compiler was saying?\n", + "\n", + "* Modify `get_pixel()` (written previously) to work with this program. Hint: The necessary changes should be very small.\n", + "\n", + "* Now try to inherit from both `Point` and `Color` (the syntax is: `struct Pixel : Point, Color {...};`). Does this work as expected? Why do you think this is?\n", + "\n", + "The concepts of inheritance and composition introduced here pose the question: \"Which is better?\" The literature tells us that inheritance represents *is-a* modeling and composition represents *has-a*. So which is more accurate: `Pixel` is-a `Point` (with a color), or `Pixel` has-a `Point` (and a color)? Personally, I think the first one is a better description, and would suggest that *is-a* inheritance should be used wherever practical to do so. In Chapter 9 we will meet inheritance again when describing more complex classes.\n", + "\n", + "## Member functions\n", + "\n", + "We have seen that the `struct Point` fields defined as `int x{}, y{};` can be acessed as member variables of objects using dot-notation such as `p1.x` and `p1.y`. Changing the types of `x` and/or `y` (to `double` for example) does not cause any problems, but renaming the fields to something different causes a compilation error as we would be trying to reference former members of `Point` which no longer exist.\n", + "\n", + "Our `struct Point` is said to have *zero encapsulation*; its internals are open to public view, inspection and modification. Sometimes this is acceptable, but more often we want to *separate implementation from interface*. Use of member functions can be a way to provide an interface between the *user* of a type (the programmer who uses objects of that user-defined type) and the *implementor* of that type (the programmer who created the user-defined type). This interface is a *contract* between the two, which should always be considered and designed carefully.\n", + "\n", + "Let us consider what we need in order to rewrite `Point` with some degree of encapsulation. The following program is our, by now familiar, `Point` type with three member functions (sometimes called *methods* in other programming languages) defined in the body of the `struct` definition, that is, within the braces. These functions can read and write the values of the member variables `x` and `y`, and logically enough are known as *getters* and *setters*. They are said to be defined *inline* when written in full between the braces of the `struct` definition, and as such are often automatically inlined by the compiler. This means that there may well be no function call overhead, so performance considerations should not be a reason to disregard encapsulation. (Types with methods are ususally known as classes in other languages, from now on we will use this word to mean any C++ composite type declared with `struct` or `class`, except for `enum class`.)" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "02a1074a", + "metadata": {}, + "outputs": [], + "source": [ + "// 06-point1.cpp : a Point type with getter and setters\n", + "\n", + "#include \n", + "using namespace std;\n", + "\n", + "struct Point {\n", + " void setX(int nx) {\n", + " x = nx;\n", + " }\n", + " void setY(int ny) {\n", + " y = ny;\n", + " }\n", + " auto getXY() const {\n", + " return pair{x, y};\n", + " }\n", + "private:\n", + " int x{}, y{};\n", + "};\n", + "\n", + "int main() {\n", + " Point p;\n", + " int user_x{}, user_y{};\n", + " cout << \"Please enter x and y for Point:\\n\";\n", + " cin >> user_x >> user_y;\n", + " p.setX(user_x);\n", + " p.setY(user_y);\n", + " auto [ px, py ] = p.getXY();\n", + " cout << \"px = \" << px << \", py = \" << py << '\\n';\n", + "}" + ] + }, + { + "cell_type": "markdown", + "id": "532b73d6", + "metadata": {}, + "source": [ + "A few things to note about this program:\n", + "\n", + "* The member functions appear before the member variables in the definition of `Point`; this is just a convention since members can in most cases appear in any order. The member function names have been written in *camelCase* which is a common convention.\n", + "\n", + "* The member variables `x` and `y` are in scope for all of the member functions, so there is no need to fully qualify them as `this->x` and `this->y`.\n", + "\n", + "* The member function returns both `x` and `y` as a `std::pair`. The `auto` return type is used (it's actually `std::pair`) and is declared `const` between the (empty) parameter list and the function body. The use of `const` in this context means the member function promises not to modify any member variables (in other words, the object's own state). The important concept of `const` correctness for member functions is to declare them `const` whenever they do not modify the object, thus enabling for objects which are themselves constants (such as `const Point`).\n", + "\n", + "* The *access specifier* `private:` is used before the member variables `x` and `y` which means that code outside the scope of `Point` (such as in `main()`) cannot use them; they must use the getter and setters.\n", + "\n", + "* The variable `p` of type `Point` in `main()` is default-initialized, other variants such as `Point p{ 0, 1 };` are not possible (due to `x` and `y` being `private:`), this would need a class constructor to be written (see Chapter 9).\n", + "\n", + "* The `int`s `user_x`, `user_y`, `px` and `py` are all defined local to `main()`. The member access operator `.` is used as in `p.setX(user_x);` to call the member functions; this is another use of dot-notation.\n", + "\n", + "* The return type of member function `getXY()` is read into `px` and `py` using aggregate initialization, and these variables are outputted.\n", + "\n", + "**Experiment:**\n", + "\n", + "* Write a function `setXY()` which modifies both member variables of `Point`, and use this instead of `setX()` and `setY()` in `main()`.\n", + "\n", + "* Write two functions `moveByX()` and `moveByY()` which add their parameter's value to the `x` and `y` members respectively\n", + "\n", + "* Change `pair` to `tuple` in `getXY()`. Does the code still compile? What does this indicate about the generality of aggregate initialization from return types?\n", + "\n", + "* Try to modify `x` within `getXY()`. What happens? Now try to return a modified `x` such as `x+1` instead. What happens now? Try both of these having removed the `const` qualifier.\n", + "\n", + "* Change the name of `x` to `super_x` at all occurencies within `Point`, remebering to change all of the member functions which use `x` too. Does the code compile without any changes to `main()`? What does this tell you about another advantage of separating implementation from interface?\n", + "\n", + "## Static members\n", + "\n", + "In the context of a class definition, `static` member variables (sometimes called *class variables*) are similar to global variables, in that there is only one *instance*. They are said to be *per-class* as opposed to *per-object*; regardless of how many objects of a `struct` (or `class`) there are, there can be only one instance of any `static` member. Also they are referred to outside of the `struct` definition with a double colon operator (`::`), not dot-notation.\n", + "\n", + "The following program extends the `Point` class with two `static` member constants. The member functions `setX()` and `setY()` have been modified, try to guess what they now do from the code:" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "438384ee", + "metadata": {}, + "outputs": [], + "source": [ + "// 06-point2.cpp : a Point type with getter and setters which check for values being within range\n", + "\n", + "#include \n", + "using namespace std;\n", + "\n", + "struct Point {\n", + " void setX(int nx)\n", + " {\n", + " if (nx < 0) {\n", + " x = 0;\n", + " }\n", + " else if (nx > screenX) {\n", + " x = screenX;\n", + " }\n", + " else {\n", + " x = nx;\n", + " }\n", + " }\n", + " void setY(int ny)\n", + " {\n", + " if (ny < 0) {\n", + " y = 0;\n", + " }\n", + " else if (ny > screenY) {\n", + " y = screenY;\n", + " }\n", + " else {\n", + " y = ny;\n", + " }\n", + " }\n", + " auto getXY() const {\n", + " return pair{x, y};\n", + " }\n", + " static const int screenX{ 639 }, screenY{ 479 };\n", + "private:\n", + " int x{}, y{};\n", + "};\n", + "\n", + "int main() {\n", + " cout << \"Screen is \" << Point::screenX + 1 << \" by \" << Point::screenY + 1 << '\\n';\n", + " Point p;\n", + " int user_x{}, user_y{};\n", + " cout << \"Please enter x and y for Point:\\n\";\n", + " cin >> user_x >> user_y;\n", + " p.setX(user_x);\n", + " p.setY(user_y);\n", + " auto [ px, py ] = p.getXY();\n", + " cout << \"px = \" << px << \", py = \" << py << '\\n';\n", + "}" + ] + }, + { + "cell_type": "markdown", + "id": "07b887de", + "metadata": {}, + "source": [ + "A few things to note about this program:\n", + "\n", + "* The static member variables `screenX` and `screenY` are declared both `static` and `const` and are assigned values within the definition of `Point`. Storage is automatically assigned for them due to this being true (non-`const` would need to use `inline static` in order to provide this).\n", + "\n", + "* These variables can be accessed directly from within `main()` as they are defined before the `private:` access specifier. As they are **read-only** it is acceptable for them to be accessed directly while preserving encapsulation.\n", + "\n", + "* The default values of `x` and `y` (zero) do not need to be changed as they fall within the permitted values.\n", + "\n", + "* The class *invariants* `0 <= x <= screenX` and `0 <= y <= screenY` are not easily able to be broken when `Point` is written with setters which validate their input.\n", + "\n", + "The goal of encapsulation is still achieved with `screenX` and `screenY` being directly accessible from within `main()` because they are constants. If `screenX` and `screenY` could be modified directly, this would no longer be the case, and a setter/getter pair (or similar) should be created. (A similar rule relaxation is allowing global *constants*, as opposed to *variables*, without restriction as neither data-races nor accidental/erroneous reassignment can occur with constants.)\n", + "\n", + "**Experiment:**\n", + "\n", + "* Refactor the code logic of `setX()` and `setY()` into a utility function `within()` which is called by both. Declare `within()` to be `static`. Can you call it from within `main()`? How would this be accomplished, and is it desirable?\n", + "\n", + "* Can you find a utility function from the Standard Library which does the same task as `within()`?\n", + "\n", + "* Remove the `const` qualifier from `screenX` and `screenY`'s definition. What other change is necessary?\n", + "\n", + "* Move these two variables after the `private:` access specifier, and write a getter/setter pair called `getScreenXY()` and `setScreenXY()`. Modify `main()` to accommodate this change. Is it easily possible to maintain the invariants of this type for `Point`s already created, that is existing `Point`s that are now outside the screen area?\n", + "\n", + "## Operator overloading\n", + "\n", + "There are many operators in C++ and most of these can be adapted (or *overloaded*) to work with user-defined types. (Operators for built-in types are not able to be redefined.) Like many other features of the language, their availability and flexibility should be approached with some degree of restraint.\n", + "\n", + "Operator oveloading works in a similar way to function overloading, so some familiarity with this concept is assumed. C++ resolves operator calls to user-defined types to function calls, so that `r = a X b` is resolved to `r = operator X (a, b)`. (This is a slight simplification; where `a` is a user-defined type, the member function `r = a.operator X (b)` is used in preference, if available.)\n", + "\n", + "The following program demonstrates the `Point` type, simplified back to its original form, with global `operator+` defined for it:" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "87c1d592", + "metadata": {}, + "outputs": [], + "source": [ + "// 06-point3.cpp : Point type with global operator+ defined\n", + "\n", + "#include \n", + "using namespace std;\n", + "\n", + "struct Point{\n", + " int x{}, y{};\n", + "};\n", + "\n", + "const Point operator+ (const Point& lhs, const Point& rhs) {\n", + " Point result;\n", + " result.x = lhs.x + rhs.x;\n", + " result.y = lhs.y + rhs.y;\n", + " return result;\n", + "}\n", + "\n", + "int main() {\n", + " Point p1{ 100, 200 }, p2{ 200, -50 }, p3;\n", + " p3 = p1 + p2; // use overloaded \"operator+\"\n", + " cout << \"p3 = (\" << p3.x << ',' << p3.y << \")\\n\";\n", + "}" + ] + }, + { + "cell_type": "markdown", + "id": "70e43d4a", + "metadata": {}, + "source": [ + "A few things to note about this program:\n", + "\n", + "* The return type of the `operator+` we define is returned by value; it is a new variable. The return value is declared `const` in order to **prevent** accidental operations on a temporary, such as: `(p1 + p2).x = -99;` (It also **allows** invocation of `const` member functions, as in: `(p1 + p2).getXY();` assuming `getXY()` exists as a `const` member function.)\n", + "\n", + "* The parameters of this function are passed in by `const` reference. The names `lhs` and `rhs` are very common (for the left-hand-side and right-hand-side to the operator at the *call site* respectively).\n", + "\n", + "* The function `operator+` needs to access the member variables of the parameters passed in, thus member data must be public or have public getters (also, see discussion of friend functions in Chapter 9).\n", + "\n", + "* The new values `result.x` and `result.y` are computed independently, as might be expected.\n", + "\n", + "* The statement `p3 = p1 + p2;` invokes the user-defined `operator+` automatically.\n", + "\n", + "**Experiment:**\n", + "\n", + "* Rewrite `operator+` to avoid the need for a named temporary variable `result`.\n", + "\n", + "* Write an `operator-` and call it from `main()`.\n", + "\n", + "It is usual to write `operator`s as global (or *free*, or *non-member*) functions when they do not need to access `private:` parts of the types which they operate on. This is not a problem for **member** `operator`s as they implicitly have access to all parts of both themselves and the variable they operate on.\n", + "\n", + "The simplified result of these conventions is demonstrated in the following program:" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "ac53f4c2", + "metadata": {}, + "outputs": [], + "source": [ + "// 06-point4.cpp : Point type with global operator+ and member operator+= defined\n", + "\n", + "#include \n", + "using namespace std;\n", + "\n", + "struct Point{\n", + " int x{}, y{};\n", + "\n", + " Point& operator+= (const Point& rhs) { // member operator +=\n", + " x += rhs.x;\n", + " y += rhs.y;\n", + " return *this;\n", + " }\n", + "};\n", + "\n", + "const Point operator+ (const Point& lhs, const Point& rhs) { // non-member operator+\n", + " Point result{ lhs };\n", + " result += rhs;\n", + " return result;\n", + "}\n", + "\n", + "int main() {\n", + " Point p1{ 100, 200 }, p2{ 200, -50 }, p3;\n", + " p3 = p1 + p2;\n", + " cout << \"p3 = (\" << p3.x << ',' << p3.y << \")\\n\";\n", + "}" + ] + }, + { + "cell_type": "markdown", + "id": "bbffd94d", + "metadata": {}, + "source": [ + "A few things to note about this program:\n", + "\n", + "* The `main()` function and its output are the same as for the previous program.\n", + "\n", + "* The member function `operator+=` takes **one** parameter named `rhs` and modifies its own member variables. It returns a **reference** to a `Point`, this being itself. One of the rare uses of the `this` pointer, dereferenced here with `*`, is shown here without further explanation.\n", + "\n", + "* The global `operator+` makes a **copy** of `lhs` and then calls (member) `operator+=` on this (with parameter `rhs`).\n", + "\n", + "* Global `operator+` does **not** directly access the member variables of either of its parameters, this is better C++ style.\n", + "\n", + "* The variable `result` is then returned by `const` value, as before.\n", + "\n", + "**Experiment:**\n", + "\n", + "* Modify this program to implement and test `operator-=` and `operator-`.\n", + "\n", + "* Now modify the program to use the encapsulated version of the `Point` from the program `06-point2.cpp`. What difficulty would you encounter if you tried using only global `operator+`?\n", + "\n", + "* Add a `static` function to calculate the diagonal distance between two `Point`s and return it as a `double`. Consider how to implement `operator/` to calculate this value, and whether this would be a suitable use of OO.\n", + "\n", + "*All text and program code ©2019-2025 Richard Spencer, all rights reserved.*" + ] + } + ], + "metadata": { + "jupytext": { + "cell_metadata_filter": "-all" + }, + "kernelspec": { + "display_name": "C++ 23", + "language": "c++", + "name": "cpp23" + } + }, + "nbformat": 4, + "nbformat_minor": 5 +} diff --git a/jupyter-notebooks/07-strings-containers-and-views.ipynb b/jupyter-notebooks/07-strings-containers-and-views.ipynb new file mode 100644 index 0000000..d05a464 --- /dev/null +++ b/jupyter-notebooks/07-strings-containers-and-views.ipynb @@ -0,0 +1,1055 @@ +{ + "cells": [ + { + "cell_type": "markdown", + "id": "d17d441a", + "metadata": {}, + "source": [ + "# Strings, Containers and Views\n", + "\n", + "## String initialization, concatenation and comparison\n", + "\n", + "Whilst support for read-only string literals is built into C++, we must make use of the Standard Library when we want a string-type which is be able to be manipulated and compared, using operators such as `+` (concatenation) and `==` (equality comparison). The `std::string` type supports all of the operations you would expect to be present, such as concatenation, indexing, sub-string extraction, comparisons and reporting the length. It is also possible to directly access the raw string data, if desired, or pass a `std::string` to a (C-style) function expecting a `const char*`. All of the memory management operations necessary are taken care of automatically at run-time; string objects are allowed to use heap memory and interestingly do not use any \"special\" features of the language not available to the application programmer. (Writing your own string class is a commonly advised exercise in gaining proficiency in C++.)\n", + "\n", + "An empty string object can be created using `string` as the type specifier, either using uniform initialization syntax, or `auto`, or omitting the braces altogether where the type specifier is first:" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "74e26daf", + "metadata": {}, + "outputs": [], + "source": [ + "string s1;\n", + "string s2{};\n", + "auto s3 = string{}; // s1, s2, s3 are empty (mutable) strings" + ] + }, + { + "cell_type": "markdown", + "id": "dc914d72", + "metadata": {}, + "source": [ + "Other variants exist, but these shown are the most modern. When an empty `std::string` is compared against an empty string literal `\"\"` using `==` the result is `true`." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "46a2364a", + "metadata": {}, + "outputs": [], + "source": [ + "auto is_empty = (s1 == \"\"); // is_empty has value \"true\", also for s2 and s3" + ] + }, + { + "cell_type": "markdown", + "id": "c0e5803e", + "metadata": {}, + "source": [ + "A `std::string` can be initialized or re-assigned from a string literal:" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "4e294964", + "metadata": {}, + "outputs": [], + "source": [ + "string name1;\n", + "name1 = \"Bilbo\"; // (re-)assignment after definition\n", + "\n", + "string name2{ \"Frodo\" }; // assignment combined with definition" + ] + }, + { + "cell_type": "markdown", + "id": "b41b0d0f", + "metadata": {}, + "source": [ + "Both `name1` and `name2` are able to be modified, for example by concatenation using `+`:" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "225a11c7", + "metadata": {}, + "outputs": [], + "source": [ + "name1 = name1 + \" Baggins\"; // name1 has value \"Bilbo Baggins\"\n", + "name2 += \" Baggins\"; // name2 has value \"Frodo Baggins\"" + ] + }, + { + "cell_type": "markdown", + "id": "9ea259d9", + "metadata": {}, + "source": [ + "Single `char` literals can be appended too, although a `std::string` **cannot** be created from a single `char`:" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "a69075a5", + "metadata": {}, + "outputs": [], + "source": [ + "string s1 = 'A'; // Error! Does not compile\n", + "auto s2 = string{} + 'A'; // This version is fine, but maybe non-obvious" + ] + }, + { + "cell_type": "markdown", + "id": "1d328640", + "metadata": {}, + "source": [ + "Strings can be reset to empty using a member function, or by assigning to an empty string literal:" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "b39f942e", + "metadata": {}, + "outputs": [], + "source": [ + "s1 = \"\"; // Both of these accomplish the same thing\n", + "s2.clear(); // (Using clear() is the preferred method)" + ] + }, + { + "cell_type": "markdown", + "id": "c33053e5", + "metadata": {}, + "source": [ + "Confusingly, there are two different member functions which return a `std::string`'s length (excluding the `\\0` terminator if it was constructed from a string literal), and a third which returns a `bool` (value `true` indicates length is zero):" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "85778d2e", + "metadata": {}, + "outputs": [], + "source": [ + "if (name1.length() == name2.size()) { // historically std::string did not have size()\n", + " cout << \"Equal length\\n\";\n", + "}\n", + "\n", + "if (s2.empty()) { // use in preference to \"s2.size() == 0\"\n", + " cout << \"Empty string\\n\";\n", + "}" + ] + }, + { + "cell_type": "markdown", + "id": "a24aac25", + "metadata": {}, + "source": [ + "The member functions `size()` and `empty()` are present in other containers as well, so it's a good idea to get to know them.\n", + "\n", + "## Subscripting and string methods\n", + "\n", + "It is possible to iterate over a `std::string` using a range-for loop; the following program demonstrates this:" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "f726998f", + "metadata": {}, + "outputs": [], + "source": [ + "// 07-string-upper.cpp : function to make a std::string uppercase in-place\n", + "\n", + "#include \n", + "#include \n", + "#include \n", + "using namespace std;\n", + "\n", + "void string_to_uppercase(string &s) {\n", + " for (auto& c : s) {\n", + " c = toupper(c);\n", + " }\n", + "}\n", + "\n", + "int main() {\n", + " cout << \"Please enter some text in lower, mixed or uppercase:\\n\";\n", + " string input;\n", + " getline(cin, input);\n", + " string_to_uppercase(input);\n", + " cout << \"The same text in uppercase is:\\n\" << input << '\\n';\n", + "}" + ] + }, + { + "cell_type": "markdown", + "id": "077fd279", + "metadata": {}, + "source": [ + "Things to note about this program:\n", + "\n", + "* Both variables `s` and `c` are declared as references, thus modifiying them changes the variable they refer to, not a copy.\n", + "\n", + "* The type of `c` is deduced by the compiler as, `char&`.\n", + "\n", + "* The `getline()` function (explained further in Chapter 8) is used to get an arbitrarily long line of input from `cin` and store it in `input`. (Note: don't confuse this with `cin.getline()` which we met in Chapter 5.)\n", + "\n", + "**Experiment:**\n", + "\n", + "* Remove one of the `&`s in the function `string_to_uppercase()`. Does the program still compile? Does it produce the expected output when run? Now remove the other `&` instead and try the same thing. What does this tell you about the importance of reading code which uses reference semantics, very carefully?\n", + "\n", + "* Modify `string_to_uppercase()` so that the uppercase string is *appended* to the input. Hint: this is a simple change that just requires some thought.\n", + "\n", + "* Modify `string_to_uppercase()` so that a new uppercase string is returned as a `std::string`. Is the construct `input = string_to_uppercase(input);` within `main()` now legal? Does it work as expected? Do you still need to use reference semantics?\n", + "\n", + "Individual characters of a `std::string` can be selected for read- or write-access using the subscript operator `[]`, which works in the same way as on built-in arrays. The index value is **not** checked for being within valid range (which is between `0` and `length() - 1` inclusive); if this check is required the member function `at()` should be used:" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "d22688cc", + "metadata": {}, + "outputs": [], + "source": [ + "string book = \"a hobbit\";\n", + " \n", + "book[0] = 'A'; // book is now \"A hobbit\"\n", + "\n", + "auto c1 = book[99]; // undefined behavior, probably garbage assigned to c1\n", + "auto c2 = book.at(99); // throws an exception, possibly terminating the program (no value is assigned to c2)" + ] + }, + { + "cell_type": "markdown", + "id": "81969f95", + "metadata": {}, + "source": [ + "**Experiment:**\n", + "\n", + "* Modify `string_to_uppercase()` to use a regular for-loop and an index variable, together with unchecked array access syntax.\n", + "\n", + "* Now modify this program to use **checked** array access. What happens if you make a (deliberate) bounds-checking error?\n", + "\n", + "The member functions `front()` and `back()` return (writeable) references to the first and last characters of a string, resepctively; they can be used instead of `s[0]` and `s[s.length() - 1]`. Interestingly, **reading** `s[s.length()]` is not undefined behavior, but instead returns a value which is the default value of the underlying character type (`'\\0'` for `char`).\n", + "\n", + "To add or remove individual characters or substrings, the `insert()` and `erase()` member functions can be used (don't try to write to `s[s.length()]`):" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "cc1bc1d8", + "metadata": {}, + "outputs": [], + "source": [ + "string book = \"a hobbit\";\n", + "\n", + "book = \"In \" + book;\n", + " // book is now \"In a hobbit\"\n", + "book.insert(5, \"hole in the ground, there lived a \");\n", + " // book is now \"In a hole in the ground, there lived a hobbit\"\n", + "book.erase(10, 21);\n", + " // book is now \"In a hole lived a hobbit\"" + ] + }, + { + "cell_type": "markdown", + "id": "4ab12713", + "metadata": {}, + "source": [ + "There is also `replace()`, which is a combination of both of `erase()` and `insert()`.\n", + "\n", + "Substrings can be extracted from a `std::string` using the `substr()` member function:" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "67700f55", + "metadata": {}, + "outputs": [], + "source": [ + "string wizard = \"Gandalf the Gray\";\n", + "\n", + "auto s1 = wizard.substr(0, 7); // s1 is \"Gandalf\"\n", + "auto s2 = wizard.substr(8, 3); // s2 is \"the\"\n", + "auto s3 = wizard.substr(12); // s3 is \"Gray\"\n", + " // or wizard.substr(12, 4) or wizard.substr(12, string::npos)" + ] + }, + { + "cell_type": "markdown", + "id": "66963c31", + "metadata": {}, + "source": [ + "The return type of `substr()` is `std::string`, which is a **new** variable containing a **copy** of (part of) the contents of the original `std::string`.\n", + "\n", + "Finally there is `append()` which is considered better style than using the `+=` operator as it is potentially more efficient:" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "e43ae8da", + "metadata": {}, + "outputs": [], + "source": [ + "auto wizard2 = \"Saruman\"s; // note: suffix produces a string\n", + "wizard2.append(\" the White\"); // wizard2 becomes \"Saruman the White\"" + ] + }, + { + "cell_type": "markdown", + "id": "362a5de2", + "metadata": {}, + "source": [ + "## Conversions and literals\n", + "\n", + "Sometimes it is necessary to convert between a `std::string` and other (often built-in) types, such as converting to and from an integer or floating-point number. The Standard Library function template `to_string` is overloaded to cope with different (built-in) types:" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "4e71be9e", + "metadata": {}, + "outputs": [], + "source": [ + "auto n1 = 1.23; // n1 is type double\n", + "auto n2 = 45; // n2 is type int\n", + "\n", + "auto s1 = to_string(n1); // s1 is \"1.230000\"\n", + "auto s2 = to_string(n2); // s2 is \"45\"" + ] + }, + { + "cell_type": "markdown", + "id": "2fb766d5", + "metadata": {}, + "source": [ + "Converting the other way, the group of functions `sto…()` allow conversion to an integer or floating-point type from an input `std::string` (often usefully a sub-string). The full list is: `stoi()`, `stol()`, `stoul()`, `stoll()`, `stoull()`, `stof()`, `stod()` and `stold()`." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "7be2b3de", + "metadata": {}, + "outputs": [], + "source": [ + "auto n3 = stoi(s2); // n3 is of type int\n", + "auto n4 = stold(s1); // n4 is of type long double" + ] + }, + { + "cell_type": "markdown", + "id": "55e0bb00", + "metadata": {}, + "source": [ + "For these `sto…()` conversion functions which return an integer type, the optional third parameter is the numerical base to be applied (this defaults to 10), while for all of them the optional second parameter is a pointer to `std::size_t` variable used to indicate the index into the `std::string` of the first unused character (this defaults to `nullptr`, that is no index is written to this pointer address).\n", + "\n", + "It is possible to declare `std::string` variables using syntax which is very similar to that for string literals, which uses the *literal suffix* `s`:" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "4f95966f", + "metadata": {}, + "outputs": [], + "source": [ + "auto h1{ \"Merry\"s }; // h1 is mutable\n", + "const auto h2{ \"Pippin\"s }; // h2 cannot be altered\n", + "constexpr auto h3{ \"Samwise\"s }; // h3 can be used in constexpr contexts" + ] + }, + { + "cell_type": "markdown", + "id": "ab2cbb8e", + "metadata": {}, + "source": [ + "In addition, a single (possibly empty) `std::string` literal can be safely concatenated with any number of string and character literals:" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "86f52872", + "metadata": {}, + "outputs": [], + "source": [ + "auto alphabet = \"\"s + \"ABCDEF\" + ' ' + \"abcde\" + 'f';\n", + " // alphabet contains \"ABCDEF abcdef\" and is of type std::string" + ] + }, + { + "cell_type": "markdown", + "id": "8555e2c9", + "metadata": {}, + "source": [ + "Here `alphabet` has type `std::string`, and the concatenation is usually performed at run-time (use `constexpr` to make it happen at compile-time).\n", + "\n", + "A `std::string` provides direct access to its underlying array-of-`char` representation through two member functions: `c_str()` and `data()`. The difference between the two is that `c_str()` returns a **read-only** (`const char *`) pointer to an NTMBS (see Chapter 1), while `data()` returns a **writable** (`char *`) pointer to the same (pre-C++11 did not guarantee the null terminator to be present for `data()`). Where you have the choice, use `c_str()` as it is available for `const std::string` objects (itself being a `const` member function).\n", + "\n", + "**Experiment:**\n", + "\n", + "* Modify `string_to_uppercase()` to use `data()` inside a regular for-loop to do its work. Hint: continue to use a loop index, the syntax may surprise you.\n", + "\n", + "* Now modify this program to use pointer arithmetic instead of a loop index.\n", + "\n", + "* Modify this program again to use `begin()` and `end()`.\n", + "\n", + "## String views\n", + "\n", + "There is a fourth string-like type (besides literal string, built-in array of `char` and `std::string`) called `std::string_view`, which provides a \"half-way house\" between a fully-fledged string type and raw array access. Typically it is implemented with only two fields (pointer and length); its main advantage over `std::string` is that it can be constructed and passed around more cheaply in many cases.\n", + "\n", + "The `std::string_view` type only provides a subset of the features provided by `std::string`, in particular it does **not** support either in-place modification or concatenation. It also does **not** \"own\" the resource it refers to, therefore care must be taken to ensure that a `std::string_view` object does not outlive the entity from which it was constructed (usually a `std::string` or `const char *`—construction from a string literal is always safe.). It is safe when used as a function parameter (as an alternative to `const std::string&` or `const char *`), and is sometimes safe as a return type (instead of `const char *`). Finally, it does not own or include a null terminator, unless the entity from which it is constructed has one; this behavior is useful in cases where a sliding textual \"window\" over a larger string entity is needed." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "74051ab6", + "metadata": {}, + "outputs": [], + "source": [ + "string_view v1{ \"Elrond\" }; // string_view constructed from const char *\n", + "string_view v2{ \"Arwen\"s }; // Error! attempt to construct string_view from temporary std::string\n", + "auto v3{ \"Galadriel\"sv }; // std::string_view literal\n", + "\n", + "cout << v1 << '\\n'; // outputs \"Elrond\"\n", + "cout << v2[0]; // outputs \"A\"\n", + "cout << v3.data() << '\\n'; // Possibly unsafe, no guarantee of terminating '\\0'\n", + "v3[0] = 'C'; // Error! no write access\n", + "v3.data()[0] = 'C'; // Error! no write access\n", + "auto elves = v1 + v3; // Error! operator+ (concatenation) not supported" + ] + }, + { + "cell_type": "markdown", + "id": "3902dbff", + "metadata": {}, + "source": [ + "The following program demonstrates a function called `print_reversed()` with a `std::string_view` as a function parameter, called with each of: a pointer, a string literal, a `char`-array, a `std::string` and a `std::string_view`." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "101ec7e5", + "metadata": {}, + "outputs": [], + "source": [ + "// 07-reversed.cpp : output different \"string\" types using a string_view\n", + "\n", + "#include \n", + "#include \n", + "#include \n", + "using namespace std;\n", + "\n", + "void print_reversed(string_view sv) {\n", + " for (auto iter = crbegin(sv); iter != crend(sv); ++iter) {\n", + " cout << *iter;\n", + " }\n", + " cout << '\\n';\n", + "}\n", + "\n", + "int main() {\n", + " const char *s1 = \"Elf\";\n", + " char s2[] = \"Dwarf\";\n", + " string s3 = \"Hobbit\"s;\n", + " string_view s4 = \"Orc\"sv;\n", + "\n", + " print_reversed(s1);\n", + " print_reversed(s2);\n", + " print_reversed(s3);\n", + " print_reversed(s4);\n", + " print_reversed(\"Man\");\n", + "}" + ] + }, + { + "cell_type": "markdown", + "id": "4957b2dc", + "metadata": {}, + "source": [ + "A few things to note about this program:\n", + "\n", + "* The function `print_reversed()` uses a constant reverse iterator to loop over its input. If you're not familiar with this idiom, you may wish to review program `05-begin-end.cpp` and the table below it from Chapter 5.\n", + "\n", + "* The iterator variable `iter` is dereferenced as `*iter` to obtain a single character for printing.\n", + "\n", + "* The function `print_reversed()` is called from `main()` repeatedly but with different typed arguments, all of which are resolved at compile-time and are *implicitly convertible* to `std::string_view`.\n", + "\n", + "**Experiment**\n", + "\n", + "* Change the type of `print_reversed()`'s parameter to `const string&`. Does the program compile?\n", + "\n", + "* What about if you use a `const char[]`? Hint: remember array decay.\n", + "\n", + "* Now modify this function so that it **returns** its input reversed. What parameter type and return type would you choose?\n", + "\n", + "## Vectors and iterators\n", + "\n", + "A key concept of C++ is that the Standard Library container types, of which `std::vector` is one, is that elements are meant to be manipulated using *iterators*. (We have seen the `std::string` member functions `insert()` and `erase()` being used with indices, however these can use iterators instead.) An iterator is a *pointer-like object* that when dereferenced, yields exactly one object from within a container; thus the `begin()` and `end()` family of functions should each be thought of as returning an iterator, rather than a pointer.\n", + "\n", + "The following program populates a `std::vector` of integers from user input, and then outputs it in numerically sorted order." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "f360942d", + "metadata": {}, + "outputs": [], + "source": [ + "// 07-vector.cpp : read integers from user, sort them and then output\n", + "\n", + "#include \n", + "#include \n", + "#include \n", + "#include \n", + "using namespace std;\n", + "\n", + "int main() {\n", + " vector v;\n", + " for (;;) {\n", + " cout << \"Please enter a number (99 to quit): \";\n", + " int i{};\n", + " cin >> i;\n", + " if (i == 99) {\n", + " break;\n", + " }\n", + " v.push_back(i);\n", + " }\n", + "\n", + " sort(begin(v), end(v));\n", + " copy(begin(v), end(v), ostream_iterator(cout, \" \"));\n", + " cout << '\\n';\n", + "}" + ] + }, + { + "cell_type": "markdown", + "id": "e5eeb63b", + "metadata": {}, + "source": [ + "A few things to note about this program:\n", + "\n", + "* We start with an empty `std::vector`, this means that the `int` element type **must** be specified (within angle brackets) as it cannot be otherwise deduced. This type is fixed at compile-time.\n", + "\n", + "* A \"forever\" loop with local variable `i` is used to avoid the need for either multiple `cout` statements (which would be code duplication) or a do-while loop (which would most likely cause `99` to be appended to the `vector`, which we don't want).\n", + "\n", + "* The `push_back()` member function of `vector` is used to make `i` the new last element, this \"grows\" the container automatically as needed.\n", + "\n", + "* The Standard Libary *algorithm* `std::sort()` gets all of the information about the `vector` that it needs in order to operate from the two iterators provided as parameters. (It can be relied upon to be an efficient algorithm, probably performing better than hand-written code—there is no need or advantage of using C's `qsort()`.)\n", + "\n", + "* Instead of a traditional or range-for loop, a second algorithm `std::copy()` is used. As might be guessed this copies everything from the first iterator up to, but not including, the second iterator to its third parameter, which is actually an *output iterator*. There is no \"magic\" involved, all you need to understand is that a `std::output_iterator` *object* takes a single type of its output between triangular brackets (here it is `int`) and the output stream and optional delimiter are specified as parameters. (This is boilerplate code that can be reused in your own programs, possibly with different types and delimiters.)\n", + "\n", + "**Experiment:**\n", + "\n", + "* Type `99` as the first input value. Does the program crash? Does this surprise you? Check that `v` is `empty()`.\n", + "\n", + "* Modify the program to print out the highest numbers first. Hint: use `crbegin()` and `crend()`.\n", + "\n", + "* Change to using a range-for loop instead of `std::copy()` to output the `vector`. Hint: use `const auto&`.\n", + "\n", + "* Use **member** functions `begin()` and `end()` in the call to `std::sort()`. Does the compile to the same thing? Which style do you prefer?\n", + "\n", + "* Rewrite the second `for`-loop using an index variable and subscript access. Do you still prefer this form?\n", + "\n", + "There are many member functions belonging to `std::vector` and the other standard containers, and even experienced C++ programmers don't remember them all. There are also many (over 100) function templates (algorithms) which operate with the standard containers through iterators; where there is a choice between using both, the member function should be used as this will be specialized for the container type (thus potentially more efficient). There is almost never a need to write a mini-algorithm which operates within a loop over the elements of a container, as would be needed in C; they have already been implemented in the Standard Library ready for you to use.\n", + "\n", + "When you reach for a container, `std::vector` is often the best fit, and should be your natural first choice. Should you decide that one of the other container types is needed, this would usually be a design decision made early in the development of your program. There is uniformity in the naming of the member functions, so all containers support `clear()`, for example. However as soon as you delve into the implementation details, such similarity appears superficial. It is important to have a basic understanding of the implementation of each container such that their individual advantages and limitations are understood, in order for the correct one to be chosen and used effectively.\n", + "\n", + "As an example, consider the use of `std::find()` versus member function `find()` when using `std::string`, `std::vector` and `std::set`; this function finds the first occurence of its parameter in the specified container. The `std::set` container is similar to `std::vector` except that it maintains its elements in sorted order. The differences are:\n", + "\n", + "* For `std::string`, member function `find()` returns an **index** having performed a *linear search* (checking each character element in turn).\n", + "\n", + "* For `std::vector`, algorithm `std::find()` returns an **iterator**, again performing a linear search.\n", + "\n", + "* For `std::set`, member function `find()` returns an **iterator** having performed a *binary search* (repeatedly dividing the range of values in half). This is quicker than using `std::find()`.\n", + "\n", + "For `std::string`s if the search criterion is not found, special value `std::string::npos` is returned (\"no position\"), whilst the iterator which indicates not found is `end()` (**not** `nullptr` or zero). The following program demonstrates this:" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "5034d573", + "metadata": {}, + "outputs": [], + "source": [ + "// 07-find.cpp : find and erase a single element\n", + "\n", + "#include \n", + "#include \n", + "#include \n", + "#include \n", + "#include \n", + "#include \n", + "using namespace std;\n", + "\n", + "int main() {\n", + " string a{ \"hello\" };\n", + " vector v{ 1, 9, 7, 3 };\n", + " set s{ 3, 8, 6, 4, 3 };\n", + " \n", + " cout << \"Before:\\nstring: \" << a << \"\\nvector: \";\n", + " copy(begin(v), end(v), ostream_iterator(cout, \" \")); \n", + " cout << \"\\nset: \";\n", + " copy(begin(s), end(s), ostream_iterator(cout, \" \"));\n", + " cout << '\\n';\n", + "\n", + " auto f1 = a.find('l');\n", + " if (f1 != string::npos) {\n", + " cout << \"Found in string at position: \" << f1 << '\\n';\n", + " a.erase(f1, 1);\n", + " }\n", + " auto f2 = find(begin(v), end(v), 7);\n", + " if (f2 != end(v)) {\n", + " cout << \"Found in vector: \" << *f2 << '\\n';\n", + " v.erase(f2);\n", + " }\n", + " auto f3 = s.find(6);\n", + " if (f3 != end(s)) {\n", + " cout << \"Found in set: \" << *f3 << '\\n';\n", + " s.erase(f3);\n", + " }\n", + "\n", + " cout << \"After:\\nstring: \" << a << \"\\nvector: \";\n", + " copy(begin(v), end(v), ostream_iterator(cout, \" \")); \n", + " cout << \"\\nset: \";\n", + " copy(begin(s), end(s), ostream_iterator(cout, \" \"));\n", + " cout << '\\n';\n", + "}" + ] + }, + { + "cell_type": "markdown", + "id": "d894a4bb", + "metadata": {}, + "source": [ + "The output from running this program is:\n", + "\n", + "```\n", + "Before:\n", + "string: hello\n", + "vector: 1 9 7 3\n", + "set: 3 4 6 8\n", + "Found in string at position: 2\n", + "Found in vector: 7\n", + "Found in set: 6\n", + "After:\n", + "string: helo\n", + "vector: 1 9 3\n", + "set: 3 4 8\n", + "```\n", + "\n", + "Take time to study this program as it contains some important concepts:\n", + "\n", + "* The `main()` program consists of four parts, the second and fourth of which are near-identical and simply print out all the containers before and after modification.\n", + "\n", + "* The first part assigns a `std::string` from a string literal, and a `std::vector` and `std::set` from two different initializer lists. Note that `std::set` can only hold unique values, so the container begins with a size of four, not five as for the initializer list (because of the duplicated value `3`).\n", + "\n", + "* The interesting part of the program is the third part, itself split into three. The logic is the same, search for an element value with the correct form of `find()`, compare it against the \"not found\" type for the specific container, and if found then erase it. The form of `erase()` used for `std::string` needs a length for the second parameter, while for `std::vector` and `std::set` the form used takes an iterator as the single element to erase.\n", + "\n", + "**Experiment**\n", + "\n", + "* Sort the `std::vector` and use a binary search instead of a linear one. Hint: use `std::lower_bound()` not `std::binary_search()`.\n", + "\n", + "* Experiment with adding values to the containers, at the beginning, in the middle, and at the end. Use a mixture of member function `push_back()` (where possible) and member function `insert()` or `std::insert()` where applicable.\n", + "\n", + "## Spans and arrays\n", + "\n", + "It can be very inefficient to copy `std::vector`s by value, as copies of both the `vector` object, and also the array it manages, need to be made. The preferred way to pass a `vector` to a function is to use a reference, or a `const` reference in cases where the original `vector` should not be modified.\n", + "\n", + "**Experiment**\n", + "\n", + "* Write a function called `populate_int()` which takes a `vector` as its parameter and implements the logic of the `for`-loop in `07-vector.cpp`. Call this function from `main()` instead of using a `for`-loop.\n", + "\n", + "* Now use `double` instead of `int` in the program. How many code changes are needed?\n", + "\n", + "In the style of `std::string_view` being implicitly constructible from charater sequences, there exists the type `std::span` which provides a similar function for array-style containers (those which hold their elements contiguously in memory). The following program contains a function `print_ints()` whose parameter is of type `std::span` and which iterates over this sequence:" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "85fb9835", + "metadata": {}, + "outputs": [], + "source": [ + "// 07-span.cpp : convert different container types to span and print them out\n", + "\n", + "#include \n", + "#include \n", + "#include \n", + "#include \n", + "#include \n", + "using namespace std;\n", + "\n", + "void print_ints(span s) {\n", + " for (auto sep{ \"\"sv }; auto& e : s) {\n", + " cout << sep << e;\n", + " sep = \", \"sv;\n", + " }\n", + " cout << '\\n';\n", + "}\n", + "\n", + "int main() {\n", + " int c_array[] = { 1, 2, 3 };\n", + " vector vec = { 2, 6, 4, 3 };\n", + " array std_array = { 7, 6, 5 };\n", + "\n", + " print_ints(c_array);\n", + " print_ints(vec);\n", + " print_ints(std_array);\n", + " // print_ints({ 9, 8, 7, 6 }); // Error: does not compile\n", + "}" + ] + }, + { + "cell_type": "markdown", + "id": "0869e70c", + "metadata": {}, + "source": [ + "A few things to note about this program:\n", + "\n", + "* A range-for loop with an initializer field prints out the values of the `std::span` parameter, outputting a separator in-between, but not after, the elements. The trick of reassigning the variable `sep` gets around this limitation of using `std::copy()`.\n", + "\n", + "* The three array-like types are initialized in `main()`. The size of type `std::array` is fixed at compile-time (from its optional second template parameter) and this allows it to be allocated on the stack, not using any heap memory (as for a built-in array). (Due to the fact that `begin()` and `end()` can be used with built-in arrays there are not very many cases where `std::array` is more useful.)\n", + "\n", + "* The commented-out call to `print_ints()` doesn't compile as there is no valid conversion from `std::initializer_list` to `std::span`. This is a possible use case for a temporary `std::array`, as in: `print_ints(array{ 9, 8, 7 ,6 });`\n", + "\n", + "Unlike `std::string_view`, `std::span` can modify its elements, even though it does not \"own\" them. Also, a second form of `std::span` takes its size parameter after the type, which is also fixed at compile-time.\n", + "\n", + "**Experiment**\n", + "\n", + "* Remove the size field (`4`) from the definition of `std_array`. What is the inferred size of this `std::array` now?\n", + "\n", + "* Perform a sort within `print_ints()` before outputting.\n", + "\n", + "* Now output the containers in `main()` after calling `print_ints()`, without calling it again. Have the elements of these changed order?\n", + "\n", + "## Ordered and unordered sets\n", + "\n", + "A `std::set` holds its contents in sorted order at all times, thus it is called an *ordered container*. Occasionally this is desirable, however there are space and time costs to this convenience so before using this container type you should consider whether a `std::vector`, which can be (manually) sorted when required, is a better solution. Array access (using `[]`) is not supported for `std::set`; this may be a deciding factor as to its suitability. Ordered containers require that `operator<` (less-than) is defined when using them to hold user-defined types (other ordering criteria can be specified, if desired).\n", + "\n", + "A feature of `std::set` is that it cannot hold duplicate values; inserting a previously held value does not alter the container, while an initializer list containing duplicates is shortened (and sorted) immediately. (The type `std::multiset` does allow duplicate values.)\n", + "\n", + "The following program defines a `std::set` with value type `std::string`:" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "44ccf988", + "metadata": {}, + "outputs": [], + "source": [ + "// 07-set.cpp : demonstrate automatic ordering of a set\n", + "\n", + "#include \n", + "#include \n", + "#include \n", + "#include \n", + "#include \n", + "using namespace std;\n", + "\n", + "int main() {\n", + " set s{\n", + " \"Stroustrup, Bjarne\",\n", + " \"Yukihiro, Matsumoto\",\n", + " \"Wall, Larry\",\n", + " \"Eich, Brendan\"\n", + " };\n", + "\n", + " s.insert(\"Lerdorf, Rasmus\");\n", + " copy(begin(s), end(s), ostream_iterator(cout, \"\\n\"));\n", + "}" + ] + }, + { + "cell_type": "markdown", + "id": "1a23e90a", + "metadata": {}, + "source": [ + "**Experiment**\n", + "\n", + "* Modify this program so that further names can be added with user input. Is the sorted order preserved?\n", + "\n", + "* Change the container type to `std::multiset`. Does the program compile and run? What happens if you (deliberately) enter a duplicate name?\n", + "\n", + "* The correct ordering depends on the rule of surname first with capitalized first letter. Remove this second restriction by storing all names in lower-case, capitalizing the first letter for output. Test with name: \"van Rossum, Guido\".\n", + "\n", + "Lookup for `std::set` is faster than linear searching due to the fact that its elements are always sorted. There is also the container type `std::unordered_set` which can claim to have constant-time lookup in the best case due to utilization of a *hash function*. (To complete the quartet, there exists `std::unordered_multiset`.)\n", + "\n", + "**Experiment**\n", + "\n", + "* Modify the original `07-set.cpp` to use `std::unordered_set` as the only change.\n", + "\n", + "* Now add the ability to add subsequent entries from user input, and print out the whole collection on each change. Does anything surprise you?\n", + "\n", + "In fact, due to the way that the *unordered containers* are implemented, removal or addition of even a single element can change the whole apparent \"order\" of the elements. Note that a hash function needs to be provided for user-defined types stored in an unordered container (this is achieved by providing a *specialization* of `std::hash`).\n", + "\n", + "## Lists and forward-lists\n", + "\n", + "Some operations can be inefficient with `std::vector` because of the way it is implemented by the library; operations such as `insert()` and `erase()` can involve the movement of much of the data stored in memory. (In fact this is unavoidable since the Standard dictates that the elements of a `std::vector` are stored contiguously in memory.) Other operations such as `push_front()` are not implemented at all, for the same reason. (Using a `std::deque`, as in \"double-ended queue\", instead would resolve this particular limitation.)\n", + "\n", + "The implementation of `std::list` is fairly straightforward; each element is stored in its own block of assigned memory, together with two pointers; one pointer to the previous element and one pointer to the next element. This does mean that element insertion and deletion can be much quicker than for `std::vector`, however more memory is used by this container in total (the difference is the size of two pointers times number of elements, approximately). Lists of \"large\" objects become more efficient than lists of \"small\" ones, and as for `std::vector` all elements must be of the same type and size. It follows that the implementation of `std::forward_list` is similar but with only one pointer in each block, pointing to the next element.\n", + "\n", + "Some of the operations that `std::vector` supports, such as indexing using subscript syntax (`[]`) and `std::sort()`, are not supported at all, either because performance would be unacceptably poor or because the algorithm requres a *random-access iterator*. In fact, `std::list` implements its own member function `sort()` which performs a *stable sort* in-place. The iterator type which works with `std::list` is called a *bi-directional iterator*, meaning that pointer arithmetic-style operations on iterators cannot work. The iterator type for `std::forward_list` is called a *forward iterator*.\n", + "\n", + "The following program demonstrates both `std::forward_list` and `std::list` being used, although it is not intended to be an example of best practice:" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "74a77ce5", + "metadata": {}, + "outputs": [], + "source": [ + "// 07-lists.cpp : forward and bi-directional lists\n", + "\n", + "#include \n", + "#include \n", + "#include \n", + "#include \n", + "using namespace std;\n", + "\n", + "int main() {\n", + " forward_list fwd;\n", + " auto iter = fwd.before_begin(); // note: member function\n", + " cout << \"Please enter some words (blank line to end):\\n\";\n", + " for (;;) {\n", + " string s;\n", + " getline(cin, s);\n", + " if (s.empty()) {\n", + " break;\n", + " }\n", + " fwd.insert_after(iter, s); // note: member function\n", + " ++iter; // note: must \"keep up\"\n", + " }\n", + "\n", + " list lst(begin(fwd), end(fwd)); // copy fwd into lst\n", + " lst.sort();\n", + " for (const auto& e : lst) {\n", + " cout << \"- \" << e << '\\n';\n", + " }\n", + "}" + ] + }, + { + "cell_type": "markdown", + "id": "6d35fe55", + "metadata": {}, + "source": [ + "A few new things about this program:\n", + "\n", + "* There are two lists, a `std::forward_list` called `fwd` and a `std::list` called `lst`.\n", + "\n", + "* The variable `iter` is initialized by member function `before_begin()`, this is unique to `std::forward_list`.\n", + "\n", + "* In the first loop the `std::string` in variable `s` is inserted after `iter`'s position using member function `insert_after()`.\n", + "\n", + "* The variable `iter` is then incremented, still in the first loop.\n", + "\n", + "* The empty `std::list` is initialized from the `begin()` and `end()` of `fwd`. This is the standard way of making a copy of a container.\n", + "\n", + "* Member function `sort()` is called on `lst`, which is then printed out.\n", + "\n", + "**Experiment**\n", + "\n", + "* Consider how to memberwise compare `fwd` and `lst`.\n", + "\n", + "* Since the input is to be sorted eventually, experiment with other ways of populating `fwd`. Hint: consider `push_front()`.\n", + "\n", + "* Now find a way to avoid the use of `fwd` altogether.\n", + "\n", + "## Ordered and unordered maps\n", + "\n", + "All of the containers seen so far have stored a number of elements of a single type. There has been no other information stored with the element, except possibly for `std::vector` where the first element *implicitly* has index `0`, the second has index `1` and so on. This index can be thought of as the *key* as it allows direct access to a single *value*.\n", + "\n", + "This can be generalized so that the key can be of any type, not just a sequence of advancing integers. In C++ all maps operate with a type called `std::pair` which as might be guessed has two fields; these are called `first` and `second`. We could define `std::pair` as follows (see Chapter 10 for a discussion of the `template` and `typename` keywords):" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "13ec6dd9", + "metadata": {}, + "outputs": [], + "source": [ + "template \n", + "struct pair {\n", + " Key first;\n", + " Value second;\n", + "};" + ] + }, + { + "cell_type": "markdown", + "id": "b3ddf3e1", + "metadata": {}, + "source": [ + "However we don't need to do this as the Standard Library provides this definition in header `` (or one very similar, the exact implementation details are not important). Maps operate on collections of *key/value pairs* which are provided by this type.\n", + "\n", + "The first *associative container* we will look at is `std::map`. The following program uses a `std::map` to hold the per-weight prices of a list of fruits, which can be added to during a run of the program:" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "c122a9c0", + "metadata": {}, + "outputs": [], + "source": [ + "// 07-map.cpp : calculate prices from associative array of products and per-weight cost\n", + "\n", + "#include \n", + "#include \n", + "#include \n", + "#include \n", + "using namespace std;\n", + "\n", + "int main() {\n", + " map products{\n", + " { \"Apples\", 0.65 },\n", + " { \"Oranges\", 0.85 },\n", + " { \"Bananas\", 0.45 },\n", + " { \"Pears\", 0.50 }\n", + " };\n", + " cout.precision(2);\n", + " cout << fixed;\n", + " for (;;) {\n", + " cout << \"Please choose: Add product, Calculate price, Quit\\nEnter one of A, C, Q: \";\n", + " char opt;\n", + " cin >> opt;\n", + " opt = toupper(opt);\n", + " if (opt == 'Q') {\n", + " break;\n", + " }\n", + " else if (opt == 'A') {\n", + " cout << \"Enter product and price-per-kilo: \";\n", + " string product;\n", + " double price;\n", + " cin >> product >> price;\n", + " product.front() = toupper(product.front());\n", + " products.insert(pair{ product, price });\n", + " }\n", + " else if (opt == 'C') {\n", + " for (const auto& p : products) {\n", + " cout << p.first << '\\t' << p.second << \"/kg\\n\";\n", + " } \n", + " cout << \"Enter product and quantity: \";\n", + " string product;\n", + " double quantity;\n", + " cin >> product >> quantity;\n", + " product.front() = toupper(product.front());\n", + " auto iter = products.find(product);\n", + " if (iter != end(products)) {\n", + " cout << \"Price: \" << iter->second * quantity << '\\n';\n", + " }\n", + " else {\n", + " cout << \"Could not find \\\"\" << product << \"\\\"\\n\";\n", + " }\n", + " }\n", + " else {\n", + " cout << \"Option not recognized.\\n\";\n", + " }\n", + " }\n", + "}" + ] + }, + { + "cell_type": "markdown", + "id": "7bd1daa3", + "metadata": {}, + "source": [ + "This is a longer program but does not contain much that is new. A few points to note:\n", + "\n", + "* The `std::map` called `products` is initialized from a nested initializer list, and the key and value types must be specified within the angle brackets. The output of floating point numbers is fixed to two decimal places.\n", + "\n", + "* With user option `A`, member function `insert()` is called with a (temporary) `std::pair`. This is usually preferred over using array subscript syntax, while `products[product] = price` would work in most cases it is not always the most efficient method.\n", + "\n", + "* With user option `C`, all of the products are printed out by a range-for loop which iterates over `products` and outputs the `first` and `second` fields of each element. Then member function `find()` is called to obtain an iterator. This is compared against `end(products)` (which if equal would indicate \"not found\"), being other than this allows the **value** part as `iter->second` to be retrieved (the **key** would be available as `iter->first`).\n", + "\n", + "As explained above, use of array syntax is not used by this program when adding an entry, nor is it advisable in most cases for element lookup:" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "c2c54afb", + "metadata": {}, + "outputs": [], + "source": [ + "auto price_per_product1 = products[\"Apples\"]; // would work, but find() is better\n", + "auto price_per_product2 = products[\"Missing1\"]; // Warning: creates entry with key \"Missing1\" and value 0.0\n", + "auto price_per_product3 = products.at(\"Missing2\"); // Warning: might throw an exception, which would need to be caught" + ] + }, + { + "cell_type": "markdown", + "id": "22817af4", + "metadata": {}, + "source": [ + "**Experiment**\n", + "\n", + "* What happens if you try to change an existing entry with a different price-per-kilo? Can you discover a way to warn the user?\n", + "\n", + "* Use `std::unordered_map` instead of `std::map`. Does the program still compile and run? What is the effect of adding one or more products to the order in which they are output?\n", + "\n", + "To complete the list of associative containers, `std::multimap` and `std::unordered_multimap` allow duplicate occurences of keys with the same or different values, thus `insert()` always succeeds.\n", + "\n", + "## Other containers and adaptors\n", + "\n", + "There are some other containers and *container adaptors* implemented in the Standard Library:\n", + "\n", + "* `std::vector` is a specialization of `std::vector` that stores binary bits as packed data\n", + "\n", + "* `std::bitset` also stores binary bits, but has its size fixed at compile-time\n", + "\n", + "* `std::deque` (pronounced \"deck\") implements a double-ended container similar to `std::vector`, but with additional operations such as `push_front()`\n", + "\n", + "* `std::stack` implements a LIFO (Last In First Out)\n", + "\n", + "* `std::queue` implements a FIFO (First In First Out)\n", + "\n", + "* `std::priority_queue` implements a FIFO that sorts by age and priority\n", + "\n", + "* `std::flat_set` implements a sorted container of unique values, typically implemented as a vector\n", + "\n", + "* `std::flat_map` implements a sorted associative container, typically implemented as two vectors (one for keys and one for values)\n", + "\n", + "A brief Tutorial such as this is not the place to delve into these, and indeed the other containers covered in this Chapter have much more detail to discover. As a go-to for both tutorial and reference I can highly recommend [CppReference.com](https://en.cppreference.com)[^1] and [Josuttis, \"The C++ Standard Library\"](http://cppstdlib.com)[^2].\n", + "\n", + "[^1]: https://en.cppreference.com\n", + "[^2]: http://cppstdlib.com\n", + "\n", + "*All text and program code ©2019-2025 Richard Spencer, all rights reserved.*" + ] + } + ], + "metadata": { + "jupytext": { + "cell_metadata_filter": "-all" + }, + "kernelspec": { + "display_name": "C++ 23", + "language": "c++", + "name": "cpp23" + } + }, + "nbformat": 4, + "nbformat_minor": 5 +} diff --git a/jupyter-notebooks/08-files-and-formatting.ipynb b/jupyter-notebooks/08-files-and-formatting.ipynb new file mode 100644 index 0000000..3e570a5 --- /dev/null +++ b/jupyter-notebooks/08-files-and-formatting.ipynb @@ -0,0 +1,918 @@ +{ + "cells": [ + { + "cell_type": "markdown", + "id": "4f2e56bd", + "metadata": {}, + "source": [ + "# Files and Formatting\n", + "\n", + "## Formatting values and variables for output\n", + "\n", + "We have seen how values and variables can be put to output streams using `<<`, and how `print()` and `println()` can be used to output subsequent parameters using curly braces in the format string. For further control over the way these are output, such as field width, accuracy etc. we can specify this using stream manipulators (when outputting to streams) or extra information in the *format string* (when using `print()`/`println()`). Manipulators are covered later in this Chapter, what follows is a discussion of how to use *format specifiers* with `print()`, `println()` and `format()`/`format_to()`.\n", + "\n", + "The following program demonstrates use of format specifiers for some common types:" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "a6648d92", + "metadata": {}, + "outputs": [], + "source": [ + "// 08-format1.cpp : Basic usage of format string\n", + "\n", + "#include \n", + "#include \n", + "using namespace std;\n", + "\n", + "int main() {\n", + " string s{ \"Formatted\" };\n", + " auto d{ 10.0 / 3.0 };\n", + " auto i{ 20000 };\n", + " println(\"{0:20}:{2:8}, {1:12.11}\", s, d, i);\n", + "}" + ] + }, + { + "cell_type": "markdown", + "id": "18d7f292", + "metadata": {}, + "source": [ + "This program outputs the text `Formatted` followed by sufficient spaces to pad up to a width of 20 characters, then a colon present in the format string, then the value `20000` right-aligned to a width of 8 characters, then the comma and space present in the format string, and finally the value 3.3333333333 at a \"precision\" of 11 figures (plus decimal point) padded to a width of 12 characters (only padding, as opposed to truncation, is possible).\n", + "\n", + "**Experiment**:\n", + "\n", + "* Try printing the three parameters in a different order, by changing the numbers before the colon within the curly braces.\n", + "\n", + "* Is it possible to achieve the same results when removing these numbers altogether?\n", + "\n", + "* What happens if you repeat one of `s`, `d`, or `i` in the parameter list? Or take one away?\n", + "\n", + "The format string, and its associated format specifier(s), are evaluated at compile-time for maximum performance. It must therefore be a string literal, not a string-type variable (unless it is `constexpr`). The values of the subsequent parameters referenced by the format specifier(S) can (and probably will) change during the run of the program.\n", + "\n", + "## Format specifiers\n", + "\n", + "As well as describing the field width and precision for all of the built-in types (plus several Standard Library types), format specifiers offer fine-grained control over the output. In fact, all format specifiers are made up of eight optional parts, all of which (if used) appear in order after the colon in the format string. These are listed in the table below:\n", + "\n", + "| Field | Description | Example | Result |\n", + "|----------------|-----------------------------------------------------|---------|--------------------------|\n", + "| Fill-and-align | Optional fill character then: <, >, or ^ | {:@>10} | @@@@1233456 |\n", + "| Sign | One of: +, - (default), or space | {:+} | +1.23 |\n", + "| # | Use alternate form | {:#} | 0x12a, 3.0 |\n", + "| 0 | Pad integers with leading zeros | {:06} | 000123 |\n", + "| Width | Minimum field width | {:10} | \"abc \" |\n", + "| Precision | FP-precision, maximum field width | {:.7} | 3.333333, \"Formatt\" |\n", + "| L | Use locale-specific setting | {L} | 12,345, 1.234,56, \"faux\" |\n", + "| Type | One of: b, B, d, o, x, X, a, A, e, E, f, F, g, G, ? | {:8.7a} | 1.aaaaaabp+1 |\n", + "\n", + "It is also possible to write custom formatters which operate on arbitrary format specifiers and user-defined classes. An alternative method would be to create a public `toString()` method in the class and simply invoke this on a parameter of this type (after the format string, which would use plain `{}`).\n", + "\n", + "The format specifiers listed above work with `print()` and `println()` as well as other functions from the `` header (which include wide-character variants). Here is a complete list:\n", + "\n", + "| Function | Description | Parameters | Return value |\n", + "|---------------|-------------------------------------------------|--------------------------------|---------------------------------|\n", + "| `print()` | Output to `stdout`, `FILE*` or `std::ostream` | [dest, ] fmt, ... | None |\n", + "| `println()` | As for `print()` with trailing newline | [dest, ] fmt, ... | None |\n", + "| `format()` | Create a string from (wide) format string | [locale, ] fmt, ... | `std::string`, `std::wstring` |\n", + "| `format_to()` | Write to a (wide) output iterator | iter, [locale, ] fmt, ... | `out` member is `std::iterator` |\n", + "| `format_to_n()` | As for `format_to()` with size limit | iter, max, [locale, ] fmt, ... | `out` member is `std::iterator` |\n", + "\n", + "In choosing between the above functions, the aim would be to choose the most performant for the task. The following program outputs different format strings and parameters utilizing a variety of these functions:" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "fda6960f", + "metadata": {}, + "outputs": [], + "source": [ + "// 08-format2.cpp : Various format string-using functions\n", + "\n", + "#include \n", + "#include \n", + "#include \n", + "#include \n", + "#include \n", + "#include \n", + "#include \n", + "using namespace std;\n", + "\n", + "int main() {\n", + " string world{ \"World\" };\n", + " print(cout, \"Hello, {}!\\n\", world);\n", + " println(\"{1} or {0}\", false, true);\n", + " \n", + " constexpr const char *fmt = \"Approximation of π = {:.12g}\";\n", + " string s = format(fmt, asin(1.0) * 2);\n", + " cout << s << '\\n';\n", + " \n", + " constexpr const wchar_t *wfmt = L\"Approximation of pi = {:.12g}\";\n", + " wstring ws = format(wfmt, asin(1.0) * 2);\n", + " wcout << ws << L'\\n';\n", + " \n", + " format_to(ostream_iterator(cout), \"Hello, {}!\\n\", world);\n", + " wstring ww{ L\"World\" };\n", + " array wa;\n", + " auto iter = format_to_n(wa.begin(), 8, L\"Hello, {}!\\n\", ww);\n", + " *(iter.out) = L'\\0';\n", + " wcout << wa.data() << L'\\n';\n", + "}" + ] + }, + { + "cell_type": "markdown", + "id": "58726fa4", + "metadata": {}, + "source": [ + "A few things to note about this program:\n", + "\n", + "* The use of `print()` is straightforward and simply outputs `Hello, World!` on a single line, using the variant that prints to a `std::ostream`, in this case `cout`.\n", + "\n", + "* The call to `println()` reverses the order of its subsequent parameters and outputs them textually: `true or false`. You should be aware that this prints to the C standard output (`stdout`); mixing C++ stream and C output can sometimes cause buffering issues.\n", + "\n", + "* The uses of `format()`, firstly with a 8-bit, and secondly with a wide-character format string, create a temporary (wide-)string and then put this to the (wide-)character output stream.\n", + "\n", + "* The function `format_to()` is called with `ostream_iterator(cout)` which is boilerplate for creating a suitable output iterator from a stream object.\n", + "\n", + "* The use for `format_to_n()` is more involved as it uses a fixed size `std::array` to hold the wide-character output string. The first parameter is the (writable) iterator pointing to the start of the array, and the second is the maximum number of characters to write. The return value has an `out` data member which is the iterator pointing to the next character in the array, which needs to be set to zero in order to allow putting (`std::array`'s, not `std::string`'s) `data()` to `wcout`.\n", + "\n", + "**Experiment:**\n", + "\n", + "* Modify this program to use different field widths. Do they work with wide characters?\n", + "\n", + "* Try some of the different format specifiers from the table above, together with different built-in types such as `long long` and `double`.\n", + "\n", + "## Simple file access\n", + "\n", + "All of the programs we have seen so far lose their internal state, together with any user input, when they exit. A program which can save and/or restore its state makes use of *persistence*. The way this is usually achieved, of course, is to enable saving to and loading from a disk file, stored on a hard-drive, memory card or network server.\n", + "\n", + "C++ file access using the Standard Library header `` is designed to be analogous to use of `cin` and `cout`, using the stream extraction (`>>`) and insertion (`<<`) operators. File access using the C Library's `` header is also possible, and a suitable `FILE *` pointer can be passed as the first parameter to `print()` and `println()` to switch output to that file.\n", + "\n", + "The following program reads from a previously created file and echoes the content to the console. (The filename is provided at run-time as the first environment parameter after the executable name.) This program is only safe to use with text files, so fire up your favorite editor and create a test file to use, including some whitespace such as spaces, tabs and newlines." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "aca5f2e1", + "metadata": {}, + "outputs": [], + "source": [ + "// 08-file1.cpp : echo disk file to console\n", + "\n", + "#include \n", + "#include \n", + "using namespace std;\n", + "\n", + "int main(int argc, const char *argv[]) {\n", + " if (argc != 2) {\n", + " cerr << \"Syntax: \" << argv[0] << \" \\n\";\n", + " return 1;\n", + " }\n", + " ifstream infile{ argv[1] };\n", + " \n", + " int c = infile.get();\n", + " while (c != ifstream::traits_type::eof()) {\n", + " cout << static_cast(c);\n", + " c = infile.get();\n", + " }\n", + "}" + ] + }, + { + "cell_type": "markdown", + "id": "d8e87e50", + "metadata": {}, + "source": [ + "A few things to note about this program:\n", + "\n", + "* A sanity check is made on the number of environment parameters in order that executing the program without a filename argument won't cause a null-pointer dereference. The value in `argv[0]` is the name the program was executed as.\n", + "\n", + "* A `std::ifstream` object called `infile`, which encapsulates the functionality of an input file stream as a class, is created by providing a filename as its constructor argument.\n", + "\n", + "* An explicit call to close the input file is not needed, this happens automaticalls whien `infile` goes out of scope.\n", + "\n", + "* The only parts of this class we use is the member function `get()`, which confusingly returns an `int`, not a `char` as you might expect, and `ifstream::traits_type::eof()`. The `int` returned by `get()` can be any of the valid range of `char` (usually 0 to 255, or -128 to 127 if `char` is signed) plus a special marker value outside this range to indicate that the *end-of-file* has been reached and no more characters can be read. (If the double-double-colon syntax confuses you don't worry, this boilerplate can be used without a detailed knowledge of the makeup of the stream classes. Using it is better style than relying on C's `EOF` macro from ``.)\n", + "\n", + "* The `while`-loop body uses a cast to convert the variable `c` from an `int` to a `char` in order to ensure that is output as a character and not as a number.\n", + "\n", + "**Experiment:**\n", + "\n", + "* Try removing `static_cast` and see what happens. Consider whether this could ever be desirable.\n", + "\n", + "* What happens if you change the same line to `cout.put(c);`?\n", + "\n", + "* Rewrite the loop to be a `for`-loop. Can you remove the need for any statements in the loop body?\n", + "\n", + "The above program can be modified to no longer need the check against `ifstream::traits_type::eof()`. This involves use of a `char` variable and the stream extraction operator `>>`, as shown in the following program:" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "eab947d0", + "metadata": {}, + "outputs": [], + "source": [ + "// 08-file2.cpp : echo disk file to console\n", + "\n", + "#include \n", + "#include \n", + "using namespace std;\n", + "\n", + "int main(int argc, const char *argv[]) {\n", + " if (argc != 2) {\n", + " cerr << \"Syntax: \" << argv[0] << \" \\n\";\n", + " return 1;\n", + " }\n", + " ifstream infile{ argv[1] };\n", + "\n", + " char c;\n", + " while (!infile.eof()) {\n", + " infile >> noskipws >> c;\n", + " cout << c;\n", + " }\n", + "}" + ] + }, + { + "cell_type": "markdown", + "id": "3083b394", + "metadata": {}, + "source": [ + "A few differences to note about this program:\n", + "\n", + "* There is no read operation before the loop body.\n", + "\n", + "* The member function `eof()` is used to check for end-of-file; this returns a boolean.\n", + "\n", + "* A *stream manipulator* called `noskipws` is used to prevent skipping of whitespace (such as tabs) from the input file.\n", + "\n", + "* No cast is needed to output `c`.\n", + "\n", + "**Experiement:**\n", + "\n", + "* Remove the stream manipulator and one of the `>>`'s. What do you notice when the input file contains spaces, tabs etc?\n", + "\n", + "* Add the standalone statement line `infile >> noskipws;` before the `while`-loop, and use plain `infile >> c;` within it. What do you notice now? (The entity `noskipws` is actually a *manipulator* which modifies the stream it is put to.)\n", + "\n", + "* Rewrite the loop as a `for`-loop. Can you again remove the need for any statements in the body?\n", + "\n", + "## Files as streams\n", + "\n", + "The member functions `get()` and `put()` are adequate for simple character access to C++ streams but are not easily extensible. (Think of the complexity involved in reading a `std::string` or a`double` using only these member functions.) When reading input files, the stream extraction operator is overloaded for all of the built-in types, as well as `std::string`. Similarly, the stream insertion operator is overloaded for files being written to, and works identically to the use of `cout` and `cerr` we are familiar with. We will see that you can write your own custom input and output overloads fairly easily, too.\n", + "\n", + "Saving the state of a program (possibly in binary format) is sometimes called *serialization*, while loading it back is called *deserialization*. Of course, there are no guarantees that the same platform is being used to load the previously serialized state back in, so considerations such as *endian-ness* (big versus little) and *address width* (32 versus 64 bit) can come into play. A way round this issue is to use plain text representation (solely), and in our example programs we will be using text files exclusively.\n", + "\n", + "The following program is our calculator program from previously, modified to read calculations to be performed from a text file. These are read one by one, and the results are output. When all of the input file has been read, the program exits. The functionality which has been seen before is contained in the `calc()` function, with the addition of support for exponent:" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "989490e0", + "metadata": {}, + "outputs": [], + "source": [ + "// 08-calc.cpp : read from a file and perform calculations\n", + "\n", + "#include \n", + "#include \n", + "#include \n", + "using namespace std;\n", + "\n", + "double calc(char op, double x, double y) {\n", + " double r{};\n", + " switch (op) {\n", + " case '+':\n", + " r = x + y;\n", + " break;\n", + " case '-':\n", + " r = x - y;\n", + " break;\n", + " case '*':\n", + " r = x * y;\n", + " break;\n", + " case '/':\n", + " if (y) {\n", + " r = x / y;\n", + " }\n", + " else {\n", + " cerr << \"Error: divide by zero.\\n\";\n", + " }\n", + " break;\n", + " case '^':\n", + " r = pow(x, y);\n", + " break;\n", + " default:\n", + " cerr << \"Error: invalid op.\\n\";\n", + " }\n", + " return r;\n", + "}\n", + "\n", + "int main(int argc, const char *argv[]) {\n", + " if (argc != 2) {\n", + " cerr << \"Syntax: \" << argv[0] << \" \\n\";\n", + " return 1;\n", + " }\n", + " ifstream infile{argv[1]};\n", + "\n", + " while (!infile.eof()) {\n", + " double x, y;\n", + " char op;\n", + " infile >> x >> op >> y;\n", + " if (infile.fail() || infile.bad()) {\n", + " cerr << \"Error in input.\\n\";\n", + " break;\n", + " }\n", + " auto r = calc(op, x, y);\n", + " cout << x << ' ' << op << ' ' << y << \" = \" << r << '\\n';\n", + " }\n", + "}" + ] + }, + { + "cell_type": "markdown", + "id": "ad7695ab", + "metadata": {}, + "source": [ + "**Experiment:**\n", + "\n", + "* Does it matter if the last line of the input file is blank? Follow the logic of the main loop of the program and try to understand why this is.\n", + "\n", + "* Put all of the calculations on one line, with a space between each one (instead of a newline). Does the program still produce the same output? Why do you think this is?\n", + "\n", + "* Modify the program to write its output to a `std::ofstream` called `outfile`, initialized with `argv[2]`. (Remember to check for `argc` being exactly three.) What happens if there is an error in the input file?\n", + "\n", + "It is also possible to use the stream extraction operator to read a `std::string`. The following program demonstrates this:" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "47e920ce", + "metadata": {}, + "outputs": [], + "source": [ + "// 08-string.cpp : read a string using the stream extraction operator\n", + "\n", + "#include \n", + "#include \n", + "using namespace std;\n", + "\n", + "int main() {\n", + " cout << \"Please enter your first name:\\n\";\n", + " string name;\n", + " cin >> name;\n", + " cout << \"Hi, \\'\" << name << \"\\', nice to meet you!\\n\";\n", + "}" + ] + }, + { + "cell_type": "markdown", + "id": "efb800cb", + "metadata": {}, + "source": [ + "**Experiment:**\n", + "\n", + "* What happens if your input contains whitespace at the beginning or end?\n", + "\n", + "* What happens if your input contains whitespace in the middle (eg. you enter your full name)?\n", + "\n", + "* Modify the program to accept a first name and surname, in that order. Can you see any limitations with that approach?\n", + "\n", + "You may be tempted to use `noskipws` to help you enter a line of text containing whitespace, however this only includes preceding whitespace in the input string. The requirement of getting a line of input, possibly containing whitespace, is a common one, and we'll cover this next.\n", + "\n", + "## Lines and buffers\n", + "\n", + "So far we have seen byte (8-bit character) raw input as well as formatted input. However, programs (especially interactive ones) often get their input line-by-line. Lines of input can often be evaluated for errors and processed more reliably than relying on the stream extraction operator and repeatedly checking against `fail()` and `bad()`. In the case of a line of input being found to be invalid, the program can prompt the user to try again.\n", + "\n", + "The following program uses the `getline()` **member** function to obtain a line of input from the console. This function takes two parameters: the address of a C-style array, and its size in bytes. Care must be taken to provide both a valid array address and correct length. The line of text **can** include spaces and is stored in the array **without** a newline and **with** a trailing zero-byte character." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "f8cd9345", + "metadata": {}, + "outputs": [], + "source": [ + "// 08-line1.cpp : obtain a line of input from the user and display it\n", + "\n", + "#include \n", + "using namespace std;\n", + "\n", + "int main() {\n", + " cout << \"Please enter your full name:\\n\";\n", + " char line[32];\n", + " cin.getline(line, size(line));\n", + " cout << \"You entered: \\'\" << line << \"\\'\\n\";\n", + "}" + ] + }, + { + "cell_type": "markdown", + "id": "c15648cd", + "metadata": {}, + "source": [ + "**Experiment:**\n", + "\n", + "* Try entering a blank line (just press Enter); does the program cope well with this? How does this differ from program `08-string.cpp`?\n", + "\n", + "* Now try reducing the size of the array to something less than the text you will type in; how well does the program cope?\n", + "\n", + "* Try checking `cin.fail()` after typing in something bigger than the size of the array.\n", + "\n", + "As can be found from experimentation, any characters that do not fit into the C-style array are left in the input buffer and are left unprocessed; also the *fail-bit* is set in the input stream's flags, meaning any further calls to `getline()` will return an empty string. The stream fail-bit for `cin` can be unset with `cin.clear()`, after which the unprocessed characters can be read with further call(s) to `getline()`. Optionally, the `ignore()` member function can be used to skip one or more input characters.\n", + "\n", + "There is also a **non-member** function which we met in Chapter 7, perhaps confusingly also named `getline()`, which reads directly from an input stream object into a `std::string`. There is no restriction to the length of the input which can be stored in the `std::string`, and the input ends with a newline (which is not stored). The following program demonstrates the use of this function, with minimal changes from the previous one:" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "fd0c1299", + "metadata": {}, + "outputs": [], + "source": [ + "// 08-line2.cpp : obtain a line of input from the user, store it in a string variable and display it\n", + "\n", + "#include \n", + "#include \n", + "using namespace std;\n", + "\n", + "int main() {\n", + " cout << \"Please enter your full name:\\n\";\n", + " string s;\n", + " getline(cin, s);\n", + " cout << \"You entered: \\'\" << s << \"\\'\\n\";\n", + "}" + ] + }, + { + "cell_type": "markdown", + "id": "fdc931ca", + "metadata": {}, + "source": [ + "A couple of things to note about this program:\n", + "\n", + "* The `std::string` variable `s` grows to the length of the input from non-member `getline()` dynamically, limited only by available memory.\n", + "\n", + "* Input is terminated by entering a new-line, which is discarded (not appended to `s`).\n", + "\n", + "* This `getline()` function can work in this way as it takes both of its parameters by reference, giving it the ability to modify them.\n", + "\n", + "**Experiment:**\n", + "\n", + "* Modify the program (taking parts from others seen recently) to copy a text-file line by line from one disk file into another, so that they are identical. Compare the input and output files with a binary editor, checksum, or similar.\n", + "\n", + "* Try the same program on a binary file (such as an executable). Are the input and output files identical? Hint: under Windows it is necessary to open both input and output files in binary mode; this is achieved by providing `ios_base::binary` as the second parameter to both `std::ifstream` and `std::ofstream` constructors.\n", + "\n", + "The non-member `getline()` has a third parameter, which is the character value which terminates input (this defaults to `'\\n'`). This is less commonly used, but setting it to `'\\0'` can enable reading of an entire text-file at once (so long as it does not contain any NUL-byte characters). The following program demonstrates this:" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "e112f87f", + "metadata": {}, + "outputs": [], + "source": [ + "// 08-line3.cpp : read a text file into a string variable and display it\n", + "\n", + "#include \n", + "#include \n", + "#include \n", + "using namespace std;\n", + "\n", + "int main(int argc, const char *argv[]) {\n", + " if (argc != 2) {\n", + " cerr << \"Syntax: \" << argv[0] << \" \\n\";\n", + " return 1;\n", + " }\n", + " ifstream infile{argv[1]};\n", + "\n", + " string s;\n", + " getline(infile, s, '\\0');\n", + " cout << s;\n", + "}" + ] + }, + { + "cell_type": "markdown", + "id": "b753ac9b", + "metadata": {}, + "source": [ + "**Experiment:**\n", + "\n", + "* Does the program cope with whitespace, and control characters such as BEL (`'\\a'`)?\n", + "\n", + "* Adapt this program to copy a binary input file to an output file, making sure they are identical. (Hint: Binary files can, and often do, contain NUL-byte characters, so this needs to be catered for.)\n", + "\n", + "* Modify this program to use the stream **member** functions `read()`, `gcount()` and `write()` instead of `getline()` and `cout`. Hint: these functions work with C-style arrays and fixed-size binary data; look up further details in an online resource or reference book.\n", + "\n", + "## String streams\n", + "\n", + "The concept of string streams is a simple one: read from or write to a `std::string` as if it were a file or stream object. There are three types of string stream:\n", + "\n", + "* *Input string streams* are created from a `std::string`, and can subsequently read values into previously defined variables using the stream extraction operator.\n", + "\n", + "* *Output string streams* are created empty, and can be written to using the stream insertion operator, with the resulting `std::string` available from its `str()` member function.\n", + "\n", + "* A third string stream type which combines both of the above sets of functionality.\n", + "\n", + "There are a number of possible advantages to using string streams as opposed to reading from or writing to files or stream objects directly:\n", + "\n", + "* Performance may be better when caching line(s) of output (or input) in memory when writing to (or reading from) a disk file or network socket.\n", + "\n", + "* Flags set (or manipulators used) on string stream objects don't affect global `cin`/`cout`/`cerr` or `std::ifstream`/`std::ofstream` formatting states.\n", + "\n", + "* For output string streams the buffer grows dynamically so there is no risk of either buffer overflow or truncation.\n", + "\n", + "* For input string streams that set the fail-bit or bad-bit, backtracking and/or error recovery may be quicker and easier.\n", + "\n", + "Output string stream functionality is encapsulated in the `std::ostringstream` class. Firstly a default constructed object is defined, which is then written to with variables or constants using the stream insertion operator in exactly the same way as for `cout`. When the `std::ostringstream` is complete (all values have been written to it), the underlying `std::string` object can be extracted with the `str()` member function. The following simple program demonstrates this, writing its output to `puts()` (a function from the C Standard Library which accepts a `const char *`):" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "09d43227", + "metadata": {}, + "outputs": [], + "source": [ + "// 08-stringstream1.cpp : write to string stream\n", + "\n", + "#include \n", + "#include \n", + "#include \n", + "using namespace std;\n", + "\n", + "int main() {\n", + " ostringstream oss{};\n", + " oss.precision(3);\n", + " oss << fixed << 1 << '+' << 3.2 << \" = \" << 1 + 3.2;\n", + " puts(oss.str().c_str());\n", + "}" + ] + }, + { + "cell_type": "markdown", + "id": "62f16cb0", + "metadata": {}, + "source": [ + "A few things to note about this program:\n", + "\n", + "* There is (deliberately) no call to `cout`, but the syntax to write to the output string stream will be familiar.\n", + "\n", + "* The member function `precision()` and manipulator `fixed` are used to tune the formatting; these are explained further soon.\n", + "\n", + "* The *chained method call* `oss.str().c_str()` is used to avoid explicitly defining a `std::string` variable (recall that `c_str()` is the method on a `std::string` object, in this case the return value from `str()`, which returns a zero-terminated C-string.)\n", + "\n", + "In the case of a function whose purpose is to define a `std::ostringstream`, the resulting `std::string` from its `str()` member function whould usually be the return type of the function. Don't be tempted to write a `std::ostringstream` to `cout` directly; the correct form would be: `cout << oss.str();`.\n", + "\n", + "Input string stream functionality is encapsulated in the `std::istringstream` class. An object is constructed from a `std::string`, which may have been read using `getline()` (or similar) from either user input or a file. This is then proccessed using the stream extraction operator, reading values into previously defined variables, until the member function `eof()` returns `true`. The following program demonstrates use of an input string stream, however it doesn't actually perform the calculation:" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "15a77ac5", + "metadata": {}, + "outputs": [], + "source": [ + "// 08-stringstream2.cpp : validate input to calculator function\n", + "\n", + "#include \n", + "#include \n", + "#include \n", + "using namespace std;\n", + "\n", + "int main() {\n", + " string s;\n", + " double a, b;\n", + " char op;\n", + " for (;;) {\n", + " cout << \"Please enter a calculation to perform (Number Operator Number):\\n\";\n", + " getline(cin, s);\n", + " if (s.empty()) {\n", + " break;\n", + " }\n", + " istringstream iss{s};\n", + " iss >> a >> op >> b;\n", + " if (iss.fail() || !iss.eof()) {\n", + " cout << \"Bad input!\\n\";\n", + " }\n", + " else {\n", + " cout << \"Input read successfully.\\n\";\n", + " }\n", + " }\n", + "}" + ] + }, + { + "cell_type": "markdown", + "id": "480e44e8", + "metadata": {}, + "source": [ + "A few things to note about this program:\n", + "\n", + "* Whitespace is handled as if reading from `cin` (it **is** present in `s` but is **not** read from `iss`.)\n", + "\n", + "* Entering a letter instead of a number (for example) causes `fail()` to return `true`.\n", + "\n", + "* Extra input at the end of the line causes `eof()` to return `false`, and flag a \"Bad input\" error.\n", + "\n", + "**Experiment:**\n", + "\n", + "* Write a program which creates a times-table in the form: `1 x n = ` up to `12 x n =` writing each line to a `std::ostringstream`. Try to find a way to line up the multiplication signs correctly, and output the resulting `std::string` to both the console and a disk-file.\n", + "\n", + "* Modify the calculator program in `08-calc.cpp` to validate its input using a `std::istringstream`.\n", + "\n", + "## Manipulators and flags\n", + "\n", + "So far we have encountered `noskipws` which is a *stream manipulator* that works on input streams. The exact details of how this, and other, manipulators work is unimportant for the purposes of using them, however in general they are put to the stream object with either `<<` or `>>`. *Stream flags* can also be explicitly set or cleared using the member functions `setf()` and `unsetf()`, and *stream parameters* can be set using named member functions such as `width()` and `precision()`.\n", + "\n", + "Getting formatted output to \"look right\" is quite tricky and relies to a great extent on trial-and-error combined with (tedious) manual checking of program's output. For some performance-critical code, using C++ streams and manipulators may not be practical or desirable. Also, providing localization (*l10n*) to the user's language and other settings can be difficult when using interleaved manipulators and messages. For these reasons, considering use of the `print()` and `format()` family in preference is recommended.\n", + "\n", + "The following program produces a simulated cash-till receipt formatted to a width of 20 characters. This program is longer than most of the ones we've seen, and uses `struct` and `std::vector` introduced in previous Chapters. All of the text formatting functionality is in the `main()` program, so try and run the program and compare its output with the multiple uses of `cout` in the code (note that any product descriptions inputted may not contain spaces):" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "961c55ef", + "metadata": {}, + "outputs": [], + "source": [ + "// 08-receipt.cpp : output a till-receipt from user input\n", + "\n", + "#include \n", + "#include \n", + "#include \n", + "#include \n", + "using namespace std;\n", + "\n", + "struct Entry {\n", + " string product;\n", + " size_t quantity;\n", + " double unit_price;\n", + " inline static double total{};\n", + "};\n", + "\n", + "Entry add_entry(const string& input) {\n", + " Entry e;\n", + " istringstream iss{ input };\n", + " iss >> e.product >> e.quantity >> e.unit_price;\n", + " if (iss.fail()) {\n", + " cerr << \"Bad entry.\\n\";\n", + " e.quantity = 0;\n", + " }\n", + " else {\n", + " Entry::total += e.quantity * e.unit_price;\n", + " }\n", + " return e;\n", + "}\n", + "\n", + "int main() {\n", + " vector sales;\n", + " cout << \"Please enter: PRODUCT QTY PRICE (eg. \\'Apple 6 0.50\\')\\n\";\n", + " string s;\n", + " getline(cin, s);\n", + " while(!s.empty()) {\n", + " sales.emplace_back(add_entry(s));\n", + " cout << \"Please enter: PRODUCT QTY PRICE (blank line to finish)\\n\";\n", + " getline(cin, s);\n", + " }\n", + "\n", + " cout << \"====================\\n\";\n", + " auto f = cout.flags();\n", + " auto p = cout.precision(2);\n", + " cout.setf(ios_base::fixed, ios_base::floatfield);\n", + " for (const auto& line : sales) {\n", + " if (line.quantity) {\n", + " cout.setf(ios_base::left, ios_base::adjustfield);\n", + " cout.width(11);\n", + " cout << line.product;\n", + " cout.unsetf(ios_base::adjustfield);\n", + " cout.width(3);\n", + " cout << line.quantity;\n", + " cout.width(6);\n", + " cout << line.unit_price << '\\n';\n", + " }\n", + " }\n", + " cout << \"====================\\n\";\n", + " cout << \"Total:\";\n", + " cout.width(14);\n", + " cout << Entry::total << '\\n';\n", + " cout.flags(f);\n", + " cout.precision(p);\n", + "}" + ] + }, + { + "cell_type": "markdown", + "id": "39dbb381", + "metadata": {}, + "source": [ + "A few things to note about this program:\n", + "\n", + "* A `struct Entry` has three fields `product`, `quantity` and `unit_price` as well as a `static` variable `total`. The `static` variable `total` is declared `inline` so that it can both have a non-`const` value and not need to be defined outside of the `struct` (recall that `struct` and `class` definitions can often be found in headers, and that the ODR must not be violated.)\n", + "\n", + "* The *factory function* `add_entry()` makes an `Entry` object from a `std::string` using a `std::istringstream`. In case of error, the `quantity` field is set to zero, meaning that the `total` is unchanged. The parameter is passed as `const std::string&` in order to save the overhead of copying a `std::string` object.\n", + "\n", + "* The return value of this function is appended to the `vector` of `Entry`s called `sales`; `emplace_back()` is used because the `Entry` is a temporary (as opposed to a named variable). Your compliler should support Return-Value Optimization (RVO) and move semantics meaning that a copy operation should not be needed.\n", + "\n", + "* When a blank line is entered program flow moves on to the part of `main()` which formats the input data into a receipt. The current state of (all of) `cout`'s formatting flags are read into the variable `f`, and the floating-point precision into `p` while setting it to `2`. These variables are stored because changing `cout` has a global effect, so it is good practice for a part of a program which modifies them to always restore their previous values.\n", + "\n", + "* Before the range-for loop `cout.precision(2)` sets the number of digits for floating-point numbers, while `cout.setf(ios_base::fixed | ios_base::left)` sets the default formatting of all fields to both left-aligned and exactly two digits after the floating point. See the table below for a complete list.\n", + "\n", + "* Inside the loop `cout.width()` is used to set the field width of (only) the immediately following field, while `cout.setf(ios_base::right)` sets right-aligment and `cout.unsetf(ios_base::right)` resets this.\n", + "\n", + "**Experiment:**\n", + "\n", + "* Add a `static` member variable `lines` of type `size_t` to `struct Entry` in order to keep a tally of the total number of lines sold, and print this out with the total.\n", + "\n", + "* Remove the need for the two `static` member variables. Is this an improvement to the code or not?\n", + "\n", + "* Output the prices with your regional currency symbol (or `$`), while keeping the formatting width the same. (Hint: this is slightly more complicated than it may at first seem; for example you need to find a way to treat `$1.99` as a single entity to be right-aligned.)\n", + "\n", + "There are quite a lot of stream formatting flags and parameters available, most of which have an equivalent stream manipulator. All of these modify the state of the stream for all subsequent output, with the exception of member function `width()` and stream manipulator `setw()`, which only modify the immediately following output field. This table is intended to be a reference to the more common member functions, flags and manipulators; the best way to learn about formatting variants is to write (small) test programs and observe their output.\n", + "\n", + "| Manipulator | Member function | Action |\n", + "|:---------------------------------:|:--------------------------------------------------------------------------:|:----------------------------------------------:|\n", + "| s<ios_base::adjustfield) | Pad between sign and numeric value |\n", + "| s<ios_base::adjustfield) | Pad after the value |\n", + "| s<ios_base::adjustfield) | Pad before the value |\n", + "| s<ios_base::basefield) | Output integers in decimal |\n", + "| s<ios_base::basefield) | Output integers in hexadecimal |\n", + "| s<ios_base::basefield) | Output integers in octal |\n", + "| s<ios_base::floatfield) | Output floating-point values as ddd.dd |\n", + "| s<ios_base::floatfield) | Output floating-point values as d.ddddEdd |\n", + "| s<ios_base::scientific,
ios_base::floatfield) | Use hexadecimal for exponent and mantissa |\n", + "| s< s>>boolalpha | s.setf(ios_base::boolalpha) | Booleans output or input as 'true' or 'false' |\n", + "| s< s>>noboolalpha | s.unsetf(ios_base::boolalpha) | Booleans output or input as '1' or '0' |\n", + "| s>>skipws | s.setf(ios_base::skipws) | Skip preceding whitespace |\n", + "| s>>noskipws | s.unsetf(ios_base::skipws) | Don't skip input whitespace |\n", + "\n", + "**Experiment:**\n", + "\n", + "* Change the last program to use manipulators on `cout` instead of member functions. (Hint: You will need `left`, `right`, `fixed`, `setw()` and `setprecision()`, and possibly the header ``.)\n", + "\n", + "## User-defined types and I/O\n", + "\n", + "It is possible, and sometimes desirable, to define how user-defined types are formatted when put to output streams with `<<`. This is done by overloading the global `operator<<`, which, despite appearances, is actually the **name** of the function for which you must write an overload. Sadly, the syntax is ugly, unlike in some other programming languages where you merely provide a `tostring()` method, or similar.\n", + "\n", + "The following program reintroduces the `Point` type and defines an *overloaded* stream output function (overloaded becuase the function already exists with a different second parameter for other built-in and user types):" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "37bd77fd", + "metadata": {}, + "outputs": [], + "source": [ + "// 08-point1.cpp : a Point class with ostream formatter\n", + "\n", + "#include \n", + "using namespace std;\n", + "\n", + "struct Point {\n", + " int x{}, y{};\n", + "};\n", + "\n", + "ostream& operator<< (ostream& os, const Point& p) {\n", + " os << '(' << p.x << ',' << p.y << ')';\n", + " return os;\n", + "}\n", + "\n", + "int main() {\n", + " Point p{ 1, 2 };\n", + " cout << p << '\\n';\n", + " cout << Point{ 3, 4 } << '\\n';\n", + "}" + ] + }, + { + "cell_type": "markdown", + "id": "6e4f3fea", + "metadata": {}, + "source": [ + "A few of things to note about this program:\n", + "\n", + "* Custom stream output functions are often trivial to write, in this case only three significant lines of code. They are **always** global functions.\n", + "\n", + "* The object being output should be passed as a `const`-reference to the stream output function. The function will often need to access the internals of the object, so in the case of a `class` with `private` members the function chould either be a `friend` (see Chapter 9), or rely on getters.\n", + "\n", + "* The return type of `operator<<` is a reference to the (newly modified) `std::ostream` object passed as the first parameter. This is important as it allows chaining of output operations. (If the return type used were `void`, `cout << p;` would be legal but `cout << p << '\\n';` would not be.)\n", + "\n", + "**Experiment:**\n", + "\n", + "* Modify this program to output `Pixel`s in the format: `red@(0,0)`, using the version of `Pixel` that derives from `Point`. Hint: you will need to use a `switch` statement, and a `static_cast()` to avoid code duplication.\n", + "\n", + "* Define `operator<<` for `Entry`s and use `cout << line << '\\n';` in `main()`'s for-loop in the program `08-receipt.cpp` above.\n", + "\n", + "Less common, but sometimes necessary, is allowing input of user-defined types from input streams using the stream extraction operator. Care must be taken the allow for incorrect or invalid input, setting the stream state to \"bad\" in this case. Also, the stream flags and parameters should not be modified unless care is taken to reset them (all) before returning the modified `std::istream` object.\n", + "\n", + "The following program reads `Point`s from `cin` in an infinite loop, informing the user whether or not the input was successful. Notice that this user feedback is provided in the **calling** function (in this case `main()`) and not the overloaded `operator>>` function. The second parameter to `operator>>` is **not** declared `const` as it is modified by this function (in order to return its newly read value to the caller function)." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "ac0754d2", + "metadata": {}, + "outputs": [], + "source": [ + "// 08-point2.cpp : read Points from input stream\n", + "\n", + "#include \n", + "using namespace std;\n", + "\n", + "struct Point {\n", + " int x{}, y{};\n", + "};\n", + "\n", + "istream& operator>> (istream& is, Point& p) {\n", + " char a{}, b{}, c{};\n", + " int px, py;\n", + " is >> a >> px >> b >> py >> c;\n", + " if (is.good()) {\n", + " if (a == '(' && b == ',' && c == ')') {\n", + " p.x = px;\n", + " p.y = py;\n", + " }\n", + " else {\n", + " is.setstate(ios_base::failbit);\n", + " }\n", + " }\n", + " return is;\n", + "}\n", + "\n", + "int main() {\n", + " cout << \"Please enter Points, in the form \\'(2,-3)\\'\\n\";\n", + " Point p;\n", + " while (!cin.eof()) {\n", + " cin >> p;\n", + " if (cin.good()) {\n", + " cout << \"Point read successfully!\\n\";\n", + " }\n", + " else {\n", + " cout << \"Error in input!\\n\";\n", + " cin.clear();\n", + " }\n", + " }\n", + "}" + ] + }, + { + "cell_type": "markdown", + "id": "4e0aa2a1", + "metadata": {}, + "source": [ + "**Experiment:**\n", + "\n", + "* Try entering multiple points on one line. Does the program work as expected?\n", + "\n", + "* Deliberately enter some invalid data, followed by valid `Point`s. Does error recovery work as expected?\n", + "\n", + "* Modify this program to check each input field as entered and act on any incorrect input. Hint: you will need five successive uses of `is >>`. Does error recovery work in the same way?\n", + "\n", + "* Modify this program to read `Pixel`s.\n", + "\n", + "*All text and program code ©2019-2025 Richard Spencer, all rights reserved.*" + ] + } + ], + "metadata": { + "jupytext": { + "cell_metadata_filter": "-all" + }, + "kernelspec": { + "display_name": "C++ 23", + "language": "c++", + "name": "cpp23" + } + }, + "nbformat": 4, + "nbformat_minor": 5 +} diff --git a/jupyter-notebooks/09-classes-friends-and-polymorphism.ipynb b/jupyter-notebooks/09-classes-friends-and-polymorphism.ipynb new file mode 100644 index 0000000..18e3a0f --- /dev/null +++ b/jupyter-notebooks/09-classes-friends-and-polymorphism.ipynb @@ -0,0 +1,784 @@ +{ + "cells": [ + { + "cell_type": "markdown", + "id": "1e9e9430", + "metadata": {}, + "source": [ + "# Classes, Friends and Polymorphism\n", + "\n", + "## Private, protected and public\n", + "\n", + "Classes are used to model abstract and concrete entities in ways that combine state with functionality. (The state is held in member variables and the functionality is provided by member functions.) The literature tells us that classes encapsulate both data and the methods (functions) that act upon that data. Objects are *instances* of a particular class in the same way that (normal) variables are instances of a (built-in) type.\n", + "\n", + "Let us consider a minimalist `Person` class, which we will later extend to `Student` and `Employee` through inheritance. Our first attempt might look like this:" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "e6e2a548", + "metadata": {}, + "outputs": [], + "source": [ + "struct Date {\n", + " int year{}, month{}, day{};\n", + "};\n", + "\n", + "class Person {\n", + " Date dob;\n", + " string familyname, firstname;\n", + "};\n", + "\n", + "Person a_person{};\n", + "\n", + "Person genius{ { 1879, 3, 14 }, \"Einstein\", \"Albert\" }; // Error: does not (yet) compile" + ] + }, + { + "cell_type": "markdown", + "id": "75d60705", + "metadata": {}, + "source": [ + "This `Person` class (here defined with `class` as opposed to the `struct` keyword we met in Chapter 6) contains three members: `dob` (itself of a user-defined type called `Date`), `familyname` and `firstname` (both of which are `std::string`s). We can define a variable of type `Person` (here `a_person`) using default-initialization syntax (the braces shown here are in fact optional, while empty parentheses are **not** permitted) but we cannot do a lot else with this object. Its fields will be zero-initialized for `a_person.dob.year`, `a_person.dob.month`, and `a_person.dob.day`, while `a_person.familyname` and `a_person.firstname` are empty strings. This is becuase the access specifier `private:` (which we also met in Chapter 6) is always implied for `class`es. This means we cannot either access the fields (member variables) directly using dot-notation, or use uniform initialization syntax, as with `genius`.\n", + "\n", + "**Experiment:**\n", + "\n", + "* Change the above fragment to use `struct` instead of `class` in order to enable compilation, and also write an empty `main()` function. Does the program run? Is it therefore self-contained?\n", + "\n", + "* Now try to create `genius` within `main()` using assignment to member variables and uniform initialization. What error messages do you get? Does changing the keyword `class` to `struct` fix this problem in both cases?\n", + "\n", + "The key to solving the inability to create `Person`s using uniform initialization syntax is solved by writing a *constructor*. Access to member variables after the object has been created is achieved using *getters* and *setters*, which we met previously in Chapter 6. In order to be useful, a constructor must be declared after a `public:` access specifier. The following program demonstrates this, together with a `main()` program which produces output:" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "d37195e0", + "metadata": {}, + "outputs": [], + "source": [ + "// 09-person1.cpp : model Person as a class with constructor\n", + "\n", + "#include \n", + "#include \n", + "#include \n", + "#include \n", + "using namespace std;\n", + "using namespace std::chrono;\n", + "\n", + "class Person {\n", + "public:\n", + " Person(const year_month_day& dob, string_view familyname, string_view firstname)\n", + " : dob{ dob }, familyname{ familyname }, firstname{ firstname }\n", + " {}\n", + " string getName() const { return firstname + ' ' + familyname; }\n", + " const year_month_day& getDob() const { return dob; }\n", + "private:\n", + " const year_month_day dob;\n", + " string familyname, firstname;\n", + "};\n", + "\n", + "\n", + "int main() {\n", + " Person genius{ { 1879y, March, 14d }, \"Einstein\", \"Albert\" };\n", + " cout << genius.getName() << \" was born \" << genius.getDob() << '\\n';\n", + "}" + ] + }, + { + "cell_type": "markdown", + "id": "847e8aae", + "metadata": {}, + "source": [ + "Quite a few things to note about this program:\n", + "\n", + "* A constructor has the format *ClassName* ( *parameter-list* ) : *member initializers* { *possibly empty function body* } and does **not** have a return type declared.\n", + "\n", + "* The constructor's parameters have names `dob`, `familyname` and `firstname`, these being the same names as for the member variables (this is allowed in Modern C++). The conventions for naming (`private:`) class members vary, historically a trailing underscore is used, but this can become difficult to read.\n", + "\n", + "* The member variables are initialized using uniform initialization syntax; this forbids narrowing conversions, and there shouldn't be any as the parameter types should have been carefully chosen. (Older code may use parentheses here instead of braces.) The order of construction is the same as the way the member fields are laid out (in this class they are all after the `private:` access specifier); the order in the comma-separated initializers is unimportant (although you should try to replicate the order of the member fields, and your compiler will warn if they differ). The constructor's body is empty here (although it must be present), and this is not unusual.\n", + "\n", + "* The `std::chrono::year_month_day` parameter (itself initialized by uniform initialization) is passed as `const`-reference instead of by value, as it is probably too big to fit in a single register to pass by value. The names are passed by value as `std::string_view` although in older code `const std::string&` would be common.\n", + "\n", + "* The member function `getName()` is declared `const` as it is guaranteed not to change any member variables. It returns a newly created `std::string` which must be returned by value.\n", + "\n", + "* The member variable `dob` is declared `const` as it will never need to be changed; of course it needs to be initialized by the constructor, and this case is allowed. The member variables `familyname` and `firstname` need to be of type `std::string` (not `std::string_view` as for the constructor's parameters) for them to be guaranteed to exist for the lifetime of the class (consider factory functions which return a newly-constructed object, as we saw in Chapter 8).\n", + "\n", + "* The member function `getDob()` is also declared `const` and returns a `const`-reference. It is possible to put this return value directly to a `std::ostream` as the Standard Library overloads `operator<<` for `std::chrono::year_month_day`.\n", + "\n", + "**Experiment:**\n", + "\n", + "* Add more `Person` objects to `main()`, and output their names.\n", + "\n", + "* Rewrite the constructor to initialize the member variables in the body, instead of using the comma-separated list of member initializers.\n", + "\n", + "* Modify this program to use `std::println()` instead of `cout`. Perform most of the formatting in a `const` member function `toString()`, which returns a `std::string`.\n", + "\n", + "* Write getters (all declared `const`) called `getFamilyName()` and `getFirstName()` avoiding creation of unnecessary temporary variables. Modify `main()` to use these.\n", + "\n", + "* Write setters called `setFamilyName()` and `setFirstName()`. Test these from `main()` again.\n", + "\n", + "* Modify the original constructor to allow for `firstname` not being present. Hint: use a defaulted function parameter. What other function needs to be changed?\n", + "\n", + "* Try to create a default-constructed `Person`. What do you find? Now try to create a `public:` default constructor (with an empty parameter list).\n", + "\n", + "There is a third type of access specifier called `protected:`. Its meaning is the same as for `private:` except when inheritance is in use, when it means that (member functions defined within) derived classes have access to any members in the base class which were declared `protected:`. It's rare to find this in real code, although the next program we shall look at demonstrates its syntax and use.\n", + "\n", + "Unlike with the `struct`s we met in Chapter 6, *private inheritance* is the default for `class`es. This means that any members which were public in the base class are not visible to users of the derived class. The literature tells us that (public) inheritance describes an *is-a* relationship, while the much rarer private inheritance describes an *is-implemented-by* relationship. Typically, this means that a **privately** derived class must provide (public) member functions which in turn call member functions of the base class. Interestingly, this doesn't necessarily mean that the size and binary layout of the privately derived class is different from that of the base class, unless it has additional member variables.\n", + "\n", + "(*Protected inheritance*, as opposed to protected members, is even more unusual, and quite possibly has no useful purpose. It isn't discussed further here.)\n", + "\n", + "The following program defines three `class`es, the second and third of which derive from the first. A collection of related classes that utilize inheritance is sometimes called an *inheritance hierarchy*. Quite a few changes have been made to `Person` so it is probably worth studying this first, before moving onto the new (derived) `Student` and `Employee` classes (quite a few member functions have been written on one line, to save space):" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "ad397213", + "metadata": {}, + "outputs": [], + "source": [ + "// 09-person2.cpp : model Person, Student and Employee as a class inheritance hierarchy\n", + "\n", + "#include \n", + "#include \n", + "#include \n", + "#include \n", + "#include \n", + "using namespace std;\n", + "using namespace std::chrono;\n", + "\n", + "class Person {\n", + "public:\n", + " Person(year_month_day dob) : dob{ dob } {}\n", + " Person(year_month_day dob, string_view familyname, string_view firstname, bool familynamefirst = false)\n", + " : dob{ dob }, familyname{ familyname }, firstname{ firstname },\n", + " familynamefirst{ familynamefirst } {}\n", + " virtual ~Person() {}\n", + " void setFamilyName(string_view familyname) { familyname = familyname; }\n", + " void setFirstName(string_view firstname) { firstname = firstname; }\n", + " void setFamilyNameFirst(bool familynamefirst) { familynamefirst = familynamefirst; }\n", + " string getName() {\n", + " if (familyname.empty() || firstname.empty()) {\n", + " return familyname + firstname;\n", + " }\n", + " else if (familynamefirst) {\n", + " return familyname + ' ' + firstname;\n", + " }\n", + " else {\n", + " return firstname + ' ' + familyname;\n", + " }\n", + " }\n", + "protected:\n", + " const year_month_day dob;\n", + "private:\n", + " string familyname, firstname;\n", + " bool familynamefirst{};\n", + "};\n", + "\n", + "class Student : public Person {\n", + "public:\n", + " enum class Schooling;\n", + " Student(const Person& person, const vector& attended_classes = {}, Schooling school_type = Schooling::preschool)\n", + " : Person{ person }, school_type{ school_type }, attended_classes{ attended_classes } {}\n", + " const year_month_day& getDob() const { return dob; }\n", + " const vector& getAttendedClasses() const { return attended_classes; }\n", + " enum class Schooling { preschool, elementary, juniorhigh, highschool, college, homeschool, other };\n", + "private:\n", + " Schooling school_type;\n", + " vector attended_classes;\n", + "};\n", + "\n", + "class Employee : public Person {\n", + "public:\n", + " Employee(const Person& person, int employee_id, int salary = 0)\n", + " : Person{ person }, employee_id{ employee_id }, salary{ salary } {}\n", + " bool isBirthdayToday(year_month_day today) const { return dob.month() == today.month() && dob.day() == today.day(); }\n", + " void setSalary(int salary) { salary = salary; }\n", + " auto getDetails() const { return pair{ employee_id, salary }; }\n", + "private:\n", + " const int employee_id;\n", + " int salary;\n", + "};\n", + "\n", + "int main() {\n", + " Person genius{ { 1879y, March, 14d }, \"Einstein\", \"Albert\" };\n", + " Student genius_student{ genius, { \"math\", \"physics\", \"philosophy\" }, Student::Schooling::other };\n", + " Employee genius_employee{ genius, 1001, 15000 };\n", + "\n", + " cout << \"Full name: \" << genius_student.getName() << '\\n';\n", + "\n", + " cout << \"School classes: \";\n", + " for (const auto& the_class : genius_student.getAttendedClasses()) {\n", + " cout << the_class << ' ';\n", + " }\n", + " cout << '\\n';\n", + "\n", + " auto [ id, salary ] = genius_employee.getDetails();\n", + " cout << \"ID: \" << id << \", Salary: $\" << salary << '\\n';\n", + " year_month_day next_bday{ 2024y, March, 14d };\n", + " if (genius_employee.isBirthdayToday(next_bday)) {\n", + " cout << \"Happy Birthday!\\n\";\n", + " }\n", + "}" + ] + }, + { + "cell_type": "markdown", + "id": "ba92d6c6", + "metadata": {}, + "source": [ + "Many things to note about this program:\n", + "\n", + "* A second constructor for `Person` taking only a `std::chrono::year_month_day` has been added. Setters can be used later to initialize or modify the other three member variables, which are left defaulted by this constructor (empty for the two `std::string`s and `false` for the `bool`).\n", + "\n", + "* A `virtual` destructor has been added to `Person`; a key C++ concept is that base classes often need a virtual destructor. This is so that any heap objects of type `Student` or `Employee` assigned to a pointer of type `Person*` (including use of smart pointers), the correct destructor of the **derived** class can be found and thus called, avoiding memory leaks.\n", + "\n", + "* The `getName()` function returns the name(s) provided by either the constructor or the setter(s) as a single `std::string`, ordered according to the member variable `familynamefirst`. (Hopefully this attempt at cultural inclusion doesn't offend anyone!)\n", + "\n", + "* The member variable `dob` is declared `protected:`, the other three are `private:`, as before.\n", + "\n", + "* The `Student` type is derived from `Person` using the keyword `public`. If this keyword were omitted, none of `Person`'s `public:` members would be visible to users of `Student`, as `class`es default to private inheritance. The syntax is exactly as for `Pixel` inheriting from `Point` in Chapter 6.\n", + "\n", + "* An `enum class` called `Schooling` is also forward-declared so that it is able to be used as a constructor parameter.\n", + "\n", + "* The three `Student` constructor parameters are an existing `Person` object used to initialize the base class part, an optional `vector` (needed to be passed by value in this case), and an optional value from the enumeration set `Schooling`.\n", + "\n", + "* The base class portion of `Student` is initialized as `Person{ person }` where `person` is of type `const Person&`. Then the other two fields of `Student` are initialized. The constructor parameter variable `attended_classes` is passed as a `const vector&` so that only one copy is made, which is when the member variable of the same name is initialized.\n", + "\n", + "* A `public:` member function `getDob()` makes the `protected:` data member of the base class `dob` available to **users** of the derived class, in this case `Student`. It is declared `const` and returns a `const`-reference.\n", + "\n", + "* The member function `getAttendedClasses()` returns a `const`-reference to `attended_classes`, therefore this `std::vector` is made visible to the function which calls this member function, but is not modifiable.\n", + "\n", + "* The `Employee` constructor takes three parameters, the third of which is optional. The base class portion is initialized in the same way as for `Student`.\n", + "\n", + "* The member function `isBirthdayToday()` takes a `std::chrono::year_month_day` as a parameter and compares the return values of the `day()` and `month()` members with those of `dob`, returning `true` if they are the same, or `false` otherwise. (We're pretending \"today\" is March 14, 2024, so this function always returns `true`.)\n", + "\n", + "* The member variable `employee_id` is not meant to be able to be changed, so is declared `const`. The setter `setSalary()` is defined so that `salary` can be updated, while the getter `getDetails()` returns an aggregate of both derived class member variables by value.\n", + "\n", + "**Experiment:**\n", + "\n", + "* Modify `main()` to remove the need for the variable `Person genius`. Hint: there will be some necessary code duplication.\n", + "\n", + "* Add some other `Student`s and `Employee`s. Experiment with minimalist and partial initializations.\n", + "\n", + "* Experiment with the member functions not previously called from `main()`.\n", + "\n", + "* Write a getter/setter pair to retrieve/modify `school_type` for `Student`.\n", + "\n", + "* Write a second constructor for `Student` which takes (in addition) the parameters needed to define a `Person`. Initialize the `Person` base class from these parameters. Should these parameters before or after the ones specific to `Student`? Can they be defaulted?\n", + "\n", + "* Write a second constructor for `Employee` to accomplish the same thing.\n", + "\n", + "* Add `getDob()` to `Employee`, as for `Student`. Now try to add it to `Person`, what do you find? Would a single `public:` getter in the base class be more useful than a `protected:` member variable?\n", + "\n", + "* Add member functions `addAttendedClass()` and `removeAttendedClass()` to `Student`. Make them smart enough to handle duplicates/invalid parameters.\n", + "\n", + "* Add the field `job_title` to `Employee` as well as support for this in the relevant getters/setters/constructors.\n", + "\n", + "## Copying and comparisons\n", + "\n", + "So far we have created stack objects and accessed their member functions. Often, you will want to make copies of these objects, whether its passing them to, or returning them from, functions, or storing them in a container. Sometimes they are passed by reference instead, and this is preferred for (larger) user-defined types, as passing by value has to cause a (potentially) expensive copy to be made. However the class designer needs to be aware of all of the copy and move operations that might be required of object instances, and must ensure they are implemented correctly.\n", + "\n", + "There are **six** operations which are involved in this discussion: three constructors, two assignment operators and the destructor. All of these can be explicitly declared `= default` or `= delete`. (We have already discovered that defining a constructor which takes parameters causes the default constructor to no longer be generated.) The other two constructors and the two assignment operators each come in two forms: copy and move, as shown in the next code fragment, using `Person` as the name of the class, and a code example of when the operation would be called. (The *boilerplate* code shown here can be copied verbatim for other `class`es, simply changing every occurence `Person` to the name of the class. The actual variable parameter name, often being `rhs`, has been omitted; these are the minimalist forms of the member function declarations.)" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "5054e4cb", + "metadata": {}, + "outputs": [], + "source": [ + "class Person {\n", + "// rest of class definition omitted\n", + "public:\n", + " // \"default constructor\"\n", + " Person() = delete; // Person p1{}, p2(), p3;\n", + " // \"copy constructor\"\n", + " Person(const Person&) = delete; // Person p4{ p1 }, p5(p2);\n", + " // \"copy assignment operator\"\n", + " Person& operator= (const Person&) = delete; // Person p6; p6 = p1;\n", + " // \"move constructor\"\n", + " Person(Person&&) = delete; // Person p7{ std::move(p2) };\n", + " // \"move assignment operator\"\n", + " Person& operator= (Person&&) = delete; // Person p8; p8 = std::move(p3);\n", + " // \"destructor\"\n", + " ~Person() = delete; // Any Person object going out of scope\n", + "};" + ] + }, + { + "cell_type": "markdown", + "id": "1d00f192", + "metadata": {}, + "source": [ + "**Experiment:**\n", + "\n", + "* Add the above code to the end of the definition of the `Person` class from `09-person1.cpp`. Why doesn't the code compile now? Hint: read the error message carefully. Fix this by `= default`ing just one of the operations.\n", + "\n", + "* Try to create `p4` to `p8` as above, `= default`ing the operations as necessary. Are the (copied/assigned) objects in a valid state? Hint: try to use their member functions.\n", + "\n", + "* Now use `auto` instead of `Person`, for example: `auto p1{};`. Does the code still compile? Are the objects valid?\n", + "\n", + "As can be seen we are aided by the compiler in the provision of object duplication, as many (probably most) of the classes you will write have valid (`= default`) *special member functions* generated as they are needed. (The exact rules of when and which of them are generated automatically are slightly arcane; you may find references to the \"rule of five\" for Modern C++ online or in literature.) However, the exception proves the rule, and I would suggest declaring the first five of these `= delete` when writing a new class, enabling them one by one with `= default` as any compiler errors present themselves, ensuring that objects can be copied and moved correctly. Most member variable types are compatible with default copy/move semantics, the obvious one that isn't being raw pointers. Writing custom special member functions for derived classes is sometimes tricky, as it involves manually invoking the correct special member function on the base class. (Hopefully you won't have to do this very often, further explanation is beyond the scope of this Tutorial.) Be aware that if the member variables (of a base or derived class) themselves obey the usual rules of copying (such as `int`, `double`, `std::string`, `std::shared_ptr`, but not `char *` for example) then the `=default` special member functions will always work correctly.\n", + "\n", + "Often we will want to compare objects for equivalence. Some containers, such as `std::unordered_map`, mandate that `operator==` is defined, while others such as `std::map`, require `operator<`, so we can only store objects in associative containers if the required `operator`s have been defined. The following code defines a rudimentary member `operator==` for the `Person` class from `09-person1.cpp`, the syntax from Chapter 6 should be familiar:" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "186fde34", + "metadata": {}, + "outputs": [], + "source": [ + "class Person {\n", + "// rest of class definition omitted\n", + "public:\n", + " bool operator== (const Person& rhs) { return getName() == rhs.getName(); }\n", + "};" + ] + }, + { + "cell_type": "markdown", + "id": "fe592def", + "metadata": {}, + "source": [ + "Alternatively, global `operator==` can be overloaded for `Person`, as demonstrated here:" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "d2ad794f", + "metadata": {}, + "outputs": [], + "source": [ + "bool operator== (const Person& lhs, const Person& rhs) {\n", + " return lhs.getName() == rhs.getName();\n", + "}" + ] + }, + { + "cell_type": "markdown", + "id": "d7ebc231", + "metadata": {}, + "source": [ + "Defining either one of these variants of `operator==` is sufficient to make the following code compile:" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "15aa05b4", + "metadata": {}, + "outputs": [], + "source": [ + "int main() {\n", + " Person person1 { { 2000, 1, 1 }, \"John\", \"Smith\" };\n", + " auto person2{ person1 };\n", + " if (person1 == person2) {\n", + " cout << \"Same!\\n\";\n", + " }\n", + " else {\n", + " cout << \"Different!\\n\";\n", + " }\n", + "}" + ] + }, + { + "cell_type": "markdown", + "id": "ded6dab8", + "metadata": {}, + "source": [ + "A couple of things to note:\n", + "\n", + "* The return type of both variants is `bool` (not `Person&`).\n", + "\n", + "* The member function version has access to its own member variables and those of `rhs` (even though it doesn't access them directly), while the global (free) function version relies on public getters.\n", + "\n", + "**Experiment:**\n", + "\n", + "* Define `person2` with a different date of birth. How do they compare now?\n", + "\n", + "* Can you fix this problem by modifying the member `operator==`?\n", + "\n", + "* Do the same with global `operator==`?\n", + "\n", + "* Write a member `operator==` for `Employee` from `09-person2.cpp`, that compares the `employee_id` member (only) for equality, and then test this operator.\n", + "\n", + "## Friend functions and classes\n", + "\n", + "Friends have access to all members of the `class` that declares them a `friend`, including those declared `private:` or `protected:`. Sometimes this is desirable, as shown in the following program:" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "7483a6db", + "metadata": {}, + "outputs": [], + "source": [ + "// 09-person3.cpp : define operator<=> for Person class\n", + "\n", + "#include \n", + "using namespace std;\n", + "\n", + "struct Date {\n", + " int year{}, month{}, day{};\n", + " auto operator<=>(const Date&) const = default;\n", + "};\n", + "\n", + "class Person {\n", + "public:\n", + " Person(const Date& dob, string_view familyname, string_view firstname)\n", + " : dob{ dob }, familyname{ familyname }, firstname{ firstname }\n", + " {}\n", + " string getName() const { return firstname + ' ' + familyname; }\n", + " const auto& getDob() const { return dob; }\n", + " auto operator<=>(const Person&) const = default;\n", + " friend ostream& operator<< (ostream&, const Person&);\n", + "private:\n", + " string familyname, firstname;\n", + " const Date dob;\n", + "};\n", + "\n", + "ostream& operator<< (ostream& os, const Person& p) {\n", + " os << \"Name: \" << p.getName() << \", DOB: \"\n", + " << p.dob.year << '/' << p.dob.month << '/' << p.dob.day;\n", + " return os;\n", + "}\n", + "\n", + "int main() {\n", + " Person person1{ { 2000, 1, 1 }, \"Doe\", \"John\" },\n", + " person2{ { 1987, 11, 31 }, \"Doe\", \"John\" };\n", + " cout << \"person1: \" << person1 << '\\n';\n", + " cout << \"person2: \" << person2 << '\\n';\n", + " if (person1 == person2) {\n", + " cout << \"Same person!\\n\";\n", + " }\n", + " else {\n", + " cout << \"Different person!\\n\";\n", + " }\n", + "\n", + " cout << \"person1 is \";\n", + " if (person1.getDob() > person2.getDob()) {\n", + " cout << \"younger than \";\n", + " }\n", + " else if (person1.getDob() < person2.getDob()) {\n", + " cout << \"older than \";\n", + " }\n", + " else {\n", + " cout << \"the same age as \";\n", + " }\n", + " cout \" person2\\n\";\n", + "}" + ] + }, + { + "cell_type": "markdown", + "id": "08fdf66d", + "metadata": {}, + "source": [ + "Some things to note about this program:\n", + "\n", + "* Member `operator<=>` (the \"spaceship operator\") is defaulted for this roll-your-own `Date`; this is all that is needed for the equality and ordering comparisons to be defined for this class, with ordering performed member-wise starting with the first data member.\n", + "\n", + "* Within the definition of `Person`, global `operator<<` is declared as a `friend` function. This is more boilerplate that you can use in your own classes, changing parameter `const Person&` to the name of your class. (They are identical to normal function declarations, other than the use of the `friend` keyword.)\n", + "\n", + "* Member `operator<=>` is defaulted for `Person`; with this code the `std:::string` members will be compared (`familyname` before `firstname`), before the `Date` members are compared.\n", + "\n", + "* Global `operator<<` is also defined for `Person`, allowing objects to be put to `cout` (and any other `std::ostream`s) using `<<`. This needs to be a `friend` because it accesses `dob`.\n", + "\n", + "**Experiment:**\n", + "\n", + "* Give `person2` the same date of birth as `person1`. Does the program produce the expected output?\n", + "\n", + "* Now give them different names. What output do you get?\n", + "\n", + "* Define global `operator<<` for `Date`. Can you remove the need for `operator<<` for `Person` to itself be a `friend` of `class Person`?\n", + "\n", + "* Compare a few `Person` instances with similar or same family names and first names, storing them in a `std::set`. Write code to output them telephone-book style. Are they ordered in the way you would expect?\n", + "\n", + "Classes can be declared `friend`s as well as functions, although this use is probably less common. The following program defines two `class`es `A` and `B` which are mutual friends, thus allowing member functions of either to access each other's `private:` members." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "07a3077d", + "metadata": {}, + "outputs": [], + "source": [ + "// 09-friends.cpp : two classes as friends of each other\n", + "\n", + "#include \n", + "using namespace std;\n", + "\n", + "class B;\n", + "\n", + "class A {\n", + "public:\n", + " friend class B;\n", + " void a(B& other);\n", + "private:\n", + " int m_a{42};\n", + "};\n", + "\n", + "class B {\n", + "public:\n", + " friend class A;\n", + " void b(A& other) { cout << \"b():\" << other.m_a << '\\n'; }\n", + "private:\n", + " double m_b{1.414};\n", + "};\n", + "\n", + "void A::a(B& other)\n", + "{\n", + " cout << \"a():\" << other.m_b << '\\n';\n", + "}\n", + "\n", + "int main() {\n", + " A obj_a{};\n", + " B obj_b{};\n", + "\n", + " obj_a.a(obj_b);\n", + " obj_b.b(obj_a);\n", + "}" + ] + }, + { + "cell_type": "markdown", + "id": "79c07aaa", + "metadata": {}, + "source": [ + "A few things to note about this program:\n", + "\n", + "* In order for `friend class B` to be written within `class A`, the **delaration** `class B;` must appear beforehand. This forward declaration allows a reference (or pointer, including smart pointer) to `B` to be taken and used, but members cannot (yet) be accessed.\n", + "\n", + "* The definition of `class A`'s member function `a()` must be written outside of the function body, **after** the definition of `class B`. It is important to appreciate that it is **still** a member function, not a global function, when written after the class definition *non-inline* (or *out-of-line*) in this way using the scope resolution operator (`::`).\n", + "\n", + "* The definition of `class B` declares `friend class A` and its member function `b()` can access `other.m_a` for this reason.\n", + "\n", + "* The member variables need a prefix (such as `m_`) because member functions called `a()` and `b()` are used, and the names would clash.\n", + "\n", + "**Experiment:**\n", + "\n", + "* Change the types of the member variables and their values. Does the program compile without further changes?\n", + "\n", + "* Add a defaulted second parameter to `a()` and `b()` which is used to set the value of other class's member variable.\n", + "\n", + "* Go back to `Person` from `09-person1.cpp` and define `getName()` outside the class body. Does it still need a declaration inside the class body? Can this definition now appear after `main()`? What do free functions and non-inline member functions have in common?\n", + "\n", + "## Polymorphism\n", + "\n", + "The literature tells us that polymorphism \"is a concept in type theory wherein a name may denote instances of many different classes as long as they are related by some common superclass\" (Booch, \"Object-Oriented Analysis and Design with Applications\"[^1]). What this means in practice is that derived class objects can be manipulated through a pointer or reference to a base class type, with member function selection being resolved **at run-time**. This probably doesn't sound too exciting, but is important in order for C++ to be classified as an object-**oriented** programming language, as opposed to merely object-**based**. Member functions whose selection is determined at run-time are called *virtual* functions, and are defined with the `virtual` keyword (which we have already met when discussing virtual destructors in base classes).\n", + "\n", + "The following code defines (part of) an *abstract base class* called `Virtual`; it makes use of virtual functions in the following forms:" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "b2defe26", + "metadata": {}, + "outputs": [], + "source": [ + "class Virtual {\n", + "public:\n", + " virtual void f();\n", + " virtual void g() = 0;\n", + " virtual void h() override;\n", + " virtual void k() override final;\n", + "};" + ] + }, + { + "cell_type": "markdown", + "id": "ae447743", + "metadata": {}, + "source": [ + "The meanings implied for these member functions in the context of the `virtual` keyword are as follows:\n", + "\n", + "* `f()` is a function in a base class or derived class which can (optionally) be redefined (in the derived class).\n", + "\n", + "* `g()` is a *pure-virtual* function of an abstract base class, which is not usually defined in this class and **must** be defined in a class that derives from it, in order for objects of the derived class to able to be created. Objects of an abstract class **cannot** be instantiated; attempting to do so would trigger a compile-time error.\n", + "\n", + "* `h()` is a function in a derived class which redefines (overrides) a previous definition; the function signature must exactly match that in the base class (including `const` and `noexcept` qualifiers). This function **can** itself be redefined in any subsequently derived class.\n", + "\n", + "* `k()` is the same as `h()` except this function **cannot** again be redefined in a subsequently derived class.\n", + "\n", + "The following program demonstrates all of these uses in a more complex hierarchy deriving from an abstract `Shape` class:" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "ee8b8d6b", + "metadata": {}, + "outputs": [], + "source": [ + "// 09-shape.cpp : Shape class hierarchy demonstrating polymorphism\n", + "\n", + "#include \n", + "#include \n", + "#include \n", + "using namespace std;\n", + "\n", + "class Shape {\n", + "public:\n", + " struct Point;\n", + " Shape(int sides) : sides{ sides } {}\n", + " Shape(int sides, Point center) : sides{ sides }, center{ center } {}\n", + " virtual void draw(ostream& os) const = 0;\n", + " virtual string getSides() const { return to_string(sides); }\n", + " void moveBy(int dx, int dy) { center.x += dx; center.y += dy; }\n", + " const Point& getCenter() const { return center; }\n", + " virtual ~Shape() { cerr << \"~Shape()\\n\"; }\n", + " struct Point {\n", + " int x{}, y{};\n", + " };\n", + "private:\n", + " int sides;\n", + " Point center;\n", + "};\n", + "\n", + "ostream& operator<< (ostream& os, const Shape::Point& pt) {\n", + " return os << '(' << pt.x << ',' << pt.y << ')';\n", + "}\n", + "\n", + "class Triangle final : public Shape {\n", + "public:\n", + " Triangle(int side) : Shape{ 3 }, side{ side } {}\n", + " Triangle(int x, int y, int side) : Shape{ 3, {x, y} }, side{ side } {}\n", + " virtual void draw(ostream& os) const override {\n", + " os << \" /\\\\\\n/__\\\\\\nSide: \" << side << \"\\nAt: \" << getCenter() << '\\n';\n", + " }\n", + "private:\n", + " int side;\n", + "};\n", + "\n", + "class Circle : public Shape {\n", + "public:\n", + " Circle(int radius) : Shape{ 0 }, radius{ radius } {}\n", + " Circle(int x, int y, int radius) : Shape{ 0, {x, y} }, radius{ radius } {}\n", + " virtual void draw(ostream& os) const override final {\n", + " os << \" _\\n(_)\\nRadius: \" << radius << \"\\nAt: \" << getCenter() << '\\n';\n", + " }\n", + " virtual string getSides() const override final { return \"infinite\"; }\n", + "private:\n", + " int radius;\n", + "};\n", + "\n", + "class Rectangle : public Shape {\n", + "public:\n", + " Rectangle(int side_x, int side_y) : Shape{ 4 }, side_x{ side_x }, side_y{ side_y } {}\n", + " Rectangle(int x, int y, int side_x, int side_y)\n", + " : Shape{ 4, {x ,y} }, side_x{ side_x }, side_y{ side_y } {}\n", + " virtual void draw(ostream& os) const override {\n", + " os << \" ____\\n|____|\\nSize: \" << side_x << 'x' << side_y << \"\\nAt: \" << getCenter() << '\\n';\n", + " }\n", + "protected:\n", + " int side_x, side_y;\n", + "};\n", + "\n", + "class Square final : public Rectangle {\n", + "public:\n", + " Square(int side) : Rectangle{ side, side } {}\n", + " Square(int x, int y, int side) : Rectangle{ x, y, side, side } {}\n", + " virtual void draw(ostream& os) const override final {\n", + " os << \" _\\n|_|\\nSide: \" << side_x << \"\\nAt: \" << getCenter() << '\\n';\n", + " }\n", + "};\n", + "\n", + "int main() {\n", + " vector shapes;\n", + " shapes.push_back(new Circle{ 10 });\n", + " shapes.push_back(new Triangle{ 10, 20, 15 });\n", + " shapes.push_back(new Rectangle{ 10, 5 });\n", + " shapes.push_back(new Square{ 25, 100, 50 });\n", + " shapes[0]->moveBy(20, 50);\n", + "\n", + " for (auto& s : shapes) {\n", + " s->draw(cout);\n", + " cout << \"Sides: \" << s->getSides() << '\\n';\n", + " delete s;\n", + " s = nullptr;\n", + " }\n", + "}" + ] + }, + { + "cell_type": "markdown", + "id": "fbc28d94", + "metadata": {}, + "source": [ + "A lot of things to note about this program:\n", + "\n", + "* The definition of `Shape` contains two constructors, one pure-virtual member function, one virtual function, two non-virtual functions and a virtual destructor. It also defines `Point` as a local `struct`. In addition, two `private:` member variables are defined.\n", + "\n", + "* Both of the member variables are guaranteed to be initialized whenever a derived class calls either one of the `Shape` constructors.\n", + "\n", + "* The two non-virtual member functions are not meant to be redefined in derived classes, and provide functionality for both the member variables that `Shape` defines (member functions of a base class **cannot** access member functions or variables of a derived class).\n", + "\n", + "* To reduce code duplication, an overload of `operator<<` which handles `Shape::Point`s is provided (above the derived classes which use it).\n", + "\n", + "* All of the derived classes provide an implementation of `draw()`. In addition, `Circle` provides its own implementation of `getSides()`.\n", + "\n", + "* The definition of `Triangle` is the simplest of those which derive from `Shape` and represents an equilateral triangle; public inheritance needs to be specified as for all the other derived classes in this hierarchy. This class definition is qualified with the `final` keyword, which means that no class can derive from `Triangle` (it is therefore the \"final\" class of that inheritance \"branch\"). The constructors both call a `Shape` constructor. A single member variable `side` is defined which is used by the definition of `draw()`.\n", + "\n", + "* The definition of `Circle` is very similar to that of `Triangle`; this is a common theme with class heirarchies. It redefines `getSides()` in addition to defining and using a member variable `radius`.\n", + "\n", + "* The definition of `Rectangle` defines two `protected:` member variables which are initialized by both constructors and output by `draw()`.\n", + "\n", + "* The definition of `Square` is, as for `Triangle`, qualified with `final`, and inherits from `Rectangle` (instead of directly from `Shape`). Since `Rectangle`'s constructor calls that for `Shape`, neither of `Square`'s constructors need to call it directly. It can access the `protected:` member variables of `Rectangle` within its definition of `draw()`.\n", + "\n", + "* In `main()` a `std::vector` (vector of raw pointers to `Shape`) is created, and the populated with the return value from `new`; no intermediate pointer is used. (Since `Shape` is an abstract type it is not possible to create a `std::vector`, as these would need to be able to be default-initialized in order for the container to be created.)\n", + "\n", + "* The output from the range-for loop proves that polymorphism is being used, as the loop variable `s` is a (reference to a) pointer to the base class type of the hierarchy.\n", + "\n", + "* The member functions `draw()` and `getSides()` can be called from `main()` becuase they were declared `public:` in **all** of the base and derived classes.\n", + "\n", + "* The `Shape` objects are deleted as soon as they have been output; setting a pointer that is finished with to `nullptr` straightaway is good practice as it protects against the possibility of trying to access or delete a *dangling pointer*. Better practice still would be to use a `std::vector` of smart pointers.\n", + "\n", + "**Experiment:**\n", + "\n", + "* Try to create an object of type `Shape()` that would normally use the first constructor (which takes a single `int` parameter). What do you find?\n", + "\n", + "* Move all the calls to `getCenter()` into `main()`.\n", + "\n", + "* Write an overload of `operator<<` which handles `const Shape&`. Does this need to be a `friend` function? Decide whether you think this is neat, or just being too clever.\n", + "\n", + "* Remove the `&` from the range-for loop in `main()`. What is the (single, invisible) difference about the program?\n", + "\n", + "* Write a (`virtual`) destructor for all of the classes besides `Shape()`, observing how the output changes. What do you learn about order of destruction in a class hierarchy? What happens if you omit `Shape`s own destructor?\n", + "\n", + "* Try to derive an empty class from `Square`. What compilation error do you get?\n", + "\n", + "* Try removing the `const` qualifier from the overload of `getSides()` in `Circle`. Does the code still compile? What does this tell you about the effect of `const` on a member function's signature?\n", + "\n", + "* Try to derive from `Circle`. What happens if you try to overload `draw()`?\n", + "\n", + "* Consider the best way (least code duplication) to add a member function `getArea()` (which returns a `double`) to `Shape`, and implement this for all classes in the hierarchy.\n", + "\n", + "[^1]: Grady Booch, Robert A. Maksimchuk, Michael W. Engle, Bobbi J. Young, Jim Conallen, Kelli A. Houston *Object-Oriented Analysis and Design with Applications* (3rd ed. Pearson, 2007, ISBN-13: 9780201895513)\n", + "\n", + "*All text and program code ©2019-2025 Richard Spencer, all rights reserved.*" + ] + } + ], + "metadata": { + "jupytext": { + "cell_metadata_filter": "-all" + }, + "kernelspec": { + "display_name": "C++ 23", + "language": "c++", + "name": "cpp23" + } + }, + "nbformat": 4, + "nbformat_minor": 5 +} diff --git a/jupyter-notebooks/10-templates-exceptions-lambdas-smart-pointers.ipynb b/jupyter-notebooks/10-templates-exceptions-lambdas-smart-pointers.ipynb new file mode 100644 index 0000000..6a4bdc9 --- /dev/null +++ b/jupyter-notebooks/10-templates-exceptions-lambdas-smart-pointers.ipynb @@ -0,0 +1,1197 @@ +{ + "cells": [ + { + "cell_type": "markdown", + "id": "b1fedb81", + "metadata": {}, + "source": [ + "# Templates, Exceptions, Lambdas, Smart Pointers\n", + "\n", + "## Types as parameters\n", + "\n", + "The use of the word *generics* usually implies two things: types as parameters and compiler-generated code. We have already met this concept without introducing it in detail; in the following definition, the type specified within angle-brackets is the *class template type parameter* for `std::vector`, which is used to describe the element type for the container:" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "a6ceb620", + "metadata": {}, + "outputs": [], + "source": [ + "vector vd; // vd is an empty vector with its element type fixed" + ] + }, + { + "cell_type": "markdown", + "id": "affb1250", + "metadata": {}, + "source": [ + "What is not necessarily apparent is that the Standard Library does not actually contain a *specialization* of `vector` for the element type `double`. The code required for the *instance* `std::vector` is generated automatically by the compiler, and this code is then compiled. Whilst it is true that the type parameter is optional in some circumstances, it must still be able to be deduced somehow at compile-time in order for the *template* (the code for generic `std::vector`) to be *instantiated* (turned into compilable code) and then itself compiled:" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "c3830ec0", + "metadata": {}, + "outputs": [], + "source": [ + "vector vi{ 0, 1, 2, 3, 4 }; // vi is actually a vector, as deduced by the compiler" + ] + }, + { + "cell_type": "markdown", + "id": "131470ee", + "metadata": {}, + "source": [ + "Note that the types of all the elements in the `std::initializer_list` used to create `vi` must be the same, so that this type can be deduced unambiguously.\n", + "\n", + "Another way to understand generics is to compare them with other features that the language offers, such as function overloading. Consider a simple function called `average()` which returns the result of two `double`s (its two function parameters) added together and divided by two:" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "dc1dbf2b", + "metadata": {}, + "outputs": [], + "source": [ + "double average(double a, double b) {\n", + " return (a + b) / 2.0;\n", + "}" + ] + }, + { + "cell_type": "markdown", + "id": "32250a3d", + "metadata": {}, + "source": [ + "We can overload this function for `int`s without violating the ODR:" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "6c69d572", + "metadata": {}, + "outputs": [], + "source": [ + "int average(int a, int b) {\n", + " return (a + b) / 2;\n", + "}" + ] + }, + { + "cell_type": "markdown", + "id": "aa1ce9d4", + "metadata": {}, + "source": [ + "Notice that if we call `average()` with two `double`s, the return type is `double`. If we call it with two `int`s, the return type is `int`, possibly leading to rounding:" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "3005cc96", + "metadata": {}, + "outputs": [], + "source": [ + "auto a1 = average(3.5, 3.0); // a1 is a double with value 3.25\n", + "auto a2 = average(3, 4); // a2 is an int with value 3" + ] + }, + { + "cell_type": "markdown", + "id": "5f946f3b", + "metadata": {}, + "source": [ + "This can become unwieldy if averages of many different types are required (a new function needs to be written out for each one), and is inflexible (there is no way to specify the return type; this is not part of the function signature and so cannot be used to select which overloaded function is to be called).\n", + "\n", + "Let us rewrite `average()` as a function template (this code is probably not for production use as the Standard Library provides `std::midpoint`, although with this function the return type is always the same as for the parameters):" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "05c18533", + "metadata": {}, + "outputs": [], + "source": [ + "template \n", + "U average(const T& a, const T& b) {\n", + " return (a + b) / U{ 2 };\n", + "}\n", + "\n", + "auto a3 = average(3.5, 3.0); // a3 is a double with value 3.25 (as for overloaded function)\n", + "auto a4 = average(3, 4); // a4 is a double with value 3.5 (change from overloaded function)\n", + "auto a5 = average(3, 4); // a5 is an int with value 3 (as for overloaded function)\n", + "auto a6 = average(3.5, 3); // a6 is a double with value 3.25 (as for overloaded function)" + ] + }, + { + "cell_type": "markdown", + "id": "d4e15044", + "metadata": {}, + "source": [ + "A couple of things to note about this syntax:\n", + "\n", + "* The use of `int` or `double` as function parameter types is replaced by the *template type parameter* `T`; this can be deduced from the input parameter types. (Template type parameters are often named: `T`, `U`, or `V`, or sometimes `T1`, `T2`, and so on.)\n", + "\n", + "* The use of `int`or `double` as the return type is replaced by the template parameter `U`, which defaults to type `double`; it does not need to be specified explicitly in the call to the template function.\n", + "\n", + "* Both `a` and `b` **must** be of the same type unless the type is specified, otherwise the template specification is ambiguous. They are passed by `const`-reference in case they need to be used with (large, expensive to copy) user-defined types in the future (this type of usage should be anticipated when the generic form of the function is written).\n", + "\n", + "* The calculation is very similar to those in the non-template versions, except for the *constructor syntax* `U{ 2 }`, which forces promotion to be applied to the division if `U` is of floating-point type.\n", + "\n", + "## Variable, function and class templates\n", + "\n", + "The simplest type of template is the variable, here is an example:" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "95c5200c", + "metadata": {}, + "outputs": [], + "source": [ + "template \n", + "constexpr T pi{ 3.1415926536897932385L }; // note: long double literals end with L\n", + "\n", + "auto circ = pi * 2.0f * 1.5f; // circ is of type float\n", + "auto area = pi * 1.5 * 1.5; // area is of type double\n", + "auto pi2 = pi<> * 2.0L; // pi2 is of type long double" + ] + }, + { + "cell_type": "markdown", + "id": "094288b2", + "metadata": {}, + "source": [ + "Notice that triangular brackets are **always** necessary when dereferencing template variables (which may be empty if a default type is specified as it is here), however explicit narrowing casts are not needed. The specializations `pi` and `pi` are useful where automatic promotion of the floating-point type in an expression is not desired.\n", + "\n", + "Template functions can be specified with one or more type parameters, as we have seen. Here is an example function `minimum()`, which returns the smallest of two values (production code could use `std::min` from the Standard Library):" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "787c4a25", + "metadata": {}, + "outputs": [], + "source": [ + "template \n", + "T minimum(const T& a, const T& b) {\n", + " return (a < b) ? a : b;\n", + "}\n", + "\n", + "auto m1 = minimum(3, 2.5); // Error! minimum or minimum?\n", + "auto m2 = minimum(-2, 1); // m2 is an int with value -2\n", + "auto m3 = minimum(-5.5, -6.5); // m3 is a double with value -6.5\n", + "auto m4 = minimum(3.0, 4) // m4 is a double with value 3" + ] + }, + { + "cell_type": "markdown", + "id": "d0d2cd62", + "metadata": {}, + "source": [ + "Notice that we do not have to specify a type for `T` explicitly unless the deduction from the supplied arguments would be ambiguous (which is the case if the types of the two function arguments are different).\n", + "\n", + "Template classes typically have one or more members of the template type. Here is an example class which holds a type `T` (as a member variable) and a `bool` (which indicates whether the value is valid)." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "d452588c", + "metadata": {}, + "outputs": [], + "source": [ + "template \n", + "class Opt {\n", + " bool valid{ false };\n", + " T value;\n", + "public:\n", + " Opt() = default;\n", + " Opt(const T& value) : value{ value }, valid{ true } {}\n", + " Opt& operator= (const T& new_value) {\n", + " value = new_value;\n", + " valid = true;\n", + " return *this;\n", + " }\n", + " bool hasValue() const {\n", + " return valid;\n", + " }\n", + " const T& get() const {\n", + " if (!valid) {\n", + " throw;\n", + " }\n", + " else {\n", + " return value;\n", + " }\n", + " }\n", + "};\n", + "\n", + "auto o1 = Opt{ 1.2 }; // T = double, valid = true\n", + "auto o2 = Opt{ 3 }; // T = int, valid = true\n", + "auto o3 = Opt{}; // T = char, valid = false\n", + "auto o4 = Opt{}; // T = size_t, valid = false" + ] + }, + { + "cell_type": "markdown", + "id": "0c275758", + "metadata": {}, + "source": [ + "Some things to note about this program:\n", + "\n", + "* A default type for `T` is required as we make use of a defaulted default-constructor; `char` was chosen as the smallest type (`void` may be in theory preferrable, but cannot be used as the compiler would encounter the construct `void value` when instantiating the class and produce an error).\n", + "\n", + "* The other constructor matches `T` from the type of `value`, storing this in the member variable `value`, and also sets `valid` to `true`.\n", + "\n", + "* The definition of `operator=` allows us to (re-)define a value (but not its type) that the `Opt` class will hold.\n", + "\n", + "* Calling member function `hasValue()` is always safe, yielding a `bool`. Calling `get()` on an `Opt` with no value immediately terminates the program (the keyword `throw` is explained later in this Chapter).\n", + "\n", + "Of course, this simple class is of limited practical use; if you need a type to be considered optionally valid without using a \"special\" value to indicate this, then making use of `std::optional` from the Standard Library is recommended.\n", + "\n", + "Member functions can be template functions, too. The following program defines a `Stringy` class with a `std::string` member, which can be initialized from another `std::string`, a `std::string_view` or a `const char *`:" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "c844e12c", + "metadata": {}, + "outputs": [], + "source": [ + "class Stringy {\n", + " string str;\n", + "public:\n", + " template explicit Stringy(T&& str)\n", + " : str{ str } {}\n", + " string get() const { return str; }\n", + "};\n", + "\n", + "Stringy sy1{ \"Star\" }; // initialize from const char *\n", + "Stringy sy2{ \"Wars\"s }; // initialize from std::string\n", + "Stringy sy3{ \"Trilogy\"sv }; // initialize from std::string_view\n", + "Stringy sy4{ 'V' }; // initialize from char\n", + "Stringy sy5{ 5 }; // Error! Attempt to narrow from int to char" + ] + }, + { + "cell_type": "markdown", + "id": "444680b4", + "metadata": {}, + "source": [ + "Notice that the constructor (only) is defined with both `template` and `explicit`, meaning a new constructor is (attempted to be) generated when called with different types, and takes an r-value reference `T&&`. A function taking an r-value reference promises not to modify it; it can also be safely used with temporaries (such as `\"Hello\"s + \" World\"`) and is efficient as the temporary is not copied. (An optimization to use `std::move` when called with a `std::string` (only) r-value is a possiblilty here, however this would entail writing a second `explicit` constructor.)\n", + "\n", + "## Standard exceptions, try, throw and catch\n", + "\n", + "Exceptions are a means of altering program flow (at run-time) and *propagating* error conditions from a callee (sub-)function to its caller function (potentially as far back as `main()`, thus bypassing the usual function return mechanism). Program flow is interrupted at the point where an exception is *thrown*, and resumes at the point the exception is *caught*, which is always within the scope of a caller function (again, possibly `main()`, the beginning of the function call stack). Any code designed to handle an exception being thrown is contained within a *try-block*; this is a block of code enclosed in curly braces immediately after the `try` keyword. This try-block **is** allowed to make function/method calls, implicitly enclosing these within the try-block scope. (Any exceptions thrown from functions declared `noexcept`, or thrown from outside of a try-block's scope will terminate the program.)\n", + "\n", + "An exception is thrown by using the `throw` keyword, followed by the object to be thrown. (If no object is specified then `std::terminate` is called, as for `noexcept` functions.) Usually, you will want to throw an instance of the `std::exception` hierarchy, although **any** user-defined or built-in type can be thrown.\n", + "\n", + "An exception is caught by a catch-block immediately following the try-block and `catch` statement. There can be multiple consecutive catch-blocks and the order of these **is** significant; the first type-matching catch-block (in the case of a class hierarchy this is the base class) will be entered. The caught object should be named by reference (for example, `std::exception&`) and this becomes the *current exception*. The `throw` keyword by itself has a special meaning in the context of a catch-block, where it means to rethrow the current exception object further back down the function call stack.\n", + "\n", + "The following program demonstrates use of the keywords `try`, `throw` and `catch`:" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "affe5ae4", + "metadata": {}, + "outputs": [], + "source": [ + "// 10-throw1.cpp : simple exception demonstration, throw and catch\n", + "\n", + "#include \n", + "#include \n", + "using namespace std;\n", + "\n", + "template \n", + "void getInteger(T& value) {\n", + " cout << \"Please enter an integer (0 to throw): \";\n", + " cin >> value;\n", + " if (!value) {\n", + " throw exception{};\n", + " }\n", + "}\n", + "\n", + "int main() {\n", + " long long v{};\n", + " try {\n", + " getInteger(v);\n", + " }\n", + " catch (...) {\n", + " cerr << \"Caught exception!\\n\";\n", + " return 1;\n", + " }\n", + " cout << \"Got value: \" << v << '\\n';\n", + "}" + ] + }, + { + "cell_type": "markdown", + "id": "28e02e9b", + "metadata": {}, + "source": [ + "Some new features of C++ introduced by this program:\n", + "\n", + "* The `getInteger()` function prompts for an input `value`, and throws a `std::exception` if zero is entered.\n", + "\n", + "* Due to the fact that it is a function template, the variable `value` is returned by a reference parameter, so that its type can be automatically deduced.\n", + "\n", + "* Within `main()` the variable `v` must be defined outside the try-block as its value is needed after the end of the catch-block.\n", + "\n", + "* The try-block has just one statement, the call to `getInteger()`. If an exception is thrown by this function it is caught by the catch-block which follows immediately.\n", + "\n", + "* The catch-block begins with an ellipsis `(...)` meaning \"catch any type\". Without the `return` statement (which causes early exit from `main()`) control flow would fall through to the first line after the catch-block.\n", + "\n", + "**Experiment:**\n", + "\n", + "* Omit the line with the `return` statement in the catch-block. Does the program work as expected? Is the output a valid value?\n", + "\n", + "* Now change the type of the function `getInteger()` to return `T` by **value**. What other change needs to be made?\n", + "\n", + "* Try to catch the `std::exception` by **reference**, calling the variable `e`, **instead** of utilizing an ellipsis.\n", + "\n", + "* Now experiment with by-value and by-pointer catching. Hint: the second of these will require `throw new ...`.\n", + "\n", + "The Standard Library `std::exception` class is designed to be inherited from, and in fact the Standard Library includes an extensive class hierarchy with `std::exception` as the base class. Almost always, you will want any custom exception classes you derive to inherit from `std::exception`; this implies a constructor taking a `const char *` or `const std::string&` (**not** a `std::string_view`) which initializes a (`private:`) member which can be examined in the catch-block using the member function `what()`.\n", + "\n", + "Catching exceptions by reference means that there is no possibility of *slicing*; this is where an object of derived class type is truncated to the size of its base class when passed/thrown by value. Catching by reference also means that there is no possiblity of a memory leak, as is the case with catching a pointer. There may be multiple catch-blocks following the try-block, each introducing its own scope, and the first (and only) one of these which matches the type thrown is entered. For this reason catch-blocks should be organized with the derived class(es) first; your compiler will probably warn you if a base class precedes a derived class in the catch-block order.\n", + "\n", + "Exception support is useful even in small- to medium-sized projects, however two things need to be recognized: firstly, there is no way to return control flow back up the function stack to where the exception was originally thrown; and secondly, exception handling and use adds a significant performance overhead. In code required to be as fast as possible, exceptions should not be thrown and the functions should be declared with the `noexcept` keyword; this disables exception support, meaning that any use of `throw` simply calls `std::terminate`.\n", + "\n", + "The following program defines a simple *event loop* which waits for user input and performs various actions:" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "e6b7c9e0", + "metadata": {}, + "outputs": [], + "source": [ + "// 10-throw2.cpp : throw and catch exceptions from within and outside std::exception hierarchy\n", + "\n", + "#include \n", + "#include \n", + "#include \n", + "using namespace std;\n", + "\n", + "int throwing() {\n", + " cout << 1+R\"(\n", + "Please choose:\n", + "1) throw std::runtime_error\n", + "2) throw std::exception\n", + "3) throw int\n", + "4) quit\n", + "Enter 1-4: )\";\n", + " int option;\n", + " cin >> option;\n", + " switch(option) {\n", + " case 1:\n", + " throw runtime_error{\"std::runtime_error thrown\"};\n", + " case 2:\n", + " throw exception{};\n", + " case 3:\n", + " throw 99;\n", + " case 4:\n", + " return 1;\n", + " default:\n", + " cout << \"Error: unrecognized option\\n\";\n", + " }\n", + " return 0;\n", + "}\n", + "\n", + "int main() {\n", + " for (;;) {\n", + " int action{};\n", + " try {\n", + " action = throwing();\n", + " }\n", + " catch (runtime_error& e) {\n", + " cerr << \"Caught std::runtime_error! (\" << e.what() << \")\\n\";\n", + " }\n", + " catch (exception& e) {\n", + " cerr << \"Caught std::exception!\\n\";\n", + " }\n", + " catch (...) {\n", + " cerr << \"Caught something other than std::exception! Quitting.\\n\";\n", + " return 1;\n", + " }\n", + " if (action) {\n", + " break;\n", + " }\n", + " }\n", + "}" + ] + }, + { + "cell_type": "markdown", + "id": "246831c3", + "metadata": {}, + "source": [ + "A few new things to note about this program:\n", + "\n", + "* The function `throwing()` always returns control to `main()` from all paths whatever the user inputs.\n", + "\n", + "* It works by throwing a `std::runtime_error` (which is derived from `std::exception`), throwing a plain `std::exception`, throwing an `int` or returning an `int`.\n", + "\n", + "* There is no need for `break` statements within the `switch` as no `case:` conditions can fall through (except for `default:`, which never needs `break` as it should be the final clause).\n", + "\n", + "* The order of the catch-blocks is significant, with ellipsis last and (derived class) `std::runtime_error` first.\n", + "\n", + "**Experiment:**\n", + "\n", + "* Modify `main()` so that it returns to the environment the value thrown as `int`.\n", + "\n", + "* Create a class type derived from `std::runtime_error` called `FatalError`; it should have an `int` as a `public:` data member which is the value returned to the environment from `main()` when a `FatalError` is caught there. Write a suitable catch-block in the correct place, and modify the function `throwing()`.\n", + "\n", + "* Now make the `int` a `private:` data member, and utilize a getter function called `getRC()`.\n", + "\n", + "## Function objects\n", + "\n", + "It is possible to overload the *function call operator* `operator()` for `struct`s and `class`es; this enables objects created from them to masquerade as functions. Sometimes these objects are called *functors*; essentially this means that they are *callable* in the same sense as free functions, member functions and *lambdas* (which are discussed later in this Chapter). The term functor can be used to describe both the `struct` or `class` definition and the instance objects it creates.\n", + "\n", + "The following program demonstrates overloading `operator()` (it is also possible to overload on several different parameter types, if needed):" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "e1703bd5", + "metadata": {}, + "outputs": [], + "source": [ + "// 10-functor1.cpp : simple function object demonstration\n", + "\n", + "#include \n", + "using namespace std;\n", + "\n", + "struct Average {\n", + " int operator()(int a, int b) {\n", + " cout << \"Calculating average...\\n\";\n", + " return (a + b) / 2;\n", + " }\n", + "};\n", + "\n", + "int main() {\n", + " Average a;\n", + " cout << \"Please enter two integers:\\n\";\n", + " int x{}, y{};\n", + " cin >> x >> y;\n", + " auto avg = a(x, y);\n", + " cout << \"The average is: \" << avg << '\\n';\n", + "}" + ] + }, + { + "cell_type": "markdown", + "id": "aef7ae72", + "metadata": {}, + "source": [ + "**Experiment:**\n", + "\n", + "* Define another functor which calculates the average of two `double`s, giving it a different name.\n", + "\n", + "* Move these functor definitions to within `main()`. Does the code still compile?\n", + "\n", + "More usefully, functors can store state in data members, preserving it between (object-as-function) calls. The following program shows a function object definition which can calculate the (running) minimum, maximum and average of a `std::vector`; in fact any container type for which `begin()` and `end()` are defined could be used:" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "3298cc6c", + "metadata": {}, + "outputs": [], + "source": [ + "// 10-functor2.cpp : function object maintaining state\n", + "\n", + "#include \n", + "#include \n", + "#include \n", + "using namespace std;\n", + "\n", + "struct MinMaxAvg {\n", + " void operator()(int i) {\n", + " if (first) {\n", + " min = max = avg = i;\n", + " first = false;\n", + " return;\n", + " }\n", + " if (i < min) {\n", + " min = i;\n", + " }\n", + " if (i > max) {\n", + " max = i;\n", + " }\n", + " avg = ((avg * num) + i) / (num + 1);\n", + " ++num;\n", + " }\n", + " int min, max, num{ 1 };\n", + " double avg;\n", + " bool first{ true };\n", + "};\n", + "\n", + "int main() {\n", + " vector v{ 3, 5, 2, 6, 2, 4 };\n", + " MinMaxAvg f = for_each(begin(v), end(v), MinMaxAvg{});\n", + " cout << \"Min: \" << f.min << \" Max: \" << f.max\n", + " << \" Avg: \" << f.avg << \" Num: \" << f.num << '\\n';\n", + "}" + ] + }, + { + "cell_type": "markdown", + "id": "e581c54e", + "metadata": {}, + "source": [ + "A few points to note about this program:\n", + "\n", + "* Only `num` and `first` are required to be set before the `std::for_each()` call; we have used universal initialization of the member variables, but this could also be achieved by using a (default-)constructor.\n", + "\n", + "* The assignment of `f` (a `MinMaxAvg` function object) is the result of the call to `std::for_each()`, being the modified (default-constructed) third parameter.\n", + "\n", + "* The function template `std::for_each()` call decomposes to the equivalent of: `auto f = MinMaxAvg{}; f(3); f(5); f(2); f(6); f(2); f(4);`. Of course, a range-for loop could be used to accomplish the same thing, but the *logic* within the functor's `operator()` would have to be written (or repeated) within the body of the loop.\n", + "\n", + "**Experiment:**\n", + "\n", + "* Replace `vector` with `auto`. Does the code still compile? What type is `v`?\n", + "\n", + "* Replace `auto v` with `const int v[] =`. Does the code still compile now?\n", + "\n", + "* Turn this functor into a template which can be instantiated with different types for `min`/`max` and `avg`.\n", + "\n", + "## Lambdas\n", + "\n", + "A *lambda* (or sometimes \"lambda function\"), is a callable entitiy which works much like a free function, whilst having some of the properties of a local variable. Unlike a `constexpr` function it can access global state, such as `cout`. Unlike free functions, they can access (or *capture*) variables from within the scope they are called from. Lambdas do have a lot in common with functors; in fact, compilers will implement lambdas by using a functor \"behind the scenes\". Interestingly, it has been said that if you fully understand C++ lambdas, you also understand much of C++, so knowledge of them is very useful.\n", + "\n", + "Lambda expressions begin with an opening square bracket `[` (in a different context to array syntax), while the body of the lambda is enclosed within curly braces, as for a regular function. An **optional** parameter list enclosed in parentheses goes between the closing square bracket and the opening curly brace. The minimalist lamda is therefore simply `[]{}`. (The return type being `void` in this case is implicit.)\n", + "\n", + "The following program demonstrates a very simple lambda:" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "7b730671", + "metadata": {}, + "outputs": [], + "source": [ + "// 10-lambda1.cpp : simple lambda which produces output\n", + "\n", + "#include \n", + "using namespace std;\n", + "\n", + "int main() {\n", + " auto l = []{ cout << \"Lambda says Hi!\\n\"; };\n", + " \n", + " l();\n", + "}" + ] + }, + { + "cell_type": "markdown", + "id": "1faf56a3", + "metadata": {}, + "source": [ + "A few points to note:\n", + "\n", + "* Lambdas can be assigned to variables, in this case the variable `l`. (Actually, other types of function can be assigned to variables too, using the `&` address-of operator.)\n", + "\n", + "* The type of the lambda reflects its function signature (the combination of its parameter and return types). Rather than try to determine this manually, the use of `auto` here is almost universal.\n", + "\n", + "* A semi-colon is necessary after both statement(s) in the body of the lambda **and** after the closing curly brace of the body (as with a `struct` or `class` definition).\n", + "\n", + "* Lambdas can be easily identified where the sequence `= [` occurs, which is different (to both compilers and humans) from array subscript syntax.\n", + "\n", + "* A lambda is *invoked* by stating its name with (possibly empty) parentheses and a semi-colon, with exactly the same semantics as a free function call.\n", + "\n", + "**Experiment:**\n", + "\n", + "* Add an empty parameter list between the closing square bracket and opening curly brace of the lambda definition. Does the program still compile?\n", + "\n", + "* Omit the line `l();` and try to compile the program. What is its output now?\n", + "\n", + "* Define and assign `l` on separate lines, omitting the use of `auto`. Hint: there are at least four ways of doing this, some more tricky than others; they involve using: `typedef`, `using`, `std::function` or `decltype`. You may need to do some research.\n", + "\n", + "The following program calls its lambda with a parameter, altering its output:" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "01a46e6e", + "metadata": {}, + "outputs": [], + "source": [ + "// 10-lambda2.cpp : another simple lambda which produces output\n", + "\n", + "#include \n", + "#include \n", + "using namespace std;\n", + "\n", + "int main() {\n", + " auto l = [](string_view s){ cout << \"Lambda says \" << s << '\\n'; };\n", + " \n", + " l(\"Hola\");\n", + "}" + ] + }, + { + "cell_type": "markdown", + "id": "47f07bae", + "metadata": {}, + "source": [ + "A couple of things to note about this program:\n", + "\n", + "* The parameter list is non-empty and takes the same format as that of a regular free or member function.\n", + "\n", + "* Parameters can be accepted by value or reference.\n", + "\n", + "* The match between argument `\"Hola\"` and parameter `string_view s` is performed at compile-time.\n", + "\n", + "**Experiment:**\n", + "\n", + "* Change the program to accept `s` by reference. What other (small) change needs to be made?\n", + "\n", + "* Change the program so `l` is called with run-time user input.\n", + "\n", + "* Try to implement this lambda as a functor.\n", + "\n", + "* Change `string_view` to `auto`. Does the program still compile? What is the type of `s` now?\n", + "\n", + "* Try to implement this new version of the lambda as a functor. (Hint: you will need to use template syntax to do this properly.) Experiment with both this functor and the lambda version using different literal types (not just string-like types).\n", + "\n", + "Lambdas can return a value in the same way as for free and member functions; the type of this value is deduced from the `return` statement(s) in the body of the lambda. (Lambdas **can** have more than one flow and return path, but all `return` statements must have the same type.) When defining the lambda after an `auto` keyword, no change needs to be made in order to specify that the lambda returns a value.\n", + "\n", + "The following program implements a lambda which returns the average of two numbers:" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "424050d1", + "metadata": {}, + "outputs": [], + "source": [ + "// 10-lambda3.cpp : lambda function which calculates average of two values\n", + "\n", + "#include \n", + "using namespace std;\n", + "\n", + "int main() {\n", + " auto l = [](int a, int b) {\n", + " cout << \"Calculating average...\\n\";\n", + " return (a + b) / 2;\n", + " };\n", + "\n", + " cout << \"Please enter two integers:\\n\";\n", + " int x{}, y{};\n", + " cin >> x >> y;\n", + " auto avg = l(x, y);\n", + " cout << \"The average is: \" << avg << '\\n';\n", + "}" + ] + }, + { + "cell_type": "markdown", + "id": "eb56f5cb", + "metadata": {}, + "source": [ + "A few things to note about this program:\n", + "\n", + "* This program is essentially an adaptation of `10-functor1.cpp`; it may be valuable to review these two programs side-by-side.\n", + "\n", + "* The lambda `l` is defined over multiple lines; whitespace conventions for doing this vary, however the closing curly brace and semi-colon are often on a line by themselves\n", + "\n", + "* The return type of `l(x, y);` is stored in the variable `avg` which is declared `auto`. It is important to recognize that this usage is different from that of `l`, also being declared `auto`.\n", + "\n", + "**Experiment:**\n", + "\n", + "* Change the lambda calculation to `/ 2.0`. Which variable has its type changed as a result?\n", + "\n", + "* Instead, change the parameter list in the lambda to declare `a` and `b` as `auto`. Does this alter the behavior of the program when the average would be fractional?\n", + "\n", + "* Now change the type definition of `x` and `y` to `double`. Experiment with integer and fractional numbers and results. Hint: you should also change the `cout` message. What does this tell you about *generic lambdas* such as this?\n", + "\n", + "All of the lambdas introduced so far have been *stateless lambdas*. Whilst not necessarily pure functions (they can modify global state), they have only been able to operate on the parameters passed in. They have also only been able to return a single entity (although this could usefully be a `std::pair`, `std::tuple` or `struct`).\n", + "\n", + "It is possible to enable lambdas to read from and write to variables in the (immediately) enclosing scope; this is the scope of the function in which the lambda is defined. Two things worthy of note stem from this: firstly, variables captured in this way are analogous to member variables of a functor, and secondly, the scope in which a lambda is defined is not necessarily the same as the one in which it is called. Care must be taken when capturing variables by reference not to cause dangling references; the captured variable must not have passed out of scope when the lambda is invoked.\n", + "\n", + "The following program revisits the second functor example, calculating the minimum, maximum and mean average of the elements in a container:" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "3b673333", + "metadata": {}, + "outputs": [], + "source": [ + "// 10-lambda4.cpp : lambda accessing scoped variables by value and reference\n", + "\n", + "#include \n", + "#include \n", + "#include \n", + "using namespace std;\n", + "\n", + "int main() {\n", + " vector v{ 3, 5, 2, 6, 2, 4 };\n", + " int min, max, num{ 1 };\n", + " double avg;\n", + " bool first{ true };\n", + " \n", + " auto l = [&](int i) {\n", + " if (first) {\n", + " min = max = avg = i;\n", + " first = false;\n", + " return;\n", + " }\n", + " if (i < min) {\n", + " min = i;\n", + " }\n", + " if (i > max) {\n", + " max = i;\n", + " }\n", + " avg = ((avg * num) + i) / (num + 1);\n", + " ++num;\n", + " };\n", + "\n", + " for_each(begin(v), end(v), l);\n", + " cout << \"Min: \" << min << \" Max: \" << max\n", + " << \" Avg: \" << avg << \" Num: \" << num << '\\n';\n", + "}" + ] + }, + { + "cell_type": "markdown", + "id": "05e3bd34", + "metadata": {}, + "source": [ + "A few important things to note about this program:\n", + "\n", + "* This program is essentially an adaptation of `10-functor2.cpp`; it may be valuable to review these two programs side-by-side.\n", + "\n", + "* The former member variables of the functor have been defined before the lambda in the same scope as the container.\n", + "\n", + "* The ampersand inside the square brackets before the lambdas parameter list (`[&]`) indicates by-reference capture, which in this case means that all the variables in the scope of `main()` are accessible within the body of the lambda. (This includes `v` but since no copy is made this does not harm performance. It is possible to specify exactly which variables to capture, using a *capture list* `[&min,&max,&num,&avg,&first]` in order to avoid capturing `v`.)\n", + "\n", + "* The body of the lambda and its parameter list are both the same as for the overloaded `operator()` of the functor.\n", + "\n", + "* The lambda object `l` can be passed into `for_each()` by value (`std::ref` is not needed).\n", + "\n", + "* The variables `min`, `max`, `avg` and `num` can be accessed directly after the lambda has modified them.\n", + "\n", + "**Experiment:**\n", + "\n", + "* Change the parameter list of the lambda to accept a variable declared `auto`. Does the program still compile and run?\n", + "\n", + "* Change the types of `min` and `max` to double, along with the elements assigned to `v`.\n", + "\n", + "* Change the capture to by-value (`[=]`). Does the program compile now?\n", + "\n", + "* Change the capture to be empty. What error messages do you get?\n", + "\n", + "* Define and assign to a second `std::vector` named `v2`. What other changes need to be made to the program in order to be able to call `std::for_each()` again for `v2`?\n", + "\n", + "## Smart pointers\n", + "\n", + "Smart pointers are entities which bind heap objects to scoped lifetimes. The advantage over using \"plain\" (or \"naked\") `new`/`delete` is that all return paths from a function or sub-scope are automatically covered, even if an exception is thrown. There are three C++ smart pointer classes, they are: `std::unique_ptr`, `std::shared_ptr` and `std::weak_ptr`.\n", + "\n", + "The simplest smart pointer is `std::unique_ptr` which encapsulates the most common functionality associated with a raw pointer, that is *exclusive ownership*. The following program, which does not delete `p2` if called with arguments, demonstrates its use:" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "aa24757b", + "metadata": {}, + "outputs": [], + "source": [ + "// 10-smartptr1.cpp : use of unique_ptr\n", + "\n", + "#include \n", + "#include \n", + "#include \n", + "#include \n", + "using namespace std;\n", + "\n", + "class Simple {\n", + " string str;\n", + "public:\n", + " Simple(string_view s) : str{s}\n", + " { cout << \"Simple(): \" << str << '\\n'; }\n", + " ~Simple()\n", + " { cout << \"~Simple(): \" << str << '\\n'; }\n", + "};\n", + "\n", + "int main(int argc, const char *argv[]) {\n", + " unique_ptr p1{ new Simple(\"p1\") };\n", + " Simple *p2 = new Simple(\"p2\");\n", + " {\n", + " auto p3 = make_unique(\"p3\");\n", + " if (argc > 1) {\n", + " return 1;\n", + " }\n", + " delete p2;\n", + " p2 = nullptr;\n", + " }\n", + "}" + ] + }, + { + "cell_type": "markdown", + "id": "6bd795be", + "metadata": {}, + "source": [ + "A few things to note about this program:\n", + "\n", + "* A `std::unique_ptr` is initialized with a pointer to a heap object. The pointer type needs to be provided in case it needs to be a (different) base class type, such as a `std::unique_ptr` initialized with a `new Triangle()`.\n", + "\n", + "* This initialization has a direct analogy with initialization of raw pointers.\n", + "\n", + "* An alternative way to create a `std::unique_ptr` is by using the helper function `std::make_unique` as used to create `p3`. This is the preferred way in many cases due to its exception safety, so you will find this in code.\n", + "\n", + "* A `std::unique_ptr` created within a sub-scope is destroyed at the end of the sub-scope. Otherwise, as with all stack objects, they are destroyed in the reverse order in which they were created.\n", + "\n", + "**Experiment:**\n", + "\n", + "* Modify the above program so that `p2` is always deleted, regardless of the value of `argc`. Hint: keep it as a raw pointer.\n", + "\n", + "* Now modify the program so that a `std::exception` is thrown if `argc > 1`. Are the destructors called?\n", + "\n", + "* Modify the program `09-shape.cpp` to use a `vector>`, making any other necessary changes.\n", + "\n", + "It is possible to specify *custom deleters* for `std::unique_ptr`s; which can be any callable object which encapsulates the correct behavior to destroy the object. The following example demonstrates this for a `FILE*` (in case you didn't know, the C library functions `fopen()` and `fclose()` return and accept a `FILE*` pointer):" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "e70bcb05", + "metadata": {}, + "outputs": [], + "source": [ + "// 10-smartptr2.cpp : encapsulate a FILE* in a unique_ptr\n", + "\n", + "#include \n", + "#include \n", + "#include \n", + "using namespace std;\n", + "\n", + "int main(int argc, const char *argv[]) {\n", + " if (argc != 2) {\n", + " cerr << \"Syntax: \" << argv[0] << \" \\n\";\n", + " return 1;\n", + " }\n", + "\n", + " unique_ptr fp{ fopen(argv[1], \"rb\"), fclose };\n", + "\n", + " if (fp) {\n", + " int c;\n", + " while ((c = fgetc(fp.get())) != EOF) {\n", + " putchar(c);\n", + " }\n", + " }\n", + "}" + ] + }, + { + "cell_type": "markdown", + "id": "cd545911", + "metadata": {}, + "source": [ + "A couple of things to note about this program:\n", + "\n", + "* Here `std::make_unique` cannot be used, nor can the type of the `std::unique_ptr` be deduced automatically. Also, we have to specify the type of the deleter explicitly: `decltype(&fclose)` provides us with a function pointer type.\n", + "\n", + "* The member function `get()` is used to access the raw pointer needed for the call to the C Library function `fgetc()`.\n", + "\n", + "The member function `reset()` changes the object owned by the `std::unique_ptr`; calling `reset(nullptr)` releases and destroys the object early. Also, `std::unique_ptr`s cannot be copied as this would make no semantic sense (a deep copy cannot be initiated by a pointer-to-object, and a shallow copy would mean either shared ownership or dangling pointers). They can however be moved, either explicitly using `std::move` or as a return value from a function (they are very useful as return types for factory functions).\n", + "\n", + "The next smart pointer type is `std::shared_ptr`; this allows an object to become *reference counted* and only deletes it when the **last** pointer referring to it goes out of scope. The following program creates a `Simple` object in a sub-scope, yet destroys it in an outer scope:" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "2bcb0b82", + "metadata": {}, + "outputs": [], + "source": [ + "// 10-smartptr3.cpp : use of shared_ptr\n", + "\n", + "#include \n", + "#include \n", + "using namespace std;\n", + "\n", + "class Simple {\n", + " string str;\n", + "public:\n", + " Simple(string_view s) : str{s}\n", + " { cout << \"Simple(): \" << str << '\\n'; }\n", + " ~Simple()\n", + " { cout << \"~Simple(): \" << str << '\\n'; }\n", + "};\n", + "\n", + "int main() {\n", + " cout << \"main(): 1\\n\";\n", + " shared_ptr p1{ new Simple(\"p1\") };\n", + " cout << \"main(): 2\\n\";\n", + " {\n", + " cout << \"main(): 3\\n\";\n", + " auto p2 = make_shared(\"p2\");\n", + " cout << \"main(): 4\\n\";\n", + " p1 = p2;\n", + " cout << \"main(): 5\\n\";\n", + " }\n", + " cout << \"main(): 6\\n\";\n", + "}" + ] + }, + { + "cell_type": "markdown", + "id": "2b513c1f", + "metadata": {}, + "source": [ + "A few things to note about this program:\n", + "\n", + "* Every other statement in `main()` produces output, so the exact workings of `std::shared_ptr` are demonstrated.\n", + "\n", + "* The use of `std::make_shared` is shown as an alternative to using a raw pointer to initialize a `std::shared_ptr`.\n", + "\n", + "* Firstly, `p1` is created in the scope of `main()`.\n", + "\n", + "* Secondly, `p2` is created in a sub-scope.\n", + "\n", + "* Thirdly, `p2` is assigned to `p1`, thus object `\"p1\"` is deleted. Also, the scope of `p2` is **extended** from the sub-scope to that of `main()`.\n", + "\n", + "* Then, the sub-scope exits, destroying `p2`, however the object it points to says alive becuase `p1` points to it.\n", + "\n", + "* Finally, `main()` exits, destroying `p1` and `p2`. Thus `\"p1\"` and `\"p2\"` are destroyed in the **same** order in which they were initialized, unlike for `std::unique_ptr` where it would always be in reverse order.\n", + "\n", + "Any `std::shared_ptr` object can be passed by **value** to a function, implying a copy of the `std::shared_ptr` and a sharing of ownership. Also a container of `std::shared_ptr`s can share ownership with named `std::shared_ptr`s, or even another container of `std::shared_ptr`s.\n", + "\n", + "Some programming tasks involve use of pointers, often in containers, where the pointee needs to point back to the pointer. Use of `std::shared_ptr` may be unsuitable in this case becuase of the *dependency cycle* created. The key symptom of this is objects not being deleted within the lifetime of the program because the reference count cannot drop to zero for either the pointer or pointee. An example of subtly incorrect code is shown in the program fragment below:" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "20ef7aa5", + "metadata": {}, + "outputs": [], + "source": [ + "struct Pupil; // forward declaration to allow shared_ptr in definition of Class\n", + "\n", + "struct Class {\n", + " int room;\n", + " string subject, teacher_name;\n", + " vector> pupils;\n", + "};\n", + "\n", + "struct Pupil {\n", + " string name;\n", + " vector> subjects; // compiles but is incorrect!\n", + "};\n", + "\n", + "vector> AllClasses;\n", + "vector> AllPupils;" + ] + }, + { + "cell_type": "markdown", + "id": "1f1d2710", + "metadata": {}, + "source": [ + "This code will probably compile without a warning being issued, and `Class` and `Pupil` objects can be created and made to point to each other. However, when the containers `AllClasses` and `AllPupils` are destroyed or go out of scope, this does not cause the destructors of the `Class` and `Pupil` objects to be called correctly; this is caused by the semantics being wrong as a `Class` **cannot** \"own\" its `Pupil`s if the `Pupil`s **also** \"own\" the `Class`. Luckily there is a third smart pointer type `std::weak_ptr`, a non-owning smart pointer which can be initialized from a `std::shared_ptr`. A `std::weak_ptr` cannot be derefenced directly, but has a member function `lock()` which returns a suitable `std::shared_ptr` (within the scope of the call to `lock()`) which **can** be dereferenced. The change to the code is simple (assuming that `Class` is desired to own its `Pupil`s, rather than the other way about), and is shown below:" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "84b31302", + "metadata": {}, + "outputs": [], + "source": [ + "struct Pupil {\n", + " string name;\n", + " vector> subjects;\n", + "};" + ] + }, + { + "cell_type": "markdown", + "id": "f3c694d4", + "metadata": {}, + "source": [ + "The corrected sample code is replicated in the complete program shown below; lambdas have been shown (instead of `friend` or `static` member functions) as ways to create and manipulate the `Class` and `Pupil` types, thus the program has only two global `struct` definitions and a fairly large `main()` function:" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "56916ee1", + "metadata": {}, + "outputs": [], + "source": [ + "// 10-pupils.cpp : use of shared_ptr and weak_ptr to avoid dependency cycle\n", + "\n", + "#include \n", + "#include \n", + "#include \n", + "#include \n", + "#include \n", + "#include \n", + "using namespace std;\n", + "\n", + "struct Pupil;\n", + "\n", + "struct Class {\n", + " Class(int r, string_view s, string_view t)\n", + " : room{ r }, subject{ s }, teacher_name{ t } {}\n", + " int room;\n", + " string subject, teacher_name;\n", + " vector> pupils;\n", + "};\n", + "\n", + "struct Pupil {\n", + " Pupil(string_view n) : name{ n } {}\n", + " string name;\n", + " vector> classes;\n", + "};\n", + "\n", + "int main() {\n", + " vector> AllClasses{\n", + " make_shared(101, \"English\", \"Mr White\"),\n", + " make_shared(150, \"Math\", \"Miss Black\")\n", + " };\n", + " vector> AllPupils{\n", + " make_shared(\"Paul\"),\n", + " make_shared(\"Percy\"),\n", + " make_shared(\"Perry\"),\n", + " make_shared(\"Phoebe\"),\n", + " make_shared(\"Penny\"),\n", + " make_shared(\"Patricia\")\n", + " };\n", + "\n", + " auto add_to_class = [&](string_view c, string_view p) {\n", + " auto iter_c = find_if(cbegin(AllClasses), cend(AllClasses),\n", + " [&](auto ec){ return c == ec->subject; });\n", + " auto iter_p = find_if(cbegin(AllPupils), cend(AllPupils),\n", + " [&](auto ep){ return p == ep->name; });\n", + " if (iter_c != cend(AllClasses) && iter_p != cend(AllPupils)) {\n", + " (*iter_c)->pupils.push_back(*iter_p);\n", + " (*iter_p)->classes.push_back(*iter_c);\n", + " }\n", + " else {\n", + " cerr << \"Could not add \" << p << \" to \" << c << '\\n';\n", + " }\n", + " };\n", + "\n", + " add_to_class(\"English\", \"Paul\");\n", + " add_to_class(\"English\", \"Percy\");\n", + " add_to_class(\"English\", \"Phoebe\");\n", + " add_to_class(\"English\", \"Penny\");\n", + " add_to_class(\"Math\", \"Paul\");\n", + " add_to_class(\"Math\", \"Perry\");\n", + " add_to_class(\"Math\", \"Phoebe\");\n", + " add_to_class(\"Math\", \"Patricia\");\n", + "\n", + " AllClasses.emplace_back(make_shared(260, \"IT\", \"Mrs Brown\"));\n", + " add_to_class(\"IT\", \"Percy\");\n", + " add_to_class(\"IT\", \"Perry\");\n", + "\n", + " for (const auto& c : AllClasses) {\n", + " cout << \"Room: \" << c->room << \"\\nSubject: \" << c->subject\n", + " << \"\\nTeacher: \" << c->teacher_name << \"\\nPupils: \";\n", + " for (const auto& p : c->pupils) {\n", + " cout << p->name << ' ';\n", + " }\n", + " cout << '\\n';\n", + " }\n", + " \n", + " for (;;) {\n", + " cout << \"Please enter a pupil name (blank line to quit): \";\n", + " string s;\n", + " getline(cin, s);\n", + " if (s.empty()) {\n", + " break;\n", + " }\n", + " auto iter_p = find_if(cbegin(AllPupils), cend(AllPupils),\n", + " [&](auto ep){ return s == ep->name; });\n", + " if (iter_p != cend(AllPupils)) {\n", + " cout << \"Classes: \";\n", + " for (const auto& c : (*iter_p)->classes) {\n", + " if (auto pc = c.lock(); pc) {\n", + " cout << pc->subject << ' ';\n", + " }\n", + " }\n", + " cout << '\\n';\n", + " }\n", + " else {\n", + " cout << \"Name not recognized!\\n\";\n", + " }\n", + " }\n", + "}" + ] + }, + { + "cell_type": "markdown", + "id": "ba08f6c0", + "metadata": {}, + "source": [ + "This is one of the larger programs we have seen, and covers much of the contents of this Chapter:\n", + "\n", + "* Due to the fact that a `std::vector` of `std::shared_ptr` is used, a factory function needs to be used to generate the elements for `AllClasses` and `AllPupils`. The function template `std::make_shared` (introduced in `10-smartp3.cpp`) is used, which forwards its arguments to the constructor for the type specified within the angle brakets.\n", + "\n", + "* The Standard Library `std::find` cannot easily be used to search for a matching `std::shared_ptr` element, so `std::find_if` is used (twice) instead in the lambda function `add_to_class()`. This lambda needs to capture `AllClasses` and `AllPupils` by reference, and iterates through these with a *predicate* lambda that returns a `bool`. It is worthwhile becoming familiar with this syntax, `std::find()` was used in Chapter 7.\n", + "\n", + "* The variables `iter_c` and `iter_p` assume value `cend(AllClasses)` and `cend(AllPupils)` respectively if neither the predicates return `true`. Either of these causes the lambda `add_to_class()` to return early.\n", + "\n", + "* The linking of the elements of `AllClasses` and `AllPupils` is done by dereferencing `iter_c` and `iter_p` to produce a `std::shared_ptr` in each case. These are then dereferenced to obtain the container data members `pupils` and `classes` respectively, which are invoked with `push_back()`.\n", + "\n", + "* A range-for loop cycles through `AllClasses` printing out all of the `Class`es and their `Pupil`s.\n", + "\n", + "* The last interactive part of the program accepts a `Pupil` name and cycles through the data member `classes` (a `std::vector` of `std::weak_ptr`s). A `std::shared_ptr` to each `Class` is obtained via the `lock()` member function of each `std::weak_ptr`. The syntax may be slightly tricky to follow, but does not contain anything not seen before.\n", + "\n", + "**Experiment:**\n", + "\n", + "* Add some more classes and pupils to the program, and check that they are associated correctly. Experiment with `AllPupils.push_back()` and `AllClasses.push_back()`, as well as removing (using `find()` and `erase()`) from these containers.\n", + "\n", + "* Add a destructor to `Class` and `Pupil` which produces some output. Ensure that **all** objects in the program are correctly deleted.\n", + "\n", + "* Change the lambda captures from `[&]` to specify exactly which variables to capture. Experiment with by-reference and by-value capture for each variable.\n", + "\n", + "* Rewrite the program to use a member function `add_to_class()` instead of a lambda. This should be a member of `Pupil` and take a reference parameter of `std::shared_ptr`. You will also need to find a way for this function to access `AllClasses` and `AllPupils`, as well as a way of identifying an element of `AllPupils` by name within `main()`. (Hint: a `static` data member as a *class variable* may be useful here.)\n", + "\n", + "* Make it impossible to add the same pupil to the same class twice.\n", + "\n", + "* Change `Pupil` and `Class` to be `class`es instead of `struct`s, with `private:` data members. Hint: you will need to write some `public:` getters.\n", + "\n", + "*All text and program code ©2019-2025 Richard Spencer, all rights reserved.*" + ] + } + ], + "metadata": { + "jupytext": { + "cell_metadata_filter": "-all" + }, + "kernelspec": { + "display_name": "C++ 23", + "language": "c++", + "name": "cpp23" + } + }, + "nbformat": 4, + "nbformat_minor": 5 +} diff --git a/jupyter-notebooks/README.md b/jupyter-notebooks/README.md new file mode 100644 index 0000000..ed62591 --- /dev/null +++ b/jupyter-notebooks/README.md @@ -0,0 +1,14 @@ +## Jupyter Notebooks + +The complete Tutorial is available as a set of Jupyter Notebooks listed above. Please note that they are intended for use with JupyterLab—it is possible to preview them on GitHub but the interactive features will not be present. + +Installation of JupyterLab under Debian Linux is as follows: + +```bash +sudo apt update +sudo apt install jupyter jupyterlab # Note: many large dependencies +sudo pip install --break-system-packages jupyter-cpp-kernel +sudo apt install build-essential g++ # Note: minimum to compile C++ +cd /path/to/notebooks +jupyter lab # Note: copy weblink token from console and open in browser +``` diff --git a/modules/01-hellow.cpp b/modules/01-hellow.cpp index ae0478e..4696156 100644 --- a/modules/01-hellow.cpp +++ b/modules/01-hellow.cpp @@ -4,5 +4,5 @@ import std; using namespace std; int main() { - cout << "Hello, World!" << '\n'; + println("Hello, World!"); } diff --git a/modules/01-title.cpp b/modules/01-title.cpp index 99213c6..6d7cd74 100644 --- a/modules/01-title.cpp +++ b/modules/01-title.cpp @@ -4,12 +4,12 @@ import std; using namespace std; int main() { - cout << 1+R"( + print(1+R"( Alice's Adventures In Wonderland by LEWIS CARROLL -)"; +)"); } diff --git a/modules/02-assign.cpp b/modules/02-assign.cpp index 8a7d41e..41b84a2 100644 --- a/modules/02-assign.cpp +++ b/modules/02-assign.cpp @@ -6,9 +6,9 @@ using namespace std; int main() { int i = 1, j = 2; unsigned k; - cout << "(1) i = " << i << ", j = " << j << ", k = " << k << '\n'; + println("(1) i = {}, j = {}, k = {}", i, j, k); i = j; j = 3; k = -1; - cout << "(2) i = " << i << ", j = " << j << ", k = " << k << '\n'; + println("(2) i = {}, j = {}, k = {}", i, j, k); } diff --git a/modules/02-constants.cpp b/modules/02-constants.cpp index 0ad0716..0d047cb 100644 --- a/modules/02-constants.cpp +++ b/modules/02-constants.cpp @@ -7,7 +7,6 @@ const double PI = 3.14159265358979; int main() { auto const APPROX_E = 3; - cout << "pi is almost exactly " << PI - << "e is approximately " << APPROX_E - << '\n'; + println("pi is almost exactly {}, while e is approximately {}", + PI, APPROX_E); } diff --git a/modules/02-constexpr.cpp b/modules/02-constexpr.cpp index 77b5447..6e138c1 100644 --- a/modules/02-constexpr.cpp +++ b/modules/02-constexpr.cpp @@ -3,14 +3,16 @@ import std; using namespace std; -const double PI1 = acos(-1.0); // acos is not (yet) constexpr +// Note: currently, not all compilers mark `acos` as a +// constexpr function in cmath. The following line might +// not compile with `clang++` for example. +constexpr double PI1 = acos(-1.0); constexpr double PI2 = 22.0 / 7.0; -// the following line does not compile and has been commented out -//static_assert(PI1 > 3.141 && PI1 < 3.143); +static_assert(PI1 > 3.141 && PI1 < 3.143); static_assert(PI2 > 3.141 && PI2 < 3.143); int main() { - cout << "PI1 = " << PI1 << '\n'; - cout << "PI2 = " << PI2 << '\n'; + println("PI1 = {}", PI1); + println("PI2 = {}", PI2); } diff --git a/modules/02-height.cpp b/modules/02-height.cpp index 933b86c..9265bc3 100644 --- a/modules/02-height.cpp +++ b/modules/02-height.cpp @@ -12,9 +12,7 @@ namespace VictorianEngland { } int main() { - cout << "Alice\'s height varies between " - << Wonderland::alice_height_m - << "m and " - << VictorianEngland::alice_height_m - << "m.\n"; + println("Alice\'s height varies between {}m and {}m", + Wonderland::alice_height_m, + VictorianEngland::alice_height_m); } diff --git a/modules/02-references.cpp b/modules/02-references.cpp index dcdb236..f464efc 100644 --- a/modules/02-references.cpp +++ b/modules/02-references.cpp @@ -6,8 +6,8 @@ using namespace std; int alice_age{ 9 }; int main() { - cout << "Alice\'s age is " << alice_age << '\n'; + println("Alice\'s age is {}", alice_age); int& alice_age_ref = alice_age; alice_age_ref = 10; - cout << "Alice\'s age is now " << alice_age << '\n'; + println("Alice\'s age is now {}", alice_age); } diff --git a/modules/02-scopes.cpp b/modules/02-scopes.cpp index f79984f..0071b06 100644 --- a/modules/02-scopes.cpp +++ b/modules/02-scopes.cpp @@ -6,11 +6,11 @@ using namespace std; auto a{ 1.5f }; int main() { - cout << "(1) " << a << '\n'; + println("(1) {}", a); auto a{ 2u }; - cout << "(2) " << a << '\n'; + println("(2) {}", a); { auto a{ 2.5 }; - cout << "(3) " << a << '\n'; + println("(3) {}", a); } } diff --git a/modules/02-swap.cpp b/modules/02-swap.cpp index fc62e8a..d691e37 100644 --- a/modules/02-swap.cpp +++ b/modules/02-swap.cpp @@ -6,8 +6,8 @@ using namespace std; int main() { int a = 1; double b = 2.5; - cout << "(1) a = " << a << ", b = " << b << '\n'; + println("(1) a = {}, b = {}", a, b); a = 2.5; b = 1; - cout << "(2) a = " << a << ", b = " << b << '\n'; + println("(2) a = {}, b = {}", a, b); } diff --git a/modules/02-uniform.cpp b/modules/02-uniform.cpp index 8e41066..1012358 100644 --- a/modules/02-uniform.cpp +++ b/modules/02-uniform.cpp @@ -7,5 +7,5 @@ int main() { // int c = { 2.5 }; // Error: this does NOT compile int c = { static_cast(2.5) }; // while this does double d = { 1 }; // and so does this - cout << "c = " << c << ", d = " << d << '\n'; + println("c = {}, d = {}", c, d); } diff --git a/modules/04-inline.cpp b/modules/04-inline.cpp index 175c71b..be9f29a 100644 --- a/modules/04-inline.cpp +++ b/modules/04-inline.cpp @@ -11,7 +11,7 @@ inline void swap(int& x, int& y) { int main() { int a = 1, b = 2; - cout << "(1) a = " << a << ", b = " << b << '\n'; + println("(1) a = {}, b = {}", a, b); swap(a, b); - cout << "(2) a = " << a << ", b = " << b << '\n'; + println("(2) a = {}, b = {}", a, b); } diff --git a/modules/04-noexcept.cpp b/modules/04-noexcept.cpp index 2329b9c..dec1d95 100644 --- a/modules/04-noexcept.cpp +++ b/modules/04-noexcept.cpp @@ -3,21 +3,21 @@ import std; using namespace std; -int throw_if_zero(int i) noexcept { +void throw_if_zero(int i) noexcept { if (!i) { throw runtime_error("found a zero"); } - cout << "throw_if_zero(): " << i << '\n'; + println("throw_if_zero(): {}", i); } int main() { - cout << "Entering main()\n"; + println("Entering main()"); try { throw_if_zero(1); throw_if_zero(0); } - catch(...) { - cout << "Caught an exception!\n"; + catch(exception& e) { + println("Caught an exception: {}", e.what()); } - cout << "Leaving main()\n"; + println("Leaving main()"); } diff --git a/modules/04-static-var.cpp b/modules/04-static-var.cpp index 18fe691..4757f44 100644 --- a/modules/04-static-var.cpp +++ b/modules/04-static-var.cpp @@ -5,7 +5,7 @@ using namespace std; void f() { static int s{1}; - cout << s << '\n'; + println("{}", s); ++s; } diff --git a/modules/05-begin-end.cpp b/modules/05-begin-end.cpp index 374c968..610e0fe 100644 --- a/modules/05-begin-end.cpp +++ b/modules/05-begin-end.cpp @@ -1,4 +1,4 @@ -// 05-begin-end.cpp : demostration of the use of begin() and end() +// 05-begin-end.cpp : demonstration of the use of begin() and end() import std; using namespace std; diff --git a/modules/06-pixel1.cpp b/modules/06-pixel1.cpp index 8d961d9..7dfbeab 100644 --- a/modules/06-pixel1.cpp +++ b/modules/06-pixel1.cpp @@ -18,15 +18,13 @@ string_view get_color(Color c) { switch (c) { case Color::red: return "red"; - break; case Color::green: return "green"; - break; case Color::blue: return "blue"; - break; + default: + return ""; } - return ""; } int main() { diff --git a/modules/06-pixel2.cpp b/modules/06-pixel2.cpp index 9408bb6..831ceb0 100644 --- a/modules/06-pixel2.cpp +++ b/modules/06-pixel2.cpp @@ -17,15 +17,13 @@ string_view get_color(Color c) { switch (c) { case Color::red: return "red"; - break; case Color::green: return "green"; - break; case Color::blue: return "blue"; - break; + default: + return ""; } - return ""; } int main() { diff --git a/modules/07-set.cpp b/modules/07-set.cpp index 89e8cce..b0587da 100644 --- a/modules/07-set.cpp +++ b/modules/07-set.cpp @@ -5,7 +5,7 @@ using namespace std; int main() { set s{ - "Rossum, Guido van", + "Stroustrup, Bjarne", "Yukihiro, Matsumoto", "Wall, Larry", "Eich, Brendan" diff --git a/modules/07-string-upper.cpp b/modules/07-string-upper.cpp index a56bc90..19fa08e 100644 --- a/modules/07-string-upper.cpp +++ b/modules/07-string-upper.cpp @@ -10,7 +10,7 @@ void string_to_uppercase(string &s) { } int main() { - cout << "Please enter some text in lower-, mixed- or upper-case:\n"; + cout << "Please enter some text in lower, mixed or uppercase:\n"; string input; getline(cin, input); string_to_uppercase(input); diff --git a/modules/08-format1.cpp b/modules/08-format1.cpp new file mode 100644 index 0000000..447e45b --- /dev/null +++ b/modules/08-format1.cpp @@ -0,0 +1,11 @@ +// 08-format1.cpp : Basic usage of format string + +import std; +using namespace std; + +int main() { + string s{ "Formatted" }; + auto d{ 10.0 / 3.0 }; + auto i{ 20000 }; + println("{0:20}:{2:8}, {1:12.11}", s, d, i); +} diff --git a/modules/08-format2.cpp b/modules/08-format2.cpp new file mode 100644 index 0000000..657a5d4 --- /dev/null +++ b/modules/08-format2.cpp @@ -0,0 +1,25 @@ +// 08-format2.cpp : Various format string-using functions + +import std; +using namespace std; + +int main() { + string world{ "World" }; + print(cout, "Hello, {}!\n", world); + println("{1} or {0}", false, true); + + constexpr const char *fmt = "Approximation of π = {:.12g}"; + string s = format(fmt, asin(1.0) * 2); + cout << s << '\n'; + + constexpr const wchar_t *wfmt = L"Approximation of pi = {:.12g}"; + wstring ws = format(wfmt, asin(1.0) * 2); + wcout << ws << L'\n'; + + format_to(ostream_iterator(cout), "Hello, {}!\n", world); + wstring ww{ L"World" }; + array wa; + auto iter = format_to_n(wa.begin(), 8, L"Hello, {}!\n", ww); + *(iter.out) = L'\0'; + wcout << wa.data() << L'\n'; +} diff --git a/modules/09-person1.cpp b/modules/09-person1.cpp index 32157db..f4da006 100644 --- a/modules/09-person1.cpp +++ b/modules/09-person1.cpp @@ -2,24 +2,22 @@ import std; using namespace std; - -struct Date { - int year{}, month{}, day{}; -}; +using namespace std::chrono; class Person { public: - Person(const Date& dob, string_view familyname, string_view firstname) + Person(const year_month_day& dob, string_view familyname, string_view firstname) : dob{ dob }, familyname{ familyname }, firstname{ firstname } {} string getName() const { return firstname + ' ' + familyname; } + const year_month_day& getDob() const { return dob; } private: - const Date dob; + const year_month_day dob; string familyname, firstname; }; int main() { - Person genius{ { 1879, 3, 14 }, "Einstein", "Albert" }; - cout << genius.getName() << '\n'; + Person genius{ { 1879y, March, 14d }, "Einstein", "Albert" }; + cout << genius.getName() << " was born " << genius.getDob() << '\n'; } diff --git a/modules/09-person2.cpp b/modules/09-person2.cpp index 952c5e5..b957f8c 100644 --- a/modules/09-person2.cpp +++ b/modules/09-person2.cpp @@ -2,12 +2,12 @@ import std; using namespace std; +using namespace std::chrono; class Person { public: - struct Date; - Person(Date dob) : dob{ dob } {} - Person(Date dob, string_view familyname, string_view firstname, bool familynamefirst = false) + Person(year_month_day dob) : dob{ dob } {} + Person(year_month_day dob, string_view familyname, string_view firstname, bool familynamefirst = false) : dob{ dob }, familyname{ familyname }, firstname{ firstname }, familynamefirst{ familynamefirst } {} virtual ~Person() {} @@ -25,12 +25,8 @@ class Person { return firstname + ' ' + familyname; } } - struct Date { - unsigned short year{}; - unsigned char month{}, day{}; - }; protected: - const Date dob; + const year_month_day dob; private: string familyname, firstname; bool familynamefirst{}; @@ -41,7 +37,7 @@ class Student : public Person { enum class Schooling; Student(const Person& person, const vector& attended_classes = {}, Schooling school_type = Schooling::preschool) : Person{ person }, school_type{ school_type }, attended_classes{ attended_classes } {} - const Date& getDOB() const { return dob; } + const year_month_day& getDob() const { return dob; } const vector& getAttendedClasses() const { return attended_classes; } enum class Schooling { preschool, elementary, juniorhigh, highschool, college, homeschool, other }; private: @@ -53,7 +49,7 @@ class Employee : public Person { public: Employee(const Person& person, int employee_id, int salary = 0) : Person{ person }, employee_id{ employee_id }, salary{ salary } {} - bool isBirthday(Date today) const { return dob.month == today.month && dob.day == today.day; } + bool isBirthdayToday(year_month_day today) const { return dob.month() == today.month() && dob.day() == today.day(); } void setSalary(int salary) { salary = salary; } auto getDetails() const { return pair{ employee_id, salary }; } private: @@ -62,7 +58,7 @@ class Employee : public Person { }; int main() { - Person genius{ { 1879, 3, 14 }, "Einstein", "Albert" }; + Person genius{ { 1879y, March, 14d }, "Einstein", "Albert" }; Student genius_student{ genius, { "math", "physics", "philosophy" }, Student::Schooling::other }; Employee genius_employee{ genius, 1001, 15000 }; @@ -76,8 +72,8 @@ int main() { auto [ id, salary ] = genius_employee.getDetails(); cout << "ID: " << id << ", Salary: $" << salary << '\n'; - Person::Date next_bday{ 2020, 3, 14 }; - if (genius_employee.isBirthday(next_bday)) { + year_month_day next_bday{ 2024y, March, 14d }; + if (genius_employee.isBirthdayToday(next_bday)) { cout << "Happy Birthday!\n"; } } diff --git a/modules/09-person3.cpp b/modules/09-person3.cpp index f01e950..1bcd8c2 100644 --- a/modules/09-person3.cpp +++ b/modules/09-person3.cpp @@ -1,35 +1,27 @@ -// 09-person3.cpp : define operator== and operator<< for Person class +// 09-person3.cpp : define operator<=> for Person class import std; using namespace std; struct Date { int year{}, month{}, day{}; + auto operator<=>(const Date&) const = default; }; -bool operator== (const Date& lhs, const Date& rhs) { - return lhs.year == rhs.year && lhs.month == rhs.month && lhs.day == rhs.day; -} - class Person { public: Person(const Date& dob, string_view familyname, string_view firstname) : dob{ dob }, familyname{ familyname }, firstname{ firstname } {} string getName() const { return firstname + ' ' + familyname; } - friend bool operator== (const Person&, const Person&); + const auto& getDob() const { return dob; } + auto operator<=>(const Person&) const = default; friend ostream& operator<< (ostream&, const Person&); private: - const Date dob; string familyname, firstname; + const Date dob; }; -bool operator== (const Person& lhs, const Person& rhs) { - return lhs.familyname == rhs.familyname - && lhs.firstname == rhs.firstname - && lhs.dob == rhs.dob; -} - ostream& operator<< (ostream& os, const Person& p) { os << "Name: " << p.getName() << ", DOB: " << p.dob.year << '/' << p.dob.month << '/' << p.dob.day; @@ -37,8 +29,8 @@ ostream& operator<< (ostream& os, const Person& p) { } int main() { - Person person1{ { 2000, 1, 1 }, "John", "Doe" }, - person2{ { 1987, 11, 31 }, "John", "Doe" }; + Person person1{ { 2000, 1, 1 }, "Doe", "John" }, + person2{ { 1987, 11, 31 }, "Doe", "John" }; cout << "person1: " << person1 << '\n'; cout << "person2: " << person2 << '\n'; if (person1 == person2) { @@ -47,4 +39,16 @@ int main() { else { cout << "Different person!\n"; } + + cout << "person1 is "; + if (person1.getDob() > person2.getDob()) { + cout << "younger than "; + } + else if (person1.getDob() < person2.getDob()) { + cout << "older than "; + } + else { + cout << "the same age as "; + } + cout " person2\n"; } diff --git a/modules/10-functor2.cpp b/modules/10-functor2.cpp index 48cec8d..fd468e1 100644 --- a/modules/10-functor2.cpp +++ b/modules/10-functor2.cpp @@ -26,8 +26,7 @@ struct MinMaxAvg { int main() { vector v{ 3, 5, 2, 6, 2, 4 }; - MinMaxAvg f; - for_each(begin(v), end(v), ref(f)); + MinMaxAvg f = for_each(begin(v), end(v), MinMaxAvg{}); cout << "Min: " << f.min << " Max: " << f.max << " Avg: " << f.avg << " Num: " << f.num << '\n'; } diff --git a/modules/10-pupils.cpp b/modules/10-pupils.cpp index fd650c5..077c7fb 100644 --- a/modules/10-pupils.cpp +++ b/modules/10-pupils.cpp @@ -91,4 +91,4 @@ int main() { cout << "Name not recognized!\n"; } } -} +} diff --git a/modules/10-smartptr2.cpp b/modules/10-smartptr2.cpp index 49aef6e..4f4de3b 100644 --- a/modules/10-smartptr2.cpp +++ b/modules/10-smartptr2.cpp @@ -1,7 +1,6 @@ // 10-smartptr2.cpp : encapsulate a FILE* in a unique_ptr import std; -import std.compat; using namespace std; int main(int argc, const char *argv[]) { @@ -14,7 +13,7 @@ int main(int argc, const char *argv[]) { if (fp) { int c; - while ((c = fgetc(fp.get())) != ifstream::traits_type::eof()) { + while ((c = fgetc(fp.get())) != EOF) { putchar(c); } } diff --git a/modules/build/build-clang-modules.sh b/modules/build/build-clang-modules.sh index a82554c..a74f599 100755 --- a/modules/build/build-clang-modules.sh +++ b/modules/build/build-clang-modules.sh @@ -1,14 +1,37 @@ #!/bin/sh # This script will compile all files with .cpp extension in the parent directory -# Clang version 12 (or newer) C++ modules version +# Clang version 16 (or newer) C++ modules version + +if [ -z "$CLANG_PREFIX" ] ; then + CLANG_PREFIX="/usr" +fi + +if [ -z "$CLANG_PCM" ] ; then + CLANG_PCM="./std.pcm" +fi + +CLANG="$CLANG_PREFIX/bin/clang++" + +if [ ! -f "$CLANG_PCM" ] ; then + echo "Compiling library module..." + if [ ! -f "$CLANG_PREFIX/share/libc++/v1/std.cppm" ] ; then + echo "Error: Could not find file $CLANG_PREFIX/share/libc++/v1/std.cppm" + echo "Please set environment variable CLANG_PREFIX and re-run script" + exit 1 + fi + LD_LIBRARY_PATH="$CLANG_PREFIX/lib":"$CLANG_PREFIX/lib/x86_64-unknown-linux-gnu" \ + "$CLANG" -std=c++23 -stdlib=libc++ -Wno-reserved-identifier -Wno-reserved-module-identifier \ + --precompile -o "$CLANG_PCM" "$CLANG_PREFIX/share/libc++/v1/std.cppm" +fi failures=0 for PROGRAM in ../*.cpp ; do BASE="$(basename $PROGRAM)" echo "$BASE..." failed="" - clang++ -fmodules -std=c++20 -stdlib=libc++ -o ${BASE%.cpp} $PROGRAM >/dev/null 2>&1 || failed="y" + LD_LIBRARY_PATH="$CLANG_PREFIX/lib":"$CLANG_PREFIX/lib/x86_64-unknown-linux-gnu" \ + "$CLANG" -fmodule-file=std="$CLANG_PCM" -std=c++23 -stdlib=libc++ -o ${BASE%.cpp} $PROGRAM || failed="y" if [ -n "$failed" ] ; then echo "Failed to compile $BASE" failures=$((failures+1)) diff --git a/scripts/make_notebooks.sh b/scripts/make_notebooks.sh new file mode 100755 index 0000000..cab6677 --- /dev/null +++ b/scripts/make_notebooks.sh @@ -0,0 +1,10 @@ +#!/bin/bash + +# Note: in order to be recognized as code blocks, must start with: ```python + +srcdir=$(dirname $(readlink -f "$0"))/.. +destdir=$srcdir/jupyter-notebooks +for f in $srcdir/{01,02,03,04,05,06,07,08,09,10}-*.md ; do + out=$destdir/$(basename $f) + cat $f | sed 's/```cpp/```python/' | jupytext --from md --to notebook -k cpp23 -o ${out%%.md}.ipynb +done