|
1 | 1 | ChangeLog for PCRE
|
2 | 2 | ------------------
|
3 | 3 |
|
| 4 | +Version 5.0 13-Sep-04 |
| 5 | +--------------------- |
| 6 | + |
| 7 | + 1. Internal change: literal characters are no longer packed up into items |
| 8 | + containing multiple characters in a single byte-string. Each character |
| 9 | + is now matched using a separate opcode. However, there may be more than one |
| 10 | + byte in the character in UTF-8 mode. |
| 11 | + |
| 12 | + 2. The pcre_callout_block structure has two new fields: pattern_position and |
| 13 | + next_item_length. These contain the offset in the pattern to the next match |
| 14 | + item, and its length, respectively. |
| 15 | + |
| 16 | + 3. The PCRE_AUTO_CALLOUT option for pcre_compile() requests the automatic |
| 17 | + insertion of callouts before each pattern item. Added the /C option to |
| 18 | + pcretest to make use of this. |
| 19 | + |
| 20 | + 4. On the advice of a Windows user, the lines |
| 21 | + |
| 22 | + #if defined(_WIN32) || defined(WIN32) |
| 23 | + _setmode( _fileno( stdout ), 0x8000 ); |
| 24 | + #endif /* defined(_WIN32) || defined(WIN32) */ |
| 25 | + |
| 26 | + have been added to the source of pcretest. This apparently does useful |
| 27 | + magic in relation to line terminators. |
| 28 | + |
| 29 | + 5. Changed "r" and "w" in the calls to fopen() in pcretest to "rb" and "wb" |
| 30 | + for the benefit of those environments where the "b" makes a difference. |
| 31 | + |
| 32 | + 6. The icc compiler has the same options as gcc, but "configure" doesn't seem |
| 33 | + to know about it. I have put a hack into configure.in that adds in code |
| 34 | + to set GCC=yes if CC=icc. This seems to end up at a point in the |
| 35 | + generated configure script that is early enough to affect the setting of |
| 36 | + compiler options, which is what is needed, but I have no means of testing |
| 37 | + whether it really works. (The user who reported this had patched the |
| 38 | + generated configure script, which of course I cannot do.) |
| 39 | + |
| 40 | + LATER: After change 22 below (new libtool files), the configure script |
| 41 | + seems to know about icc (and also ecc). Therefore, I have commented out |
| 42 | + this hack in configure.in. |
| 43 | + |
| 44 | + 7. Added support for pkg-config (2 patches were sent in). |
| 45 | + |
| 46 | + 8. Negated POSIX character classes that used a combination of internal tables |
| 47 | + were completely broken. These were [[:^alpha:]], [[:^alnum:]], and |
| 48 | + [[:^ascii]]. Typically, they would match almost any characters. The other |
| 49 | + POSIX classes were not broken in this way. |
| 50 | + |
| 51 | + 9. Matching the pattern "\b.*?" against "ab cd", starting at offset 1, failed |
| 52 | + to find the match, as PCRE was deluded into thinking that the match had to |
| 53 | + start at the start point or following a newline. The same bug applied to |
| 54 | + patterns with negative forward assertions or any backward assertions |
| 55 | + preceding ".*" at the start, unless the pattern required a fixed first |
| 56 | + character. This was a failing pattern: "(?!.bcd).*". The bug is now fixed. |
| 57 | + |
| 58 | +10. In UTF-8 mode, when moving forwards in the subject after a failed match |
| 59 | + starting at the last subject character, bytes beyond the end of the subject |
| 60 | + string were read. |
| 61 | + |
| 62 | +11. Renamed the variable "class" as "classbits" to make life easier for C++ |
| 63 | + users. (Previously there was a macro definition, but it apparently wasn't |
| 64 | + enough.) |
| 65 | + |
| 66 | +12. Added the new field "tables" to the extra data so that tables can be passed |
| 67 | + in at exec time, or the internal tables can be re-selected. This allows |
| 68 | + a compiled regex to be saved and re-used at a later time by a different |
| 69 | + program that might have everything at different addresses. |
| 70 | + |
| 71 | +13. Modified the pcre-config script so that, when run on Solaris, it shows a |
| 72 | + -R library as well as a -L library. |
| 73 | + |
| 74 | +14. The debugging options of pcretest (-d on the command line or D on a |
| 75 | + pattern) showed incorrect output for anything following an extended class |
| 76 | + that contained multibyte characters and which was followed by a quantifier. |
| 77 | + |
| 78 | +15. Added optional support for general category Unicode character properties |
| 79 | + via the \p, \P, and \X escapes. Unicode property support implies UTF-8 |
| 80 | + support. It adds about 90K to the size of the library. The meanings of the |
| 81 | + inbuilt class escapes such as \d and \s have NOT been changed. |
| 82 | + |
| 83 | +16. Updated pcredemo.c to include calls to free() to release the memory for the |
| 84 | + compiled pattern. |
| 85 | + |
| 86 | +17. The generated file chartables.c was being created in the source directory |
| 87 | + instead of in the building directory. This caused the build to fail if the |
| 88 | + source directory was different from the building directory, and was |
| 89 | + read-only. |
| 90 | + |
| 91 | +18. Added some sample Win commands from Mark Tetrode into the NON-UNIX-USE |
| 92 | + file. No doubt somebody will tell me if they don't make sense... Also added |
| 93 | + Dan Mooney's comments about building on OpenVMS. |
| 94 | + |
| 95 | +19. Added support for partial matching via the PCRE_PARTIAL option for |
| 96 | + pcre_exec() and the \P data escape in pcretest. |
| 97 | + |
| 98 | +20. Extended pcretest with 3 new pattern features: |
| 99 | + |
| 100 | + (i) A pattern option of the form ">rest-of-line" causes pcretest to |
| 101 | + write the compiled pattern to the file whose name is "rest-of-line". |
| 102 | + This is a straight binary dump of the data, with the saved pointer to |
| 103 | + the character tables forced to be NULL. The study data, if any, is |
| 104 | + written too. After writing, pcretest reads a new pattern. |
| 105 | + |
| 106 | + (ii) If, instead of a pattern, "<rest-of-line" is given, pcretest reads a |
| 107 | + compiled pattern from the given file. There must not be any |
| 108 | + occurrences of "<" in the file name (pretty unlikely); if there are, |
| 109 | + pcretest will instead treat the initial "<" as a pattern delimiter. |
| 110 | + After reading in the pattern, pcretest goes on to read data lines as |
| 111 | + usual. |
| 112 | + |
| 113 | + (iii) The F pattern option causes pcretest to flip the bytes in the 32-bit |
| 114 | + and 16-bit fields in a compiled pattern, to simulate a pattern that |
| 115 | + was compiled on a host of opposite endianness. |
| 116 | + |
| 117 | +21. The pcre-exec() function can now cope with patterns that were compiled on |
| 118 | + hosts of opposite endianness, with this restriction: |
| 119 | + |
| 120 | + As for any compiled expression that is saved and used later, the tables |
| 121 | + pointer field cannot be preserved; the extra_data field in the arguments |
| 122 | + to pcre_exec() should be used to pass in a tables address if a value |
| 123 | + other than the default internal tables were used at compile time. |
| 124 | + |
| 125 | +22. Calling pcre_exec() with a negative value of the "ovecsize" parameter is |
| 126 | + now diagnosed as an error. Previously, most of the time, a negative number |
| 127 | + would have been treated as zero, but if in addition "ovector" was passed as |
| 128 | + NULL, a crash could occur. |
| 129 | + |
| 130 | +23. Updated the files ltmain.sh, config.sub, config.guess, and aclocal.m4 with |
| 131 | + new versions from the libtool 1.5 distribution (the last one is a copy of |
| 132 | + a file called libtool.m4). This seems to have fixed the need to patch |
| 133 | + "configure" to support Darwin 1.3 (which I used to do). However, I still |
| 134 | + had to patch ltmain.sh to ensure that ${SED} is set (it isn't on my |
| 135 | + workstation). |
| 136 | + |
| 137 | +24. Changed the PCRE licence to be the more standard "BSD" licence. |
| 138 | + |
| 139 | + |
4 | 140 | Version 4.5 01-Dec-03
|
5 | 141 | ---------------------
|
6 | 142 |
|
|
0 commit comments