|
1 | 1 | ChangeLog for PCRE
|
2 | 2 | ------------------
|
3 | 3 |
|
| 4 | +Version 4.5 01-Dec-03 |
| 5 | +--------------------- |
| 6 | + |
| 7 | + 1. There has been some re-arrangement of the code for the match() function so |
| 8 | + that it can be compiled in a version that does not call itself recursively. |
| 9 | + Instead, it keeps those local variables that need separate instances for |
| 10 | + each "recursion" in a frame on the heap, and gets/frees frames whenever it |
| 11 | + needs to "recurse". Keeping track of where control must go is done by means |
| 12 | + of setjmp/longjmp. The whole thing is implemented by a set of macros that |
| 13 | + hide most of the details from the main code, and operates only if |
| 14 | + NO_RECURSE is defined while compiling pcre.c. If PCRE is built using the |
| 15 | + "configure" mechanism, "--disable-stack-for-recursion" turns on this way of |
| 16 | + operating. |
| 17 | + |
| 18 | + To make it easier for callers to provide specially tailored get/free |
| 19 | + functions for this usage, two new functions, pcre_stack_malloc, and |
| 20 | + pcre_stack_free, are used. They are always called in strict stacking order, |
| 21 | + and the size of block requested is always the same. |
| 22 | + |
| 23 | + The PCRE_CONFIG_STACKRECURSE info parameter can be used to find out whether |
| 24 | + PCRE has been compiled to use the stack or the heap for recursion. The |
| 25 | + -C option of pcretest uses this to show which version is compiled. |
| 26 | + |
| 27 | + A new data escape \S, is added to pcretest; it causes the amounts of store |
| 28 | + obtained and freed by both kinds of malloc/free at match time to be added |
| 29 | + to the output. |
| 30 | + |
| 31 | + 2. Changed the locale test to use "fr_FR" instead of "fr" because that's |
| 32 | + what's available on my current Linux desktop machine. |
| 33 | + |
| 34 | + 3. When matching a UTF-8 string, the test for a valid string at the start has |
| 35 | + been extended. If start_offset is not zero, PCRE now checks that it points |
| 36 | + to a byte that is the start of a UTF-8 character. If not, it returns |
| 37 | + PCRE_ERROR_BADUTF8_OFFSET (-11). Note: the whole string is still checked; |
| 38 | + this is necessary because there may be backward assertions in the pattern. |
| 39 | + When matching the same subject several times, it may save resources to use |
| 40 | + PCRE_NO_UTF8_CHECK on all but the first call if the string is long. |
| 41 | + |
| 42 | + 4. The code for checking the validity of UTF-8 strings has been tightened so |
| 43 | + that it rejects (a) strings containing 0xfe or 0xff bytes and (b) strings |
| 44 | + containing "overlong sequences". |
| 45 | + |
| 46 | + 5. Fixed a bug (appearing twice) that I could not find any way of exploiting! |
| 47 | + I had written "if ((digitab[*p++] && chtab_digit) == 0)" where the "&&" |
| 48 | + should have been "&", but it just so happened that all the cases this let |
| 49 | + through by mistake were picked up later in the function. |
| 50 | + |
| 51 | + 6. I had used a variable called "isblank" - this is a C99 function, causing |
| 52 | + some compilers to warn. To avoid this, I renamed it (as "blankclass"). |
| 53 | + |
| 54 | + 7. Cosmetic: (a) only output another newline at the end of pcretest if it is |
| 55 | + prompting; (b) run "./pcretest /dev/null" at the start of the test script |
| 56 | + so the version is shown; (c) stop "make test" echoing "./RunTest". |
| 57 | + |
| 58 | + 8. Added patches from David Burgess to enable PCRE to run on EBCDIC systems. |
| 59 | + |
| 60 | + 9. The prototype for memmove() for systems that don't have it was using |
| 61 | + size_t, but the inclusion of the header that defines size_t was later. I've |
| 62 | + moved the #includes for the C headers earlier to avoid this. |
| 63 | + |
| 64 | +10. Added some adjustments to the code to make it easier to compiler on certain |
| 65 | + special systems: |
| 66 | + |
| 67 | + (a) Some "const" qualifiers were missing. |
| 68 | + (b) Added the macro EXPORT before all exported functions; by default this |
| 69 | + is defined to be empty. |
| 70 | + (c) Changed the dftables auxiliary program (that builds chartables.c) so |
| 71 | + that it reads its output file name as an argument instead of writing |
| 72 | + to the standard output and assuming this can be redirected. |
| 73 | + |
| 74 | +11. In UTF-8 mode, if a recursive reference (e.g. (?1)) followed a character |
| 75 | + class containing characters with values greater than 255, PCRE compilation |
| 76 | + went into a loop. |
| 77 | + |
| 78 | +12. A recursive reference to a subpattern that was within another subpattern |
| 79 | + that had a minimum quantifier of zero caused PCRE to crash. For example, |
| 80 | + (x(y(?2))z)? provoked this bug with a subject that got as far as the |
| 81 | + recursion. If the recursively-called subpattern itself had a zero repeat, |
| 82 | + that was OK. |
| 83 | + |
| 84 | +13. In pcretest, the buffer for reading a data line was set at 30K, but the |
| 85 | + buffer into which it was copied (for escape processing) was still set at |
| 86 | + 1024, so long lines caused crashes. |
| 87 | + |
| 88 | +14. A pattern such as /[ab]{1,3}+/ failed to compile, giving the error |
| 89 | + "internal error: code overflow...". This applied to any character class |
| 90 | + that was followed by a possessive quantifier. |
| 91 | + |
| 92 | +15. Modified the Makefile to add libpcre.la as a prerequisite for |
| 93 | + libpcreposix.la because I was told this is needed for a parallel build to |
| 94 | + work. |
| 95 | + |
| 96 | +16. If a pattern that contained .* following optional items at the start was |
| 97 | + studied, the wrong optimizing data was generated, leading to matching |
| 98 | + errors. For example, studying /[ab]*.*c/ concluded, erroneously, that any |
| 99 | + matching string must start with a or b or c. The correct conclusion for |
| 100 | + this pattern is that a match can start with any character. |
| 101 | + |
| 102 | + |
| 103 | +Version 4.4 13-Aug-03 |
| 104 | +--------------------- |
| 105 | + |
| 106 | + 1. In UTF-8 mode, a character class containing characters with values between |
| 107 | + 127 and 255 was not handled correctly if the compiled pattern was studied. |
| 108 | + In fixing this, I have also improved the studying algorithm for such |
| 109 | + classes (slightly). |
| 110 | + |
| 111 | + 2. Three internal functions had redundant arguments passed to them. Removal |
| 112 | + might give a very teeny performance improvement. |
| 113 | + |
| 114 | + 3. Documentation bug: the value of the capture_top field in a callout is *one |
| 115 | + more than* the number of the hightest numbered captured substring. |
| 116 | + |
| 117 | + 4. The Makefile linked pcretest and pcregrep with -lpcre, which could result |
| 118 | + in incorrectly linking with a previously installed version. They now link |
| 119 | + explicitly with libpcre.la. |
| 120 | + |
| 121 | + 5. configure.in no longer needs to recognize Cygwin specially. |
| 122 | + |
| 123 | + 6. A problem in pcre.in for Windows platforms is fixed. |
| 124 | + |
| 125 | + 7. If a pattern was successfully studied, and the -d (or /D) flag was given to |
| 126 | + pcretest, it used to include the size of the study block as part of its |
| 127 | + output. Unfortunately, the structure contains a field that has a different |
| 128 | + size on different hardware architectures. This meant that the tests that |
| 129 | + showed this size failed. As the block is currently always of a fixed size, |
| 130 | + this information isn't actually particularly useful in pcretest output, so |
| 131 | + I have just removed it. |
| 132 | + |
| 133 | + 8. Three pre-processor statements accidentally did not start in column 1. |
| 134 | + Sadly, there are *still* compilers around that complain, even though |
| 135 | + standard C has not required this for well over a decade. Sigh. |
| 136 | + |
| 137 | + 9. In pcretest, the code for checking callouts passed small integers in the |
| 138 | + callout_data field, which is a void * field. However, some picky compilers |
| 139 | + complained about the casts involved for this on 64-bit systems. Now |
| 140 | + pcretest passes the address of the small integer instead, which should get |
| 141 | + rid of the warnings. |
| 142 | + |
| 143 | +10. By default, when in UTF-8 mode, PCRE now checks for valid UTF-8 strings at |
| 144 | + both compile and run time, and gives an error if an invalid UTF-8 sequence |
| 145 | + is found. There is a option for disabling this check in cases where the |
| 146 | + string is known to be correct and/or the maximum performance is wanted. |
| 147 | + |
| 148 | +11. In response to a bug report, I changed one line in Makefile.in from |
| 149 | + |
| 150 | + -Wl,--out-implib,.libs/lib@WIN_PREFIX@pcreposix.dll.a \ |
| 151 | + to |
| 152 | + -Wl,--out-implib,.libs/@WIN_PREFIX@libpcreposix.dll.a \ |
| 153 | + |
| 154 | + to look similar to other lines, but I have no way of telling whether this |
| 155 | + is the right thing to do, as I do not use Windows. No doubt I'll get told |
| 156 | + if it's wrong... |
| 157 | + |
| 158 | + |
4 | 159 | Version 4.3 21-May-03
|
5 | 160 | ---------------------
|
6 | 161 |
|
|
0 commit comments