Skip to content

Commit 8ea7744

Browse files
chris-b1jreback
authored andcommitted
PERF: ascii c string functions (#23981)
1 parent cb862e4 commit 8ea7744

File tree

5 files changed

+175
-34
lines changed

5 files changed

+175
-34
lines changed

LICENSES/MUSL_LICENSE

+132
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,132 @@
1+
musl as a whole is licensed under the following standard MIT license:
2+
3+
----------------------------------------------------------------------
4+
Copyright © 2005-2014 Rich Felker, et al.
5+
6+
Permission is hereby granted, free of charge, to any person obtaining
7+
a copy of this software and associated documentation files (the
8+
"Software"), to deal in the Software without restriction, including
9+
without limitation the rights to use, copy, modify, merge, publish,
10+
distribute, sublicense, and/or sell copies of the Software, and to
11+
permit persons to whom the Software is furnished to do so, subject to
12+
the following conditions:
13+
14+
The above copyright notice and this permission notice shall be
15+
included in all copies or substantial portions of the Software.
16+
17+
THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND,
18+
EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF
19+
MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT.
20+
IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY
21+
CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT,
22+
TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE
23+
SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.
24+
----------------------------------------------------------------------
25+
26+
Authors/contributors include:
27+
28+
Anthony G. Basile
29+
Arvid Picciani
30+
Bobby Bingham
31+
Boris Brezillon
32+
Brent Cook
33+
Chris Spiegel
34+
Clément Vasseur
35+
Emil Renner Berthing
36+
Hiltjo Posthuma
37+
Isaac Dunham
38+
Jens Gustedt
39+
Jeremy Huntwork
40+
John Spencer
41+
Justin Cormack
42+
Luca Barbato
43+
Luka Perkov
44+
M Farkas-Dyck (Strake)
45+
Michael Forney
46+
Nicholas J. Kain
47+
orc
48+
Pascal Cuoq
49+
Pierre Carrier
50+
Rich Felker
51+
Richard Pennington
52+
sin
53+
Solar Designer
54+
Stefan Kristiansson
55+
Szabolcs Nagy
56+
Timo Teräs
57+
Valentin Ochs
58+
William Haddon
59+
60+
Portions of this software are derived from third-party works licensed
61+
under terms compatible with the above MIT license:
62+
63+
The TRE regular expression implementation (src/regex/reg* and
64+
src/regex/tre*) is Copyright © 2001-2008 Ville Laurikari and licensed
65+
under a 2-clause BSD license (license text in the source files). The
66+
included version has been heavily modified by Rich Felker in 2012, in
67+
the interests of size, simplicity, and namespace cleanliness.
68+
69+
Much of the math library code (src/math/* and src/complex/*) is
70+
Copyright © 1993,2004 Sun Microsystems or
71+
Copyright © 2003-2011 David Schultz or
72+
Copyright © 2003-2009 Steven G. Kargl or
73+
Copyright © 2003-2009 Bruce D. Evans or
74+
Copyright © 2008 Stephen L. Moshier
75+
and labelled as such in comments in the individual source files. All
76+
have been licensed under extremely permissive terms.
77+
78+
The ARM memcpy code (src/string/armel/memcpy.s) is Copyright © 2008
79+
The Android Open Source Project and is licensed under a two-clause BSD
80+
license. It was taken from Bionic libc, used on Android.
81+
82+
The implementation of DES for crypt (src/misc/crypt_des.c) is
83+
Copyright © 1994 David Burren. It is licensed under a BSD license.
84+
85+
The implementation of blowfish crypt (src/misc/crypt_blowfish.c) was
86+
originally written by Solar Designer and placed into the public
87+
domain. The code also comes with a fallback permissive license for use
88+
in jurisdictions that may not recognize the public domain.
89+
90+
The smoothsort implementation (src/stdlib/qsort.c) is Copyright © 2011
91+
Valentin Ochs and is licensed under an MIT-style license.
92+
93+
The BSD PRNG implementation (src/prng/random.c) and XSI search API
94+
(src/search/*.c) functions are Copyright © 2011 Szabolcs Nagy and
95+
licensed under following terms: "Permission to use, copy, modify,
96+
and/or distribute this code for any purpose with or without fee is
97+
hereby granted. There is no warranty."
98+
99+
The x86_64 port was written by Nicholas J. Kain. Several files (crt)
100+
were released into the public domain; others are licensed under the
101+
standard MIT license terms at the top of this file. See individual
102+
files for their copyright status.
103+
104+
The mips and microblaze ports were originally written by Richard
105+
Pennington for use in the ellcc project. The original code was adapted
106+
by Rich Felker for build system and code conventions during upstream
107+
integration. It is licensed under the standard MIT terms.
108+
109+
The powerpc port was also originally written by Richard Pennington,
110+
and later supplemented and integrated by John Spencer. It is licensed
111+
under the standard MIT terms.
112+
113+
All other files which have no copyright comments are original works
114+
produced specifically for use as part of this library, written either
115+
by Rich Felker, the main author of the library, or by one or more
116+
contibutors listed above. Details on authorship of individual files
117+
can be found in the git version control history of the project. The
118+
omission of copyright and license comments in each file is in the
119+
interest of source tree size.
120+
121+
All public header files (include/* and arch/*/bits/*) should be
122+
treated as Public Domain as they intentionally contain no content
123+
which can be covered by copyright. Some source modules may fall in
124+
this category as well. If you believe that a file is so trivial that
125+
it should be in the Public Domain, please contact the authors and
126+
request an explicit statement releasing it from copyright.
127+
128+
The following files are trivial, believed not to be copyrightable in
129+
the first place, and hereby explicitly released to the Public Domain:
130+
131+
All public headers: include/*, arch/*/bits/*
132+
Startup files: crt/*

doc/source/whatsnew/v0.24.0.rst

+1
Original file line numberDiff line numberDiff line change
@@ -1242,6 +1242,7 @@ Performance Improvements
12421242
- Improved performance of :func:`pd.concat` for `Series` objects (:issue:`23404`)
12431243
- Improved performance of :meth:`DatetimeIndex.normalize` and :meth:`Timestamp.normalize` for timezone naive or UTC datetimes (:issue:`23634`)
12441244
- Improved performance of :meth:`DatetimeIndex.tz_localize` and various ``DatetimeIndex`` attributes with dateutil UTC timezone (:issue:`23772`)
1245+
- Fixed a performance regression on Windows with Python 3.7 of :func:`pd.read_csv` (:issue:`23516`)
12451246
- Improved performance of :class:`Categorical` constructor for `Series` objects (:issue:`23814`)
12461247

12471248
.. _whatsnew_0240.docs:

pandas/_libs/src/headers/portable.h

+6
Original file line numberDiff line numberDiff line change
@@ -5,4 +5,10 @@
55
#define strcasecmp( s1, s2 ) _stricmp( s1, s2 )
66
#endif
77

8+
// GH-23516 - works around locale perf issues
9+
// from MUSL libc, MIT Licensed - see LICENSES
10+
#define isdigit_ascii(c) ((unsigned)c - '0' < 10)
11+
#define isspace_ascii(c) (c == ' ' || (unsigned)c-'\t' < 5)
12+
#define toupper_ascii(c) (((unsigned)c-'a' < 26) ? (c & 0x5f) : c)
13+
814
#endif

pandas/_libs/src/parse_helper.h

+7-7
Original file line numberDiff line numberDiff line change
@@ -138,11 +138,11 @@ int floatify(PyObject *str, double *result, int *maybe_int) {
138138
//
139139

140140
PANDAS_INLINE void lowercase(char *p) {
141-
for (; *p; ++p) *p = tolower(*p);
141+
for (; *p; ++p) *p = tolower_ascii(*p);
142142
}
143143

144144
PANDAS_INLINE void uppercase(char *p) {
145-
for (; *p; ++p) *p = toupper(*p);
145+
for (; *p; ++p) *p = toupper_ascii(*p);
146146
}
147147

148148
static double xstrtod(const char *str, char **endptr, char decimal, char sci,
@@ -177,7 +177,7 @@ static double xstrtod(const char *str, char **endptr, char decimal, char sci,
177177
num_decimals = 0;
178178

179179
// Process string of digits
180-
while (isdigit(*p)) {
180+
while (isdigit_ascii(*p)) {
181181
number = number * 10. + (*p - '0');
182182
p++;
183183
num_digits++;
@@ -188,7 +188,7 @@ static double xstrtod(const char *str, char **endptr, char decimal, char sci,
188188
*maybe_int = 0;
189189
p++;
190190

191-
while (isdigit(*p)) {
191+
while (isdigit_ascii(*p)) {
192192
number = number * 10. + (*p - '0');
193193
p++;
194194
num_digits++;
@@ -207,7 +207,7 @@ static double xstrtod(const char *str, char **endptr, char decimal, char sci,
207207
if (negative) number = -number;
208208

209209
// Process an exponent string
210-
if (toupper(*p) == toupper(sci)) {
210+
if (toupper_ascii(*p) == toupper_ascii(sci)) {
211211
*maybe_int = 0;
212212

213213
// Handle optional sign
@@ -222,7 +222,7 @@ static double xstrtod(const char *str, char **endptr, char decimal, char sci,
222222
// Process string of digits
223223
num_digits = 0;
224224
n = 0;
225-
while (isdigit(*p)) {
225+
while (isdigit_ascii(*p)) {
226226
n = n * 10 + (*p - '0');
227227
num_digits++;
228228
p++;
@@ -263,7 +263,7 @@ static double xstrtod(const char *str, char **endptr, char decimal, char sci,
263263

264264
if (skip_trailing) {
265265
// Skip trailing whitespace
266-
while (isspace(*p)) p++;
266+
while (isspace_ascii(*p)) p++;
267267
}
268268

269269
if (endptr) *endptr = p;

0 commit comments

Comments
 (0)