Skip to content

Commit 342bbc3

Browse files
committed
Import GotoBLAS2 1.13 BSD version codes.
0 parents  commit 342bbc3

File tree

1,685 files changed

+1382682
-0
lines changed

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

1,685 files changed

+1382682
-0
lines changed

00License.txt

+32
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,32 @@
1+
2+
Copyright 2009, 2010 The University of Texas at Austin.
3+
All rights reserved.
4+
5+
Redistribution and use in source and binary forms, with or without
6+
modification, are permitted provided that the following conditions are
7+
met:
8+
9+
1. Redistributions of source code must retain the above copyright
10+
notice, this list of conditions and the following disclaimer.
11+
12+
2. Redistributions in binary form must reproduce the above copyright
13+
notice, this list of conditions and the following disclaimer in
14+
the documentation and/or other materials provided with the
15+
distribution.
16+
17+
THIS SOFTWARE IS PROVIDED BY THE UNIVERSITY OF TEXAS AT AUSTIN ``AS IS''
18+
AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO,
19+
THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR
20+
PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE UNIVERSITY OF TEXAS AT
21+
AUSTIN OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
22+
SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED
23+
TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR
24+
PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF
25+
LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING
26+
NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS
27+
SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
28+
29+
The views and conclusions contained in the software and documentation
30+
are those of the authors and should not be interpreted as representing
31+
official policies, either expressed or implied, of The University of
32+
Texas at Austin.

01Readme.txt

+93
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,93 @@
1+
Optimized GotoBLAS2 libraries version 1.13
2+
3+
By Kazushige Goto <kgoto@tacc.utexas.edu>
4+
5+
# This is the last update and done on 5th Feb. 2010.
6+
7+
0. License
8+
9+
See 00TACC_Research_License.txt.
10+
11+
1. Supported OS
12+
13+
Linux
14+
FreeBSD(Also it may work on NetBSD)
15+
OSX
16+
Soralis
17+
Windows 2k, XP, Server 2003 and 2008(both 32bit and 64bit)
18+
AIX
19+
Tru64 UNIX
20+
21+
2. Supported Architecture
22+
23+
X86 : Pentium3 Katmai
24+
Coppermine
25+
Athlon (not well optimized, though)
26+
PentiumM Banias, Yonah
27+
Pentium4 Northwood
28+
Nocona (Prescott)
29+
Core 2 Woodcrest
30+
Core 2 Penryn
31+
Nehalem-EP Corei{3,5,7}
32+
Atom
33+
AMD Opteron
34+
AMD Barlcelona, Shanghai, Istanbul
35+
VIA NANO
36+
37+
X86_64: Pentium4 Nocona
38+
Core 2 Woodcrest
39+
Core 2 Penryn
40+
Nehalem
41+
Atom
42+
AMD Opteron
43+
AMD Barlcelona, Shanghai, Istanbul
44+
VIA NANO
45+
46+
IA64 : Itanium2
47+
48+
Alpha : EV4, EV5, EV6
49+
50+
POWER : POWER4
51+
PPC970/PPC970FX
52+
PPC970MP
53+
CELL (PPU only)
54+
POWER5
55+
PPC440 (QCDOC)
56+
PPC440FP2(BG/L)
57+
POWERPC G4(PPC7450)
58+
POWER6
59+
60+
SPARC : SPARC IV
61+
SPARC VI, VII (Fujitsu chip)
62+
63+
MIPS64/32: Sicortex
64+
65+
3. Supported compiler
66+
67+
C compiler : GNU CC
68+
Cygwin, MinGW
69+
Other commercial compiler(especially for x86/x86_64)
70+
71+
Fortran Compiler : GNU G77, GFORTRAN
72+
G95
73+
Open64
74+
Compaq
75+
F2C
76+
IBM
77+
Intel
78+
PathScale
79+
PGI
80+
SUN
81+
Fujitsu
82+
83+
4. Suported precision
84+
85+
Now x86/x86_64 version support 80bit FP precision in addition to
86+
normal double presicion and single precision. Currently only
87+
gfortran supports 80bit FP with "REAL*10".
88+
89+
90+
5. How to build library?
91+
92+
Please see 02QuickInstall.txt or just type "make".
93+

02QuickInstall.txt

+118
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,118 @@
1+
Quick installation for GotoBLAS2
2+
3+
***************************************************************************
4+
***************************************************************************
5+
** **
6+
** **
7+
** Just type "make" <<return>>. **
8+
** **
9+
** If you're not satisfied with this library, **
10+
** please read following instruction and customize it. **
11+
** **
12+
** **
13+
***************************************************************************
14+
***************************************************************************
15+
16+
17+
1. REALLY REALLY quick way to build library
18+
19+
Type "make" or "gmake".
20+
21+
$shell> make
22+
23+
The script will detect Fortran compiler, number of cores and
24+
architecture which you're using. If default gcc binary type is
25+
64bit, 64 bit library will be created. Otherwise 32 bit library
26+
will be created.
27+
28+
After finishing compile, you'll find various information about
29+
generated library.
30+
31+
=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=
32+
33+
GotoBLAS2 build complete.
34+
35+
OS ... Linux
36+
Architecture ... x86_64
37+
BINARY ... 64bit
38+
C compiler ... GCC (command line : gcc)
39+
Fortran compiler ... PATHSCALE (command line : pathf90)
40+
Library Name ... libgoto_barcelonap-r1.27.a (Multi threaded; Max
41+
num-threads is 16)
42+
43+
=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=
44+
45+
46+
2. Specifying 32bit or 64bit library
47+
48+
If you need 32bit binary,
49+
50+
$shell> make BINARY=32
51+
52+
If you need 64bit binary,
53+
54+
$shell> make BINARY=64
55+
56+
57+
3. Specifying target architecture
58+
59+
If you need library for different architecture, you can use TARGET
60+
option. You can find current available options in top of getarch.c.
61+
For example, if you need library for Intel core2 architecture,
62+
you'll find FORCE_CORE2 option in getarch.c. Therefore you can
63+
specify TARGET=CORE2 (get rid of FORCE_) with make.
64+
65+
$shell> make TARGET=CORE2
66+
67+
Also if you want GotoBLAS2 to support multiple architecture,
68+
69+
$shell> make DYNAMIC_ARCH=1
70+
71+
All kernel will be included in the library and dynamically switched
72+
the best architecutre at run time.
73+
74+
75+
4. Specifying for enabling multi-threaded
76+
77+
Script will detect number of cores and will enable multi threaded
78+
library if number of cores is more than two. If you still want to
79+
create single threaded library,
80+
81+
$shell> make USE_THREAD=0
82+
83+
Or if you need threaded library by force,
84+
85+
$shell> make USE_THREAD=1
86+
87+
88+
5. Specifying target OS
89+
90+
Target architecture will be determined by the CC. If you
91+
specify cross compiler for MIPS, you can create library for
92+
MIPS architecture.
93+
94+
$shell> make CC=mips64el-linux-gcc TARGET=SICORTEX
95+
96+
Or you can specify your favorite C compiler with absolute path.
97+
98+
$shell> make CC=/opt/intel/cc/32/10.0.026/bin/icc TARGET=BARCELONA
99+
100+
Binary type (32bit/64bit) is determined by checking CC, you
101+
can control binary type with this option.
102+
103+
$shell> make CC="pathcc -m32"
104+
105+
In this case, 32bit library will be created.
106+
107+
108+
6. Specifying Fortran compiler
109+
110+
If you need to support other Fortran compiler, you can specify with
111+
FC option.
112+
113+
$shell> make FC=gfortran
114+
115+
116+
7. Other useful options
117+
118+
You'll find other useful options in Makefile.rule.

03FAQ.txt

+119
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,119 @@
1+
GotoBLAS2 FAQ
2+
3+
1. General
4+
5+
1.1 Q Can I find useful paper about GotoBLAS2?
6+
7+
A You may check following URL.
8+
9+
http://www.cs.utexas.edu/users/flame/Publications/index.htm
10+
11+
11. Kazushige Goto and Robert A. van de Geijn, " Anatomy of
12+
High-Performance Matrix Multiplication," ACM Transactions on
13+
Mathematical Software, accepted.
14+
15+
15. Kazushige Goto and Robert van de Geijn, "High-Performance
16+
Implementation of the Level-3 BLAS." ACM Transactions on
17+
Mathematical Software, submitted.
18+
19+
20+
1.2 Q Does GotoBLAS2 work with Hyperthread (SMT)?
21+
22+
A Yes, it will work. GotoBLAS2 detects Hyperthread and
23+
avoid scheduling on the same core.
24+
25+
26+
1.3 Q When I type "make", following error occured. What's wrong?
27+
28+
$shell> make
29+
"./Makefile.rule", line 58: Missing dependency operator
30+
"./Makefile.rule", line 61: Need an operator
31+
...
32+
33+
A This error occurs because you didn't use GNU make. Some binary
34+
packages install GNU make as "gmake" and it's worth to try.
35+
36+
37+
1.4 Q Function "xxx" is slow. Why?
38+
39+
A Generally GotoBLAS2 has many well optimized functions, but it's
40+
far and far from perfect. Especially Level 1/2 function
41+
performance depends on how you call BLAS. You should understand
42+
what happends between your function and GotoBLAS2 by using profile
43+
enabled version or hardware performance counter. Again, please
44+
don't regard GotoBLAS2 as a black box.
45+
46+
47+
1.5 Q I have a commercial C compiler and want to compile GotoBLAS2 with
48+
it. Is it possible?
49+
50+
A All function that affects performance is written in assembler
51+
and C code is just used for wrapper of assembler functions or
52+
complicated functions. Also I use many inline assembler functions,
53+
unfortunately most of commercial compiler can't handle inline
54+
assembler. Therefore you should use gcc.
55+
56+
57+
1.6 Q I use OpenMP compiler. How can I use GotoBLAS2 with it?
58+
59+
A Please understand that OpenMP is a compromised method to use
60+
thread. If you want to use OpenMP based code with GotoBLAS2, you
61+
should enable "USE_OPENMP=1" in Makefile.rule.
62+
63+
64+
1.7 Q Could you tell me how to use profiled library?
65+
66+
A You need to build and link your application with -pg
67+
option. After executing your application, "gmon.out" is
68+
generated in your current directory.
69+
70+
$shell> gprof <your application name> gmon.out
71+
72+
Each sample counts as 0.01 seconds.
73+
% cumulative self self total
74+
time seconds seconds calls Ks/call Ks/call name
75+
89.86 975.02 975.02 79317 0.00 0.00 .dgemm_kernel
76+
4.19 1020.47 45.45 40 0.00 0.00 .dlaswp00N
77+
2.28 1045.16 24.69 2539 0.00 0.00 .dtrsm_kernel_LT
78+
1.19 1058.03 12.87 79317 0.00 0.00 .dgemm_otcopy
79+
1.05 1069.40 11.37 4999 0.00 0.00 .dgemm_oncopy
80+
....
81+
82+
I think profiled BLAS library is really useful for your
83+
research. Please find bottleneck of your application and
84+
improve it.
85+
86+
1.8 Q Is number of thread limited?
87+
88+
A Basically, there is no limitation about number of threads. You
89+
can specify number of threads as many as you want, but larger
90+
number of threads will consume extra resource. I recommend you to
91+
specify minimum number of threads.
92+
93+
94+
2. Architecture Specific issue or Implementation
95+
96+
2.1 Q GotoBLAS2 seems to support any combination with OS and
97+
architecture. Is it possible?
98+
99+
A Combination is limited by current OS and architecture. For
100+
examble, the combination OSX with SPARC is impossible. But it
101+
will be possible with slight modification if these combination
102+
appears in front of us.
103+
104+
105+
2.2 Q I have POWER architecture systems. Do I need extra work?
106+
107+
A Although POWER architecture defined special instruction
108+
like CPUID to detect correct architecture, it's privileged
109+
and can't be accessed by user process. So you have to set
110+
the architecture that you have manually in getarch.c.
111+
112+
113+
2.3 Q I can't create DLL on Cygwin (Error 53). What's wrong?
114+
115+
A You have to make sure if lib.exe and mspdb80.dll are in Microsoft
116+
Studio PATH. The easiest way is to use 'which' command.
117+
118+
$shell> which lib.exe
119+
/cygdrive/c/Program Files/Microsoft Visual Studio/VC98/bin/lib.exe

04Windows64bit.txt

+13
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,13 @@
1+
2+
Quick guide to build library for Windows 64bit.
3+
4+
1. What you need
5+
6+
a. Windows Server 2003 or later
7+
b. Cygwin environment(make, gcc, g77, perl, sed, wget)
8+
c. MinGW64 compiler
9+
d. Microsoft Visual Studio (lib.exe and mspdb80.dll are required to create dll)
10+
11+
2. Do ./quickbuild.win64
12+
13+
Good luck

0 commit comments

Comments
 (0)