Skip to content
This repository was archived by the owner on Nov 1, 2021. It is now read-only.

Modify AsmJS SizeType to match Wasm #10

Closed

Conversation

jgravelle-google
Copy link
Contributor

AsmJS uses unsigned ints for size_t, which causes some libcxx functions to have different symbols in AsmJS than with Wasm, which uses unsigned longs.
In principle, long-term, we want AsmJS and Wasm to have the same ABI, so this serves as a first step toward ensuring that they match.

AsmJS uses unsigned ints for size_t, which causes some libcxx functions
to have different symbols in AsmJS than with Wasm, which uses unsigned
longs.
In principle we want AsmJS and Wasm to have the same ABI, so this serves
as a first step toward ensuring that they match.
@kripken
Copy link
Member

kripken commented Aug 30, 2016

What is the benefit for making the change this way versus the reverse?

@jgravelle-google
Copy link
Contributor Author

The two main reasons I know of are 1) TargetInfo defaults to UnsignedLong, so it seems more consistent with more targets, although I don't know how to easily tell how many targets do what. 2) For 64-bit wasm, we'll need to use long (or long long), so keeping it long makes that consistent across asmjs, wasm32, and wasm64, which means they can all have the same symbols.

I'm not fully convinced either of those are particularly compelling. What are some benefits to the reverse?

@kripken
Copy link
Member

kripken commented Aug 30, 2016

Reasonable points, yeah.

A downside is that it means current emscripten users need to rebuild their projects. We can (and must) bump the version number which would force a system lib rebuild, but don't have a solution for user project code. So this could cause confusion and problems.

@dschuff
Copy link
Member

dschuff commented Aug 30, 2016

Aren't we already making them recompile because of the other ABI changes for emscripten-core/emscripten#4340 ? This is no different from those.

@kripken
Copy link
Member

kripken commented Aug 30, 2016

Well, it depends how we resolve that issue :) Last I heard @sunfishcode was going to try out a long double trick.

@kripken
Copy link
Member

kripken commented Aug 30, 2016

@juj, curious what you think the risk is for existing users here.

@dschuff
Copy link
Member

dschuff commented Aug 30, 2016

It's not just long doubles though, is it? The ABIs have to be the same top to bottom if we are going to have compatible bitcode for e.g. the unity use case.

@dschuff
Copy link
Member

dschuff commented Aug 30, 2016

Basically it means that you won't be able to link bitcode object files compiled with 2 different versions of the toolchain. This isn't a common use case in the absence of dynamic linking, people generally would do a clean build if they change their toolchain anyway (and is why the fact that we haven't promised a stable ABI so far is not huge deal). There could be cases where someone is shipping a proprietary bitcode library to users who include it in their builds, where they will have to ensure that the versions match. But those users would also get a big benefit from the asm.js and wasm ABIs matching, so I think this would be good for them on balance anyway.

@kripken
Copy link
Member

kripken commented Aug 30, 2016

Yeah, the risk is people with a bitcode blob of library code, or that happen to have a project where some files haven't been touched in a while so they weren't rebuilt (and their build system doesn't rebuild on a new compiler version). I wish we had a way to warn people about this (like if we stored the ABI hash in bitcode files), but currently don't.

@dschuff
Copy link
Member

dschuff commented Aug 30, 2016

Maybe the first question was the one asked upthread, which is "which one to use?" Using unsigned long makes us the same as ARM (and much of our other ABI surface is like ARM already) and the same as a hypothetical wasm64 (for whatever that's worth; maybe not much). On the other hand unsigned int makes us the same as most other ILP32 targets (i386, X32, NaCl, Mips32, PPC32), so it's not 100% clear.

@kripken
Copy link
Member

kripken commented Aug 30, 2016

One thing I don't get - what would be the benefit for wasm32 to be the same as wasm64 in this respect? Surely they can't be linked anyhow.

@dschuff
Copy link
Member

dschuff commented Aug 30, 2016

Yeah, they can't be linked. But libraries that export mangled names (e.g. libcxx) would export the same set of symbols, which... might be nice?

@jgravelle-google
Copy link
Contributor Author

For what it's worth, I ran into this in the first place because wasm and asm use different symbols, so a test case that only uses operator new(size_t) from libcxx didn't realized it needed to link against libcxx and failed for wasm. This is because we currently check a precomputed libcxx symbols file, which we could extend to symbols-asm, symbols-wasm, and symbols-wasm64, or some other mechanism. But that's one case where identically-named symbols is convenient... assuming there's no other differently-defined things between 64 and 32 bit wasm.

@dschuff
Copy link
Member

dschuff commented Aug 30, 2016

BTW that discrepancy is actually the proximate cause that motivated this change in the first place. Currently asm.js and wasm have different signatures (and therefore different mangled names) for operator new because its argument is size_t but the mangling uses the underlying type. So asm and wasm have a different set of symbols, and because emscripten uses system/lib/libcxx/symbols instead of reading the library directly when linking, we were seeing link failures for wasm.

@kripken
Copy link
Member

kripken commented Aug 30, 2016

I see. Then another option might be to keep asm.js as is, and have different .symbols files for asm.js and wasm.

@dschuff
Copy link
Member

dschuff commented Aug 31, 2016

Right. Or even get rid of precomputed symbols files (although getting the symbols with llvm-nm is slower than nm on other platforms so maybe we don't want to do that). But I thought we discussed previously that we did want the ABIs to match (wasn't that why you filed emscripten-core/emscripten#4340 in the first place? I do think it's a good idea).

@kripken
Copy link
Member

kripken commented Aug 31, 2016

Note that we use the precomputed symbol files for another purpose, to know which system libs need to be built. When a user installs emscripten, there are no system libs, and if they compile a file that needs libc++, we build libc++ based on the symbol file info.

I agree having the same ABI makes sense, but how to do it and when are open questions. Two symbol files might be a temporary solution to unblock us for now. (e.g. maybe we can do that until we find a way to warn people about bitcode incompatibilities).

For my side, I'm mostly waiting to hear from @juj (who is in another time zone), he has more experience than me with game devs using emscripten, and I want to hear his estimate of the risk of an ABI change like this.

@dschuff
Copy link
Member

dschuff commented Aug 31, 2016

Right, makes sense. I think we should move away from building target libraries on demand, and just ship a sysroot, but that's a discussion for another bug :)

So I pinged Roland for advice about int vs long (turns out I was wrong about EABI ARM, it's unsigned int too), and while there's (as we thought) not too many reasons to use one or the other on ILP32, he suggested unsigned int. So maybe we can just switch wasm to match all the other ILP32 targets in this case.

@jgravelle-google
Copy link
Contributor Author

Which winds up being a smaller change overall because we won't need to do anything with symbols and wasm will just match it.
We'll need to eventually figure out what to do with symbols for wasm64, but there's at least two other ways to handle that so it's a nonissue.

I'm ok with closing this then.

@kripken
Copy link
Member

kripken commented Aug 31, 2016

cc @sunfishcode

@dschuff
Copy link
Member

dschuff commented Aug 31, 2016

Per our discussion about the broader merging of the asm.js and wasm ABI, triple, etc: we eventually want to merge the ABIs. Also it might be nice to have the same symbols for wasm32, wasm64 and asm.js, which would mean unsigned long for size_t. But there are a few steps between there and here, and we don't want a bunch of incremental breakage of the asm.js ABI in the meantime. So I think the plan is for now to switch the wasm frontend to unsigned long to fix the breakage for now, and then move everything at once later.

@jgravelle-google
Copy link
Contributor Author

Switching to wasm frontend using unsigned ints here: https://reviews.llvm.org/D24134

juj pushed a commit that referenced this pull request Jan 31, 2018
Summary:
The test being added in this patch used to cause an assertion failure:

/build/./bin/clang -cc1 -internal-isystem /build/lib/clang/5.0.0/include -nostdsysteminc -verify -fsyntax-only -std=c++11 -Wshadow-all /src/tools/clang/test/SemaCXX/warn-shadow.cpp
--
Exit Code: 134

Command Output (stderr):
--
clang: /src/tools/clang/lib/AST/ASTDiagnostic.cpp:424: void clang::FormatASTNodeDiagnosticArgument(DiagnosticsEngine::ArgumentKind, intptr_t, llvm::StringRef, llvm::StringRef, ArrayRef<DiagnosticsEngine::ArgumentValue>, SmallVectorImpl<char> &, void *, ArrayRef<intptr_t>): Assertion `isa<NamedDecl>(DC) && "Expected a NamedDecl"' failed.
#0 0x0000000001c7a1b4 PrintStackTraceSignalHandler(void*) (/build/./bin/clang+0x1c7a1b4)
#1 0x0000000001c7a4e6 SignalHandler(int) (/build/./bin/clang+0x1c7a4e6)
#2 0x00007f30880078d0 __restore_rt (/lib/x86_64-linux-gnu/libpthread.so.0+0xf8d0)
#3 0x00007f3087054067 gsignal (/lib/x86_64-linux-gnu/libc.so.6+0x35067)
#4 0x00007f3087055448 abort (/lib/x86_64-linux-gnu/libc.so.6+0x36448)
#5 0x00007f308704d266 (/lib/x86_64-linux-gnu/libc.so.6+0x2e266)
#6 0x00007f308704d312 (/lib/x86_64-linux-gnu/libc.so.6+0x2e312)
#7 0x00000000035b7f22 clang::FormatASTNodeDiagnosticArgument(clang::DiagnosticsEngine::ArgumentKind, long, llvm::StringRef, llvm::StringRef, llvm::ArrayRef<std::pair<clang::DiagnosticsEngine::ArgumentKind, long> >, llvm::SmallVectorImpl<char>&, void*, llvm::ArrayRef<long>) (/build/
./bin/clang+0x35b7f22)
#8 0x0000000001ddbae4 clang::Diagnostic::FormatDiagnostic(char const*, char const*, llvm::SmallVectorImpl<char>&) const (/build/./bin/clang+0x1ddbae4)
#9 0x0000000001ddb323 clang::Diagnostic::FormatDiagnostic(char const*, char const*, llvm::SmallVectorImpl<char>&) const (/build/./bin/clang+0x1ddb323)
#10 0x00000000022878a4 clang::TextDiagnosticBuffer::HandleDiagnostic(clang::DiagnosticsEngine::Level, clang::Diagnostic const&) (/build/./bin/clang+0x22878a4)
#11 0x0000000001ddf387 clang::DiagnosticIDs::ProcessDiag(clang::DiagnosticsEngine&) const (/build/./bin/clang+0x1ddf387)
#12 0x0000000001dd9dea clang::DiagnosticsEngine::EmitCurrentDiagnostic(bool) (/build/./bin/clang+0x1dd9dea)
#13 0x0000000002cad00c clang::Sema::EmitCurrentDiagnostic(unsigned int) (/build/./bin/clang+0x2cad00c)
#14 0x0000000002d91cd2 clang::Sema::CheckShadow(clang::NamedDecl*, clang::NamedDecl*, clang::LookupResult const&) (/build/./bin/clang+0x2d91cd2)

Stack dump:
0.      Program arguments: /build/./bin/clang -cc1 -internal-isystem /build/lib/clang/5.0.0/include -nostdsysteminc -verify -fsyntax-only -std=c++11 -Wshadow-all /src/tools/clang/test/SemaCXX/warn-shadow.cpp
1.      /src/tools/clang/test/SemaCXX/warn-shadow.cpp:214:23: current parser token ';'
2.      /src/tools/clang/test/SemaCXX/warn-shadow.cpp:213:26: parsing function body 'handleLinkageSpec'
3.      /src/tools/clang/test/SemaCXX/warn-shadow.cpp:213:26: in compound statement ('{}')
/build/tools/clang/test/SemaCXX/Output/warn-shadow.cpp.script: line 1: 15595 Aborted                 (core dumped) /build/./bin/clang -cc1 -internal-isystem /build/lib/clang/5.0.0/include -nostdsysteminc -verify -fsyntax-only -std=c++11 -Wshadow-all /src/tools/clang/test/SemaCXX/warn-shadow.cpp

Reviewers: rsmith

Reviewed By: rsmith

Subscribers: krytarowski, cfe-commits

Differential Revision: https://reviews.llvm.org/D33207

git-svn-id: https://llvm.org/svn/llvm-project/cfe/trunk@303325 91177308-0d34-0410-b5e6-96231b3b80d8
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants