Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Introduce -Zsplit-metadata to split metadata out of rlibs/dylibs #137535

Open
wants to merge 6 commits into
base: master
Choose a base branch
from

Conversation

Kobzol
Copy link
Contributor

@Kobzol Kobzol commented Feb 24, 2025

This is a continuation of #120855 (I was mentored by @bjorn3 to move it forward). Most of the original code was written by bjorn3, I tried to clean it up a bit and add some documentation and tests.

This PR introduces a new unstable compiler flag called -Zsplit-metadata (see #57076 for context). When used, rustc will only store a small metadata stub inside rlibs/dylibs instead of the full metadata, to keep their size smaller. It should be used in combination with --emit=metadata, so that the users of such a compiled library can still read the metadata from the corresponding .rmeta file. This comment shows an example of binary/artifact size wins that can be achieved using this approach.

Contrary to #120855, this PR only introduces the new flag, along with a couple of run-make tests and documentation, but does not yet use it in bootstrap to actually compile rustc. I plan to do that as a follow-up step (along with integration in Cargo, which should ideally just always pass this flag to reduce the size of target directories).

There is one unresolved thing that I wanted to discuss with the reviewer(s) first, and one smaller nit.

Lookup of .rmeta files

With -Zsplit-metadata, when rustc tries to link against a rlib/dylib that only has the stub metadata, it needs to be able to find the corresponding .rmeta file somehow. Currently, there are two codepaths taken in rustc regarding this dependency lookup:

  • If you don't specify --extern, the usual list of possible dependency locations is searched (including -L). The compiler already does this and finds the corresponding .rmeta files. This already works out of the box also with -Zsplit-metadata in this PR.

  • If you do specify --extern, the step above is skipped, and the --extern paths are searched instead. This is not fully supported yet in this PR. What we do now is add a simple heuristic that tries to add a .rmeta file path located in the same directory as the corresponding rlib/dylib file to the search paths. This will probably work for the majority of use-cases, but is not a fully general solution. If the .rmeta file is not found at this location, we should probably invoke the general library search path lookup linked above. However, the locator code is currently structured in a way where this is not completely trivial. So I wanted to ask which approach do you consider to be better:

    • First run find_commandline_library, and store the information whether we only found a stub or not. Then, if we only found a stub, but no .rmeta file, invoke find_library_crate to try to find a corresponding .rmeta file, and use it instead (if found).
    • Refactor the crate locating logic so that find_commandline_library is not exclusive with find_library_crate, so that the library search paths and the --extern paths are first combined together (with --extern probably taking priority), and only then is the final SVH/metadata lookup performed. This would pretty much make this work also with split metadata. I found it quite surprising that when --extern is passed, the -L search logic is completely skipped, so this also looks like a nice cleanup. I'm not sure if it wouldn't have some unexpected behavior change though.

Note that currently it is not possible to pass a full path to the corresponding .rmeta file explicitly, outside of using -L, but I don't think that it should be needed. In theory, we could also literally store the (relative) path to the .rmeta file on disk in the stub metadata header, but that's probably also not ideal, because the .rmeta/rlib/dylib file could move independently of one another.

Interaction with --emit=metadata

Currently, if the user does not use --emit=metadata with -Zsplit-metadata, no metadata will be emitted (there will be no .rmeta file, and the rlib/dylib will only contain a stub). This is most likely a usage error, but in theory it could be a valid use-case (same as we e.g. allow --emit=metadata for binaries, which just produces an empty .rmeta file without any warning). As an alternative, we could emit a warning/error about this, or automatically fill in --emit=metadata when -Zsplit-metadata is used if no --emit flag was specified.

Fixes #23366
Closes #29511
Fixes #57076

Another attempt of #93945 and #120855.

r? @petrochenkov

@rustbot rustbot added A-run-make Area: port run-make Makefiles to rmake.rs S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. T-compiler Relevant to the compiler team, which will review and decide on the PR/issue. labels Feb 24, 2025
@rust-log-analyzer

This comment has been minimized.

@Kobzol
Copy link
Contributor Author

Kobzol commented Feb 24, 2025

(The test failure is the one situation not yet implemented, which is described in the PR description - --extern plus .rmeta in a different directory than the rlib/dylib).

@petrochenkov
Copy link
Contributor

I found it quite surprising that when --extern is passed, the -L search logic is completely skipped

That's by design, --extern with paths disables all the directory search, makes the build system own the input files, and makes rustc just do what the build system tells it.
I'd actually consider this the primary mode of work for rustc, and all the directory search is a sort of legacy (well, except -L dependency=PATH).

--extern supports passing multiple paths for the same crate, to specify all its rmeta, rlib and dylib versions.
So in case of split metadata I'd also expect cargo (or another build system) to pass path to the split rmeta explicitly, and rustc to not do any heuristics.

In #120855 the heuristics were a temporary measure to enable split metadata in bootstrap, while cargo still doesn't support it.
It should be fine to add them in your follow up PR implementing the bootstrap changes as well, but in the long run I don't think they should exist.

@petrochenkov petrochenkov added S-waiting-on-author Status: This is awaiting some action (such as code changes or more information) from the author. and removed S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. labels Mar 13, 2025
@Kobzol
Copy link
Contributor Author

Kobzol commented Mar 13, 2025

Interesting. So if I understood it correctly, you would like to remove all heuristics from rustc (so no lookup of .rmeta in known library search dirs and no ".rmeta in the same directory as .rlib" heuristic) and let Cargo pass essentially both .rlib and .rmeta for every dependency through --extern? I guess that's cleaner in a sense.

On the other hand, it would essentially double the length of the rustc command line, because for every library that you depend on, Cargo would have to pass both .rmeta and .rlib. Do you think that wouldn't be an issue? I suppose that it would only happen for the leaf linked artifact, because the intermediate libraries seem to only receive .rmeta files from Cargo, while the final artifact receives .rlib.

@petrochenkov
Copy link
Contributor

So if I understood it correctly, you would like to remove all heuristics from rustc...

Yes.

On the other hand, it would essentially double the length of the rustc command line, because for every library that you depend on, Cargo would have to pass both .rmeta and .rlib.

My impression was that it is already the case with pipelined builds, but apparently it's indeed only .rmetas for dependencies and only .rlibs for the root crate.
I think it's fine if it happens only for direct dependencies of binary crates, in any case we can reconsider heuristics if there are any complaints about this.

@rustbot
Copy link
Collaborator

rustbot commented Mar 14, 2025

Some changes occurred in compiler/rustc_codegen_ssa

cc @WaffleLapkin

This PR modifies run-make tests.

cc @jieyouxu

@Kobzol
Copy link
Contributor Author

Kobzol commented Mar 14, 2025

Ok, rebased and removed the heuristics. I added an error message when the full metadata is not found, but I'm not sure if we should emit the error eagerly (left a comment in code). Could you please take a look at fe29fec? Thank you!

@rust-log-analyzer
Copy link
Collaborator

The job mingw-check-tidy failed! Check out the build log: (web) (plain)

Click to see the possible cause of the failure (guessed by this bot)
info: removing rustup binaries
info: rustup is uninstalled
##[group]Image checksum input
mingw-check-tidy
# We use the ghcr base image because ghcr doesn't have a rate limit
# and the mingw-check-tidy job doesn't cache docker images in CI.
FROM ghcr.io/rust-lang/ubuntu:22.04

ARG DEBIAN_FRONTEND=noninteractive
RUN apt-get update && apt-get install -y --no-install-recommends \
  g++ \
  make \
---

COPY host-x86_64/mingw-check/validate-toolstate.sh /scripts/
COPY host-x86_64/mingw-check/validate-error-codes.sh /scripts/

# NOTE: intentionally uses python2 for x.py so we can test it still works.
# validate-toolstate only runs in our CI, so it's ok for it to only support python3.
ENV SCRIPT TIDY_PRINT_DIFF=1 python2.7 ../x.py test \
           --stage 0 src/tools/tidy tidyselftest --extra-checks=py,cpp
#
# This file is autogenerated by pip-compile with Python 3.10
# by the following command:
#
#    pip-compile --allow-unsafe --generate-hashes reuse-requirements.in
---
#12 2.798 Building wheels for collected packages: reuse
#12 2.799   Building wheel for reuse (pyproject.toml): started
#12 3.018   Building wheel for reuse (pyproject.toml): finished with status 'done'
#12 3.019   Created wheel for reuse: filename=reuse-4.0.3-cp310-cp310-manylinux_2_35_x86_64.whl size=132720 sha256=0c2fd2aaf7b0bf8d6e131220aff14712a774c2ca462f3204d25460cbcf610b63
#12 3.019   Stored in directory: /tmp/pip-ephem-wheel-cache-p640v34w/wheels/3d/8d/0a/e0fc6aba4494b28a967ab5eaf951c121d9c677958714e34532
#12 3.021 Successfully built reuse
#12 3.022 Installing collected packages: boolean-py, binaryornot, tomlkit, reuse, python-debian, markupsafe, license-expression, jinja2, chardet, attrs
#12 3.429 Successfully installed attrs-23.2.0 binaryornot-0.4.4 boolean-py-4.0 chardet-5.2.0 jinja2-3.1.4 license-expression-30.3.0 markupsafe-2.1.5 python-debian-0.1.49 reuse-4.0.3 tomlkit-0.13.0
#12 3.429 WARNING: Running pip as the 'root' user can result in broken permissions and conflicting behaviour with the system package manager. It is recommended to use a virtual environment instead: https://pip.pypa.io/warnings/venv
#12 3.975 Collecting virtualenv
#12 4.012   Downloading virtualenv-20.29.3-py3-none-any.whl (4.3 MB)
#12 4.095      ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 4.3/4.3 MB 53.7 MB/s eta 0:00:00
#12 4.136 Collecting distlib<1,>=0.3.7
#12 4.140   Downloading distlib-0.3.9-py2.py3-none-any.whl (468 kB)
#12 4.147      ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 469.0/469.0 KB 84.9 MB/s eta 0:00:00
#12 4.186 Collecting filelock<4,>=3.12.2
#12 4.190   Downloading filelock-3.18.0-py3-none-any.whl (16 kB)
#12 4.223 Collecting platformdirs<5,>=3.9.1
#12 4.227   Downloading platformdirs-4.3.6-py3-none-any.whl (18 kB)
#12 4.310 Installing collected packages: distlib, platformdirs, filelock, virtualenv
#12 4.494 Successfully installed distlib-0.3.9 filelock-3.18.0 platformdirs-4.3.6 virtualenv-20.29.3
#12 4.494 WARNING: Running pip as the 'root' user can result in broken permissions and conflicting behaviour with the system package manager. It is recommended to use a virtual environment instead: https://pip.pypa.io/warnings/venv
#12 DONE 4.6s

#13 [7/8] COPY host-x86_64/mingw-check/validate-toolstate.sh /scripts/
#13 DONE 0.0s
---
DirectMap4k:      124864 kB
DirectMap2M:     5117952 kB
DirectMap1G:    13631488 kB
##[endgroup]
Executing TIDY_PRINT_DIFF=1 python2.7 ../x.py test            --stage 0 src/tools/tidy tidyselftest --extra-checks=py,cpp
+ TIDY_PRINT_DIFF=1 python2.7 ../x.py test --stage 0 src/tools/tidy tidyselftest --extra-checks=py,cpp
##[group]Building bootstrap
    Finished `dev` profile [unoptimized] target(s) in 0.05s
##[endgroup]
WARN: currently no CI rustc builds have rustc debug assertions enabled. Please either set `rust.debug-assertions` to `false` if you want to use download CI rustc or set `rust.download-rustc` to `false`.
[TIMING] core::build_steps::tool::LibcxxVersionTool { target: x86_64-unknown-linux-gnu } -- 0.220
---
[TIMING] core::build_steps::tool::Tidy { compiler: Compiler { stage: 0, host: x86_64-unknown-linux-gnu, forced_compiler: false }, target: x86_64-unknown-linux-gnu } -- 0.000
fmt check
fmt: checked 5908 files
tidy check
tidy error: /checkout/compiler/rustc_metadata/messages.ftl: message `metadata_full_metadata_not_found` appears before `metadata_fail_create_file_encoder`, but is alphabetically later than it
run `./x.py test tidy --bless` to sort the file correctly
tidy: Skipping binary file check, read-only filesystem
removing old virtual environment
creating virtual environment at '/checkout/obj/build/venv' using 'python3.10' and 'venv'
creating virtual environment at '/checkout/obj/build/venv' using 'python3.10' and 'virtualenv'
Requirement already satisfied: pip in ./build/venv/lib/python3.10/site-packages (25.0.1)
linting python files
All checks passed!
checking python file formatting
26 files already formatted
checking C++ file formatting
some tidy checks failed
Command has failed. Rerun with -v to see more details.
Build completed unsuccessfully in 0:01:49
  local time: Fri Mar 14 10:56:30 UTC 2025
  network time: Fri, 14 Mar 2025 10:56:30 GMT
##[error]Process completed with exit code 1.
Post job cleanup.

*slot = Some((hash, metadata, lib.clone()));

// Question: maybe we shouldn't error eagerly here, because it is possible that
// e.g. a rlib only contains a stub, but a (later-resolved) dylib contains the full
Copy link
Member

@bjorn3 bjorn3 Mar 14, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

-Zsplit-metadata applies to all outputs. And both rlib and dylib must be compiled by the same rustc invocation to be guaranteed that the crate hash matches (and thus they are considered interchangeable in dependency resolution). As such if you end up with a stub-only rlib and a full dylib it is not guaranteed that the dylib contains correct metadata for the rlib anyway. I don't think we need to support that use case at least until we do guarantee the exact set of cases where two separate rustc invocations result in interchangeable crate metadata.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, that makes sense. Thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-run-make Area: port run-make Makefiles to rmake.rs S-waiting-on-author Status: This is awaiting some action (such as code changes or more information) from the author. T-compiler Relevant to the compiler team, which will review and decide on the PR/issue.
Projects
None yet
5 participants