-
Notifications
You must be signed in to change notification settings - Fork 351
Split up debug_abbrev and debug_info sections into multiple blocks #4778
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Split up debug_abbrev and debug_info sections into multiple blocks #4778
Conversation
a03fe80 to
867bd94
Compare
|
@swift-ci please test |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this looks pretty reasonable.
llvm/include/llvm/ExecutionEngine/JITLink/DWARFRecordSectionSplitter.h
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why is this copy operation necessary?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Splitting will introduce new blocks, which would invalidate iterators in the range-based for below.
It looks like this has removed the pre-built caching though? I don't think we want to do that (or at least it couldn't be upstreamed) -- it'd affect performance for performance-sensitive operations like splitting eh-frames.
llvm/lib/ExecutionEngine/JITLink/DWARFRecordSectionSplitter.cpp
Outdated
Show resolved
Hide resolved
llvm/lib/ExecutionEngine/JITLink/DWARFRecordSectionSplitter.cpp
Outdated
Show resolved
Hide resolved
llvm/lib/ExecutionEngine/JITLink/DWARFRecordSectionSplitter.cpp
Outdated
Show resolved
Hide resolved
llvm/lib/ExecutionEngine/JITLink/DWARFRecordSectionSplitter.cpp
Outdated
Show resolved
Hide resolved
|
Can you provide some more details on DWARF abbreviations, and how this patch aims to use/support them? Why does this functionality need to be merged with the record splitter, rather than being run on the graph as a separate pass before or after splitting? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Splitting will introduce new blocks, which would invalidate iterators in the range-based for below.
It looks like this has removed the pre-built caching though? I don't think we want to do that (or at least it couldn't be upstreamed) -- it'd affect performance for performance-sensitive operations like splitting eh-frames.
|
@lhames Hi Lang, the debug_abbrev section contains abbreviations for every compile unit. According to the DWARF5 standard: The abbreviations table describes the formats of the entries in the entry pool. Like the DWARF abbreviations table in the .debug_abbrev section, it defines one or more abbreviation codes. Each abbreviation code provides a DWARF tag value followed by a list of pairs that defines an attribute and form code used by entries with that abbreviation code. The abbreviations essentially describe what a specific compile unit's raw bytes mean, as I understand it. Maybe @adrian-prantl can also pitch in here. To split up debug info effectively in the CAS, we decided to make it so that every function gets its own compile unit in the debug_info section. To see true deduplication, however, we would need to split up the abbreviation section so that every compile unit has its own, unique contribution into the debug_abbrev section. If we have a function such as You are right however, the If you think, it is a better design to not have it there, I can certainly move it. Thanks! |
867bd94 to
6c291ca
Compare
|
@lhames @cachemeifyoucan @benlangmuir @akyrtzi This patch has been in the works for a long time, and has been changed significantly. Splitting the blocks in the JITLink graph is not enough, because for a function defined in a header file, the abbreviation offset for its compile unit can be different, which will cause it to not deduplicate. Therefore I fixed that by zeroing out the abbrev_offset field in the block and then adding a reference from the debug_info block to its corresponding abbreviation block. Could you all please take a look and review the patch? Thank you! |
|
@swift-ci please test |
…d add options to control which debug information blocks to split
6c291ca to
319abbc
Compare
|
@swift-ci please test |
From my read this pass is now running in one of three modes, depending on the state of
These three modes don't share any code as far as I can tell, so I think it makes more sense to break the new code out into its own pass. |
…ebug_abbrev split code out of DWARFRecordSectionSplitter
|
@swift-ci please test |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would like to see you create a new kind of node in the ObjectFormatSchema that is not BlockRef and use that node to create DebugInfo block. Then you can get the stats better since it will be a separate entry in the stats.
| uint64_t AlignmentOffset, Optional<StringRef> Content, | ||
| ArrayRef<Fixup> Fixups, SmallVectorImpl<char> &Data); | ||
| ArrayRef<Fixup> Fixups, SmallVectorImpl<char> &Data, | ||
| bool IsDebugInfoBlock = false); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think this belong here. The encode function here is simply the helper function how to encode a data block with fixups.
Debug Info Block should be a different block from block ref and the branch to select how to encode debug info block should be above this function:
if (IsDebugInfoBlock)
DebugInfoBlock::encode(...)
else
BlockData::encode(...)
With #4771 clang can now make the abbreviation contributions for each compile be unique, to improve deduplication, we need to add support in llvm-cas-object-format to recognize these unique debug abbreviation contributions and the unique compile units and then split them up to be unique blocks in the cas for deduplication.
There is another issue where the compile units will not deduplicate because the abbrev_offset for the same compile unit (a function defined in a header file, but included at different locations in two .cpp file) will be different. So this patch addresses that by zeroing out the abbrev_offset field in the block and then adding a reference from the debug_info block to its corresponding abbreviation block.
I also added some new options to llvm-cas-object-format to control which blocks to split, this helps Adrian and I determine the kind of improvement we get by splitting different sections.