Split up debug_abbrev and debug_info sections into multiple blocks #4778

rastogishubham · 2022-05-31T22:59:41Z

With #4771 clang can now make the abbreviation contributions for each compile be unique, to improve deduplication, we need to add support in llvm-cas-object-format to recognize these unique debug abbreviation contributions and the unique compile units and then split them up to be unique blocks in the cas for deduplication.

There is another issue where the compile units will not deduplicate because the abbrev_offset for the same compile unit (a function defined in a header file, but included at different locations in two .cpp file) will be different. So this patch addresses that by zeroing out the abbrev_offset field in the block and then adding a reference from the debug_info block to its corresponding abbreviation block.

I also added some new options to llvm-cas-object-format to control which blocks to split, this helps Adrian and I determine the kind of improvement we get by splitting different sections.

rastogishubham · 2022-06-07T20:46:33Z

@swift-ci please test

adrian-prantl

I think this looks pretty reasonable.

llvm/include/llvm/ExecutionEngine/JITLink/DWARFRecordSectionSplitter.h

adrian-prantl · 2022-06-13T21:50:22Z

llvm/lib/ExecutionEngine/JITLink/DWARFRecordSectionSplitter.cpp

Why is this copy operation necessary?

Splitting will introduce new blocks, which would invalidate iterators in the range-based for below.

It looks like this has removed the pre-built caching though? I don't think we want to do that (or at least it couldn't be upstreamed) -- it'd affect performance for performance-sensitive operations like splitting eh-frames.

llvm/lib/ExecutionEngine/JITLink/DWARFRecordSectionSplitter.cpp

lhames · 2022-06-14T00:48:41Z

Can you provide some more details on DWARF abbreviations, and how this patch aims to use/support them?

Why does this functionality need to be merged with the record splitter, rather than being run on the graph as a separate pass before or after splitting?

lhames · 2022-06-14T00:27:35Z

llvm/lib/ExecutionEngine/JITLink/DWARFRecordSectionSplitter.cpp

Splitting will introduce new blocks, which would invalidate iterators in the range-based for below.

It looks like this has removed the pre-built caching though? I don't think we want to do that (or at least it couldn't be upstreamed) -- it'd affect performance for performance-sensitive operations like splitting eh-frames.

rastogishubham · 2022-06-14T17:04:49Z

@lhames Hi Lang, the debug_abbrev section contains abbreviations for every compile unit. According to the DWARF5 standard:

The abbreviations table describes the formats of the entries in the entry pool. Like the DWARF abbreviations table in the .debug_abbrev section, it defines one or more abbreviation codes. Each abbreviation code provides a DWARF tag value followed by a list of pairs that defines an attribute and form code used by entries with that abbreviation code.

The abbreviations essentially describe what a specific compile unit's raw bytes mean, as I understand it. Maybe @adrian-prantl can also pitch in here.

To split up debug info effectively in the CAS, we decided to make it so that every function gets its own compile unit in the debug_info section. To see true deduplication, however, we would need to split up the abbreviation section so that every compile unit has its own, unique contribution into the debug_abbrev section.

If we have a function such as StringRef::copy() every translation unit that includes StringRef.h will have a copy of that function's definition, but we want to make sure the CAS only has one block for it (since every function has its own compile unit), therefore, its abbreviation contribution also has to be separated as a unique block in the CAS which will also be deduplicated.

You are right however, the DWARFRecordSectionSplitter doesn't need to have the code to split the debug_abbrev section, I just put it there, because that is also where we split other DWARF sections, so it just seemed like the logical choice to me.

If you think, it is a better design to not have it there, I can certainly move it. Thanks!

rastogishubham · 2022-07-14T23:00:07Z

@lhames @cachemeifyoucan @benlangmuir @akyrtzi This patch has been in the works for a long time, and has been changed significantly. Splitting the blocks in the JITLink graph is not enough, because for a function defined in a header file, the abbreviation offset for its compile unit can be different, which will cause it to not deduplicate. Therefore I fixed that by zeroing out the abbrev_offset field in the block and then adding a reference from the debug_info block to its corresponding abbreviation block.

Could you all please take a look and review the patch? Thank you!

rastogishubham · 2022-07-14T23:03:49Z

@swift-ci please test

…d add options to control which debug information blocks to split

rastogishubham · 2022-07-19T19:48:19Z

@swift-ci please test

lhames · 2022-07-19T20:55:55Z

You are right however, the DWARFRecordSectionSplitter doesn't need to have the code to split the debug_abbrev section, I just put it there, because that is also where we split other DWARF sections, so it just seemed like the logical choice to me.

If you think, it is a better design to not have it there, I can certainly move it. Thanks!

From my read this pass is now running in one of three modes, depending on the state of AbbrevOffsets:

null-pointer -- Regular DWARF record section split.
Pointer to empty vector -- split debug info section, record abbrev offsets.
Pointer to non-empty vector -- split abbrev section.

These three modes don't share any code as far as I can tell, so I think it makes more sense to break the new code out into its own pass.

…ebug_abbrev split code out of DWARFRecordSectionSplitter

rastogishubham · 2022-07-20T07:37:54Z

@swift-ci please test

cachemeifyoucan

I would like to see you create a new kind of node in the ObjectFormatSchema that is not BlockRef and use that node to create DebugInfo block. Then you can get the stats better since it will be a separate entry in the stats.

cachemeifyoucan · 2022-07-20T17:09:11Z

llvm/include/llvm/CASObjectFormats/Data.h

                     uint64_t AlignmentOffset, Optional<StringRef> Content,
-                     ArrayRef<Fixup> Fixups, SmallVectorImpl<char> &Data);
+                     ArrayRef<Fixup> Fixups, SmallVectorImpl<char> &Data,
+                     bool IsDebugInfoBlock = false);


I don't think this belong here. The encode function here is simply the helper function how to encode a data block with fixups.

Debug Info Block should be a different block from block ref and the branch to select how to encode debug info block should be above this function:

if (IsDebugInfoBlock) DebugInfoBlock::encode(...) else BlockData::encode(...)

rastogishubham force-pushed the DebugAbbrevSplitJITLink branch 2 times, most recently from a03fe80 to 867bd94 Compare June 7, 2022 20:45

rastogishubham requested review from adrian-prantl, akyrtzi, benlangmuir and cachemeifyoucan June 7, 2022 20:46

rastogishubham requested a review from lhames June 13, 2022 18:41

adrian-prantl reviewed Jun 13, 2022

View reviewed changes

lhames self-assigned this Jun 14, 2022

lhames requested changes Jun 14, 2022

View reviewed changes

rastogishubham force-pushed the DebugAbbrevSplitJITLink branch from 867bd94 to 6c291ca Compare July 14, 2022 22:54

rastogishubham changed the title ~~Use DWARFRecordSectionSplitter to split up Debug Abbrev and Debug Info Sections into multiple blocks~~ Split up debug_abbrev and debug_info sections into multiple blocks Jul 14, 2022

rastogishubham requested review from adrian-prantl and lhames July 14, 2022 23:00

rastogishubham mentioned this pull request Jul 15, 2022

Change the --print-cas-tree output to print the Block CAS ID instead of the BlockData CAS ID for the NestedV1 ingestion schema #4972

Merged

rastogishubham added 3 commits July 19, 2022 10:15

Split up debug info and debug abbrev section in the flatv1 schema, an…

b4b1103

…d add options to control which debug information blocks to split

Nestedv1 also splits debug_abbrev and debug_info blocks properly

70a75ef

Fix merge conflict issues

319abbc

rastogishubham force-pushed the DebugAbbrevSplitJITLink branch from 6c291ca to 319abbc Compare July 19, 2022 19:47

Address changes requested in PR review by moving the debug_info and d…

7d23851

…ebug_abbrev split code out of DWARFRecordSectionSplitter

cachemeifyoucan requested changes Jul 20, 2022

View reviewed changes

rastogishubham closed this Sep 15, 2022

Split up debug_abbrev and debug_info sections into multiple blocks #4778

Split up debug_abbrev and debug_info sections into multiple blocks #4778

Uh oh!

Conversation

rastogishubham commented May 31, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

rastogishubham commented Jun 7, 2022

Uh oh!

adrian-prantl left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

adrian-prantl Jun 13, 2022

Choose a reason for hiding this comment

Uh oh!

lhames Jun 14, 2022

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

lhames commented Jun 14, 2022

Uh oh!

lhames Jun 14, 2022

Choose a reason for hiding this comment

Uh oh!

rastogishubham commented Jun 14, 2022

Uh oh!

rastogishubham commented Jul 14, 2022

Uh oh!

rastogishubham commented Jul 14, 2022

Uh oh!

rastogishubham commented Jul 19, 2022

Uh oh!

lhames commented Jul 19, 2022

Uh oh!

rastogishubham commented Jul 20, 2022

Uh oh!

cachemeifyoucan left a comment

Choose a reason for hiding this comment

Uh oh!

cachemeifyoucan Jul 20, 2022

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

rastogishubham commented May 31, 2022 •

edited

Loading