Skip to content

Commit 24d4291

Browse files
committed
[CSSPGO] Pseudo probes for function calls.
An indirect call site needs to be probed for its potential call targets. With CSSPGO a direct call also needs a probe so that a calling context can be represented by a stack of callsite probes. Unlike pseudo probes for basic blocks that are in form of standalone intrinsic call instructions, pseudo probes for callsites have to be attached to the call instruction, thus a separate instruction would not work. One possible way of attaching a probe to a call instruction is to use a special metadata that carries information about the probe. The special metadata will have to make its way through the optimization pipeline down to object emission. This requires additional efforts to maintain the metadata in various places. Given that the `!dbg` metadata is a first-class metadata and has all essential support in place , leveraging the `!dbg` metadata as a channel to encode pseudo probe information is probably the easiest solution. With the requirement of not inflating `!dbg` metadata that is allocated for almost every instruction, we found that the 32-bit DWARF discriminator field which mainly serves AutoFDO can be reused for pseudo probes. DWARF discriminators distinguish identical source locations between instructions and with pseudo probes such support is not required. In this change we are using the discriminator field to encode the ID and type of a callsite probe and the encoded value will be unpacked and consumed right before object emission. When a callsite is inlined, the callsite discriminator field will go with the inlined instructions. The `!dbg` metadata of an inlined instruction is in form of a scope stack. The top of the stack is the instruction's original `!dbg` metadata and the bottom of the stack is for the original callsite of the top-level inliner. Except for the top of the stack, all other elements of the stack actually refer to the nested inlined callsites whose discriminator field (which actually represents a calliste probe) can be used together to represent the inline context of an inlined PseudoProbeInst or CallInst. To avoid collision with the baseline AutoFDO in various places that handles dwarf discriminators where a check against the `-pseudo-probe-for-profiling` switch is not available, a special encoding scheme is used to tell apart a pseudo probe discriminator from a regular discriminator. For the regular discriminator, if all lowest 3 bits are non-zero, it means the discriminator is basically empty and all higher 29 bits can be reversed for pseudo probe use. Callsite pseudo probes are inserted in `SampleProfileProbePass` and a target-independent MIR pass `PseudoProbeInserter` is added to unpack the probe ID/type from `!dbg`. Note that with this work the switch -debug-info-for-profiling will not work with -pseudo-probe-for-profiling anymore. They cannot be used at the same time. Reviewed By: wmi Differential Revision: https://reviews.llvm.org/D91756
1 parent dad5d95 commit 24d4291

File tree

17 files changed

+292
-19
lines changed

17 files changed

+292
-19
lines changed

Diff for: clang/lib/CodeGen/BackendUtil.cpp

+1
Original file line numberDiff line numberDiff line change
@@ -555,6 +555,7 @@ static bool initTargetOptions(DiagnosticsEngine &Diags,
555555
Options.ForceDwarfFrameSection = CodeGenOpts.ForceDwarfFrameSection;
556556
Options.EmitCallSiteInfo = CodeGenOpts.EmitCallSiteInfo;
557557
Options.EnableAIXExtendedAltivecABI = CodeGenOpts.EnableAIXExtendedAltivecABI;
558+
Options.PseudoProbeForProfiling = CodeGenOpts.PseudoProbeForProfiling;
558559
Options.ValueTrackingVariableLocations =
559560
CodeGenOpts.ValueTrackingVariableLocations;
560561
Options.XRayOmitFunctionIndex = CodeGenOpts.XRayOmitFunctionIndex;

Diff for: llvm/include/llvm/CodeGen/CommandFlags.h

+2
Original file line numberDiff line numberDiff line change
@@ -127,6 +127,8 @@ bool getEnableMachineFunctionSplitter();
127127

128128
bool getEnableDebugEntryValues();
129129

130+
bool getPseudoProbeForProfiling();
131+
130132
bool getValueTrackingVariableLocations();
131133

132134
bool getForceDwarfFrameSection();

Diff for: llvm/include/llvm/CodeGen/Passes.h

+3
Original file line numberDiff line numberDiff line change
@@ -475,6 +475,9 @@ namespace llvm {
475475
/// Create Hardware Loop pass. \see HardwareLoops.cpp
476476
FunctionPass *createHardwareLoopsPass();
477477

478+
/// This pass inserts pseudo probe annotation for callsite profiling.
479+
FunctionPass *createPseudoProbeInserter();
480+
478481
/// Create IR Type Promotion pass. \see TypePromotion.cpp
479482
FunctionPass *createTypePromotionPass();
480483

Diff for: llvm/include/llvm/IR/DebugInfoMetadata.h

+12
Original file line numberDiff line numberDiff line change
@@ -1698,6 +1698,18 @@ class DILocation : public MDNode {
16981698

16991699
inline unsigned getDiscriminator() const;
17001700

1701+
// For the regular discriminator, it stands for all empty components if all
1702+
// the lowest 3 bits are non-zero and all higher 29 bits are unused(zero by
1703+
// default). Here we fully leverage the higher 29 bits for pseudo probe use.
1704+
// This is the format:
1705+
// [2:0] - 0x7
1706+
// [31:3] - pseudo probe fields guaranteed to be non-zero as a whole
1707+
// So if the lower 3 bits is non-zero and the others has at least one
1708+
// non-zero bit, it guarantees to be a pseudo probe discriminator
1709+
inline static bool isPseudoProbeDiscriminator(unsigned Discriminator) {
1710+
return ((Discriminator & 0x7) == 0x7) && (Discriminator & 0xFFFFFFF8);
1711+
}
1712+
17011713
/// Returns a new DILocation with updated \p Discriminator.
17021714
inline const DILocation *cloneWithDiscriminator(unsigned Discriminator) const;
17031715

Diff for: llvm/include/llvm/IR/PseudoProbe.h

+52
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,52 @@
1+
//===- PseudoProbe.h - Pseudo Probe IR Helpers ------------------*- C++ -*-===//
2+
//
3+
// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
4+
// See https://llvm.org/LICENSE.txt for license information.
5+
// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
6+
//
7+
//===----------------------------------------------------------------------===//
8+
//
9+
// Pseudo probe IR intrinsic and dwarf discriminator manipulation routines.
10+
//
11+
//===----------------------------------------------------------------------===//
12+
13+
#ifndef LLVM_IR_PSEUDOPROBE_H
14+
#define LLVM_IR_PSEUDOPROBE_H
15+
16+
#include <cassert>
17+
#include <cstdint>
18+
19+
namespace llvm {
20+
21+
enum class PseudoProbeType { Block = 0, IndirectCall, DirectCall };
22+
23+
struct PseudoProbeDwarfDiscriminator {
24+
// The following APIs encodes/decodes per-probe information to/from a
25+
// 32-bit integer which is organized as:
26+
// [2:0] - 0x7, this is reserved for regular discriminator,
27+
// see DWARF discriminator encoding rule
28+
// [18:3] - probe id
29+
// [25:19] - reserved
30+
// [28:26] - probe type, see PseudoProbeType
31+
// [31:29] - reserved for probe attributes
32+
static uint32_t packProbeData(uint32_t Index, uint32_t Type) {
33+
assert(Index <= 0xFFFF && "Probe index too big to encode, exceeding 2^16");
34+
assert(Type <= 0x7 && "Probe type too big to encode, exceeding 7");
35+
return (Index << 3) | (Type << 26) | 0x7;
36+
}
37+
38+
static uint32_t extractProbeIndex(uint32_t Value) {
39+
return (Value >> 3) & 0xFFFF;
40+
}
41+
42+
static uint32_t extractProbeType(uint32_t Value) {
43+
return (Value >> 26) & 0x7;
44+
}
45+
46+
static uint32_t extractProbeAttributes(uint32_t Value) {
47+
return (Value >> 29) & 0x7;
48+
}
49+
};
50+
} // end namespace llvm
51+
52+
#endif // LLVM_IR_PSEUDOPROBE_H

Diff for: llvm/include/llvm/InitializePasses.h

+1
Original file line numberDiff line numberDiff line change
@@ -361,6 +361,7 @@ void initializeProfileSummaryInfoWrapperPassPass(PassRegistry&);
361361
void initializePromoteLegacyPassPass(PassRegistry&);
362362
void initializePruneEHPass(PassRegistry&);
363363
void initializeRABasicPass(PassRegistry&);
364+
void initializePseudoProbeInserterPass(PassRegistry &);
364365
void initializeRAGreedyPass(PassRegistry&);
365366
void initializeReachingDefAnalysisPass(PassRegistry&);
366367
void initializeReassociateLegacyPassPass(PassRegistry&);

Diff for: llvm/include/llvm/Passes/PassBuilder.h

+8
Original file line numberDiff line numberDiff line change
@@ -63,6 +63,14 @@ struct PGOOptions {
6363
// PseudoProbeForProfiling needs to be true.
6464
assert(this->Action != NoAction || this->CSAction != NoCSAction ||
6565
this->DebugInfoForProfiling || this->PseudoProbeForProfiling);
66+
67+
// Pseudo probe emission does work with -fdebug-info-for-profiling since
68+
// they both use the discriminator field of debug lines but for different
69+
// purposes.
70+
if (this->DebugInfoForProfiling && this->PseudoProbeForProfiling) {
71+
report_fatal_error(
72+
"Pseudo probes cannot be used with -debug-info-for-profiling", false);
73+
}
6674
}
6775
std::string ProfileFile;
6876
std::string CSProfileGenFile;

Diff for: llvm/include/llvm/Target/TargetOptions.h

+5-2
Original file line numberDiff line numberDiff line change
@@ -138,8 +138,8 @@ namespace llvm {
138138
EnableMachineFunctionSplitter(false), SupportsDefaultOutlining(false),
139139
EmitAddrsig(false), EmitCallSiteInfo(false),
140140
SupportsDebugEntryValues(false), EnableDebugEntryValues(false),
141-
ValueTrackingVariableLocations(false), ForceDwarfFrameSection(false),
142-
XRayOmitFunctionIndex(false),
141+
PseudoProbeForProfiling(false), ValueTrackingVariableLocations(false),
142+
ForceDwarfFrameSection(false), XRayOmitFunctionIndex(false),
143143
FPDenormalMode(DenormalMode::IEEE, DenormalMode::IEEE) {}
144144

145145
/// DisableFramePointerElim - This returns true if frame pointer elimination
@@ -309,6 +309,9 @@ namespace llvm {
309309
/// production.
310310
bool ShouldEmitDebugEntryValues() const;
311311

312+
/// Emit pseudo probes into the binary for sample profiling
313+
unsigned PseudoProbeForProfiling : 1;
314+
312315
// When set to true, use experimental new debug variable location tracking,
313316
// which seeks to follow the values of variables rather than their location,
314317
// post isel.

Diff for: llvm/include/llvm/Transforms/IPO/SampleProfileProbe.h

+7-1
Original file line numberDiff line numberDiff line change
@@ -17,6 +17,7 @@
1717

1818
#include "llvm/ADT/DenseMap.h"
1919
#include "llvm/IR/PassManager.h"
20+
#include "llvm/IR/PseudoProbe.h"
2021
#include "llvm/Target/TargetMachine.h"
2122
#include <unordered_map>
2223

@@ -25,10 +26,10 @@ namespace llvm {
2526
class Module;
2627

2728
using BlockIdMap = std::unordered_map<BasicBlock *, uint32_t>;
29+
using InstructionIdMap = std::unordered_map<Instruction *, uint32_t>;
2830

2931
enum class PseudoProbeReservedId { Invalid = 0, Last = Invalid };
3032

31-
enum class PseudoProbeType { Block = 0 };
3233

3334
/// Sample profile pseudo prober.
3435
///
@@ -42,13 +43,18 @@ class SampleProfileProber {
4243
private:
4344
Function *getFunction() const { return F; }
4445
uint32_t getBlockId(const BasicBlock *BB) const;
46+
uint32_t getCallsiteId(const Instruction *Call) const;
4547
void computeProbeIdForBlocks();
48+
void computeProbeIdForCallsites();
4649

4750
Function *F;
4851

4952
/// Map basic blocks to the their pseudo probe ids.
5053
BlockIdMap BlockProbeIds;
5154

55+
/// Map indirect calls to the their pseudo probe ids.
56+
InstructionIdMap CallProbeIds;
57+
5258
/// The ID of the last probe, Can be used to number a new probe.
5359
uint32_t LastProbeId;
5460
};

Diff for: llvm/lib/CodeGen/CMakeLists.txt

+1
Original file line numberDiff line numberDiff line change
@@ -122,6 +122,7 @@ add_llvm_component_library(LLVMCodeGen
122122
PreISelIntrinsicLowering.cpp
123123
ProcessImplicitDefs.cpp
124124
PrologEpilogInserter.cpp
125+
PseudoProbeInserter.cpp
125126
PseudoSourceValue.cpp
126127
RDFGraph.cpp
127128
RDFLiveness.cpp

Diff for: llvm/lib/CodeGen/CommandFlags.cpp

+7
Original file line numberDiff line numberDiff line change
@@ -91,6 +91,7 @@ CGOPT(bool, EnableAddrsig)
9191
CGOPT(bool, EmitCallSiteInfo)
9292
CGOPT(bool, EnableMachineFunctionSplitter)
9393
CGOPT(bool, EnableDebugEntryValues)
94+
CGOPT(bool, PseudoProbeForProfiling)
9495
CGOPT(bool, ValueTrackingVariableLocations)
9596
CGOPT(bool, ForceDwarfFrameSection)
9697
CGOPT(bool, XRayOmitFunctionIndex)
@@ -434,6 +435,11 @@ codegen::RegisterCodeGenFlags::RegisterCodeGenFlags() {
434435
cl::init(false));
435436
CGBINDOPT(EnableDebugEntryValues);
436437

438+
static cl::opt<bool> PseudoProbeForProfiling(
439+
"pseudo-probe-for-profiling", cl::desc("Emit pseudo probes for AutoFDO"),
440+
cl::init(false));
441+
CGBINDOPT(PseudoProbeForProfiling);
442+
437443
static cl::opt<bool> ValueTrackingVariableLocations(
438444
"experimental-debug-variable-locations",
439445
cl::desc("Use experimental new value-tracking variable locations"),
@@ -548,6 +554,7 @@ codegen::InitTargetOptionsFromCodeGenFlags(const Triple &TheTriple) {
548554
Options.EmitAddrsig = getEnableAddrsig();
549555
Options.EmitCallSiteInfo = getEmitCallSiteInfo();
550556
Options.EnableDebugEntryValues = getEnableDebugEntryValues();
557+
Options.PseudoProbeForProfiling = getPseudoProbeForProfiling();
551558
Options.ValueTrackingVariableLocations = getValueTrackingVariableLocations();
552559
Options.ForceDwarfFrameSection = getForceDwarfFrameSection();
553560
Options.XRayOmitFunctionIndex = getXRayOmitFunctionIndex();

Diff for: llvm/lib/CodeGen/PseudoProbeInserter.cpp

+95
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,95 @@
1+
//===- PseudoProbeInserter.cpp - Insert annotation for callsite profiling -===//
2+
//
3+
// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
4+
// See https://llvm.org/LICENSE.txt for license information.
5+
// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
6+
//
7+
//===----------------------------------------------------------------------===//
8+
//
9+
// This file implements PseudoProbeInserter pass, which inserts pseudo probe
10+
// annotations for call instructions with a pseudo-probe-specific dwarf
11+
// discriminator. such discriminator indicates that the call instruction comes
12+
// with a pseudo probe, and the discriminator value holds information to
13+
// identify the corresponding counter.
14+
//===----------------------------------------------------------------------===//
15+
16+
#include "llvm/CodeGen/MachineBasicBlock.h"
17+
#include "llvm/CodeGen/MachineFunctionPass.h"
18+
#include "llvm/CodeGen/MachineInstr.h"
19+
#include "llvm/CodeGen/TargetInstrInfo.h"
20+
#include "llvm/IR/DebugInfoMetadata.h"
21+
#include "llvm/IR/PseudoProbe.h"
22+
#include "llvm/InitializePasses.h"
23+
#include "llvm/Target/TargetMachine.h"
24+
#include <unordered_map>
25+
26+
#define DEBUG_TYPE "pseudo-probe-inserter"
27+
28+
using namespace llvm;
29+
30+
namespace {
31+
class PseudoProbeInserter : public MachineFunctionPass {
32+
public:
33+
static char ID;
34+
35+
PseudoProbeInserter() : MachineFunctionPass(ID) {
36+
initializePseudoProbeInserterPass(*PassRegistry::getPassRegistry());
37+
}
38+
39+
StringRef getPassName() const override { return "Pseudo Probe Inserter"; }
40+
41+
void getAnalysisUsage(AnalysisUsage &AU) const override {
42+
AU.setPreservesAll();
43+
MachineFunctionPass::getAnalysisUsage(AU);
44+
}
45+
46+
bool runOnMachineFunction(MachineFunction &MF) override {
47+
const TargetInstrInfo *TII = MF.getSubtarget().getInstrInfo();
48+
bool Changed = false;
49+
for (MachineBasicBlock &MBB : MF) {
50+
for (MachineInstr &MI : MBB) {
51+
if (MI.isCall()) {
52+
if (DILocation *DL = MI.getDebugLoc()) {
53+
auto Value = DL->getDiscriminator();
54+
if (DILocation::isPseudoProbeDiscriminator(Value)) {
55+
BuildMI(MBB, MI, DL, TII->get(TargetOpcode::PSEUDO_PROBE))
56+
.addImm(getFuncGUID(MF.getFunction().getParent(), DL))
57+
.addImm(
58+
PseudoProbeDwarfDiscriminator::extractProbeIndex(Value))
59+
.addImm(
60+
PseudoProbeDwarfDiscriminator::extractProbeType(Value))
61+
.addImm(PseudoProbeDwarfDiscriminator::extractProbeAttributes(
62+
Value));
63+
Changed = true;
64+
}
65+
}
66+
}
67+
}
68+
}
69+
70+
return Changed;
71+
}
72+
73+
private:
74+
uint64_t getFuncGUID(Module *M, DILocation *DL) {
75+
auto *SP = DL->getScope()->getSubprogram();
76+
auto Name = SP->getLinkageName();
77+
if (Name.empty())
78+
Name = SP->getName();
79+
return Function::getGUID(Name);
80+
}
81+
};
82+
} // namespace
83+
84+
char PseudoProbeInserter::ID = 0;
85+
INITIALIZE_PASS_BEGIN(PseudoProbeInserter, DEBUG_TYPE,
86+
"Insert pseudo probe annotations for value profiling",
87+
false, false)
88+
INITIALIZE_PASS_DEPENDENCY(TargetPassConfig)
89+
INITIALIZE_PASS_END(PseudoProbeInserter, DEBUG_TYPE,
90+
"Insert pseudo probe annotations for value profiling",
91+
false, false)
92+
93+
FunctionPass *llvm::createPseudoProbeInserter() {
94+
return new PseudoProbeInserter();
95+
}

Diff for: llvm/lib/CodeGen/SelectionDAG/InstrEmitter.cpp

+2-1
Original file line numberDiff line numberDiff line change
@@ -26,6 +26,7 @@
2626
#include "llvm/CodeGen/TargetSubtargetInfo.h"
2727
#include "llvm/IR/DataLayout.h"
2828
#include "llvm/IR/DebugInfo.h"
29+
#include "llvm/IR/PseudoProbe.h"
2930
#include "llvm/Support/Debug.h"
3031
#include "llvm/Support/ErrorHandling.h"
3132
#include "llvm/Support/MathExtras.h"
@@ -1133,7 +1134,7 @@ EmitSpecialNode(SDNode *Node, bool IsClone, bool IsCloned,
11331134
BuildMI(*MBB, InsertPos, Node->getDebugLoc(), TII->get(TarOp))
11341135
.addImm(Guid)
11351136
.addImm(Index)
1136-
.addImm(0) // 0 for block probes
1137+
.addImm((uint8_t)PseudoProbeType::Block)
11371138
.addImm(Attr);
11381139
break;
11391140
}

Diff for: llvm/lib/CodeGen/TargetPassConfig.cpp

+4
Original file line numberDiff line numberDiff line change
@@ -1040,6 +1040,10 @@ void TargetPassConfig::addMachinePasses() {
10401040
// Add passes that directly emit MI after all other MI passes.
10411041
addPreEmitPass2();
10421042

1043+
// Insert pseudo probe annotation for callsite profiling
1044+
if (TM->Options.PseudoProbeForProfiling)
1045+
addPass(createPseudoProbeInserter());
1046+
10431047
AddingMachinePasses = false;
10441048
}
10451049

Diff for: llvm/lib/Target/X86/X86TargetMachine.cpp

+1
Original file line numberDiff line numberDiff line change
@@ -83,6 +83,7 @@ extern "C" LLVM_EXTERNAL_VISIBILITY void LLVMInitializeX86Target() {
8383
initializeX86LoadValueInjectionRetHardeningPassPass(PR);
8484
initializeX86OptimizeLEAPassPass(PR);
8585
initializeX86PartialReductionPass(PR);
86+
initializePseudoProbeInserterPass(PR);
8687
}
8788

8889
static std::unique_ptr<TargetLoweringObjectFile> createTLOF(const Triple &TT) {

0 commit comments

Comments
 (0)