Skip to content

Commit cef9b96

Browse files
committed
[CSSPGO] Report zero-count probe in profile instead of dangling probes.
Previously dangling samples were represented by INT64_MAX in sample profile while probes never executed were not reported. This was based on an observation that dangling probes were only at a smaller portion than zero-count probes. However, with compiler optimizations, dangling probes end up becoming at large portion of all probes in general and reporting them does not make sense from profile size point of view. This change flips sample reporting by reporting zero-count probes instead. This enabled dangling probe to be represented by none (missing entry in profile). This has a couple benefits: 1. Reducing sample profile size in optimize mode, even when the number of non-executed probes outperform the number of dangling probes, since INT64_MAX takes more space over 0 to encode. 2. Binary size savings. No need to encode dangling probe anymore, since missing probes are treated as dangling in the profile reader. 3. Reducing compiler work to track dangling probes. However, for probes that are real dead and removed, we still need the compiler to identify them so that they can be reported as zero-count, instead of mistreated as dangling probes. 4. Improving counts quality by respecting the counts already collected on the non-dangling copy of a probe. A probe, when duplicated, gets two copies at runtime. If one of them is dangling while the other is not, merging the two probes at profile generation time will cause the real samples collected on the non-dangling one to be discarded. Not reporting the dangling counterpart will keep the real samples. 5. Better readability. 6. Be consistent with non-CS dwarf line number based profile. Zero counts are trusted by the compiler counts inferencer while missing counts will be inferred by the compiler. Note that the current patch does include any work for llvm#3. There will be follow-up changes. For llvm#1, I've seen for a large Facebook service, the text profile is reduced by 7%. For extbinary profile, the size of LBRProfileSection is reduced by 35%. For llvm#4, I have seen general counts quality for SPEC2017 is improved by 10%. Reviewed By: wenlei, wlei, wmi Differential Revision: https://reviews.llvm.org/D104129
1 parent 619bfe8 commit cef9b96

11 files changed

+43
-54
lines changed

llvm/include/llvm/ProfileData/SampleProf.h

+2-18
Original file line numberDiff line numberDiff line change
@@ -598,21 +598,9 @@ class FunctionSamples {
598598
ErrorOr<uint64_t> findSamplesAt(uint32_t LineOffset,
599599
uint32_t Discriminator) const {
600600
const auto &ret = BodySamples.find(LineLocation(LineOffset, Discriminator));
601-
if (ret == BodySamples.end()) {
602-
// For CSSPGO, in order to conserve profile size, we no longer write out
603-
// locations profile for those not hit during training, so we need to
604-
// treat them as zero instead of error here.
605-
if (FunctionSamples::ProfileIsCS || FunctionSamples::ProfileIsProbeBased)
606-
return 0;
601+
if (ret == BodySamples.end())
607602
return std::error_code();
608-
} else {
609-
// Return error for an invalid sample count which is usually assigned to
610-
// dangling probe.
611-
if (FunctionSamples::ProfileIsProbeBased &&
612-
ret->second.getSamples() == FunctionSamples::InvalidProbeCount)
613-
return std::error_code();
614-
return ret->second.getSamples();
615-
}
603+
return ret->second.getSamples();
616604
}
617605

618606
/// Returns the call target map collected at a given location.
@@ -890,10 +878,6 @@ class FunctionSamples {
890878
const DILocation *DIL,
891879
SampleProfileReaderItaniumRemapper *Remapper = nullptr) const;
892880

893-
// The invalid sample count is used to represent samples collected for a
894-
// dangling probe.
895-
static constexpr uint64_t InvalidProbeCount = UINT64_MAX;
896-
897881
static bool ProfileIsProbeBased;
898882

899883
static bool ProfileIsCS;

llvm/lib/ProfileData/ProfileSummaryBuilder.cpp

-2
Original file line numberDiff line numberDiff line change
@@ -113,8 +113,6 @@ void SampleProfileSummaryBuilder::addRecord(
113113
}
114114
for (const auto &I : FS.getBodySamples()) {
115115
uint64_t Count = I.second.getSamples();
116-
if (!sampleprof::FunctionSamples::ProfileIsProbeBased ||
117-
(Count != sampleprof::FunctionSamples::InvalidProbeCount))
118116
addCount(Count);
119117
}
120118
for (const auto &I : FS.getCallsiteSamples())

llvm/lib/ProfileData/SampleProf.cpp

+1-10
Original file line numberDiff line numberDiff line change
@@ -119,16 +119,7 @@ raw_ostream &llvm::sampleprof::operator<<(raw_ostream &OS,
119119
sampleprof_error SampleRecord::merge(const SampleRecord &Other,
120120
uint64_t Weight) {
121121
sampleprof_error Result;
122-
// With pseudo probes, merge a dangling sample with a non-dangling sample
123-
// should result in a dangling sample.
124-
if (FunctionSamples::ProfileIsProbeBased &&
125-
(getSamples() == FunctionSamples::InvalidProbeCount ||
126-
Other.getSamples() == FunctionSamples::InvalidProbeCount)) {
127-
NumSamples = FunctionSamples::InvalidProbeCount;
128-
Result = sampleprof_error::success;
129-
} else {
130-
Result = addSamples(Other.getSamples(), Weight);
131-
}
122+
Result = addSamples(Other.getSamples(), Weight);
132123
for (const auto &I : Other.getCallTargets()) {
133124
MergeResult(Result, addCalledTarget(I.first(), I.second, Weight));
134125
}

llvm/test/Transforms/SampleProfile/Inputs/pseudo-probe-inline.prof

+6
Original file line numberDiff line numberDiff line change
@@ -6,6 +6,9 @@
66
1: 23
77
2: 382920
88
3: 382915
9+
4: 0
10+
5: 0
11+
6: 0
912
!CFGChecksum: 138828622701
1013
[bar]:23:23
1114
1: 23
@@ -15,4 +18,7 @@
1518
1: 23
1619
2: 382920
1720
3: 382915
21+
4: 0
22+
5: 0
23+
6: 0
1824
!CFGChecksum: 138828622701

llvm/test/tools/llvm-profgen/fname-canonicalization.test

-2
Original file line numberDiff line numberDiff line change
@@ -22,8 +22,6 @@
2222
; CHECK-PROBE-FNAME: !CFGChecksum: 563088904013236
2323
; CHECK-PROBE-FNAME:[main:2 @ foo:8 @ _ZL3barii.__uniq.26267048767521081047744692097241227776]:30:15
2424
; CHECK-PROBE-FNAME: 1: 15
25-
; CHECK-PROBE-FNAME: 2: 18446744073709551615
26-
; CHECK-PROBE-FNAME: 3: 18446744073709551615
2725
; CHECK-PROBE-FNAME: 4: 15
2826
; CHECK-PROBE-FNAME: !CFGChecksum: 72617220756
2927

llvm/test/tools/llvm-profgen/inline-cs-dangling-pseudoprobe.test

+5-2
Original file line numberDiff line numberDiff line change
@@ -2,14 +2,17 @@
22
; RUN: FileCheck %s --input-file %t
33

44
; CHECK: [main:2 @ foo]:58:0
5+
; CHECK-NEXT: 1: 0
56
; CHECK-NEXT: 2: 15
67
; CHECK-NEXT: 3: 14
8+
; CHECK-NEXT: 4: 0
79
; CHECK-NEXT: 5: 14
810
; CHECK-NEXT: 6: 15
11+
; CHECK-NEXT: 7: 0
12+
; CHECK-NEXT: 9: 0
913
; CHECK-NEXT: !CFGChecksum: 138950591924
1014
; CHECK:[main:2 @ foo:8 @ bar]:1:0
11-
; CHECK-NEXT: 2: 18446744073709551615
12-
; CHECK-NEXT: 3: 18446744073709551615
15+
; CHECK-NEXT: 1: 0
1316
; CHECK-NEXT: 4: 1
1417
; CHECK-NEXT: !CFGChecksum: 72617220756
1518

llvm/test/tools/llvm-profgen/inline-cs-pseudoprobe.test

+3-2
Original file line numberDiff line numberDiff line change
@@ -2,17 +2,18 @@
22
; RUN: FileCheck %s --input-file %t
33

44
; CHECK: [main:2 @ foo]:74:0
5+
; CHECK-NEXT: 1: 0
56
; CHECK-NEXT: 2: 15
67
; CHECK-NEXT: 3: 15
78
; CHECK-NEXT: 4: 14
89
; CHECK-NEXT: 5: 1
910
; CHECK-NEXT: 6: 15
11+
; CHECK-NEXT: 7: 0
1012
; CHECK-NEXT: 8: 14 bar:14
13+
; CHECK-NEXT: 9: 0
1114
; CHECK-NEXT: !CFGChecksum: 138950591924
1215
; CHECK:[main:2 @ foo:8 @ bar]:28:14
1316
; CHECK-NEXT: 1: 14
14-
; CHECK-NEXT: 2: 18446744073709551615
15-
; CHECK-NEXT: 3: 18446744073709551615
1617
; CHECK-NEXT: 4: 14
1718
; CHECK-NEXT: !CFGChecksum: 72617220756
1819

llvm/test/tools/llvm-profgen/merge-cold-profile.test

+13-4
Original file line numberDiff line numberDiff line change
@@ -16,10 +16,10 @@
1616

1717
; CHECK: [fa]:14:4
1818
; CHECK-NEXT: 1: 4
19-
; CHECK-NEXT: 2: 18446744073709551615
2019
; CHECK-NEXT: 3: 4
2120
; CHECK-NEXT: 4: 2
2221
; CHECK-NEXT: 5: 1
22+
; CHECK-NEXT: 6: 0
2323
; CHECK-NEXT: 7: 2 fb:2
2424
; CHECK-NEXT: 8: 1 fa:1
2525
; CHECK-NEXT: !CFGChecksum: 120515930909
@@ -28,6 +28,7 @@
2828
; CHECK-NEXT: 1: 4
2929
; CHECK-NEXT: 2: 3
3030
; CHECK-NEXT: 3: 1
31+
; CHECK-NEXT: 4: 0
3132
; CHECK-NEXT: 5: 4 fb:4
3233
; CHECK-NEXT: 6: 1 fa:1
3334
; CHECK-NEXT: !CFGChecksum: 72617220756
@@ -36,16 +37,17 @@
3637
; CHECK-KEEP-COLD-NEXT: 1: 6
3738
; CHECK-KEEP-COLD-NEXT: 2: 3
3839
; CHECK-KEEP-COLD-NEXT: 3: 3
40+
; CHECK-KEEP-COLD-NEXT: 4: 0
3941
; CHECK-KEEP-COLD-NEXT: 5: 4 fb:4
4042
; CHECK-KEEP-COLD-NEXT: 6: 3 fa:3
4143
; CHECK-KEEP-COLD-NEXT: !CFGChecksum: 72617220756
4244
; CHECK-KEEP-COLD-NEXT: !Attributes: 0
4345
; CHECK-KEEP-COLD-NEXT:[fa]:14:4
4446
; CHECK-KEEP-COLD-NEXT: 1: 4
45-
; CHECK-KEEP-COLD-NEXT: 2: 18446744073709551615
4647
; CHECK-KEEP-COLD-NEXT: 3: 4
4748
; CHECK-KEEP-COLD-NEXT: 4: 2
4849
; CHECK-KEEP-COLD-NEXT: 5: 1
50+
; CHECK-KEEP-COLD-NEXT: 6: 0
4951
; CHECK-KEEP-COLD-NEXT: 7: 2 fb:2
5052
; CHECK-KEEP-COLD-NEXT: 8: 1 fa:1
5153
; CHECK-KEEP-COLD-NEXT: !CFGChecksum: 120515930909
@@ -54,6 +56,7 @@
5456
; CHECK-UNMERGED-NEXT: 1: 4
5557
; CHECK-UNMERGED-NEXT: 2: 3
5658
; CHECK-UNMERGED-NEXT: 3: 1
59+
; CHECK-UNMERGED-NEXT: 4: 0
5760
; CHECK-UNMERGED-NEXT: 5: 4 fb:4
5861
; CHECK-UNMERGED-NEXT: 6: 1 fa:1
5962
; CHECK-UNMERGED-NEXT: !CFGChecksum: 72617220756
@@ -64,32 +67,38 @@
6467
; CHECK-COLD-CONTEXT-LENGTH-NEXT: 1: 4
6568
; CHECK-COLD-CONTEXT-LENGTH-NEXT: 2: 3
6669
; CHECK-COLD-CONTEXT-LENGTH-NEXT: 3: 1
70+
; CHECK-COLD-CONTEXT-LENGTH-NEXT: 4: 0
6771
; CHECK-COLD-CONTEXT-LENGTH-NEXT: 5: 4 fb:4
6872
; CHECK-COLD-CONTEXT-LENGTH-NEXT: 6: 1 fa:1
6973
; CHECK-COLD-CONTEXT-LENGTH-NEXT: !CFGChecksum: 72617220756
7074
; CHECK-COLD-CONTEXT-LENGTH-NEXT: !Attributes: 0
7175
; CHECK-COLD-CONTEXT-LENGTH-NEXT:[fb:6 @ fa]:10:3
7276
; CHECK-COLD-CONTEXT-LENGTH-NEXT: 1: 3
73-
; CHECK-COLD-CONTEXT-LENGTH-NEXT: 2: 18446744073709551615
7477
; CHECK-COLD-CONTEXT-LENGTH-NEXT: 3: 3
7578
; CHECK-COLD-CONTEXT-LENGTH-NEXT: 4: 1
7679
; CHECK-COLD-CONTEXT-LENGTH-NEXT: 5: 1
80+
; CHECK-COLD-CONTEXT-LENGTH-NEXT: 6: 0
7781
; CHECK-COLD-CONTEXT-LENGTH-NEXT: 7: 1 fb:1
7882
; CHECK-COLD-CONTEXT-LENGTH-NEXT: 8: 1 fa:1
7983
; CHECK-COLD-CONTEXT-LENGTH-NEXT: !CFGChecksum: 120515930909
8084
; CHECK-COLD-CONTEXT-LENGTH-NEXT: !Attributes: 0
8185
; CHECK-COLD-CONTEXT-LENGTH-NEXT:[fa:7 @ fb]:6:2
8286
; CHECK-COLD-CONTEXT-LENGTH-NEXT: 1: 2
87+
; CHECK-COLD-CONTEXT-LENGTH-NEXT: 2: 0
8388
; CHECK-COLD-CONTEXT-LENGTH-NEXT: 3: 2
89+
; CHECK-COLD-CONTEXT-LENGTH-NEXT: 4: 0
90+
; CHECK-COLD-CONTEXT-LENGTH-NEXT: 5: 0
8491
; CHECK-COLD-CONTEXT-LENGTH-NEXT: 6: 2 fa:2
8592
; CHECK-COLD-CONTEXT-LENGTH-NEXT: !CFGChecksum: 72617220756
8693
; CHECK-COLD-CONTEXT-LENGTH-NEXT: !Attributes: 0
8794
; CHECK-COLD-CONTEXT-LENGTH-NEXT:[fa:8 @ fa]:4:1
8895
; CHECK-COLD-CONTEXT-LENGTH-NEXT: 1: 1
89-
; CHECK-COLD-CONTEXT-LENGTH-NEXT: 2: 18446744073709551615
9096
; CHECK-COLD-CONTEXT-LENGTH-NEXT: 3: 1
9197
; CHECK-COLD-CONTEXT-LENGTH-NEXT: 4: 1
98+
; CHECK-COLD-CONTEXT-LENGTH-NEXT: 5: 0
99+
; CHECK-COLD-CONTEXT-LENGTH-NEXT: 6: 0
92100
; CHECK-COLD-CONTEXT-LENGTH-NEXT: 7: 1 fb:1
101+
; CHECK-COLD-CONTEXT-LENGTH-NEXT: 8: 0
93102
; CHECK-COLD-CONTEXT-LENGTH-NEXT: !CFGChecksum: 120515930909
94103
; CHECK-COLD-CONTEXT-LENGTH-NEXT: !Attributes: 0
95104

llvm/test/tools/llvm-profgen/noinline-cs-pseudoprobe.test

+4-2
Original file line numberDiff line numberDiff line change
@@ -2,16 +2,18 @@
22
; RUN: FileCheck %s --input-file %t
33

44
; CHECK: [main:2 @ foo]:75:0
5+
; CHECK-NEXT: 1: 0
56
; CHECK-NEXT: 2: 15
67
; CHECK-NEXT: 3: 15
78
; CHECK-NEXT: 4: 15
9+
; CHECK-NEXT: 5: 0
810
; CHECK-NEXT: 6: 15
11+
; CHECK-NEXT: 7: 0
912
; CHECK-NEXT: 8: 15 bar:15
13+
; CHECK-NEXT: 9: 0
1014
; CHECK-NEXT: !CFGChecksum: 138950591924
1115
; CHECK:[main:2 @ foo:8 @ bar]:30:15
1216
; CHECK-NEXT: 1: 15
13-
; CHECK-NEXT: 2: 18446744073709551615
14-
; CHECK-NEXT: 3: 18446744073709551615
1517
; CHECK-NEXT: 4: 15
1618
; CHECK-NEXT: !CFGChecksum: 72617220756
1719

llvm/test/tools/llvm-profgen/truncated-pseudoprobe.test

+4-2
Original file line numberDiff line numberDiff line change
@@ -2,17 +2,19 @@
22
; RUN: FileCheck %s --input-file %t
33

44
; CHECK: [foo]:75:0
5+
; CHECK-NEXT: 1: 0
56
; CHECK-NEXT: 2: 15
67
; CHECK-NEXT: 3: 15
78
; CHECK-NEXT: 4: 15
9+
; CHECK-NEXT: 5: 0
810
; CHECK-NEXT: 6: 15
11+
; CHECK-NEXT: 7: 0
912
; CHECK-NEXT: 8: 15 bar:15
13+
; CHECK-NEXT: 9: 0
1014
; CHECK-NEXT: !CFGChecksum: 563088904013236
1115
; CHECK-NEXT: !Attributes: 0
1216
; CHECK: [foo:8 @ bar]:30:15
1317
; CHECK-NEXT: 1: 15
14-
; CHECK-NEXT: 2: 18446744073709551615
15-
; CHECK-NEXT: 3: 18446744073709551615
1618
; CHECK-NEXT: 4: 15
1719
; CHECK-NEXT: !CFGChecksum: 72617220756
1820
; CHECK-NEXT: !Attributes: 1

llvm/tools/llvm-profgen/ProfileGenerator.cpp

+5-10
Original file line numberDiff line numberDiff line change
@@ -555,19 +555,14 @@ void PseudoProbeCSProfileGenerator::populateBodySamplesWithProbes(
555555
}
556556
}
557557

558-
// Report dangling probes for frames that have real samples collected.
559-
// Dangling probes are the probes associated to an empty block. With this
560-
// place holder, sample count on a dangling probe will not be trusted by the
561-
// compiler and we will rely on the counts inference algorithm to get the
562-
// probe a reasonable count. Use InvalidProbeCount to mark sample count for
563-
// a dangling probe.
558+
// Assign zero count for remaining probes without sample hits to
559+
// differentiate from probes optimized away, of which the counts are unknown
560+
// and will be inferred by the compiler.
564561
for (auto &I : FrameSamples) {
565562
auto *FunctionProfile = I.second;
566563
for (auto *Probe : I.first->getProbes()) {
567-
if (Probe->isDangling()) {
568-
FunctionProfile->addBodySamplesForProbe(
569-
Probe->Index, FunctionSamples::InvalidProbeCount);
570-
}
564+
if (!Probe->isDangling())
565+
FunctionProfile->addBodySamplesForProbe(Probe->Index, 0);
571566
}
572567
}
573568
}

0 commit comments

Comments
 (0)