Skip to content

Commit d05be9f

Browse files
committed
no message
1 parent aa6172b commit d05be9f

File tree

4 files changed

+155
-14
lines changed

4 files changed

+155
-14
lines changed

METRICS.md

+7-1
Original file line numberDiff line numberDiff line change
@@ -2,15 +2,21 @@
22
### Recall@K (R@K)
33
The earliest and the most widely accepted metric in scene graph generation, which is firstly adopted by [Visual relationship detection with language priors](https://arxiv.org/abs/1608.00187). Since the ground-truth annotations of relationships are incomplete, it's improper to use simple accurary as the metric. Therefore, Lu et al. transfer it to a retrieve-like problem: the relationships are not only required to be correctly classified, but also required to have as higher score as possible, so they can be retrieved from plenty of 'none' relationship pairs.
44

5-
### No Graph Constraint Recall@K (ngR@K)
5+
### No Graph Constraint Recall@K (ng-R@K)
66
It's firstly used by [Pixel2Graph](https://arxiv.org/abs/1706.07365) and named by [Neural-MOTIFS](https://arxiv.org/abs/1711.06640). The former paper significantly improves the R@K results by allowing each pair to have multiple predicates, which means for each subject-object pair, all the 50 predicates will be involved in the recall ranking not just the one with highest score. Since predicates are not exclusive, 'on' and 'riding' can both be correct. This setting significantly improves the R@K. To fairly compare with other methods, [Neural-MOTIFS](https://arxiv.org/abs/1711.06640) named it as the No Graph Constraint Recall@K (ngR@K).
77

88
### Mean Recall@K (mR@K)
99
It is proposed by our work [VCTree](https://arxiv.org/abs/1812.01880) and Chen et al.s'[KERN](https://arxiv.org/abs/1903.03326) at the same time (CVPR 2019), although we didn't make it as our main contribution and only listed the full results on the [supplementary material](https://zpascal.net/cvpr2019/Tang_Learning_to_Compose_CVPR_2019_supplemental.pdf). However, we also acknowledge the contribution of [KERN](https://arxiv.org/abs/1903.03326), for they gave more mR@K results of previous methods. The main motivation of Mean Recall@K (mR@K) is that the VisualGenome dataset is biased towards dominant predicates. If the 10 most frequent predicates are correctly classified, the accuracy would reach 90% even the rest 40 kinds of predicates are all wrong. This is definitely not what we want. Therefore, Mean Recall@K (mR@K) calculates Recall@K for each predicate category independently then report their mean.
1010

11+
### No Graph Constraint Mean Recall@K (ng-mR@K)
12+
The same mean Recall metric, but for each pair of objects, all possible predicates are valid candidates (the original mean Recall@K only considers the predicate with maximum score of each pair as the valid candidate to calculate Recall).
13+
1114
### Zero Shot Recall@K (zR@K)
1215
It is firstly used by [Visual relationship detection with language priors](https://arxiv.org/abs/1608.00187) for VRD dataset, and firstly reported by [Unbiased Scene Graph Generation from Biased Training](https://arxiv.org/abs/2002.11949) for VisualGenome dataset. In short, it only calculates the Recall@K for those subject-predicate-object combinations that not occurred in the training set.
1316

17+
### No Graph Constraint Zero Shot Recall@K (ng-zR@K)
18+
The same zero-shot Recall metric, but for each pair of objects, all possible predicates are valid candidates (the original zero-shot Recall@K only considers the predicate with maximum score of each pair as the valid candidate to calculate Recall).
19+
1420
### Top@K Accuracy (A@K)
1521
It is actually caused by the misunderstanding of PredCls and SGCls protocols. [Contrastive Losses](https://arxiv.org/abs/1903.02728) reported Recall@K of PredCls and SGCls by not just giving ground-truth bounding boxes, but also giving the ground-truth subject-object pairs, so no ranking is involved. The results can only be considerred as Top@K Accuracy (A@K) for the given K ground-truth subject-object pairs.
1622

README.md

+5-6
Original file line numberDiff line numberDiff line change
@@ -6,13 +6,17 @@
66

77
Our paper [Unbiased Scene Graph Generation from Biased Training](https://arxiv.org/abs/2002.11949) has been accepted by CVPR 2020 (Oral).
88

9+
## Recent Updates
10+
11+
- [x] 2020.06.23 [No Graph Constraint Mean Recall@K (ng-mR@K) and No Graph Constraint Zero-Shot Recall@K (ng-zR@K)](METRICS.md#explanation-of-our-metrics)
12+
913
## Contents
1014

1115
1. [Overview](#Overview)
1216
2. [Install the Requirements](INSTALL.md)
1317
3. [Prepare the Dataset](DATASET.md)
1418
4. [Metrics and Results for our Toolkit](METRICS.md)
15-
- [Explanation of R@K, ngR@K, mR@K, zR@K, A@K, S2G](METRICS.md#explanation-of-our-metrics)
19+
- [Explanation of R@K, mR@K, zR@K, ng-R@K, ng-mR@K, ng-zR@K, A@K, S2G](METRICS.md#explanation-of-our-metrics)
1620
- [Output Format](METRICS.md#output-format-of-our-code)
1721
- [Reported Results](METRICS.md#reported-results)
1822
5. [Faster R-CNN Pre-training](#pretrained-models)
@@ -181,11 +185,6 @@ The proposed unbiased counterfactual inference in our paper [Unbiased Scene Grap
181185

182186
If you think about our advice, you may realize that the only rule is to maintain the independent causal influence from each branch to the target node as stable as possible, and use the causal influence fusion functions that are explicit and explainable. It's probably because the causal effect is very human-centric/subjective/recognizable (sorry, I don't know which word I should use here to express my intuition.), so those unexplainable fusion functions and implicit combined single loss (without auxiliary losses when multiple branches are involved) will mess up influences with different sources.
183187

184-
## To Do List
185-
186-
- [x] Publish Visualization Tool for SGG
187-
- [ ] Reorganize Code and Instructions of S2G Retrieval
188-
189188
## Citations
190189

191190
If you find this project helps your research, please kindly consider citing our papers in your publications.

maskrcnn_benchmark/data/datasets/evaluation/vg/sgg_eval.py

+124-6
Original file line numberDiff line numberDiff line change
@@ -46,7 +46,7 @@ def register_container(self, mode):
4646
def generate_print_string(self, mode):
4747
result_str = 'SGG eval: '
4848
for k, v in self.result_dict[mode + '_recall'].items():
49-
result_str += ' R @ %d: %.4f; ' % (k, np.mean(v))
49+
result_str += ' R @ %d: %.4f; ' % (k, np.mean(v))
5050
result_str += ' for mode=%s, type=Recall(Main).' % mode
5151
result_str += '\n'
5252
return result_str
@@ -105,7 +105,7 @@ def register_container(self, mode):
105105
def generate_print_string(self, mode):
106106
result_str = 'SGG eval: '
107107
for k, v in self.result_dict[mode + '_recall_nogc'].items():
108-
result_str += 'ngR @ %d: %.4f; ' % (k, np.mean(v))
108+
result_str += ' ng-R @ %d: %.4f; ' % (k, np.mean(v))
109109
result_str += ' for mode=%s, type=No Graph Constraint Recall(Main).' % mode
110110
result_str += '\n'
111111
return result_str
@@ -142,11 +142,15 @@ def calculate_recall(self, global_container, local_container, mode):
142142
phrdet=mode=='phrdet',
143143
)
144144

145+
local_container['nogc_pred_to_gt'] = nogc_pred_to_gt
146+
145147
for k in self.result_dict[mode + '_recall_nogc']:
146148
match = reduce(np.union1d, nogc_pred_to_gt[:k])
147149
rec_i = float(len(match)) / float(gt_rels.shape[0])
148150
self.result_dict[mode + '_recall_nogc'][k].append(rec_i)
149151

152+
return local_container
153+
150154
"""
151155
Zero Shot Scene Graph
152156
Only calculate the triplet that not occurred in the training set
@@ -161,7 +165,7 @@ def register_container(self, mode):
161165
def generate_print_string(self, mode):
162166
result_str = 'SGG eval: '
163167
for k, v in self.result_dict[mode + '_zeroshot_recall'].items():
164-
result_str += ' zR @ %d: %.4f; ' % (k, np.mean(v))
168+
result_str += ' zR @ %d: %.4f; ' % (k, np.mean(v))
165169
result_str += ' for mode=%s, type=Zero Shot Recall.' % mode
166170
result_str += '\n'
167171
return result_str
@@ -192,6 +196,50 @@ def calculate_recall(self, global_container, local_container, mode):
192196
self.result_dict[mode + '_zeroshot_recall'][k].append(zero_rec_i)
193197

194198

199+
"""
200+
No Graph Constraint Mean Recall
201+
"""
202+
class SGNGZeroShotRecall(SceneGraphEvaluation):
203+
def __init__(self, result_dict):
204+
super(SGNGZeroShotRecall, self).__init__(result_dict)
205+
206+
def register_container(self, mode):
207+
self.result_dict[mode + '_ng_zeroshot_recall'] = {20: [], 50: [], 100: []}
208+
209+
def generate_print_string(self, mode):
210+
result_str = 'SGG eval: '
211+
for k, v in self.result_dict[mode + '_ng_zeroshot_recall'].items():
212+
result_str += 'ng-zR @ %d: %.4f; ' % (k, np.mean(v))
213+
result_str += ' for mode=%s, type=No Graph Constraint Zero Shot Recall.' % mode
214+
result_str += '\n'
215+
return result_str
216+
217+
def prepare_zeroshot(self, global_container, local_container):
218+
gt_rels = local_container['gt_rels']
219+
gt_classes = local_container['gt_classes']
220+
zeroshot_triplets = global_container['zeroshot_triplet']
221+
222+
sub_id, ob_id, pred_label = gt_rels[:, 0], gt_rels[:, 1], gt_rels[:, 2]
223+
gt_triplets = np.column_stack((gt_classes[sub_id], gt_classes[ob_id], pred_label)) # num_rel, 3
224+
225+
self.zeroshot_idx = np.where( intersect_2d(gt_triplets, zeroshot_triplets).sum(-1) > 0 )[0].tolist()
226+
227+
def calculate_recall(self, global_container, local_container, mode):
228+
pred_to_gt = local_container['nogc_pred_to_gt']
229+
230+
for k in self.result_dict[mode + '_ng_zeroshot_recall']:
231+
# Zero Shot Recall
232+
match = reduce(np.union1d, pred_to_gt[:k])
233+
if len(self.zeroshot_idx) > 0:
234+
if not isinstance(match, (list, tuple)):
235+
match_list = match.tolist()
236+
else:
237+
match_list = match
238+
zeroshot_match = len(self.zeroshot_idx) + len(match_list) - len(set(self.zeroshot_idx + match_list))
239+
zero_rec_i = float(zeroshot_match) / float(len(self.zeroshot_idx))
240+
self.result_dict[mode + '_ng_zeroshot_recall'][k].append(zero_rec_i)
241+
242+
195243
"""
196244
Give Ground Truth Object-Subject Pairs
197245
Calculate Recall for SG-Cls and Pred-Cls
@@ -210,7 +258,7 @@ def generate_print_string(self, mode):
210258
for k, v in self.result_dict[mode + '_accuracy_hit'].items():
211259
a_hit = np.mean(v)
212260
a_count = np.mean(self.result_dict[mode + '_accuracy_count'][k])
213-
result_str += ' A @ %d: %.4f; ' % (k, a_hit/a_count)
261+
result_str += ' A @ %d: %.4f; ' % (k, a_hit/a_count)
214262
result_str += ' for mode=%s, type=TopK Accuracy.' % mode
215263
result_str += '\n'
216264
return result_str
@@ -262,7 +310,7 @@ def register_container(self, mode):
262310
def generate_print_string(self, mode):
263311
result_str = 'SGG eval: '
264312
for k, v in self.result_dict[mode + '_mean_recall'].items():
265-
result_str += ' mR @ %d: %.4f; ' % (k, float(v))
313+
result_str += ' mR @ %d: %.4f; ' % (k, float(v))
266314
result_str += ' for mode=%s, type=Mean Recall.' % mode
267315
result_str += '\n'
268316
if self.print_detail:
@@ -313,6 +361,76 @@ def calculate_mean_recall(self, mode):
313361
self.result_dict[mode + '_mean_recall'][k] = sum_recall / float(num_rel_no_bg)
314362
return
315363

364+
365+
"""
366+
No Graph Constraint Mean Recall
367+
"""
368+
class SGNGMeanRecall(SceneGraphEvaluation):
369+
def __init__(self, result_dict, num_rel, ind_to_predicates, print_detail=False):
370+
super(SGNGMeanRecall, self).__init__(result_dict)
371+
self.num_rel = num_rel
372+
self.print_detail = print_detail
373+
self.rel_name_list = ind_to_predicates[1:] # remove __background__
374+
375+
def register_container(self, mode):
376+
self.result_dict[mode + '_ng_mean_recall'] = {20: 0.0, 50: 0.0, 100: 0.0}
377+
self.result_dict[mode + '_ng_mean_recall_collect'] = {20: [[] for i in range(self.num_rel)], 50: [[] for i in range(self.num_rel)], 100: [[] for i in range(self.num_rel)]}
378+
self.result_dict[mode + '_ng_mean_recall_list'] = {20: [], 50: [], 100: []}
379+
380+
def generate_print_string(self, mode):
381+
result_str = 'SGG eval: '
382+
for k, v in self.result_dict[mode + '_ng_mean_recall'].items():
383+
result_str += 'ng-mR @ %d: %.4f; ' % (k, float(v))
384+
result_str += ' for mode=%s, type=No Graph Constraint Mean Recall.' % mode
385+
result_str += '\n'
386+
if self.print_detail:
387+
for n, r in zip(self.rel_name_list, self.result_dict[mode + '_ng_mean_recall_list'][100]):
388+
result_str += '({}:{:.4f}) '.format(str(n), r)
389+
result_str += '\n'
390+
391+
return result_str
392+
393+
def collect_mean_recall_items(self, global_container, local_container, mode):
394+
pred_to_gt = local_container['nogc_pred_to_gt']
395+
gt_rels = local_container['gt_rels']
396+
397+
for k in self.result_dict[mode + '_ng_mean_recall_collect']:
398+
# the following code are copied from Neural-MOTIFS
399+
match = reduce(np.union1d, pred_to_gt[:k])
400+
# NOTE: by kaihua, calculate Mean Recall for each category independently
401+
# this metric is proposed by: CVPR 2019 oral paper "Learning to Compose Dynamic Tree Structures for Visual Contexts"
402+
recall_hit = [0] * self.num_rel
403+
recall_count = [0] * self.num_rel
404+
for idx in range(gt_rels.shape[0]):
405+
local_label = gt_rels[idx,2]
406+
recall_count[int(local_label)] += 1
407+
recall_count[0] += 1
408+
409+
for idx in range(len(match)):
410+
local_label = gt_rels[int(match[idx]),2]
411+
recall_hit[int(local_label)] += 1
412+
recall_hit[0] += 1
413+
414+
for n in range(self.num_rel):
415+
if recall_count[n] > 0:
416+
self.result_dict[mode + '_ng_mean_recall_collect'][k][n].append(float(recall_hit[n] / recall_count[n]))
417+
418+
419+
def calculate_mean_recall(self, mode):
420+
for k, v in self.result_dict[mode + '_ng_mean_recall'].items():
421+
sum_recall = 0
422+
num_rel_no_bg = self.num_rel - 1
423+
for idx in range(num_rel_no_bg):
424+
if len(self.result_dict[mode + '_ng_mean_recall_collect'][k][idx+1]) == 0:
425+
tmp_recall = 0.0
426+
else:
427+
tmp_recall = np.mean(self.result_dict[mode + '_ng_mean_recall_collect'][k][idx+1])
428+
self.result_dict[mode + '_ng_mean_recall_list'][k].append(tmp_recall)
429+
sum_recall += tmp_recall
430+
431+
self.result_dict[mode + '_ng_mean_recall'][k] = sum_recall / float(num_rel_no_bg)
432+
return
433+
316434
"""
317435
Accumulate Recall:
318436
calculate recall on the whole dataset instead of each image
@@ -327,7 +445,7 @@ def register_container(self, mode):
327445
def generate_print_string(self, mode):
328446
result_str = 'SGG eval: '
329447
for k, v in self.result_dict[mode + '_accumulate_recall'].items():
330-
result_str += ' aR @ %d: %.4f; ' % (k, float(v))
448+
result_str += ' aR @ %d: %.4f; ' % (k, float(v))
331449
result_str += ' for mode=%s, type=Accumulate Recall.' % mode
332450
result_str += '\n'
333451
return result_str

maskrcnn_benchmark/data/datasets/evaluation/vg/vg_eval.py

+19-1
Original file line numberDiff line numberDiff line change
@@ -12,7 +12,7 @@
1212
from maskrcnn_benchmark.structures.bounding_box import BoxList
1313
from maskrcnn_benchmark.structures.boxlist_ops import boxlist_iou
1414
from maskrcnn_benchmark.utils.miscellaneous import intersect_2d, argsort_desc, bbox_overlaps
15-
from maskrcnn_benchmark.data.datasets.evaluation.vg.sgg_eval import SGRecall, SGNoGraphConstraintRecall, SGZeroShotRecall, SGPairAccuracy, SGMeanRecall, SGAccumulateRecall
15+
from maskrcnn_benchmark.data.datasets.evaluation.vg.sgg_eval import SGRecall, SGNoGraphConstraintRecall, SGZeroShotRecall, SGNGZeroShotRecall, SGPairAccuracy, SGMeanRecall, SGNGMeanRecall, SGAccumulateRecall
1616

1717
def do_vg_evaluation(
1818
cfg,
@@ -129,6 +129,11 @@ def do_vg_evaluation(
129129
eval_zeroshot_recall = SGZeroShotRecall(result_dict)
130130
eval_zeroshot_recall.register_container(mode)
131131
evaluator['eval_zeroshot_recall'] = eval_zeroshot_recall
132+
133+
# test on no graph constraint zero-shot recall
134+
eval_ng_zeroshot_recall = SGNGZeroShotRecall(result_dict)
135+
eval_ng_zeroshot_recall.register_container(mode)
136+
evaluator['eval_ng_zeroshot_recall'] = eval_ng_zeroshot_recall
132137

133138
# used by https://github.com/NVIDIA/ContrastiveLosses4VRD for sgcls and predcls
134139
eval_pair_accuracy = SGPairAccuracy(result_dict)
@@ -140,6 +145,11 @@ def do_vg_evaluation(
140145
eval_mean_recall.register_container(mode)
141146
evaluator['eval_mean_recall'] = eval_mean_recall
142147

148+
# used for no graph constraint mean Recall@K
149+
eval_ng_mean_recall = SGNGMeanRecall(result_dict, num_rel_category, dataset.ind_to_predicates, print_detail=True)
150+
eval_ng_mean_recall.register_container(mode)
151+
evaluator['eval_ng_mean_recall'] = eval_ng_mean_recall
152+
143153
# prepare all inputs
144154
global_container = {}
145155
global_container['zeroshot_triplet'] = zeroshot_triplet
@@ -156,12 +166,15 @@ def do_vg_evaluation(
156166

157167
# calculate mean recall
158168
eval_mean_recall.calculate_mean_recall(mode)
169+
eval_ng_mean_recall.calculate_mean_recall(mode)
159170

160171
# print result
161172
result_str += eval_recall.generate_print_string(mode)
162173
result_str += eval_nog_recall.generate_print_string(mode)
163174
result_str += eval_zeroshot_recall.generate_print_string(mode)
175+
result_str += eval_ng_zeroshot_recall.generate_print_string(mode)
164176
result_str += eval_mean_recall.generate_print_string(mode)
177+
result_str += eval_ng_mean_recall.generate_print_string(mode)
165178

166179
if cfg.MODEL.ROI_RELATION_HEAD.USE_GT_BOX:
167180
result_str += eval_pair_accuracy.generate_print_string(mode)
@@ -246,6 +259,7 @@ def evaluate_relation_of_one_image(groundtruth, prediction, global_container, ev
246259

247260
# to calculate the prior label based on statistics
248261
evaluator['eval_zeroshot_recall'].prepare_zeroshot(global_container, local_container)
262+
evaluator['eval_ng_zeroshot_recall'].prepare_zeroshot(global_container, local_container)
249263

250264
if mode == 'predcls':
251265
local_container['pred_boxes'] = local_container['gt_boxes']
@@ -296,8 +310,12 @@ def evaluate_relation_of_one_image(groundtruth, prediction, global_container, ev
296310
evaluator['eval_pair_accuracy'].calculate_recall(global_container, local_container, mode)
297311
# Mean Recall
298312
evaluator['eval_mean_recall'].collect_mean_recall_items(global_container, local_container, mode)
313+
# No Graph Constraint Mean Recall
314+
evaluator['eval_ng_mean_recall'].collect_mean_recall_items(global_container, local_container, mode)
299315
# Zero shot Recall
300316
evaluator['eval_zeroshot_recall'].calculate_recall(global_container, local_container, mode)
317+
# No Graph Constraint Zero-Shot Recall
318+
evaluator['eval_ng_zeroshot_recall'].calculate_recall(global_container, local_container, mode)
301319

302320
return
303321

0 commit comments

Comments
 (0)