no message

KaihuaTang · KaihuaTang · commit d05be9f9e52e · 2020-06-23T21:54:34.000+08:00
diff --git a/METRICS.md b/METRICS.md
@@ -2,15 +2,21 @@
 ### Recall@K (R@K)
 The earliest and the most widely accepted metric in scene graph generation, which is firstly adopted by [Visual relationship detection with language priors](https://arxiv.org/abs/1608.00187). Since the ground-truth annotations of relationships are incomplete, it's improper to use simple accurary as the metric. Therefore, Lu et al. transfer it to a retrieve-like problem: the relationships are not only required to be correctly classified, but also required to have as higher score as possible, so they can be retrieved from plenty of 'none' relationship pairs.
 
-### No Graph Constraint Recall@K (ngR@K)
+### No Graph Constraint Recall@K (ng-R@K)
 It's firstly used by [Pixel2Graph](https://arxiv.org/abs/1706.07365) and named by [Neural-MOTIFS](https://arxiv.org/abs/1711.06640). The former paper significantly improves the R@K results by allowing each pair to have multiple predicates, which means for each subject-object pair, all the 50 predicates will be involved in the recall ranking not just the one with highest score. Since predicates are not exclusive, 'on' and 'riding' can both be correct. This setting significantly improves the R@K. To fairly compare with other methods, [Neural-MOTIFS](https://arxiv.org/abs/1711.06640) named it as the No Graph Constraint Recall@K (ngR@K).
 
 ### Mean Recall@K (mR@K)
 It is proposed by our work [VCTree](https://arxiv.org/abs/1812.01880) and Chen et al.s'[KERN](https://arxiv.org/abs/1903.03326) at the same time (CVPR 2019), although we didn't make it as our main contribution and only listed the full results on the [supplementary material](https://zpascal.net/cvpr2019/Tang_Learning_to_Compose_CVPR_2019_supplemental.pdf). However, we also acknowledge the contribution of [KERN](https://arxiv.org/abs/1903.03326), for they gave more mR@K results of previous methods. The main motivation of Mean Recall@K (mR@K) is that the VisualGenome dataset is biased towards dominant predicates. If the 10 most frequent predicates are correctly classified, the accuracy would reach 90% even the rest 40 kinds of predicates are all wrong. This is definitely not what we want. Therefore, Mean Recall@K (mR@K) calculates Recall@K for each predicate category independently then report their mean. 
 
+### No Graph Constraint Mean Recall@K (ng-mR@K)
+The same mean Recall metric, but for each pair of objects, all possible predicates are valid candidates (the original mean Recall@K only considers the predicate with maximum score of each pair as the valid candidate to calculate Recall).
+
 ### Zero Shot Recall@K (zR@K)
 It is firstly used by [Visual relationship detection with language priors](https://arxiv.org/abs/1608.00187) for VRD dataset, and firstly reported by  [Unbiased Scene Graph Generation from Biased Training](https://arxiv.org/abs/2002.11949) for VisualGenome dataset. In short, it only calculates the Recall@K for those subject-predicate-object combinations that not occurred in the training set.
 
+### No Graph Constraint Zero Shot Recall@K (ng-zR@K)
+The same zero-shot Recall metric, but for each pair of objects, all possible predicates are valid candidates (the original zero-shot Recall@K only considers the predicate with maximum score of each pair as the valid candidate to calculate Recall).
+
 ### Top@K Accuracy (A@K) 
 It is actually caused by the misunderstanding of PredCls and SGCls protocols. [Contrastive Losses](https://arxiv.org/abs/1903.02728) reported Recall@K of PredCls and SGCls by not just giving ground-truth bounding boxes, but also giving the ground-truth subject-object pairs, so no ranking is involved. The results can only be considerred as Top@K Accuracy (A@K) for the given K ground-truth subject-object pairs. 
 
diff --git a/README.md b/README.md
@@ -6,13 +6,17 @@
 
 Our paper [Unbiased Scene Graph Generation from Biased Training](https://arxiv.org/abs/2002.11949) has been accepted by CVPR 2020 (Oral).
 
+## Recent Updates
+
+- [x] 2020.06.23 [No Graph Constraint Mean Recall@K (ng-mR@K) and No Graph Constraint Zero-Shot Recall@K (ng-zR@K)](METRICS.md#explanation-of-our-metrics)
+
 ## Contents
 
 1. [Overview](#Overview)
 2. [Install the Requirements](INSTALL.md)
 3. [Prepare the Dataset](DATASET.md)
 4. [Metrics and Results for our Toolkit](METRICS.md)
-    - [Explanation of R@K, ngR@K, mR@K, zR@K, A@K, S2G](METRICS.md#explanation-of-our-metrics)
+    - [Explanation of R@K, mR@K, zR@K, ng-R@K, ng-mR@K, ng-zR@K, A@K, S2G](METRICS.md#explanation-of-our-metrics)
     - [Output Format](METRICS.md#output-format-of-our-code)
     - [Reported Results](METRICS.md#reported-results)
 5. [Faster R-CNN Pre-training](#pretrained-models)
@@ -181,11 +185,6 @@ The proposed unbiased counterfactual inference in our paper [Unbiased Scene Grap
 
 If you think about our advice, you may realize that the only rule is to maintain the independent causal influence from each branch to the target node as stable as possible, and use the causal influence fusion functions that are explicit and explainable. It's probably because the causal effect is very human-centric/subjective/recognizable (sorry, I don't know which word I should use here to express my intuition.), so those unexplainable fusion functions and implicit combined single loss (without auxiliary losses when multiple branches are involved) will mess up influences with different sources.
 
-## To Do List
-
-- [x] Publish Visualization Tool for SGG
-- [ ] Reorganize Code and Instructions of S2G Retrieval 
-
 ## Citations
 
 If you find this project helps your research, please kindly consider citing our papers in your publications.
diff --git a/maskrcnn_benchmark/data/datasets/evaluation/vg/sgg_eval.py b/maskrcnn_benchmark/data/datasets/evaluation/vg/sgg_eval.py
@@ -46,7 +46,7 @@ def register_container(self, mode):
     def generate_print_string(self, mode):
         result_str = 'SGG eval: '
         for k, v in self.result_dict[mode + '_recall'].items():
-            result_str += '  R @ %d: %.4f; ' % (k, np.mean(v))
+            result_str += '    R @ %d: %.4f; ' % (k, np.mean(v))
         result_str += ' for mode=%s, type=Recall(Main).' % mode
         result_str += '\n'
         return result_str
@@ -105,7 +105,7 @@ def register_container(self, mode):
     def generate_print_string(self, mode):
         result_str = 'SGG eval: '
         for k, v in self.result_dict[mode + '_recall_nogc'].items():
-            result_str += 'ngR @ %d: %.4f; ' % (k, np.mean(v))
+            result_str += ' ng-R @ %d: %.4f; ' % (k, np.mean(v))
         result_str += ' for mode=%s, type=No Graph Constraint Recall(Main).' % mode
         result_str += '\n'
         return result_str
@@ -142,11 +142,15 @@ def calculate_recall(self, global_container, local_container, mode):
             phrdet=mode=='phrdet',
         )
 
+        local_container['nogc_pred_to_gt'] = nogc_pred_to_gt
+
         for k in self.result_dict[mode + '_recall_nogc']:
             match = reduce(np.union1d, nogc_pred_to_gt[:k])
             rec_i = float(len(match)) / float(gt_rels.shape[0])
             self.result_dict[mode + '_recall_nogc'][k].append(rec_i)
 
+        return local_container
+
 """
 Zero Shot Scene Graph
 Only calculate the triplet that not occurred in the training set
@@ -161,7 +165,7 @@ def register_container(self, mode):
     def generate_print_string(self, mode):
         result_str = 'SGG eval: '
         for k, v in self.result_dict[mode + '_zeroshot_recall'].items():
-            result_str += ' zR @ %d: %.4f; ' % (k, np.mean(v))
+            result_str += '   zR @ %d: %.4f; ' % (k, np.mean(v))
         result_str += ' for mode=%s, type=Zero Shot Recall.' % mode
         result_str += '\n'
         return result_str
@@ -192,6 +196,50 @@ def calculate_recall(self, global_container, local_container, mode):
                 self.result_dict[mode + '_zeroshot_recall'][k].append(zero_rec_i)
 
 
+"""
+No Graph Constraint Mean Recall
+"""
+class SGNGZeroShotRecall(SceneGraphEvaluation):
+    def __init__(self, result_dict):
+        super(SGNGZeroShotRecall, self).__init__(result_dict)
+    
+    def register_container(self, mode):
+        self.result_dict[mode + '_ng_zeroshot_recall'] = {20: [], 50: [], 100: []} 
+
+    def generate_print_string(self, mode):
+        result_str = 'SGG eval: '
+        for k, v in self.result_dict[mode + '_ng_zeroshot_recall'].items():
+            result_str += 'ng-zR @ %d: %.4f; ' % (k, np.mean(v))
+        result_str += ' for mode=%s, type=No Graph Constraint Zero Shot Recall.' % mode
+        result_str += '\n'
+        return result_str
+
+    def prepare_zeroshot(self, global_container, local_container):
+        gt_rels = local_container['gt_rels']
+        gt_classes = local_container['gt_classes']
+        zeroshot_triplets = global_container['zeroshot_triplet']
+
+        sub_id, ob_id, pred_label = gt_rels[:, 0], gt_rels[:, 1], gt_rels[:, 2]
+        gt_triplets = np.column_stack((gt_classes[sub_id], gt_classes[ob_id], pred_label))  # num_rel, 3
+
+        self.zeroshot_idx = np.where( intersect_2d(gt_triplets, zeroshot_triplets).sum(-1) > 0 )[0].tolist()
+
+    def calculate_recall(self, global_container, local_container, mode):
+        pred_to_gt = local_container['nogc_pred_to_gt']
+
+        for k in self.result_dict[mode + '_ng_zeroshot_recall']:
+            # Zero Shot Recall
+            match = reduce(np.union1d, pred_to_gt[:k])
+            if len(self.zeroshot_idx) > 0:
+                if not isinstance(match, (list, tuple)):
+                    match_list = match.tolist()
+                else:
+                    match_list = match
+                zeroshot_match = len(self.zeroshot_idx) + len(match_list) - len(set(self.zeroshot_idx + match_list))
+                zero_rec_i = float(zeroshot_match) / float(len(self.zeroshot_idx))
+                self.result_dict[mode + '_ng_zeroshot_recall'][k].append(zero_rec_i)
+
+
 """
 Give Ground Truth Object-Subject Pairs
 Calculate Recall for SG-Cls and Pred-Cls
@@ -210,7 +258,7 @@ def generate_print_string(self, mode):
         for k, v in self.result_dict[mode + '_accuracy_hit'].items():
             a_hit = np.mean(v)
             a_count = np.mean(self.result_dict[mode + '_accuracy_count'][k])
-            result_str += '  A @ %d: %.4f; ' % (k, a_hit/a_count)
+            result_str += '    A @ %d: %.4f; ' % (k, a_hit/a_count)
         result_str += ' for mode=%s, type=TopK Accuracy.' % mode
         result_str += '\n'
         return result_str
@@ -262,7 +310,7 @@ def register_container(self, mode):
     def generate_print_string(self, mode):
         result_str = 'SGG eval: '
         for k, v in self.result_dict[mode + '_mean_recall'].items():
-            result_str += ' mR @ %d: %.4f; ' % (k, float(v))
+            result_str += '   mR @ %d: %.4f; ' % (k, float(v))
         result_str += ' for mode=%s, type=Mean Recall.' % mode
         result_str += '\n'
         if self.print_detail:
@@ -313,6 +361,76 @@ def calculate_mean_recall(self, mode):
             self.result_dict[mode + '_mean_recall'][k] = sum_recall / float(num_rel_no_bg)
         return
 
+
+"""
+No Graph Constraint Mean Recall
+"""
+class SGNGMeanRecall(SceneGraphEvaluation):
+    def __init__(self, result_dict, num_rel, ind_to_predicates, print_detail=False):
+        super(SGNGMeanRecall, self).__init__(result_dict)
+        self.num_rel = num_rel
+        self.print_detail = print_detail
+        self.rel_name_list = ind_to_predicates[1:] # remove __background__
+
+    def register_container(self, mode):
+        self.result_dict[mode + '_ng_mean_recall'] = {20: 0.0, 50: 0.0, 100: 0.0}
+        self.result_dict[mode + '_ng_mean_recall_collect'] = {20: [[] for i in range(self.num_rel)], 50: [[] for i in range(self.num_rel)], 100: [[] for i in range(self.num_rel)]}
+        self.result_dict[mode + '_ng_mean_recall_list'] = {20: [], 50: [], 100: []}
+
+    def generate_print_string(self, mode):
+        result_str = 'SGG eval: '
+        for k, v in self.result_dict[mode + '_ng_mean_recall'].items():
+            result_str += 'ng-mR @ %d: %.4f; ' % (k, float(v))
+        result_str += ' for mode=%s, type=No Graph Constraint Mean Recall.' % mode
+        result_str += '\n'
+        if self.print_detail:
+            for n, r in zip(self.rel_name_list, self.result_dict[mode + '_ng_mean_recall_list'][100]):
+                result_str += '({}:{:.4f}) '.format(str(n), r)
+            result_str += '\n'
+
+        return result_str
+
+    def collect_mean_recall_items(self, global_container, local_container, mode):
+        pred_to_gt = local_container['nogc_pred_to_gt']
+        gt_rels = local_container['gt_rels']
+
+        for k in self.result_dict[mode + '_ng_mean_recall_collect']:
+            # the following code are copied from Neural-MOTIFS
+            match = reduce(np.union1d, pred_to_gt[:k])
+            # NOTE: by kaihua, calculate Mean Recall for each category independently
+            # this metric is proposed by: CVPR 2019 oral paper "Learning to Compose Dynamic Tree Structures for Visual Contexts"
+            recall_hit = [0] * self.num_rel
+            recall_count = [0] * self.num_rel
+            for idx in range(gt_rels.shape[0]):
+                local_label = gt_rels[idx,2]
+                recall_count[int(local_label)] += 1
+                recall_count[0] += 1
+
+            for idx in range(len(match)):
+                local_label = gt_rels[int(match[idx]),2]
+                recall_hit[int(local_label)] += 1
+                recall_hit[0] += 1
+            
+            for n in range(self.num_rel):
+                if recall_count[n] > 0:
+                    self.result_dict[mode + '_ng_mean_recall_collect'][k][n].append(float(recall_hit[n] / recall_count[n]))
+ 
+
+    def calculate_mean_recall(self, mode):
+        for k, v in self.result_dict[mode + '_ng_mean_recall'].items():
+            sum_recall = 0
+            num_rel_no_bg = self.num_rel - 1
+            for idx in range(num_rel_no_bg):
+                if len(self.result_dict[mode + '_ng_mean_recall_collect'][k][idx+1]) == 0:
+                    tmp_recall = 0.0
+                else:
+                    tmp_recall = np.mean(self.result_dict[mode + '_ng_mean_recall_collect'][k][idx+1])
+                self.result_dict[mode + '_ng_mean_recall_list'][k].append(tmp_recall)
+                sum_recall += tmp_recall
+
+            self.result_dict[mode + '_ng_mean_recall'][k] = sum_recall / float(num_rel_no_bg)
+        return
+
 """
 Accumulate Recall:
 calculate recall on the whole dataset instead of each image
@@ -327,7 +445,7 @@ def register_container(self, mode):
     def generate_print_string(self, mode):
         result_str = 'SGG eval: '
         for k, v in self.result_dict[mode + '_accumulate_recall'].items():
-            result_str += ' aR @ %d: %.4f; ' % (k, float(v))
+            result_str += '   aR @ %d: %.4f; ' % (k, float(v))
         result_str += ' for mode=%s, type=Accumulate Recall.' % mode
         result_str += '\n'
         return result_str
diff --git a/maskrcnn_benchmark/data/datasets/evaluation/vg/vg_eval.py b/maskrcnn_benchmark/data/datasets/evaluation/vg/vg_eval.py
@@ -12,7 +12,7 @@
 from maskrcnn_benchmark.structures.bounding_box import BoxList
 from maskrcnn_benchmark.structures.boxlist_ops import boxlist_iou
 from maskrcnn_benchmark.utils.miscellaneous import intersect_2d, argsort_desc, bbox_overlaps
-from maskrcnn_benchmark.data.datasets.evaluation.vg.sgg_eval import SGRecall, SGNoGraphConstraintRecall, SGZeroShotRecall, SGPairAccuracy, SGMeanRecall, SGAccumulateRecall
+from maskrcnn_benchmark.data.datasets.evaluation.vg.sgg_eval import SGRecall, SGNoGraphConstraintRecall, SGZeroShotRecall, SGNGZeroShotRecall, SGPairAccuracy, SGMeanRecall, SGNGMeanRecall, SGAccumulateRecall
 
 def do_vg_evaluation(
     cfg,
@@ -129,6 +129,11 @@ def do_vg_evaluation(
         eval_zeroshot_recall = SGZeroShotRecall(result_dict)
         eval_zeroshot_recall.register_container(mode)
         evaluator['eval_zeroshot_recall'] = eval_zeroshot_recall
+
+        # test on no graph constraint zero-shot recall
+        eval_ng_zeroshot_recall = SGNGZeroShotRecall(result_dict)
+        eval_ng_zeroshot_recall.register_container(mode)
+        evaluator['eval_ng_zeroshot_recall'] = eval_ng_zeroshot_recall
         
         # used by https://github.com/NVIDIA/ContrastiveLosses4VRD for sgcls and predcls
         eval_pair_accuracy = SGPairAccuracy(result_dict)
@@ -140,6 +145,11 @@ def do_vg_evaluation(
         eval_mean_recall.register_container(mode)
         evaluator['eval_mean_recall'] = eval_mean_recall
 
+        # used for no graph constraint mean Recall@K
+        eval_ng_mean_recall = SGNGMeanRecall(result_dict, num_rel_category, dataset.ind_to_predicates, print_detail=True)
+        eval_ng_mean_recall.register_container(mode)
+        evaluator['eval_ng_mean_recall'] = eval_ng_mean_recall
+
         # prepare all inputs
         global_container = {}
         global_container['zeroshot_triplet'] = zeroshot_triplet
@@ -156,12 +166,15 @@ def do_vg_evaluation(
         
         # calculate mean recall
         eval_mean_recall.calculate_mean_recall(mode)
+        eval_ng_mean_recall.calculate_mean_recall(mode)
         
         # print result
         result_str += eval_recall.generate_print_string(mode)
         result_str += eval_nog_recall.generate_print_string(mode)
         result_str += eval_zeroshot_recall.generate_print_string(mode)
+        result_str += eval_ng_zeroshot_recall.generate_print_string(mode)
         result_str += eval_mean_recall.generate_print_string(mode)
+        result_str += eval_ng_mean_recall.generate_print_string(mode)
         
         if cfg.MODEL.ROI_RELATION_HEAD.USE_GT_BOX:
             result_str += eval_pair_accuracy.generate_print_string(mode)
@@ -246,6 +259,7 @@ def evaluate_relation_of_one_image(groundtruth, prediction, global_container, ev
 
     # to calculate the prior label based on statistics
     evaluator['eval_zeroshot_recall'].prepare_zeroshot(global_container, local_container)
+    evaluator['eval_ng_zeroshot_recall'].prepare_zeroshot(global_container, local_container)
 
     if mode == 'predcls':
         local_container['pred_boxes'] = local_container['gt_boxes']
@@ -296,8 +310,12 @@ def evaluate_relation_of_one_image(groundtruth, prediction, global_container, ev
     evaluator['eval_pair_accuracy'].calculate_recall(global_container, local_container, mode)
     # Mean Recall
     evaluator['eval_mean_recall'].collect_mean_recall_items(global_container, local_container, mode)
+    # No Graph Constraint Mean Recall
+    evaluator['eval_ng_mean_recall'].collect_mean_recall_items(global_container, local_container, mode)
     # Zero shot Recall
     evaluator['eval_zeroshot_recall'].calculate_recall(global_container, local_container, mode)
+    # No Graph Constraint Zero-Shot Recall
+    evaluator['eval_ng_zeroshot_recall'].calculate_recall(global_container, local_container, mode)
 
     return