You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardexpand all lines: METRICS.md
+7-1
Original file line number
Diff line number
Diff line change
@@ -2,15 +2,21 @@
2
2
### Recall@K (R@K)
3
3
The earliest and the most widely accepted metric in scene graph generation, which is firstly adopted by [Visual relationship detection with language priors](https://arxiv.org/abs/1608.00187). Since the ground-truth annotations of relationships are incomplete, it's improper to use simple accurary as the metric. Therefore, Lu et al. transfer it to a retrieve-like problem: the relationships are not only required to be correctly classified, but also required to have as higher score as possible, so they can be retrieved from plenty of 'none' relationship pairs.
4
4
5
-
### No Graph Constraint Recall@K (ngR@K)
5
+
### No Graph Constraint Recall@K (ng-R@K)
6
6
It's firstly used by [Pixel2Graph](https://arxiv.org/abs/1706.07365) and named by [Neural-MOTIFS](https://arxiv.org/abs/1711.06640). The former paper significantly improves the R@K results by allowing each pair to have multiple predicates, which means for each subject-object pair, all the 50 predicates will be involved in the recall ranking not just the one with highest score. Since predicates are not exclusive, 'on' and 'riding' can both be correct. This setting significantly improves the R@K. To fairly compare with other methods, [Neural-MOTIFS](https://arxiv.org/abs/1711.06640) named it as the No Graph Constraint Recall@K (ngR@K).
7
7
8
8
### Mean Recall@K (mR@K)
9
9
It is proposed by our work [VCTree](https://arxiv.org/abs/1812.01880) and Chen et al.s'[KERN](https://arxiv.org/abs/1903.03326) at the same time (CVPR 2019), although we didn't make it as our main contribution and only listed the full results on the [supplementary material](https://zpascal.net/cvpr2019/Tang_Learning_to_Compose_CVPR_2019_supplemental.pdf). However, we also acknowledge the contribution of [KERN](https://arxiv.org/abs/1903.03326), for they gave more mR@K results of previous methods. The main motivation of Mean Recall@K (mR@K) is that the VisualGenome dataset is biased towards dominant predicates. If the 10 most frequent predicates are correctly classified, the accuracy would reach 90% even the rest 40 kinds of predicates are all wrong. This is definitely not what we want. Therefore, Mean Recall@K (mR@K) calculates Recall@K for each predicate category independently then report their mean.
10
10
11
+
### No Graph Constraint Mean Recall@K (ng-mR@K)
12
+
The same mean Recall metric, but for each pair of objects, all possible predicates are valid candidates (the original mean Recall@K only considers the predicate with maximum score of each pair as the valid candidate to calculate Recall).
13
+
11
14
### Zero Shot Recall@K (zR@K)
12
15
It is firstly used by [Visual relationship detection with language priors](https://arxiv.org/abs/1608.00187) for VRD dataset, and firstly reported by [Unbiased Scene Graph Generation from Biased Training](https://arxiv.org/abs/2002.11949) for VisualGenome dataset. In short, it only calculates the Recall@K for those subject-predicate-object combinations that not occurred in the training set.
13
16
17
+
### No Graph Constraint Zero Shot Recall@K (ng-zR@K)
18
+
The same zero-shot Recall metric, but for each pair of objects, all possible predicates are valid candidates (the original zero-shot Recall@K only considers the predicate with maximum score of each pair as the valid candidate to calculate Recall).
19
+
14
20
### Top@K Accuracy (A@K)
15
21
It is actually caused by the misunderstanding of PredCls and SGCls protocols. [Contrastive Losses](https://arxiv.org/abs/1903.02728) reported Recall@K of PredCls and SGCls by not just giving ground-truth bounding boxes, but also giving the ground-truth subject-object pairs, so no ranking is involved. The results can only be considerred as Top@K Accuracy (A@K) for the given K ground-truth subject-object pairs.
Copy file name to clipboardexpand all lines: README.md
+5-6
Original file line number
Diff line number
Diff line change
@@ -6,13 +6,17 @@
6
6
7
7
Our paper [Unbiased Scene Graph Generation from Biased Training](https://arxiv.org/abs/2002.11949) has been accepted by CVPR 2020 (Oral).
8
8
9
+
## Recent Updates
10
+
11
+
-[x] 2020.06.23 [No Graph Constraint Mean Recall@K (ng-mR@K) and No Graph Constraint Zero-Shot Recall@K (ng-zR@K)](METRICS.md#explanation-of-our-metrics)
12
+
9
13
## Contents
10
14
11
15
1.[Overview](#Overview)
12
16
2.[Install the Requirements](INSTALL.md)
13
17
3.[Prepare the Dataset](DATASET.md)
14
18
4.[Metrics and Results for our Toolkit](METRICS.md)
15
-
-[Explanation of R@K, ngR@K, mR@K, zR@K, A@K, S2G](METRICS.md#explanation-of-our-metrics)
19
+
-[Explanation of R@K, mR@K, zR@K, ng-R@K, ng-mR@K, ng-zR@K, A@K, S2G](METRICS.md#explanation-of-our-metrics)
@@ -181,11 +185,6 @@ The proposed unbiased counterfactual inference in our paper [Unbiased Scene Grap
181
185
182
186
If you think about our advice, you may realize that the only rule is to maintain the independent causal influence from each branch to the target node as stable as possible, and use the causal influence fusion functions that are explicit and explainable. It's probably because the causal effect is very human-centric/subjective/recognizable (sorry, I don't know which word I should use here to express my intuition.), so those unexplainable fusion functions and implicit combined single loss (without auxiliary losses when multiple branches are involved) will mess up influences with different sources.
183
187
184
-
## To Do List
185
-
186
-
-[x] Publish Visualization Tool for SGG
187
-
-[ ] Reorganize Code and Instructions of S2G Retrieval
188
-
189
188
## Citations
190
189
191
190
If you find this project helps your research, please kindly consider citing our papers in your publications.
0 commit comments