*October 2019*

tl;dr: Extend the work of deep3Dbox by regressing residual center positions.

#### Overall impression

The paper has a good summary on mono 3DOD in introduction.

The geometric constraints become a closed-formed one. This is similar to deep3Dbox but slightly different (over-constraint vs exact-constraint).

The idea of shift RCNN and FQNet are quite similar. Both builds on deep3Dbox and refines the first guess. But FQNet passively densely sample around the GT and train a regressor to tell the difference to GT, shift RCNN actively learns to regress the difference. The followup work of FQNet is RAR-Net which also actively predicts the offset, but does that iteratively with a DRL agent.

#### Key ideas

- RoiAligned feature to regress 3D orientation and 3D dimension.
- Optimization to solve for 3D bbox location tâ.
- Shift Net work is 2 layer FC network to regress improved final translation of 3D center tââ. The input features are tâ, 2d bbox, dimension, local yaw, global yaw, and camera projection matrix.
- The volume displacement loss is decomposed into 3 sums of 3 terms, each term is $\Delta x \times h \times w$ and alike. w and h are estimated 3D dimension.

#### Technical details

- They used best IoU to pick the best configuration. This is a bit different from the previous method of picking one that mininizes residual from least square fitting, such as FQNet or Deep3DBox. This is also used in MVRA.

#### Notes

- Questions and notes on how to improve/revise the current work