# Mission Impossible：Rank类损失函数的直接优化

#### Background

$$\frac{1}{m}\sum_{i=1}^{m} \mathbb I (f(x_i) \neq y_i)$$

$$1 – AUC = l_{rank} = \frac{1}{m^+m^-}\sum_{x^+\in D^+} \sum_{x^-\in D^-}( \mathbb I (f(x^+) < f(x^-)))$$

$$S_1 = \{0.9, 0.8, 0.3, 0.7, 0.6\}\\ S_2 = \{0.8, 0.7. 0.6, 0.9, 0.3\}$$

#### Preliminary

\begin{align} tp &= \sum_{i \in Y^+} 1 – l_{01}(f_b, x_i, y_i)\\ fp &= \sum_{i \in Y^-} l_{01}(f_b, x_i, y_i) \end{align}

\begin{align} tp_l &= \sum_{i \in Y^+} 1 – l_h(f_b, x_i, y_i) \leq tp\\ fp^u &= \sum_{i \in Y^-} l_h(f_b, x_i, y_i) \geq fp \end{align}

#### Maximize R@P

$$R@P_{\alpha} = \max R(f) \quad s.t. \quad P(f) \geq \alpha$$

\begin{align} P &= \frac{tp}{tp+fp} \\ R &= \frac{tp}{tp + fn} = \frac{tp}{|Y^+|} \end{align}

$$\max \frac{1}{|Y^+|} tp \quad s.t. \quad tp \geq \alpha(tp + fp)$$

$$\overline{R@P_{\alpha}} = \max \frac{1}{|Y^+|}tp_l \quad s.t. \quad (1 – \alpha)tp_l \geq \alpha fp^u$$

$$\mathcal{L}^+ = \sum_{i \in Y^+} l_h(f, x_i, y_i)$$

$$\overline{R@P_{\alpha}} = \max 1 – \frac{\mathcal{L}^+}{|Y^+|} \\ s.t. \quad (1 – \alpha)(|Y^+| – \mathcal{L}^+) \geq \alpha \mathcal{L}^-$$

\begin{align}&\min \mathcal{L}^+\\ &s.t. \alpha \mathcal{L}^- + (1 – \alpha)(\mathcal{L}^+ – |Y^+|)\end{align}

$$\min_{f} \max_{\lambda \geq 0} \mathcal{L}^+ + \lambda[ \alpha \mathcal{L}^- + (1 – \alpha)(\mathcal{L}^+ – |Y^+|) ]$$

$$\min_{f} \max_{\lambda \geq 0} [1 + \lambda(1 – \alpha)]\mathcal{L}^+ + \lambda \alpha \mathcal{L}^- – \lambda(1 – \alpha)|Y^+|$$

#### Here We Go

$$AUCPR = \max_f \int R@P_{\alpha} d\alpha$$

$$\min_{f,b_1,\dots, b_k} \max_{\lambda_1,\dots, \lambda_k \geq 0}\sum_{t=1}^k \Delta_t [1 + \lambda_t(1 – \alpha_t)]\mathcal{L}^+(f, b_t) + \lambda_t \alpha_t \mathcal{L}^-(f, b_t) – \lambda_t(1 – \alpha_t)|Y^+|$$

#### What’s More

Pytext Tensorflow Research 中实现了文中的算法。源码的注释非常完善，我这里就不做解读了，可以直接点开链接阅读。值得注意的是，TensorFlow Research 中对AUCROC的实现并非是按照原文的思路进行的，而是将AUCROC视为我们前面所讲的正确排序的概率，在每个 batch 中使得这个概率最大，以期达到全局最优。

#### Some Thoughts

Reference:

Eban, E. E., Schain, M., Mackey, A., Gordon, A., Saurous, R. A., & Elidan, G. (2016). Scalable learning of non-decomposable objectives. arXiv preprint arXiv:1608.04802.
Narasimhan, H., Kar, P., & Jain, P. (2015, June). Optimizing non-decomposable performance measures: A tale of two classes. In International Conference on Machine Learning (pp. 199-208).
Nedić, A., & Ozdaglar, A. (2009). Subgradient methods for saddle-point problems. Journal of optimization theory and applications142(1), 205-228.
Sanyal, A., Kumar, P., Kar, P., Chawla, S., & Sebastiani, F. (2018). Optimizing non-decomposable measures with deep networks. Machine Learning107(8-10), 1597-1620.
Yan, L., Dodier, R. H., Mozer, M., & Wolniewicz, R. H. (2003). Optimizing classifier performance via an approximation to the Wilcoxon-Mann-Whitney statistic. In Proceedings of the 20th International Conference on Machine Learning (ICML-03) (pp. 848-855).