## AI帮你理解科学

## AI 精读

AI抽取本论文的概要总结

微博一下：

# Optimal Approximation - Smoothness Tradeoffs for Soft-Max Functions

NIPS 2020, (2020)

EI

摘要

A soft-max function has two main efficiency measures: (1) approximation - which corresponds to how well it approximates the maximum function, (2) smoothness - which shows how sensitive it is to changes of its input. Our goal is to identify the optimal approximation-smoothness tradeoffs for different measures of approximation and smoothn...更多

代码：

数据：

简介

- A soft-max function is a mechanism for choosing one out of a number of options, given the value of each option.
- The authors will show that if the p-norm and the Rényi divergence are used to measure distances in the domain and the range, respectively, the exponential mechanism achieves the lowest possible Lipschitz constant among all δ-approximate soft-max functions.
- The authors show that for these distance measures, there is no soft-max function with bounded Lipschitz constant that can guarantee worst-case approximation.

重点内容

- A soft-max function is a mechanism for choosing one out of a number of options, given the value of each option
- We focus on ( p, q)-Lipschitz functions which are the soft-max functions that are used in mechanism design and in machine learning setting
- If the revenue objective function w has bounded Sq(w) sensitivity for some q < log(d), using PLSoftMax we get a significantly better revenueincentive compatibility tradeoff compared to using the exponential mechanism
- Based on the above definition, we prove that the error of the power mechanism, under the assumption of O(1)-multiplicative insensitivity, is asymptotically better than the error of the exponential mechanism

结果

- The authors construct a soft-max function that achieves a Lipschitz constant of O(1/δ) and is δ-approximate in the worst case.
- The authors prove that even only requiring δ-approximation in expectation, no soft-max function can achieve a Lipschitz constant of o(1/δ) for these distance measures.
- The authors show that with the standard p-norm distance as the domain distance measure, no soft-max function with bounded Lipschitz constant and multiplicative approximation guarantee exists.
- Let c, δ > 0, and assume f : Rd → ∆d−1 is a soft-max function that is δ-approximate and ( ∞, 1)-Lipschitz continuous with a Lipschitz constant of at most c.
- P, 1)-Lipschitz constant of a δ-approximate exponential soft-max function is at least log d 2δ
- The combination of the above result and Theorem 4.3 shows that in terms of the ( p, 1)Lipschitz constant, there is a gap of Θ(log d) between the exponential function and PLSoftMax. 5 Other variants and desirable properties
- Let f : Rd → ∆d−1 be a soft-max function that is δ-approximate and ( p, χ)-Lipschitz for a distance measure χ.
- Applying this proposition to PLSoftMax, the authors obtain a soft-max function called LogPLSoftMax that is δ-multiplicative-approximate in the worst case and (Log- p, q)-Lipschitz.
- A soft-max function can be used to design an incentive compatible mechanism as follows: Assume f : Rd → ∆d−1 is (χ, 1)-Lipschitz with respect to some domain distance measure χ.

结论

- If the soft-max function f has low (χ, 1)-Lipschitz constant, and the objective w has low sensitivity with respect to χ, the authors can use the following theorem to obtain an ε-incentive compatible mechanism.
- For some distance metric χ of Rd+, the soft-max function f satisfies D∞ ( f (x) f (y)) ≤ L · χ(x, y) ∀x, y ∈ Rd+ and can be used to design differentially private algorithms when the objective function has low χ sensitivity, according to the following lemma.
- In this experiments the authors manipulated randomly a submodular optimization instance, and measured how the output distribution of a differentially private soft-max (Power and Exponential mechanism with a given parameter) is affected by the manipulation (x-axis).

相关工作

- A lot of work has been done in designing soft-max function that fit better to specific applications. In Deep Learning applications, the exponential mechanism does not allow to take advantage of the sparsity of the categorical targets during the training. Several methods have been proposed to take use of this sparsity. Hierarchical soft-max uses a heuristically defined hierarchical tree to define a soft-max function with only a few outputs [MB05, MSC+13]. Another direction is the use of a spherically symmetric soft-max function together with a spherical class of loss functions that can be used to perform back-propagation step much more efficiently [VDBB15, dBV15]. Finally there has been a line of work that targets the design of soft-max functions whose output favors sparse distributions [MA16, LCA+18].

基金

- MZ was supported by a Google Ph.D

引用论文

- [BBHM05] M-F Balcan, Avrim Blum, Jason D Hartline, and Yishay Mansour. Mechanism design via machine learning. In Foundations of Computer Science, 2005. FOCS 2005. 46th Annual IEEE Symposium on, pages 605–614. IEEE, 2005.
- Ludwig Boltzmann. Studien uber das gleichgewicht der lebenden kraft. Wissenschafiliche Abhandlungen, 1:49–96, 1868.
- John S. Bridle. Probabilistic interpretation of feedforward classification network outputs, with relationships to statistical pattern recognition. In Françoise Fogelman Soulié and Jeanny Hérault, editors, Neurocomputing, pages 227–236, Berlin, Heidelberg, 1990. Springer Berlin Heidelberg.
- John S. Bridle. Training stochastic model recognition algorithms as networks can lead to maximum mutual information estimation of parameters. In D. S. Touretzky, editor, Advances in Neural Information Processing Systems 2, pages 211–217, 1990.
- Yang Cai and Constantinos Daskalakis. Learning multi-item auctions with (or without) samples. arXiv preprint arXiv:1709.00228, 2017.
- Richard Cole and Tim Roughgarden. The sample complexity of revenue maximization. In Proceedings of the forty-sixth annual ACM symposium on Theory of computing, pages 243–252. ACM, 2014.
- [dBV15] Alexandre de Brébisson and Pascal Vincent. An exploration of softmax alternatives belonging to the spherical loss family. arXiv preprint arXiv:1511.05042, 2015.
- Nikhil R Devanur, Zhiyi Huang, and Christos-Alexandros Psomas. The sample complexity of auctions with side information. In Proceedings of the forty-eighth annual ACM symposium on Theory of Computing, pages 426–439. ACM, 2016.
- Konstantinos Drakakis and BA Pearlmutter. On the calculation of the l2→ l1 induced matrix norm. International Journal of Algebra, 3(5):231–240, 2009.
- Cynthia Dwork and Aaron Roth. The algorithmic foundations of differential privacy. Foundations and Trends in Theoretical Computer Science, 9(3–4):211–407, 2014.
- Cynthia Dwork, Guy N Rothblum, and Salil Vadhan. Boosting and differential privacy. In 2010 IEEE 51st Annual Symposium on Foundations of Computer Science, pages 51–60. IEEE, 2010.
- [DRY15] Peerapong Dhangwatnotai, Tim Roughgarden, and Qiqi Yan. Revenue maximization with a single sample. Games and Economic Behavior, 91:318–333, 2015.
- [EMZ17] Alessandro Epasto, Vahab Mirrokni, and Morteza Zadimoghaddam. Bicriteria distributed submodular maximization in a few rounds. In SPAA. ACM, 2017.
- [GBC16] Ian Goodfellow, Yoshua Bengio, and Aaron Courville. Deep Learning. MIT Press, 2016.
- [Gib02] J.W. Gibbs. Elementary Principles in Statistical Mechanics: Developed with Especial Reference to the Rational Foundations of Thermodynamics. C. Scribner’s sons, 1902.
- Irving L Glicksberg. A further generalization of the kakutani fixed point theorem, with application to nash equilibrium points. Proceedings of the American Mathematical Society, 3(1):170–174, 1952.
- Zhiyi Huang and Sampath Kannan. The exponential mechanism for social welfare: Private, truthful, and nearly optimal. In Proceedings of the 2012 IEEE 53rd Annual Symposium on Foundations of Computer Science, FOCS ’12, page 140–149, USA, 2012. IEEE Computer Society.
- [HO10] Julien M Hendrickx and Alex Olshevsky. Matrix p-norms are np-hard to approximate if p = 1, 2, ∞. SIAM Journal on Matrix Analysis and Applications, 31(5):2802–2812, 2010.
- Michael I. Jordan and Robert A. Jacobs. Hierarchical mixtures of experts and the em algorithm. Neural Comput., 6(2):181–214, March 1994.
- [LCA+18] Anirban Laha, Saneem Ahmed Chemmengath, Priyanka Agrawal, Mitesh Khapra, Karthik Sankaranarayanan, and Harish G Ramaswamy. On controllable sparse alternatives to softmax. In Advances in Neural Information Processing Systems, pages 6422–6432, 2018.
- [Luc59] R. Duncan Luce. Individual Choice Behavior: A Theoretical Analysis. John Wiley & Sons, 1959.
- Andre Martins and Ramon Astudillo. From softmax to sparsemax: A sparse model of attention and multi-label classification. In International Conference on Machine Learning, pages 1614–1623, 2016.
- [MBKK17] Marko Mitrovic, Mark Bun, Andreas Krause, and Amin Karbasi. Differentially private submodular maximization: Data summarization in disguise. In ICML, 2017.
- Jamie H Morgenstern and Tim Roughgarden. On the pseudo-dimension of nearly optimal auctions. In Advances in Neural Information Processing Systems, pages 136–144, 2015.
- [MSC+13] Tomas Mikolov, Ilya Sutskever, Kai Chen, Greg S Corrado, and Jeff Dean. Distributed representations of words and phrases and their compositionality. In Advances in neural information processing systems, pages 3111–3119, 2013.
- Frank McSherry and Kunal Talwar. Mechanism design via differential privacy. In Foundations of Computer Science, 2007. FOCS’07. 48th Annual IEEE Symposium on, pages 94–103. IEEE, 2007.
- [Mye81] Roger B Myerson. Optimal auction design. Mathematics of operations research, 6(1):58– 73, 1981.
- [Roh00] Jirí Rohn. Computing the norm a ∞, 1 is np-hard∗. Linear and Multilinear Algebra, 47(3):195–204, 2000.
- [RTCY12] Tim Roughgarden, Inbal Talgam-Cohen, and Qiqi Yan. Supply-limiting mechanisms. In Proceedings of the 13th ACM Conference on Electronic Commerce, pages 844–861. ACM, 2012.
- Richard S. Sutton and Andrew G. Barto. Reinforcement Learning: An Introduction. A Bradford Book, Cambridge, MA, USA, 2018.
- [THB86] Edward Charles Titchmarsh and David Rodney Heath-Brown. The theory of the Riemann zeta-function. Oxford University Press, 1986.
- [VDBB15] Pascal Vincent, Alexandre De Brébisson, and Xavier Bouthillier. Efficient exact gradient update for training deep networks with very large sparse targets. In Advances in Neural Information Processing Systems, pages 1108–1116, 2015.
- 1. Therefore, DKL f y(a,b) f (xa) ≥ log (d) − 2 y(a,b) − xa p
- 0. This finishes the proof that f (x) is a probability distribution.
- 1. Buyers may strategize in the collection of samples. If the buyers know that the seller is going to collect samples to estimate the optimal auction to run then they have incentives to strategize so that the seller chooses lower prices and hence they get more utility.
- 2. Constant approximation is not always a satisfying guarantee. The constant approximation is a worst case guarantee and hence the constant approximation mechanisms might fail to get almost optimal revenue even in the instances where this is easy. A popular alternative in practical applications of mechanism design is to choose the optimal from a set of simple mechanisms.
- 1. Initialize So = ∅. Let |D| = d and D = {v1,..., vd}. 2. For i ∈ [k]: a. Define qi: D \ Si−1 → R as qi(v) = h(Si−1 ∪ {v}) − f (Si−1).
- 1. But for any z ≤ 1 it is easy to see that ez ≤ 1 + ez and hence

标签

评论

数据免责声明

页面数据均来自互联网公开来源、合作出版商和通过AI技术自动分析结果，我们不对页面数据的有效性、准确性、正确性、可靠性、完整性和及时性做出任何承诺和保证。若有疑问，可以通过电子邮件方式联系我们：report@aminer.cn