Log in

Imperfect Best-Response Mechanisms

  • Published:
Theory of Computing Systems Aims and scope Submit manuscript

Abstract

Best-response Mechanisms, introduced by Nisan et al. (2011) provide a unifying framework for studying various distributed protocols in which the participants are instructed to repeatedly best respond to each others’ strategies. Two fundamental features of these mechanisms are convergence and incentive compatibility. This work investigates convergence and incentive compatibility conditions of such mechanisms when players are not guaranteed to always best respond but they rather play an imperfect best-response strategy. That is, at every time step every player deviates from the prescribed best-response strategy according to some probability parameter. The results explain to what extent convergence and incentive compatibility depend on the assumption that players never make mistakes, and how robust such protocols are to “noise” or “mistakes”.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save

Springer+ Basic
EUR 32.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or Ebook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Price includes VAT (Germany)

Instant access to the full article PDF.

Fig. 1

Similar content being viewed by others

Notes

  1. When the minimum difference in the payoff of an agent between a best-response and a non-best response is γ, this extends easily by taking β γ =βγ in place of β.

  2. The analysis of the Coupon Collector’s says that, if we pick one player uniformly at random and repeat this for \(R\geq n\ln n + cn\) time steps, then the probability that some player has never been selected is at most e c. This probability is smaller than 𝜖 for \(c\geq \ln (1/\epsilon )\).

  3. Nisan et al. [19] assume that each player has also a tie breaking rule \(\prec _{i}\), i.e., a total order on S i , that depends solely on the player’s private information. In the case that a tie breaking rule \(\prec _{i}\) has been defined for player i, then s i is a NBR for i also if \(u_{i}(s_{i}, \mathbf {s}_{-i}) = u_{i}(s_{i}^{\prime }, \mathbf {s}_{-i})\) and \(s_{i} \prec _{i} s_{i}^{\prime }\). However, such tie-breaking rule can be implemented in a game by means of suitable perturbations of the utility function: with such an implementation our definition become equivalent to the one given in [19].

  4. A game is a potential game if there exists a function Φ such that, for all i, for all s i , and for all \(s_{i},s_{i}^{\prime }\), it holds that \( {\Phi }(s_{i},\mathbf {s}_{-i}) - {\Phi }(s_{i}^{\prime },\mathbf {s}_{-i}) = u_{i}(s_{i}^{\prime },\mathbf {s}_{-i}) - u_{i}(s_{i},\mathbf {s}_{-i}). \) Such function Φ is called a potential function of the game.

  5. See Appendix B for a review of the main properties of the total variation distance.

  6. In the conference version of this work [9], we claimed that in a modified version of PageRank games [14], there exists a subgame which is a potential game and thus our results can be combined with [7] to obtain a good approximation of the logit dynamics for these games. Unfortunately, this claim was wrong and the logit dynamics for this subgame is in general not easy to analyze.

References

  1. Alós-Ferrer, C., Netzer, N.: The logit-response dynamics. Games Econ. Behav. 68(2), 413–427 (2010)

    Article  MATH  Google Scholar 

  2. Auletta, V., Ferraioli, D., Pasquale, F., Penna, P., Persiano, G.: Convergence to equilibrium of logit dynamics for strategic games. In: Proceedings of the 23rd Annual ACM Symposium on Parallelism in Algorithms and Architectures (SPAA), pp. 197–206. ACM (2011)

  3. Auletta, V., Ferraioli, D., Pasquale, F., Penna, P., Persiano, G.: Logit dynamics with concurrent updates for local interaction games. In: Proceedings of the 21st Annual European Symposium on Algorithms (ESA), vol. 8125 of LNCS, pp. 73–84 (2013)

  4. Auletta, V., Ferraioli, D., Pasquale, F., Persiano, G.: Mixing time and stationary expected social welfare of logit dynamics. Theory Comput. Syst. 53(1), 3–40 (2013)

    Article  MATH  MathSciNet  Google Scholar 

  5. Auletta, V., Moscardelli, L., Penna, P., Persiano, G.: Interference games in wireless networks. In: Proceedings of the 4th International Workshop On Internet And Network Economics (WINE), vol. 5385 of LNCS, pp 278–285 (2008)

  6. Aumann, R.J.: Subjectivity and correlation in randomized strategies. J. Math. Econ. 1(1), 67–96 (1974)

    Article  MATH  MathSciNet  Google Scholar 

  7. Blume, L.E.: The statistical mechanics of strategic interaction. Games Econ. Behav. 5, 387–424 (1993)

    Article  MATH  MathSciNet  Google Scholar 

  8. Candogan, O., Ozdaglar, A., Parrilo, P.A.: Near-potential Games : Geometry and dynamics. ACM Trans. Econ. Comput. 1(2), 11:1–11:32 (2013)

    Article  Google Scholar 

  9. Ferraioli, D., Penna, P.: Imperfect best-response mechanisms. In: Proceedings of the 6th International Symposium on Algorithmic Game Theory (SAGT), pp. 243–254 (2013)

  10. Gao, L., Rexford, J.: Stable internet routing without global coordination. IEEE/ACM Trans. Netw. 9(6), 681–692 (2001)

    Article  Google Scholar 

  11. Godfrey, B., Schapira, M., Zohar, A., Shenker, S.: Incentive compatibility and dynamics of congestion control. In: Proceedings of the International Conference on Measurement and Modeling of Computer Systems (SIGMETRICS), pp. 95–106 (2010)

  12. Goemans, M., Mirrokni, V., Vetta, A.: Sink equilibria and convergence. In: Proceedings of the 46th Annual IEEE Symposium on Foundations of Computer Science (FOCS), pp. 142–154. IEEE (2005)

  13. Harsanyi, J.C., Selten, R.: A General Theory of Equilibrium Selection in Games. MIT Press, Cambridge (1988)

    MATH  Google Scholar 

  14. Hopcroft, J, Sheldon, D.: Network reputation games. Computing and Information Science Technical Reports, Cornell University. (2008). http://hdl.handle.net/1813/11579

  15. Kandori, M., Mailath, G.J., Rob, R.: Learning, mutation, and long run equilibria in games. Econometrica 61(1), 29–56 (1993)

    Article  MATH  MathSciNet  Google Scholar 

  16. Kandori, M., Rob, R.: Evolution of equilibria in the long run: a general theory and applications. J. Econ. Theory 65(2), 383–414 (1995)

    Article  MATH  MathSciNet  Google Scholar 

  17. Levin, H., Schapira, M., Zohar, A.: Interdomain routing and games. SIAM J. Comput. 40(6), 1892–1912 (2011)

    Article  MATH  MathSciNet  Google Scholar 

  18. Nisan, N., Schapira, M., Valiant, G., Zohar, A.: Best-response auctions. In: Proceedings of the 12th ACM Conference on Electronic Commerce (EC), pp. 351–360 (2011)

  19. Nisan, N., Schapira, M., Valiant, G., Zohar, A.: Best-response mechanisms. In: Proceedings of the 2nd Symposium on Innovations in Computer Science (ICS), pp. 155–165 (2011)

  20. Hobart Peyton Young: The evolution of conventions. Econometrica 61(1), 57–84 (1993)

    Article  MathSciNet  Google Scholar 

Download references

Acknowledgements

Part of the work has been done when both authors were at the Università di Salerno and later when the first author was at Université Paris Dauphine.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Diodato Ferraioli.

Appendices

Appendix A: Models for limited knowledge and bounded rationality

1.1 A.1 Mutation and mistakes models

The mutation and the mistakes models adopt the same response rule: at every time step, each selected player updates her strategy to the best response to what other players are currently doing except with probability ε. With such a probability, a mutation or mistake occurs, meaning that the selected player chooses a strategy uniformly at random. That is, suppose player i is selected at time step t and the current strategy profile is s t. We denote with \(b_{i}(\mathbf {s}^{t})\) the best response of i to profile s t (if more than one best response exists and the current strategy \({x^{t}_{i}}\) of i is a best response, then we set \(b_{i}(\mathbf {s}^{t}) = {x^{t}_{i}}\), otherwise we choose one of the best responses uniformly at random). Then, a strategy s j S i will be selected by i with probability

$$p_{ij} = \left\{\begin{array}{ll} (1-\varepsilon) + \varepsilon \cdot \frac{1}{|S_{i}|}, & \text{if } s_{j} = b_{i}(\mathbf{s}^{t});\\ \varepsilon \cdot \frac{1}{|S_{i}|}, & \text{otherwise.} \end{array}\right. $$

The main difference between these models concerns the schedule: the mutation model assumes that at each time step every player is selected for update; the mistakes model assumes that at each time step only one player is selected uniformly at random for update.

1.2 A.2 Logit dynamics

Logit dynamics is another kind of imperfect best response dynamics in which the probability of deviating from best response is determined by a parameter β≥0. The logit dynamics for a game G with parameter β runs as follows. At every time step

  1. 1.

    Select one player i uniformly at random;

  2. 2.

    Update the strategy of player i according to the following probability distribution:

    $$\sigma_{i}(s_{i},\mathbf{s}_{-i}) = \frac{e^{\beta u_{i}(s_{i},\mathbf{s}_{-i})}}{Z(\mathbf{s}_{-i})}, $$

    where \(Z(\mathbf {s}_{-i})={\sum }_{s_{i}^{\prime }}e^{\beta u_{i}(s_{i}^{\prime },\mathbf {s}_{-i})}\) and β≥0.

Remark 5

The parameter β is sometimes called the “inverse noise” parameter: for β=0 the player chooses the strategy uniformly at random, while for \(\beta \rightarrow \infty \) the player chooses a best response (if more than one best response is available, she selects one of these uniformly at random). Thus, the probability that i plays a best response is guaranteed to be least 1/|S i |. So, every logit dynamics is p-imperfect for some p<1. Moreover, if the payoffs between a best response and a non-best response differ by at least γ, then the dynamics is p-imperfect with p≤(m−1)/(m−1+e γβ).

The above dynamics defines an ergodic Markov chain [7] over the set of all strategy profiles and with transition probabilities P given by

$$P(\mathbf{s}, \mathbf{s}^{\prime}) = \frac{1}{n} \cdot \left\{\begin{array}{ll} \sigma_{i}(s_{i} ,\mathbf{s}_{-i}), & \quad \text{ if } \mathbf{s}_{-i} = \mathbf{s}^{\prime}_{-i} \text{ and } s_{i} \neq s^{\prime}_{i}; \\ {\sum}_{i=1}^{n} \sigma_{i}(s_{i} , \mathbf{s}_{-i}), & \quad \text{ if } \mathbf{s} = \mathbf{s}^{\prime}; \\ 0, & \quad \text{ otherwise.} \end{array}\right. $$

Since the Markov chain is ergodic, the chain converges to its (unique) stationary distribution π. That is, for any starting profile s and any profile \(\mathbf {s}^{\prime }\),

$$\lim\limits_{t \rightarrow \infty} P^{t}(\mathbf{s},\mathbf{s}^{\prime}) = \pi(\mathbf{s}^{\prime}), $$

where \(P^{t}(\mathbf {s},\mathbf {s}^{\prime })\) is the probability that the dynamics starting with profile s is in profile \(\mathbf {s}^{\prime }\) after t steps. Thus, the stationary distribution represents the equilibrium reached by the logit dynamics, also called the logit equilibrium of game G [4].

The stationary distribution of logit dynamics is fully understood for the class of so-called potential games. We recall that a game is a potential game if there exists a function Φ such that, for all i, for all s i , and for all \(s_{i},s_{i}^{\prime }\), it holds that

$${\Phi}(s_{i},\mathbf{s}_{-i}) - {\Phi}(s_{i}^{\prime},\mathbf{s}_{-i}) = u_{i}(s_{i}^{\prime},\mathbf{s}_{-i}) - u_{i}(s_{i},\mathbf{s}_{-i}). $$

Blume [7] showed that in this case, the stationary distribution of the corresponding logit dynamics is

$$\pi(\mathbf{s}) = \frac{e^{-\beta {\Phi}(\mathbf{s})}}{Z} , $$

where \(Z = {\sum }_{\mathbf {s}^{\prime } \in S} e^{-\beta {\Phi }(\mathbf {s}^{\prime })}\) is the normalizing constant.

Finally, the time to converge to the logit equilibrium is the so-called mixing time of the corresponding Markov chain [4], defined as follows. For any 𝜖>0, consider

$$t_{mix}(\epsilon) := \min_{t \in \mathbb{N}}\max_{\mathbf{s}\in {\Omega}} \left\{ \left\|P^{t}(\mathbf{s},\cdot) - \pi\right\| \leq \epsilon\right\}, $$

where \(\|{P^{t}(\mathbf {s},\cdot ) - \pi }\| = \frac {1}{2}{\sum }_{\mathbf {s}^{\prime \prime } \in {\Omega }} |P^{t}(\mathbf {s},\mathbf {s}^{\prime \prime }) - \pi (\mathbf {s}^{\prime \prime })|\) is the total variation distance (see Appendix A2 for more details). This quantity measures the time needed for the chain (dynamics) to get close to stationary within an additive factor of 𝜖. The mixing time of the chain is simply defined as t m i x :=t m i x (1/2), since it is well-known that \(t_{mix}(\epsilon ) \leq t_{mix}\cdot \log _{2} (1/\epsilon )\) for any 𝜖.

Appendix B: Total variation distance

The total variation distance between distributions μ and \(\hat \mu \) on an enumerable state space Ω is

$$\|{\mu - \hat \mu}\|:= \frac{1}{2}\sum\limits_{x \in {\Omega}} \mid \mu(x) - \hat \mu(x)| = \sum\limits_{\begin{subarray}{c}x \in {\Omega}\\\mu(x) > \hat \mu(x)\end{subarray}} \mu(x) - \hat \mu(x). $$

Note that the total variation distance satisfies the usual triangle inequality of distance measures, i.e.,

$$\|\mu - \hat \mu\| \leq \|{\mu - \mu^{\prime}}\| + \|{\mu^{\prime} - \hat \mu}\|, $$

for any distributions μ and \(\mu ^{\prime }\). Moreover, the following monotonicity properties hold:

$$\begin{array}{@{}rcl@{}} \|\mu P - \hat{\mu P}\| & \leq& \|\mu - \hat{ \mu}\|, \end{array} $$
(8)
$$\begin{array}{@{}rcl@{}} \| \mu P - \mu {\hat P}\| & \leq& \sup\limits_{x \in {\Omega}} \|{P(x,\cdot) - \hat P(x,\cdot)}\|, \end{array} $$
(9)
$$\begin{array}{@{}rcl@{}} \|\mu P - {\hat \mu P}\| & \leq& \sup\limits_{x,y \in {\Omega}} \|{P(x,\cdot) - P(y,\cdot)}\|, \end{array} $$
(10)

where P and \(\hat P\) are stochastic matrices. Indeed, as for (8) we have

$$\begin{array}{@{}rcl@{}} \|{\mu P - \hat \mu P}\| = \|(\mu - \hat \mu) P\| & =& \frac{1}{2} \sum\limits_{y \in {\Omega}} \left| \sum\limits_{x \in {\Omega}} (\mu(x) - \hat \mu(x)) P(x,y)\right|\\ & \leq& \frac{1}{2} \sum\limits_{x \in {\Omega}} \left| \mu(x) - \hat \mu(x)\right| \sum\limits_{y \in {\Omega}} P(x,y)\\ & =& \|{\mu - \hat \mu}\|. \end{array} $$

As for (9) we observe that

$$\begin{array}{@{}rcl@{}} \|{\mu P - \mu \hat P}\| = \|{\mu (P - \hat P)}\| & =& \frac{1}{2} \sum\limits_{y \in {\Omega}} \left| \sum\limits_{x \in {\Omega}} \mu(x) (P(x,y) - \hat P(x,y))\right|\\ & \leq& \sum\limits_{y \in {\Omega}} \left( \frac{1}{2} \sum\limits_{x \in {\Omega}} \mu(x)\left| P(x,y) - \hat P(x,y)\right| \right)\\ \\ & =& \sum\limits_{x \in {\Omega}} \mu(x)\left( \frac{1}{2} \sum\limits_{y \in {\Omega}} \left| P(x,y) - \hat P(x,y)\right| \right)\\ & \leq& \sup\limits_{x \in {\Omega}} \|{P(x,\cdot) - \hat P(x,\cdot)}\|. \end{array} $$

Finally, for (10) we have

$$\begin{array}{@{}rcl@{}} \|{\mu P - \hat \mu P}\| & =& \left\|{\sum\limits_{z \in {\Omega}} \mu(z) \sum\limits_{w \in {\Omega}} \hat \mu(w) \left( P(z,\cdot) - P(w, \cdot) \right)}\right\|\\ & \leq& \sum\limits_{z \in {\Omega}} \mu(z) \sum\limits_{w \in {\Omega}} \hat \mu(w) \|{P(z,\cdot) - P(w, \cdot)}\|\\ & \leq& \sup\limits_{x,y \in {\Omega}} \|{P(x,\cdot) - P(y,\cdot)}\|. \end{array} $$

Appendix C: Equilibria of logit dynamics and NBR-reducible games

In this section we specialize the approach described in Section 5.2 to the case of logit dynamics. First of all, the status “h” is immaterial in this case since the dynamics is by definition Markovian (i.e., the transition probabilities depend only on the current profile).

We next provide a sufficient condition for which the (equilibrium of) the logit dynamics of the subgame is a good approximation of the (equilibrium of) the logit dynamics of the original game. Let G be a game NBR-reducible to \(\hat G\), and let π β and \(\hat \pi _{\beta }\) denote the stationary distributions of the logit dynamics with parameter β for G and \(\hat G\), respectively. The following lemma says that π β and \(\hat \pi _{\beta }\) are close to each other (in total variation) if β is large enough.

Lemma 5

Let \(\hat \tau _{\beta }\) be the mixing time of the restricted chain given by the logit dynamics with parameter β, and let p=p β be the corresponding probability of selecting a NBR strategy. If \(\lim _{\beta \rightarrow \infty } (p_{\beta }\hat \tau _{\beta }) = 0\) , then for every δ>0, there exists a constant β δ such that

$$\|{\pi_{\beta} - \hat \pi_{\beta}}\| \leq \delta $$

for all β≥β δ .

Proof

Let \(\tau =\hat {t}_{mix}^{(\beta )}(\delta /8)\) be the mixing time of the restricted chain. Consider first two copies of the chain starting in profiles \(\hat {\mathbf {s}}, \hat {\mathbf {s}^{\prime }} \in \hat {G}\) and bound the total variation after τ time steps:

$$\begin{array}{@{}rcl@{}} \|{P}^{\tau}(\hat{\mathbf{s}},\cdot )-{P}^{\tau}(\hat{\mathbf{s}}^{\prime}\|,\cdot) &\leq& \|{P}^{\tau}(\hat{\mathbf{s}},\cdot)-\hat{P}^{\tau}(\hat{\mathbf{s}}\|,\cdot) + \|\hat{P}^{\tau}(\hat{\mathbf{s}},\cdot)- \hat{\pi}\|\\ &&+\|\hat{\pi}- \hat{P}^{\tau}(\hat{\mathbf{s}}^{\prime},\cdot)\| + \|\hat{P}^{\tau}(\hat{\mathbf{s}}^{\prime},\cdot)- P^{\tau}(\hat{\mathbf{s}}^{\prime},\cdot)\|\\ &\leq& 4 \cdot \frac{\delta}{8} = \delta/2, \end{array} $$

where the last inequality is due to Lemma 4 by taking β sufficiently large (note that \(\eta p \tau = \eta p_{\beta } t^{(\beta )}_{mix}(\delta /8)\leq \eta p_{\beta } \hat \tau _{\beta } \ln (8/\delta )\), which tends to 0 as \(\beta \rightarrow \infty \) by hypothesis). Consider an interval T of length \(R \cdot \left \lceil \frac {\ln (8\ell /\delta )}{\ln (1/\varepsilon )}\right \rceil \). By applying Lemma 2 with k=, (R,ε)=(T,δ/4) and β sufficiently large we have that for every sG

$$\underset{\mathbf{s}}{\text{Pr}}({X_{\ell T} \notin \hat{G}}) \leq \delta/8. $$

Let t = T+τ and Q=P T. Then, for every \(\mathbf {s},\mathbf {s}^{\prime } \in G\)

$$\begin{array}{@{}rcl@{}} \|{\pi - P^{t^{\star}}(\mathbf{s}^{\prime}, \cdot)}\| & \leq& \|{P^{t^{\star}}(\mathbf{s}, \cdot) - P^{t^{\star}}(\mathbf{s}^{\prime}, \cdot)}\| = \|{Q(\mathbf{s}, \cdot)P^{\tau} - Q(\mathbf{s}^{\prime}, \cdot)P^{\tau}}\|\\ \text{(triangle inequality)} & \leq& \|{Q(\mathbf{s}, \cdot)P^{\tau} - \hat Q(\mathbf{s}, \cdot)P^{\tau}}\| + \|{\hat Q(\mathbf{s}, \cdot)P^{\tau} - \hat Q(\mathbf{s}^{\prime}, \cdot)P^{\tau}}\|\\ &&+ \|{\hat Q(\mathbf{s}^{\prime}, \cdot)P^{\tau} - Q(\mathbf{s}^{\prime}, \cdot)P^{\tau}}\|, \end{array} $$

where, for every \(\mathbf {s}, \mathbf {s}^{\prime } \in G\), we set

$$\hat Q(\mathbf{s},\mathbf{s}^{\prime}) = \left\{\begin{array}{ll} \frac{Q(\mathbf{s},\mathbf{s}^{\prime})}{Q(\mathbf{s},\hat G)}, & \text{if } \mathbf{s}, \mathbf{s}^{\prime} \in \hat G;\\ 0, & \text{otherwise.} \end{array}\right. $$

By (8) we obtain

$$\|{Q(\mathbf{s}, \cdot)P^{\tau} - \hat{Q}(\mathbf{s}, \cdot)P^{\tau}}\| \leq \|{Q(\mathbf{s}, \cdot) - \hat{Q}(\mathbf{s}, \cdot)}\| \leq \underset{\mathbf{s}}{\text{Pr}}\left({X_{\ell T} \notin \hat{G}}\right) \leq \delta/8. $$

By (10) we obtain

$$\|{\hat{Q}(\mathbf{s}, \cdot)P^{\tau} - \hat{Q}(\mathbf{s}^{\prime}, \cdot)P^{\tau}}\| \leq \max_{\hat{\mathbf{s}}, \hat{\mathbf{s}}^{\prime} \in \hat{G}} \|{P^{\tau}(\hat{\mathbf{s}}, \cdot) - P^{\tau}(\hat{\mathbf{s}}^{\prime}, \cdot)}\| \leq \delta/2 $$

and thus \(\|{\pi - P^{t^{\star }}(\mathbf {s}^{\prime }, \cdot )}\| \leq 3\delta /4\). Finally, for every \(\hat {\mathbf {s}} \in \hat {G}\), by triangle inequality

$$\begin{array}{@{}rcl@{}} \|{\pi - \hat{\pi}}\| & \leq& \|{\pi - P^{t^{\star}}(\hat{\mathbf{s}}, \cdot)}\| + \left\|{P^{t^{\star}}(\hat{\mathbf{s}}, \cdot) - \hat{P}^{t^{\star}}(\hat{\mathbf{s}}, \cdot)}\right\| + \left\|{\hat{P}^{t^{\star}}(\hat{\mathbf{s}}, \cdot) - \hat{\pi}}\right\|\\ & \leq &3 \delta/4 + \delta/8 + \delta/8 = \delta. \end{array} $$

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Ferraioli, D., Penna, P. Imperfect Best-Response Mechanisms. Theory Comput Syst 57, 681–710 (2015). https://doi.org/10.1007/s00224-014-9597-x

Download citation

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s00224-014-9597-x

Keywords

Navigation