Efficient bidding strategies for Cliff-Edge problems

Azoulay, Rina; Katz, Ron; Kraus, Sarit

doi:10.1007/s10458-013-9227-z

Efficient bidding strategies for Cliff-Edge problems

Published: 30 April 2013

Volume 28, pages 290–336, (2014)
Cite this article

Autonomous Agents and Multi-Agent Systems Aims and scope Submit manuscript

Rina Azoulay¹,
Ron Katz² &
Sarit Kraus^2,3

371 Accesses
3 Citations
Explore all metrics

Abstract

In this paper, we propose an efficient agent for competing in Cliff-Edge (CE) and simultaneous Cliff-Edge (SCE) situations. In CE interactions, which include common interactions such as sealed-bid auctions, dynamic pricing and the ultimatum game (UG), the probability of success decreases monotonically as the reward for success increases. This trade-off exists also in SCE interactions, which include simultaneous auctions and various multi-player ultimatum games, where the agent has to decide about more than one offer or bid simultaneously. Our agent competes repeatedly in one-shot interactions, each time against different human opponents. The agent learns the general pattern of the population’s behavior, and its performance is evaluated based on all of the interactions in which it participates. We propose a generic approach which may help the agent compete against unknown opponents in different environments where CE and SCE interactions exist, where the agent has a relatively large number of alternatives and where its achievements in the first several dozen interactions are important. The underlying mechanism we propose for CE interactions is a new meta-algorithm, deviated virtual learning (DVL), which extends existing methods to efficiently cope with environments comprising a large number of alternative decisions at each decision point. Another competitive approach is the Bayesian approach, which learns the opponents’ statistical distribution, given prior knowledge about the type of distribution. For the SCE, we propose the simultaneous deviated virtual reinforcement learning algorithm (SDVRL), the segmentation meta-algorithm as a method for extending different basic algorithms, and a heuristic called fixed success probabilities (FSP). Experiments comparing the performance of the proposed algorithms with algorithms taken from the literature, as well as other intuitive meta-algorithms, reveal superiority of the proposed algorithms in average payoff and stability as well as in accuracy in converging to the optimal action, both in CE and SCE problems.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

EUR 32.99 /Month

Get 10 units per month
Download Article/Chapter or Ebook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Subscribe now

Buy Now

Price includes VAT (Spain)

Instant access to the full article PDF.

Institutional subscriptions

Mechanisms with learning for stochastic multi-armed bandit problems

Article 30 June 2016

Scalable multi-agent learning algorithms to determine winners in combinatorial double auctions

Article 25 February 2015

Prophet Inequality with Competing Agents

Notes

Note that each interaction is a one-shot game and is performed against a different human opponent; consequently no opponent has learning opportunities.
In order to avoid differences in the valuation of the item sold to the bidders and incomplete information about it, in this case the item auctioned is an amount of money [19].
For reasons of compatibility with other environment models, equal bids yield a gain for the learning agent.
In the work of [12], the exact threshold of the worker is not observed, and the worker simply decides whether or not to work, given the offer (salary) it obtained from its principal. Consequently, the principal’s beliefs are updated given the result of the success or failure of the offer. Similarly, in our case, the user threshold is not observed and the result of an offer is a success (acceptance of an offer) or a failure of the offer. Hence the beliefs about the society’s distribution are in line with the Bayesian approach given this Boolean result.
A similar approach, namely the derivative-following strategy, was proposed for the dynamic pricing problem [16]. However, each round (one commerce day) was considered to be a set of several interactions with a number of opponents, which totaled thousands of interactions.
The same rationale is behind the auction environment. However, in the pricing environment the opposite is true: all of the offers lower than a successful offer are considered successful, since the lower the price of an item, the higher the probability it will be purchased by the consumer.
It is worth noting that all of the VL-based algorithms in this paper are presented in a version suitable for auctions and for the UG. In the pricing environment, where increasing the price decreases the acceptance probability, the update ranges in lines 8 and 9 (as well as 12 and 13) need to be swapped.
The difference between the Bayesian approach and Gittins’ strategy is that Gittins’ model is based on some particular properties which are not required in the general Bayesian approach. Some of these properties do not hold, in fact, in the CE domain, so it is used as a heuristic method.
In our finite simulations we used the Bernoulli reward process table with a discount factor of 0.99 (see [24], p. 237), as in [5, 35].
For simplicity, we considered one opponent in the auction game.‘
This was also the reason for running auctions on a total amount of money rather than on a particular item, i.e., to be able to ignore the influence of different valuations of the item itself.
Another new approach, the segmentation principle, which is described in Sect. 7.3, attained results which improved the basic algorithms but were lower than the results of the deviation principle in all of the non-simultaneous CE domains [29]. However, the segmentation principle reached competitive results for some of the simultaneous CE domains. Thus, it is presented as a solution in the simultaneous CE domains.
Since the normal assumption did not hold in our tests, we used a non-parametric Friedman test ( [20], pp. 410–412), which was appropriate for cases where the assumptions underlying the analysis of variance were not satisfied.
The distribution of the pricing environment is supposed to have a higher mean than the price suggested in the bidding environment. This is due to the fact that in the pricing environment the buyer has no incentive to decrease its true valuation in the bid, while in the first-price bid, as known from the literature, the bidder is motivated to bid a lower value than its real valuation.
The Friedman test is a non-parametric test for $K \ge 3$ correlated samples.
In environments where all players have the same threshold, the Bayesian approach attained the best solution. However, such an assumption is not valid in real world CE interactions.
The data of the other environments were quite similar as depicted in Fig. 5.
In another auction protocol which can be abstracted by this model, an item with a common value of N is auctioned in one auction, where each bidder can offer K bids.
We handle a case where all of the perfect complementary items are equal, which is consistent with the other SCE environments. We can justify the reality of such a scenario using the case of someone who tries to simultaneously order several vacation packages for all of his family members. Of course, the whole family wants to go together on the vacation.
In the OMUG and AMUG environments line 6 is changed to: For $l_{2}=l_{1}$ to N-$l_{1}$, since the sum of $l_{1},l_{2}$ cannot exceed N.
In the retractable environments, it can be easily proven that the upper offer $i_{K}$ always needs to maximize $P(i_K)(N-i_K)$, independent of other offers (for $K=2$, e.g., if $i_{1}\le i_{2},\, Q(i_{1},i_{2}) = P(i_{1})(N-i_{1})+ (1-P(i_{1}))P(i_{2})(N-i_{2})$ ). Thus, for these environments the complexity actually equals $O(I \cdot N^{k-1})$.
The segment-based approach enables the agent to focus on groups of offers rather than on the offers themselves. This may be the reason why this approach was outstanding in the simultaneous problems, where the set of alternatives of the agent is extremely large ($N^K$ where $K$ is the number of simultaneous interactions).
Extending the duration of the first stages, where a few large segments are observed, would have reduced this problem. However, since we considered environments with a total of 50 interactions at most, the average total payoff would not have risen by extending the duration of the first stages since it would often harm the later interactions, where the fine tuning is managed. The final stage of fine tuning is critically important, especially in the context of CE environments.
The value of $\epsilon $ is described in Algorithm 12, Appendix A.5.
In our simulations we used H $=$ 3 hierarchies. In the first $T_1=5$ interactions we focused on five segments of $S_1=20$ integers each. Then, until the tenth interaction ($T_2=10$), we focused on 20 segments of $S_2=5$ integers each. From then on we focused on 100 segments of single offers ($S_3=1$).
Detailed results for these environments can be found in [30].
In the CE domain this quality value is one-dimensional, since we assume that each interaction with an opponent is unique and does not influence future interactions. Thus, only one state exists in the CE game, in which the agent needs to suggest its offer to the new unknown opponent, and the $Q$-vector indicates the quality knowledge for each possible offer in this one state.
We assume that the mean of the thresholds is a discrete value between $0, 1\ldots N$, and the standard deviation is a discrete value between one and $N$. However, the value of each given threshold is drawn from the continuous normal distribution, where the pair of parameters of the normal distribution is discrete.

References

Aggarwal, G., Goel, A., & Motwani, R. (2006). Truthful auctions for pricing search keywords. In Proceedings of the 7th ACM Conference on Electronic Commerce 06.
Auer, P., Cesa-Bianchi, N., & Fischer, P. (2002). Finite-time analysis of the multiarmed bandit problem. Machine Learning, 47(2–3), 235–256.
Article MATH Google Scholar
Azoulay-Schwartz, R., Kraus, S., & Wilkenfeld, J. (2004). Exploitation vs. exploration: Choosing a supplier in an environment of incomplete information. Decision Support Systems and Electronic Commerce, 38(1), 1–18.
Google Scholar
Binmore, K., Shaked, A., & Sutton, J. (1985). Testing noncooperative bargaining theory: A preliminary study. The American Economic Review, 75(5), 1178–1180.
Google Scholar
Bourgine, P., & Leloup, B. (2000). May learning explain the ultimatum game paradox? In Ecole Polytechnique. GRID Working Paper No. 00–03.
Boutilier, C., Goldszmidt M., & Sabata, B. (1999). Sequential auctions for the allocation of resources with complementarities. In Proceedings of the International Joint Conferences on Artificial Intelligence (IJCAI) 99.
Boyan, J., Greenwald, A., Kirby, R., & Reiter, J. (2001). Bidding algorithms for simultaneous auctions. In IJCAI Workshop on Economic Agents, Models, and Mechanisms (pp. 1–11).
Brenner, T., & Vriend, N. (2006). On the behavior of proposers in ultimatum games. Journal of Economic Behavior and Organization, 61(4), 617–631.
Article Google Scholar
Brent, R. (1973). Algorithms for Minimization without derivatives (Chap. 4). Englewood Cliffs, NJ: Prentice-Hall.
Byde, A., Preist, C., & Jennings, N. R. (2002). Decision procedures for multiple auctions. Proceedings of the International Conference on Autonomous Agents and Multiagent Systems (AAMAS), 02, 613–620.
Google Scholar
Camerer, C. F. (2003). Behavioral Game Theory. Princeton: Princeton University Press.
MATH Google Scholar
Conitzer, V., & Garera, N. (2006). Learning algorithms for online principal–agent problems (and selling goods online). Proceedings of the International Conference on Machine Learning, 06, 209–216.
Google Scholar
Davidson, A., Billings, D., Schaeffer, J., & Szafron, D. (2000). Improved opponent modeling in poker. In Proceedings of the International Conference on Artificial Intelligence (pp. 1467–1473).
Dempster, A. P., Laird, N. M., & Rubin, D. B. (2006). Maximum likelihood from incomplete data via the em algorithm. Journal of the Royal Statistical Society, 61(4), 617–631.
Google Scholar
Diermeier, D., & Morton, R. (2005). Proportionality versus perfectness: Experiments in majoritarian bargaining. In D. Austen-Smith & J. Duggan (Eds.), Social choice and strategic behavior (pp. 157–196). Berlin: Springer.
DiMicco, J., Greenwald, A., & Maes, P. (2001). Dynamic pricing strategies under a finite time horizon. In Proceedings of the ACM Conference on Electronic Commerce.
Dumas, M., Aldred, L., Governatori, G., & ter Hofstede, A. (2005). Probabilistic automated bidding in multiple auctions. Electronic Commerce Research, 5(1), 25–49.
Article MATH Google Scholar
Ebay Inc. (2012). Retrieved March 25, 2012, from www.ebay.com
Fatima, S., Wooldridge, M., & Jennings, N. R. (2005). Sequential auctions for objects with common and private values. Proceedings of the International Conference on Autonomous Agents and Multiagent Systems (AAMAS), 05, 635–642.
Article Google Scholar
Ferguson, G. A. (1981). Statistical analysis in psychology and education. New York: McGraw-Hill.
Google Scholar
Gal, Y., Pfeffer, A., Marzo, F., & Grosz, B. (2004). Learning social preferences in games. Proceedings of the Association for the Advancement of Artificial Intelligence (AAAI), 04, 226–231.
Google Scholar
Gerding, E. H., Dash, R. K., Yuen, D. C. K., & Jennings, N. R. (2007). Bidding optimally in concurrent second-price auctions of perfectly substitutable goods. Proceedings of the International Conference on Autonomous Agents and Multiagent Systems (AAMAS), 07, 267–274.
Google Scholar
Ghose, T. K., & Tran, T. (2009). Dynamic pricing in electronic commerce using neural network. In Proceedings of the 4th International MCETECH Conference on e-Technologies (MCETECH) 09 (pp. 227–232).
Gittins, J. (1989). Multiarmed bandits allocation indices. New York: Wiley.
Google Scholar
Google Advertising Program. (2012). Retrieved March 25, 2012, from www.google.com/ads
Grosskopf, B. (2003). Reinforcement and directional learning in the ultimatum game with responder competition. Experimental Economics, 6(2), 141–158.
Article MATH Google Scholar
Guth, W., & Huck, S. (1997). From ultimatum bargaining to dictatorship: An experimental study of four games varying in veto power. Metroeconomica, 48(3), 262–299.
Article Google Scholar
Guth, W., Schmittberger, R., & Schwarz, B. (1982). An experimental analysis of ultimatum bargaining. Economic Behavior and Organization, 3, 367–388.
Article Google Scholar
Katz, R., & Kraus, S. (2006). Efficient agents for Cliff-Edge environments with a large set of decision options. In Proceedings of the International Conference on Autonomous Agents and Multiagent Systems (AAMAS) 06.
Katz, R., & Kraus, S. (2006). Efficient bidding strategies for simultaneous Cliff-Edge environments. In Intelligent agent technology 06.
Katz, R., & Kraus, S. (2007). Gender-sensitive automated negotiators. In Proceedings of the Association for the Advancement of Artificial Intelligence (AAAI) 07.
Knez, M., & Camerer, C. (1995). Outside options and social comparison in a three-player ultimatum game experiments. Games and Economic Behavior, 10, 65–94.
Article MATH MathSciNet Google Scholar
Krishna, V., & Morgan, J. (1997). An analysis of the war of attrition and the all-pay auction. Journal of Economic Theory, 72, 343–362.
Article MATH Google Scholar
Lagarias, J. C., Lagarias, J. C., Reeds, J. A., Reeds, J. A., Wright, M. H., Wright, M. H., et al. (1996). Convergence properties of the nelder-mead simplex algorithm in low dimensions. SIAM Journal of Optimization, 9, 112–147.
Article MathSciNet Google Scholar
Leloup, B., & Deveaux, L. (2001). Dynamic pricing on the internet: Theory and simulations. Journal of Economic Research, 1(3), 265–276.
MATH Google Scholar
Lin, R., Kraus, S., Wilkenfeld, J., & Barry, J. (2006). An automated agent for bilateral negotiation with bounded rational agents with incomplete information. In Proceedings of the European Conference on Artificial Intelligence (ECAI) 06.
Maes, P. (1995). Artificial life meets entertainment: Lifelike autonomous agents. Communication of the ACM, 38(11), 108–114.
Article Google Scholar
Milidiu, R. L., Melcop, T., Liporace, F. T. S., & Lucena, C. J. P. (2003). Simple: A multi-agent system for simultaneous and related auctions. In Intelligent agent technology 03.
Minton, S., Johnston, M. D., Philips, A. B., & Laird, P. (1992). Minimizing conflicts: A heuristic repair method for constraint satisfaction and scheduling problems. Artificial Intelligence, 58, 161–205.
Article MATH MathSciNet Google Scholar
Mitchell, M. (1996). An introduction to genetic algorithms. New York: MIT press.
Google Scholar
Nelder, J. A., & Mead, R. (1965). A simplex method for function minimization. The Computer Journal, 7(4), 308–313.
Article MATH Google Scholar
Niklasson, L., Engstrom, H., & Johansson, U. (2001). An adaptive ‘rock, scissors and paper player’ based on a tapped delay neural network. In Proceedings of the International Conference on Application and Development of Computer Games in the 21st Century (ADCOG) (pp. 130–136).
Oliveira, E., Fonseca, J. M., & Jennings, N. R. (1999). Learning to be competitive in the market. In AAAI Workshop on Negotiation: Settling Conflicts and Identifying Opportunities (pp. 30–37).
Rosenfeld, A., & Kraus, S. (2009). Modeling agents through bounded rationality theories. In Proceedings of the International Joint Conferences on Artificial Intelligence (IJCAI) 09.
Roth, A., & Erev, I. (1995). Learning in extensive form games: Experimental data and simple dynamic models in the intermediate term. Games and Economic Behavior, 8, 164–212.
Article MATH MathSciNet Google Scholar
Strens, M. (2000). A Bayesian framework for reinforcement learning. In Proceedings of the International Conference on Machine Learning (pp. 943–950).
Sutton, R., & Barto, A. (1998). An introduction to reinforcement learning. New York: MIT Press.
Google Scholar
Todd, P., & Borges, B. (1997). Designing socially intelligent agents for the ultimatum game. In K. Dautenhahn (Ed.), Socially Intelligent Agents-Papers from the 1997 Fall Symposium (pp. 134–136). Menlo Park, CA: AAAI Press.
Vreind, N. (1997). Will reasoning improve learning? Economics Letters, 55(1), 9–18.
Article Google Scholar
**anyu, B. (2010). Social preference, incomplete information, and the evolution of ultimatum game in the small world networks: An agent-based approach. Artificial Societies and Social, Simulation, 13(2), 223.
Google Scholar
Yuen, D., Byde, A., & Jennings, N. R. (2006). Heuristic bidding strategies for multiple heterogeneous auctions. In Proceedings of the European Conference on Artificial Intelligence (ECAI) 06.
Zhong, F., Wu, D., & Kimbrough, S. (2002). Cooperative agent systems: Artificial agents play the ultimatum game. Group Decision and Negotiation, 11(6), 433–447.
Article Google Scholar
Zhu, W., & Wurman, P. R. (2002). Structural leverage and fictitious play in sequential auctions. Proceedings of the Association for the Advancement of Artificial Intelligence (AAAI), 02, 385–390.
Google Scholar

Download references

Acknowledgments

We thank the anonymous reviewers for their helpful remarks which enabled us to significantly improve the quality of this paper. This work is supported in part by the following Grants: ERC Grant $\#$ 267523, MURI Grant Number W911NF-08-1-0144, ARO Grants W911NF0910206 and W911NF1110344, MOST $\#$ 3-6797 and the Google Inter-university Center for Electronic Markets and Auctions.

Author information

Authors and Affiliations

Department of Computer Science, Jerusalem College of Technology, Havaad Haleumi Street, POB 16031, Jerusalem, 91160, Israel
Rina Azoulay
Department of Computer Science and The Gonda Brain Research Center, Bar-Ilan University, Ramat-Gan, 52900, Israel
Ron Katz & Sarit Kraus
Institute for Advanced Computer Studies, University of Maryland, College Park, MD, 20742, USA
Sarit Kraus

Authors

Rina Azoulay
View author publications
You can also search for this author in PubMed Google Scholar
Ron Katz
View author publications
You can also search for this author in PubMed Google Scholar
Sarit Kraus
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Rina Azoulay.

Additional information

Preliminary results of this research appear in [30] and [31].

Appendices

Appendix The details of the proposed algorithms

1.1 A.1 A generalized meta-algorithm for CE interactions

In this section we present a detailed description of the proposed meta-algorithm for competing in CE interactions. As a meta-algorithm, it extends basic algorithms and improves their performance. Before each interaction, the algorithm chooses the best offer to suggest and, after the reward is obtained, the results may change future decisions. The current round of the algorithm is the number of the current interaction, e.g., the first interaction is at $round=1$, the second interaction is at $round=2$, etc.

We assume that the basic algorithms select their actions according to an evaluation of the expected utility that each decision option would yield, provided it is chosen. The evaluation of the expected utilities is determined according to the results of previous interactions. Hereinafter, we will refer to the database which stores the evaluations of each decision option as the $Q$-vector. For each offer $i$, the $Q$-vectors specify the quality value of choosing this offer, as learned from previous interactions.^{Footnote 27} The general meta-algorithm framework is described in Algorithm 8.

Each algorithm includes its own UPDATE procedure, which determines how to update the $Q$-vector after observing the success of the latest action. In addition, each algorithm includes its own SELECT procedure, which determines how to select the next action (apparently according to the current $Q$-vector). The SELECT procedure moves between exploiting the evaluations stored in the current $Q$-vector and exploring options with lower current $Q$-values in order to improve the accuracy of the $Q$-vector’s evaluations. The first action is chosen randomly in all of the algorithms presented in this paper.

1.2 A.2 RL: Roth and Erev

The basic Roth and Erev algorithm is described in Algorithm 9.

1.3 A.3 The Bayesian approach

According to the Bayesian approach, suggested by Conitzer and Garera [12], in each round, the principal chooses the reward that maximizes her expected utility, given her beliefs about the agents’ distributions; then the principal updates these beliefs based on the observed result, using Bayes’ rule. Accordingly, if the principal assumes that the type of distribution is exponential with parameter $\lambda $, it learns the parameter’s value during the interactions.

Following [12], the agent considers some discrete values of the underlying distribution. For example, when considering the exponential distribution with parameter $\lambda $, the agent assumes that $\lambda $ must be any member of $\{0.00,\, 0.01,\ldots 10.00\}$. The agent has prior beliefs about the probability of each distribution parameter and it updates its beliefs according to the results it obtains.

Note also that in our work, the agent only knows if its offer was better than the threshold of the responder, but it does not know what the responder’s threshold was. Thus, the probability of each offer is updated given a Boolean result of accept or reject the agent’s offer, using Bayes’ rule.

The general Bayesian algorithm is presented in Algorithm 10.

1.4 A.4 Normal distribution Bayesian algorithm

We present the details of the Bayesian approach algorithm when assuming normal distribution of the opponent’s thresholds in Algorithm 11. Similarly to the method used by Conitzer and Garera [12] for other distributions (uniform and exponential), the algorithm considers several possible values of the normal distribution parameters; $\mu $ (the average) and $\sigma $ (the standard deviation).

The decision about the offer is made in lines 6–10: In order to decide which offer to give, in any step except the first step, the algorithm calculates the expected reward of each offer, for each pair of distribution parameters.

Observing the result (acceptance or rejection) of the offer, the algorithm updates the probability of each pair of distribution parameters of the threshold. The prior probabilities are defined in lines 2, 3, and are updated given each result (acceptance or rejection) of an offer, using Bayes’ rule, in lines 12–21.

Note that since the distribution is characterized by a pair of parameters, the probability data is managed by matrix $f[\mu ,\sigma ]$. The possible values of the parameters considered are $\mu =0,1\ldots 100$ and $\sigma =1,2\ldots 100$. Each such pair represents a specific distribution of thresholds, and given the result of the agent’s offer, we can calculate the posterior probability for each pair of distribution parameters $\mu $ and $\sigma $, using Bayes’ rule. Note that we consider discrete values for $\mu ,\sigma $, but given each pair of $\mu ,\sigma $, the value of the threshold is taken from the continuous normal distribution.

The underlying assumption behind Algorithm 11 lies in the fact that the opponents’ thresholds are normally distributed. Even if this assumption does not hold, the algorithm may be a heuristic for any case of threshold distribution which is close to the normal distribution.^{Footnote 28}

For each possible mean $\mu $ and standard deviation $\sigma $, the algorithm maintains the probability of the normal distribution to have this mean and standard deviation, respectively. We start by giving equal probability to each such pair, and after each step of the algorithm we update the probability of each possible pair using Bayes’ rule.

Whenever the agent should choose a new offer $i$, it calculates the expected utility for each $i$, based on the probability of each pair of $\mu $ and $\sigma $ and based on the probability of offer $i$ to win, given each particular pair. Then, it chooses the offer with the highest expected utility.

1.5 A.5 A modified version of Zhong, Wu and Kimbrough’s (ZWK) RL algorithm

Algorithm 12 describes the ZWK algorithm, suggested by Zhong et al [52], which was modified as described in Sect. 3.5.

The values $\epsilon $ and $\gamma $ can be gradually decreased during the learning process. In our experiments, we used the following parameters: $\epsilon =\frac{10}{round+25}$, $\gamma =\frac{15}{round+25}$ and $\delta =1$. Clearly, as $round$ increases, the probability of choosing an offer different than $m$ decreases. In other words, as the agent becomes more experienced, it tends to choose the best known offer and to reduce the probability of exploration.

1.6 A.6 Todd and Borger’s virtual RL algorithm (VRL)

In Algorithm 13 we present an implementation of the virtual learning principle described in Sect. 3.7.

Among the possible implementations, we consider the version that yields the best performance in our environment. This algorithm includes the same SELECT procedure as in ZWK.

1.7 A.7 The SDVRL algorithm

According to SDVRL’s UPDATE procedure, we increase the evaluation of the success probabilities, $P$-values, of all the offers higher than a successful offer, as well as several offers below this offer (line 10). Similarly, after an offer fails, we reduce the $P$-values of all the offers below the actual offer and several offers above the actual offer, as described in line 9. The success probability of each offer is calculated by dividing the number of previous successes by the total number of previous interactions in which the offer was actually or virtually proposed (lines 9, 10, 13, 14).

1.8 A.8 The SDVG algorithm

In this section, we describe the simultaneous Gittins index based algorithm (SDVG), which was introduced in Sect. 7.2. In SDVG, updating of the $P$-values, is calculated according to Gittins’ indices. Therefore, the $P$-values are changed to:

$$\begin{aligned} {P(j) = G(S(j),round - S(j))} \end{aligned}$$

The details of the SDVG method are described in Algorithm 15.

B The instructions given to human subjects

Four different scenarios are presented below for which you should make your decisions. Please write down your own decisions in the appropriate locations, without consulting others.

1.
You have NIS 100. You are participating as a bidder in an auction against another bidder. The amount of money on which you are bidding is $100. The one who places the highest bid gains the $100 but has to pay back the amount of his bid. For example, if you offer amount X and the other bidder offers only X-5, you will win 100-X, while the other one will receive nothing. How much will you offer?
2.
This scenario is quite similar to the former, except that now the loser (the bidder with the lowest offer) also has to pay his offer, even though he does not win the money. For example, if you offer amount X and the other bidder offers only X-5, you will win 100-X, while the other bidder will lose X-5. How much will you offer?
3.
You are sitting in a room with an unfamiliar person. You will both be given the sum total of $100 if you can agree on how to divide the money. The other person should propose how to divide the money - how much goes to him and how much goes to you. If you accept the proposal, each of you will receive the amounts specified in the proposal. If you reject the proposal, neither of you will receive anything. You will not meet this person ever again. What is the minimal offer that you will accept?
4.
This scenario is quite similar to the former, except that now you will always receive the amount proposed to you, whether you accept the offer or not. However, you can decide whether the other person will receive the amount he wanted for himself or not. Therefore, if you accept the offer, each of you will receive the amount specified in the proposal. If you reject the offer, you will receive the amount specified in the proposal, while the proposer will not receive anything. You will not meet this person ever again. What is the minimal offer that you will accept?

Rights and permissions

Reprints and permissions

About this article

Cite this article

Azoulay, R., Katz, R. & Kraus, S. Efficient bidding strategies for Cliff-Edge problems. Auton Agent Multi-Agent Syst 28, 290–336 (2014). https://doi.org/10.1007/s10458-013-9227-z

Download citation

Published: 30 April 2013
Issue Date: March 2014
DOI: https://doi.org/10.1007/s10458-013-9227-z

Keywords

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

EUR 32.99 /Month

Get 10 units per month
Download Article/Chapter or Ebook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Subscribe now

Buy Now

Price includes VAT (Spain)

Instant access to the full article PDF.

Institutional subscriptions

Efficient bidding strategies for Cliff-Edge problems

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

Mechanisms with learning for stochastic multi-armed bandit problems