Two-player mixed equilibrium in the "curation game"

in #gametheory6 years ago (edited)

If two players have equal Steem Power, when should they upvote to maximize curation reward?

This is a "toy" game introduced here: https://steemit.com/steemit/@markgritter/what-are-the-nash-equilibria-in-the-curation-game that simplifies the real situation in a number of ways:

  1. Steem Power is treated as equal, which is hardly ever the case
  2. The number of upvoters is known in advance
  3. Players may only upvote at minute intervals; in real life they could probably achieve a block-level accuracy (3 seconds) if not better.
  4. Rewards aren't rounded to the nearest 0.001

The pure strategy S_i in the two-player game is "vote at minute i if the other player has not voted already; if he has, upvote at minute 15." If this results in voting before the other player, the payoff is 0.707 of the curation rewards, reduced for voting early. The payoff for voting second is 0.293, at full value.

The previous post shows the payoff matrix for the pure strategies, and shows that there is no pure strategy equilibrium. But there should be a mixed-strategy equilibrium, and we can find it using a variant of fictitious play. In "standard" fictitious play, each player selects the pure strategy that best counters the observed mixture of pure strategies selected by the opponent. The way I was introduced to fictitious play, we start with the "uniform prior" assumption instead in which each pure strategy is given equal weight. Each round, we adjust our weights by a decreasing factor (using a schedule of weights that forms a divergent sequence.)

Results

The game-theoretic optimal mixed strategy is to vote:

  • In minute 7, 37.1% of the time
  • In minute 8, weighted 46.9%
  • In minute 9, the remaining 16.0% of games

Against an optimal opponent, this produces payoff of at least 0.29412, only slightly better than always voting at minute 15.

We can check to see that this mixture does not give the opponent any way to earn more than that amount (though they can force us to take only 0.293.)

Suppose they choose to vote in minute 6. Then they get payoff 0.707 * 6 / 15 = 0.2828, worse than our 0.293 from waiting. They have reduced our payoff by picking a dominated option, but at the expense of getting less themselves.

How about minute 8? 37.1% of the time we beat them and get 0.707 * 7 / 15 = 0.3299 while they get 0.293. 46.9% of the time we are tied and on average each get 0.500 * 8 /15 = 0.2667. The remaining 16.0% they win and get 0.707 * 8/ 15 = 0.3771 while we get 0.293.

Our payoff = 0.371 * 0.3299 + 0.469 * 0.2667 + 0.160 * 0.293 =~ 0.2943

Their payoff = 0.371 * 0.293 + 0.469 * 0.2667 + 0.160 * 0.3771 =~ 0.2941

Output after 100,000 iterations (probably not enough to get six figures of accuracy, but three digits are stable.)

Final strategy=[0.000001 0.000001 0.000001 0.000001 0.000001 0.000001 0.370687 0.469006 0.160279 0.000011 0.000011 0.000001 0.000001 0.000001 0.000001]
Game value=0.294124714532

Code

# Pure strategies for two-player game:
# S_i = upvote at minute i, if no upvote already exists, otherwise wait for 15
# minutes

# Payoff from i == j: 0.50 * i / 15  (50% chance of being first, partial payout)
# Payoff from i < j: 0.707 * i / 15  (first, but partial payout)
# Payoff from i > j: 0.293           (second, but full payout)

# Strategies are numbered 1 through 15 for convenience.

def payout( i, j ):
    if i == j:
        return 0.50 * i / 15
    elif i < j:
        return 0.707 * i / 15
    else:
        return 0.293

def purePayout( i, mixedB ):
    value = 0.0
    for j in xrange( 1, 16 ):
        value += mixedB[j] * payout( i, j )
    return value

def mixedPayoutAndOptimalPure( mixedA, mixedB ):
    maxPayout = -1.0
    maxStrategy = None
    value = 0.0
    for i in xrange( 1, 16 ):
        unweighted = purePayout( i, mixedB )
        if unweighted > maxPayout:
            maxStrategy = i
            maxPayout = unweighted
        value += mixedA[i] * unweighted
    return ( value, maxStrategy, maxPayout )
    
def initialStrategy():
    numStrategies = 15
    return [0.0] + [ 1.0 / 15 for i in xrange( 1, 16 ) ]

def weightPureStrategy( s, i, weight ):
    dw = 1.0 - weight
    newWeights = [ dw * x for x in s ]
    newWeights[i] += weight
    return newWeights

def strategyRep( s ):
    return "[" + " ".join( "{:.3f}".format( x ) for x in s[1:] ) + "]"

def fictitousPlay( numIterations ):
    s = initialStrategy()
    for n in xrange( 1, numIterations + 1 ):
        ( val, maxStrategy, maxPayout ) = mixedPayoutAndOptimalPure( s, s )
        if n % 100  == 0:
            print "v={:.6f} p={:.6f} s={} max_s={}".format( val, maxPayout,
                                                            strategyRep( s ),
                                                            maxStrategy )
        adjustment = 1.0 / ( n + 1.0 )
        s = weightPureStrategy( s, maxStrategy, adjustment )
    print "Final strategy={}".format( strategyRep( s ) )
    val, _, _ = mixedPayoutAndOptimalPure( s, s )
    print "Game value={}".format( val )
Sort:  

Hello! Your post has been resteemed and upvoted by @ilovecoding because we love coding! Keep up good work! Consider upvoting this comment to support the @ilovecoding and increase your future rewards! ^_^ Steem On!

Reply !stop to disable the comment. Thanks!





This post has been voted on by the SteemSTEM curation team and voting trail in collaboration with @curie.

If you appreciate the work we are doing then consider voting both projects for witness by selecting stem.witness and curie!

For additional information please join us on the SteemSTEM discord and to get to know the rest of the community!

Coin Marketplace

STEEM 0.32
TRX 0.11
JST 0.031
BTC 68072.80
ETH 3780.85
USDT 1.00
SBD 3.72