From CryptoEconLab by Kiran Karra
Introduction
CryptoEconLab has been booting the Filecoin RetroPGF program. The first round of FIL-RetroPGF followed the Optimism framework to as much of a degree as possible. From a voting design perspective, FIL-RetroPGF-1 used the Quorum and Threshold (Q+T) model to convert badgeholder votes to funding decisions. In the Q+T model, badgeholders are asked to vote on how much funding they would like to allocate to all projects, simultaneously. These are aggregated by a scoring mechanism that determines the final funds to be distributed to each project. A key point in Q+T voting is that badgeholders must assess all projects against each other simultaneously.
However, other voting mechanisms are possible. In this post, we quantitatively characterize another voting mechanism, Pairwise, in the context of RetroPGF-based funding. We introduce a new open-source framework, voting_mechanism_design, currently under active development, to compare Pairwise to the Quorum and Threshold voting mechanism.
Using this framework, we compare Pairwise to a quorum-based voting mechanism and show that Pairwise can allocate capital more efficiently than quorum-based voting. We then explore the robustness of Pairwise voting to negative behaviors such as COI and collusion.
Pairwise Voting Mechanism
Pairwise voting is a mechanism that enables badgeholders to cast their preferences for which projects they would like to see funded. It works by presenting pairs of projects to the badgeholders. For each pair, the badgeholder selects which project they feel deserves more funding. This is done for as many pairs as the badgeholder wishes. After all badgeholder votes are collected, a model (e.g., the Bradley-Terry model) is applied to infer the global rankings of projects based on the badgeholder rankings. This is similar to a chess ranking system, where one-on-one match results are aggregated to create a global ranking of all players. Rankings are then mapped to funding amounts according to a pre-defined distribution or mapping (interfaces for pairwise voting have already been implemented).
The pairwise voting mechanism differs from the approach used by Optimism and FIL-RetroPGF-1, which we will denote as Q+T. In the “Quorum + Threshold” approach, badgeholders can vote on how much funding they would like to go to each project simultaneously. Pairwise differs from this by presenting pairs of projects to each badgeholder. The hypotheses motivating a Pairwise are:
- It reduces the cognitive load badgeholders face when trying to assess hundreds of projects simultaneously and the cognitive load on badge holders due to the limited scope of each vote.
- It is a more robust way to create a global ranking of projects since it can be inferred by well-known algorithms used in other related applications.
- It can result in more accurate capital allocation.
We created a pairwise voting simulator to test these hypotheses and the properties of this voting mechanism to negative behaviors.
Comparing to Quorum + Threshold
We begin by comparing the baseline performance of the two mechanisms. Specifically, we want to compare how aligned the global rankings of each mechanism are to the true project rankings. This measures the efficacy of the voting mechanism’s capital allocation. To incorporate the realities of the RetroPGF process, where badgeholders have limited time and energy to evaluate projects, we assess the capital allocation accuracy as a function of laziness and expertise.
In our framework, badgeholder laziness is a value between 0 and 1 that translates to how many projects the particular badgeholder will vote on. A laziness of 0 indicates that the badgeholder will vote on all projects, whereas a laziness of 1 indicates that badgeholders will not vote on any projects. Expertise is a value between 0 and 1 that translates to how aligned a badgeholder’s votes are with the true project ratings. An expertise of 0 means that the badgeholder is placing fully random votes, whereas an expertise of 1 means that a badgeholder is placing perfectly accurate votes. Interpolation between these values is discussed in the Appendix.
We initialize the simulation by seeding each project with a “true impact” rating, a value between 0 and 1 indicating the impact. We then create a population of badgeholders with particular expertise and laziness values to be tested.
We then simulated how aligned the global rankings of projects were between the two voting mechanisms as a function of badgeholder laziness and expertise. Alignment is measured through rank correlation of the true project rankings to the rankings supplied by the badgeholders through voting. An alignment of 1 is perfect, whereas 0 is purely random. Negative values are allowed because rank correlation can be negative, but this is a technicality.
Fig 1 shows the results of our experiments. The x-axis sweeps across badgeholder expertise, with a value of 0.0, meaning random guessing, and 1.0, meaning perfect badgeholder voting and linearly scaling between the two. The y-axis measures the alignment between the inferred rankings and the true project impact. Blue dots represent Monte Carlo simulation runs of pairwise voting mechanism, and the green dots represent the Quorum and Threshold mechanism. The darkness of the dots maps to different badgeholder laziness, as indicated by the legend.
Fig 1: A comparison of the effectiveness of Q+T and Pairwise voting mechanisms
The results indicate that the pairwise voting mechanism is more robust to both low badgeholder expertise and laziness than the Q+T voting mechanism. This is evident because, for corresponding values of expertise and laziness, the blue dot groupings consistently have higher alignment than the green dot groupings.
COI Modeling
Next, we wanted to test the effect of conflict of interest (COI) on the pairwise voting mechanism. COI is defined as the scenario where a badgeholder votes for a particular project for which they have a vested interest (perhaps financially). In our simulations, COI is modeled as a badgeholder who votes for a particular project consistently, even if they are presented with a voting pair where the second project has a higher perceivable impact. The metric we use to determine the effect of COI is a change in the relative ranking of a project if it was voted with COI and if it was not voted with COI behavior. Fig 2 shows the results of the effect of COI in changing a project’s ranking as a function of three variables: a) the project’s true impact, b) the badgeholder’s expertise, and c) the badgeholder’s laziness. The y-axis shows the change in project ranking, and the x-axis is binned from the most impactful to the least impactful project. Color and brightness indicate the Badgeholder population’s expertise and laziness.
Fig 2 shows that more impactful projects are less impacted by COI voting, and that the effectiveness of COI is directly proportional to laziness.
These observations match intuition — since more impactful projects are more likely to be voted for by badgeholders, a COI badgeholder applying COI behavior to a highly impactful project will not affect the project’s ranking significantly. Similarly, a lazier badgeholder population results in fewer overall votes, which can amplify the effect of COI voting.
Fig 2: Effect of COI on Project rankings using Pairwise Voting
Collusion
Next, we measure the effect of collusion on a particular project’s rankings. We define collusion as a coordinated agreement between multiple badgeholders to vote for a particular project regardless of the impact of that project. In our simulations, this manifests as multiple badgeholders voting for a particular project consistently, regardless of the relative impact of the other project. This is essentially an amplified version of the COI voting, so we use the same metrics to determine the effect of collusion on project rankings.
Fig 3 below shows the effect of collusion on project rankings. Light colors indicate only one colluding agent (equal to the COI case), and darker colors indicate more badgeholders colluding for a particular project. The x-axis represents the impact of the project that is being colluded, and the y-axis represents the delta in the project rankings for that particular project.
Fig 3 shows that as the number of agents colluding for a particular project increases, the effect of the collusion is amplified. This matches intuition because as the number of votes for a project increases, how it changes the project’s overall ranking also changes. Additionally, the effect of collusion depends on the project’s true rating — a more impactful project is less affected by collusion than a less impactful project.
Fig 3: The effect of collusion on project rankings, for Expertise=0.25 and Laziness=0.75.
Conclusion
In this work, we described the Pairwise voting mechanism and quantitatively analyzed the accuracy of pairwise voting to the Quorum+Threshold method. We found that for equivalent values of badgeholder laziness and expertise, Pairwise voting can help to allocate capital more efficiently than the Quorum method. Next, we examined the robustness of Pairwise to negative behaviors such as COI and collusion. For both, our results indicate that the effect of COI and collusion on a project is proportional to both the badgeholder population’s behavior patterns and the true project’s impact. More impactful projects are less affected by COI and collusion behaviors than less impactful behaviors. Similarly, a more active badgeholder population results in more total votes cast, reducing the effect of COI and collusion.
Our next steps are to implement COI and Collusion modeling for the Quorum and Threshold voting mechanism and compare these two voting mechanisms along that dimension. This can help to fully characterize whether Pairwise can be a viable replacement for Quorum and Threshold voting for RetroPGF. Beyond this, we aim to expand the simulator to other voting mechanisms that can be applied to this style of funding and welcome contributions from others interested in working on this space!
Please reach out to us at [email protected] if you’re interested in working together on mechanism design, cryptoeconomics, quantitative modeling, or other related subjects for your project.
Appendix
A1 — Expertise Mapping
In this section, we discuss how badgeholder expertise maps to the alignment of the rankings between the true project rankings and the badgeholder rankings. Fig 4A shows the mapping that was implemented by the original OP simulator. Here, we notice that an expertise of 0 maps to a reasonable correlation between the true project impact and the badgeholder assigned — this does not align with the definition of expertise defined earlier.
Fig 4B shows the updated mapping we used in the simulations above, which is more aligned with our definition of expertise. It also shows the pairwise version.
Fig 4: Mapping Expertise to Project Rankings
Future work can involve using the true project impact value to make expertise dependent on the project’s impact and the expertise factor. The logic here is that it may be easier to vote on projects with high true impact, regardless of expertise, whereas lower projects need more expertise to discern true impact.
A2 — Aligning Laziness
In this section, we identify how laziness differs between pairwise and Q+T voting. In pairwise voting, projects are presented in pairs, and the number of pairs of projects grows exponentially with the number of projects. For 100 projects, there are 4950 pairs of projects that a badgeholder needs to vote on. However, in Q+T voting, there are only 100 projects to vote on. If we define laziness as the percentage of projects a badgeholder votes on, then we need to ensure that the mapping is normalized. Consider the following example for 100 projects: if laziness of 0.5 corresponds to a Q+T badgeholder voting on 50 projects, then we cannot say that a pairwise badgeholder may vote for 0.5*4950 = 2475 projects in the pairwise method. They may vote for far less because even if pairwise is more accessible and less time-consuming, 2475 votes are still sizeable in absolute terms!
To normalize the factors in the simulations above, we use the mapping described in Fig 5, which translates badgeholder laziness in the Q+T setting to badgeholder laziness factor in the pairwise setting. We note this to be imperfect and that it requires further investigation.
Fig 5: Mapping of laziness between the Q+T and Pairwise schemes.
A3 — Funding
This work was conducted through the OP funding vehicle and supported by Filecoin Impact Fund Public Goods funding.
All Comments