The CPIDs (Cross Project Identifiers) issued by BOINC servers are used to track a user's contribution to science as they run BOINC projects. These projects include efforts to map the Milky Way, detect cancer, cure Zika, and many other applications. Each user should have one CPID that uniquely identifies them across all the projects they contribute to.
Gridcoin is a blockchain built on top of the BOINC platform that rewards users with cryptocurrency for their BOINC contribution as registered under the user's CPID. Unfortunately, it is possible for a user's contribution to be split across two or more CPIDs, which causes problems with both platforms. On the BOINC side, your contribution is not accurately reflected, while on the Gridcoin side this will result in the user not being paid for their full contribution.
Over the course of the past month I have suffered a split CPID twice, prompting endless hours of reading up on forums, talking with the community, and fiddling around with my machines. Having now solved the issue, I have compiled below what I hope will help anyone else navigate their way through the problem. This includes background information, prevention, fixes, and how to navigate your way through the endless content on this issue.
Why CPIDs Exist
There are three different types of CPID addresses: host, internal and external. The host CPID address uniquely identifies each machine, and the internal CPID address uniquely identifies each user on each individual project server. The CPID this article focusses on is the external CPID, henceforth just referred to as 'the' CPID.
CPIDs are one half of the identity set you control that is used to match accounts between BOINC project servers. The other piece of information is your email address. For obvious security reasons, projects are not able to export user email addresses as plain text in their statistics files, as these are publicly available. Further, supplying hashed email addresses risks a brute force attack by hackers who could feasibly hash many email addresses to compare them with the hashed addresses issued by the servers. Therefore, CPIDs are used to tie together user statistics in the stats files.
How CPIDs Are Assigned
Every time a user signs up to any BOINC project they have not registered with before, the project's server will concatenate several pieces of data about the user, including a current time stamp. This data is then hashed by the MD5 algorithm to generate the user's internal CPID. The external CPID is the MD5 hash of the concatenation of the internal CPID and the user's email address.
As a result of the process involving a time stamp, no matter what you do, you will NEVER be able to generate your current CPID when signing up to a new project. This is a common myth, and you can go test it out yourself.
What Causes CPID Splitting
After signing up to a new project, your CPID will technically be split immediately due to the generation of a new CPID. However, it will not affect you yet, as you have not made any contributions to the project. What now happens is each host will try to independently agree on the correct CPID of the combined set of projects it is attached to, and then push for that CPID to be the CPID of these projects. For example, image this scenario on a host:
Project A has CPID XXX
Project B has CPID XXX
Project C has CPID XXX
Project D has CPID YYY
The host will decide on which CPID, XXX or YYY it will now advertise as the correct CPID. A lot of factors go into this decision, including the amount of work done for each project, and which project is the oldest based on sign-up date. Therefore, the host could side with either XXX or YYY depending on a lot of external factors. It is not a plain 'majority rules' system.
One of three things will happen to your network of hosts at this stage:
The new CPID will promptly be overwritten by the older CPID, and you will probably never notice it even existed.
The new CPID will propagate throughout your host network, and all project servers will recognise it as your CPID. This means your old CPID is lost (which is fine, but on the Gridcoin side of things means you need to advertise a new beacon).
The new and old CPID will continue to 'compete' and the projects, hosts, and project servers will continue to oscillate between the two CPIDs, potentially indefinitely. There are many users on the BOINC and BOINCstats forums who claim to have waited for over a month without the issue being resolved. The likelihood of this outcome grows exponentially with your number of hosts and project separation between them.
How You Can Prevent Your CPID Splitting
Before you connect the new project to all your hosts, connect it to just one host that has all your projects attached to it. This makes it very likely that the new CPID will be overwritten. Once you have confirmed with the project server that the new project has accepted your old CPID, connect the project to any hosts you like.
This still has a chance to fail, so here is a stepwise method that guarantees adoption of your old CPID (this is not worth the hassle if you are only running BOINC on a few hosts):
On an isolated instance of BOINC with no other projects, add the new project. Allow the project to sync with the project server.
Shut down your BOINC client (You need to make sure you shut down the manager and the background process actually performing the calculations)
In your BOINC data directory, find the file client_state.xml and open it with a text editor. Search for the tag <external_cpid>.
Replace the number (which will be the newly generated CPID) with your old CPID. Save and close the file.
Run BOINC, tell your project to update. Your client_state.xml file and the project server will now swap their CPIDs.
Refresh the account page of the project, and make sure it has registered your old CPID.
Repeat steps 2 through 4 to correct the CPID in your client_state.xml file.
You can now connect your project to any of your hosts without risking the new CPID propagating, as it has already ceased to exist.
Is My CPID Split?
You can check if your CPID is split by logging into the project server of each BOINC project you are contributing to and checking the Cross-project ID field. For example, here it is for Moowrap:
Your CPID should match between all project servers. If it does not, you have a split CPID.
How To Fix A Split CPID - Classic Approach
The classic advice for fixing your CPID is to ensure your email addresses match across all the projects you signed up to, and connecting all your projects to one host. The second part of this prevents isolation of projects, which means the same result could be achieved with either of these set-ups:
Host A has projects X, Y
Host B has project X, Z
Host C has project Z, Y
Host A has projects X, Y, Z
Host B has project Y
Host C has project Z
An example of the isolation we need to avoid would look like:
Host A has projects X, Y
Host B has project X
Host C has project Z
This would form two clusters. Hosts A and B make up cluster 1 and host C makes up cluster two. There is no way for the two clusters to communicate and come to CPID consensus.
After following these two steps, the common advice is to wait 'up to 14 days' for the CPIDs to settle. This advice works fine for small numbers of hosts, but not for power users. It also does not always work, and I do not understand why as it is a problem internal to BOINC.
How To Fix A Split CPID - New Approach
If your CPID has already split, it can also be fixed immediately. Follow these steps for the quickest and most thorough fix, especially if running a lot of hosts:
Ensure your email address is the same on every project server. While you can sync your CPID up with different email addresses, it appears the servers take far longer to reach consensus. I do not know why.
On one machine, add all projects.
Close all your BOINC clients (managers and background processes), except the one machine with all projects.
Update all projects on the remaining machine. Promptly close its client and background processes. All, or almost all, project servers will now display the same CPID. This is the CPID you are going to keep.
For every one of your hosts, find the client_state.xml file in your BOINC data directory. Replace all the unwanted CPIDs contained in the <external_cpid> tag with the desired CPID that most project servers are now displaying. Save and close the files.
Sign up to the account manager BAM!, and add all your projects. There is a CPID field in BAM! which will display your CPID. This is selected by BAM as the most common CPID across all the projects connected to it. This should be the CPID we decided to keep in step 4.
In your BAM project settings, set all projects to attach automatically to new hosts.
One by one, start up your BOINC clients and add them to BAM!, meaning all your hosts will be connected to both BAM! and all projects.
Wait a few minutes after the last host and check the CPID on all project servers. It should have conformed to the same CPID.
I am sure that many of you will ask:
Why bother changing the client_state.xml file at all? Why not remove all projects and add them back in with BAM!?
The reason is that if one of the project servers is still pushing for the wrong CPID, the hosts with that project and your manually altered CPID will push for a change to the new CPID. This will speed up the process, and reduce the chance the old CPID will return straight after going through the above steps.
How does each project server know which CPID to support?
Each project server will support a CPID based on several factors, some of which I can not figure out. I am confident that the percentage of hosts that connect to it for each CPID does play a role. I also know that the servers save every CPID connected to them, because I have seen an old CPID reappear momentarily after it was present on none of the hosts.
Why should I care as a BOINC user?
If your CPID is split, your contributions will be split across two accounts which muddies your statistics. Fixing the split will ensure all your contributions are returned to one account. You do not lose ANY credit as a result of the split.
Why should I care as a Gridcoin user?
If your CPID is split, you will only be paid out for contributions attributed to the Gridcoin CPID you have registered and are staking for with your wallet. All contributions registered to the other rogue CPID are minted straight into the void and lost. you cannot recover these coins, even after fixing the split. However, you maintain any RAC you built up during the split, and as a result your magnitude too.
Are there any security implications with this method?
Yes, as you can use it to register any unregistered CPID to your wallet with relative ease. A few days ago I noticed there was an unregistered CPID that made up 0.4% of team Gridcoin's BOINC contributions that anyone could have registered. Once a CPID is registered, it cannot be stolen as to collect its mint you would need the public and private keypair from the Gridcoin wallet config file.
I would like to thank all the BOINC and Gridcoin community members who have taken the time to add to this investigation. Without you all, I would have never gotten this one figured out. Special thanks to @vortac, @ravonn, @m3rcos1ty, @neuralminer, @nateonthenet and Deltik.