"TBD: a new model for volunteer computing" - David Anderson

in #boinc7 years ago

TBD: a new model for volunteer computing

David Anderson, 1 June 2017

A summary of what David Anderson has been doing and thinking about recently.

Originally posted over on the official BOINC website by David Anderson, reposted here to maximize exposure to this news. I'd greatly appreciate any comments regarding the contents of this post so as to provide David feedback from sources outwith the BOINC community, cheers!


After 20 years, volunteer computing has had successes, but has not approached its potential.

  • VC was supposed to enable ground-breaking research by providing more computing power than was available or affordable otherwise. This has happened but only to a small extent. The set of VC projects has been small and essentially static for 10 years. Of the scientists who use high-throughput computing and could benefit from VC, only a tiny fraction actually do.
  • VC was supposed to greatly increase global public interest in science; this has happened but only to a small extent. The volunteer population is almost entirely from a single demographic (older, IT-savvy males) and has been gradually shrinking for ~10 years.

These problems can be traced to BOINC's original structural model: the "project ecosystem" model. In this model, there's a dynamic ecosystem of competing projects, the public learns about them and make informed choices, the best projects get the most computing power, and the public learns and gets excited about science. BOINC is designed to encourage this model (e.g. cross-project IDs and credit).

The model was based on several assumptions:

  • It's sufficiently easy to create and operate a BOINC project that almost any computational science research group can do it.
  • Other than providing the software and a list of projects, BOINC should have no centralized functions or control; projects are autonomous.
  • Volunteers will evaluate the projects (by reading their web sites) and make rational decisions about which ones to support. Furthermore, they will do this repeatedly as new projects arise.
  • Projects will compete for volunteers by making compelling web sites that explain and promote their research.

The model didn't work as envisioned, for a number of reasons:

  • Creating and operating a VC project is harder than we realized: it requires a combination of resource and skills (Win/Mac programming, sysadmin, DB admin, web design, PR/outreach) that few academic research groups have.
  • For a research group, trying to use VC is a risk. There's a substantial investment, with no guarantee of any return, since no one may volunteer. Adding a VC component to a grant proposal adds uncertainty and weakens the proposal.
  • The computing needs of many research groups are sporadic - e.g. they need a big chunk of throughput every now and then. For such groups, buying computing time on a commercial cloud may be cheaper than using VC.
  • Attracting volunteers is a marketing exercise. It's difficult to do effective marketing when there are dozens of competing brands (i.e. projects names).
  • Most volunteers aren't interested willing to survey and assess a large set of projects once, much less repeatedly.
  • We made little effort to interface, technically or politically, with the mainstream HPC/HTC world (Grid, Supercomputing, Condor, etc.). They came to view VC in negative ways: as a threat, a gimmick, etc. Around 2006 there was a brief and small interest in VC in academic computer world. Since then, nothing: no conferences on distributed computing list VC as a topic of interest. This has been damaging to VC; e.g. no one is working on solving the hard problems that arise in VC (such as how to grant credit).

A new model

I think we need to take what we've learned from the project ecosystem model and make a new and better model. I brought up this idea at the 2013 BOINC workshop, and proposed a model consisting of two related parts:

  • Partner with existing HTC computing providers such as supercomputing centers and science portals to add BOINC-based back ends. These projects would be operated by the provider's staff. Tens of thousands of scientists use such computing providers. These scientists would benefit from lower queueing delays, higher throughput and lower cost. But they wouldn't need to do anything; they wouldn't even need to know that VC is being used.
  • Create an account manager (let's called it "TBD" for now) acting as the primary volunteer interface. TBD lets volunteers express their preferences in terms of keywords (scientific areas and locations) rather than selecting specific projects. Based on these preferences, and corresponding keywords of projects and applications, TBD dynamically assigns computers to a set of vetted projects, which would include both existing (single-group) projects, as well as the new computing-center projects.

On a technical level, the new model is enabled by our ability (thanks to Rom Walton and various people from CERN) to run jobs in virtual machines, and recent refinements by Marius Milea to support Docker on top of this. This makes it possible for HTC providers like TACC and nanoHUB (which already use Docker for app deployment) to run hundreds of existing applications with no porting or other per-app work.

This model addresses most of the problems with the previous one. Notes about the new model:

  • It doesn't interfere with or preclude existing BOINC activities. Current projects continue as they are. Scientists can create new single-group projects if they want. Volunteers can attach individual projects as they currently do, or use existing account managers like BAM! and Gridrepublic.
  • TBD will act as an allocator of computing power. This will be based in part on user preferences, but there will of necessity also be a higher-level allocation policy, decided on by an organization. The decision process should include merit and need; it may include politics and money as well. NSF has an organization - XSEDE - that does this for NSF-funded computing resources. I'm in contact with XSEDE, and hope to include them in TBD. Involving NSF in the process is important; but this project needs to be international. This part of the model needs to be worked out at a high level.
  • The model focuses on large HPC-provider-level projects, but it actually encourages single-group projects, since they can apply for an allocation from TBD and be assured of computing power prior to making any investment.
  • TBD can serve as a brand for VC marketing purposes. It will also provide a basis for corporate partnerships; if technology or game companies want to support VC, they can support TBD rather then having to select individual projects.

Funding status

In 2014 I started thinking seriously about the new model, and I teamed up with two mainstream computing providers as test cases:

  • nanoHUB, at Purdue University, which is a nanoscience portal. It provides web interfaces to computational tools, used by thousands of scientists, many of which create HTC workloads well-suited to VC.
  • Texas Advanced Computing Center (TACC) is a major supercomputer center. A good fraction of their jobs are well-suited to VC.

Our goal is to create success stories that inspire all HTC providers to add VC back ends.

In 2014 we sent a proposal to NSF; it got good reviews but was rejected. We revised and resubmitted the next year, and in 2016 we were given 1 year of funding and encouraged to re-apply again. We did, and recently learned that our latest proposal was funded for 3 years, starting this month. Yippee!

The proposal text is here, and the 1-page summary is here.

We didn't get all the money we asked for, which is par for the course. We got enough to pay my salary, and 50% salary for my collaborators at Purdue and TACC. I had hoped to be able to hire a web designer here at UCB; maybe I can find other sources of money to do this.

Relationship to BOINC

In 2016 BOINC became a community-run project; I don't control it. The new project, TBD, will be separate from BOINC. I hope that the BOINC community likes and supports TBD, but some people might not, and I don't want to step on their toes. Of course, I'm interested in hearing comments and criticisms about TBD, and in discussing it.

I've been mostly MIA from BOINC for the last couple of years, because I've been working full-time on other projects. I apologize for this. With this new funding, I'll be able to devote a good chunk of my time to managing and contributing to BOINC, e.g. setting up a functional release management process.

I suspect that relatively few current volunteers will use TBD; it's more for new users with wider demographics. So current projects won't lose computing power, and they should get additional power from TBD. Long term, I think something like TBD is our only hope for going from a few 100K volunteers to millions or tens of millions. And such a rising tide will float all of our ships.

To implement TBD, I'll need to add some features to BOINC, e.g.:

  • The client will pass credit estimate information to account managers.
  • Account managers can send clients opaque data to be passed in scheduler requests (preference keywords in this case).
  • The scheduler will have a "keyword matching" option that takes user and job keywords into account. E.g. it will preferentially send biomed jobs to volunteers who want to support biomed.

These features will have no impact on existing projects.

The BOINC web site will link to TBD as well as BAM! and GridRepublic.

The TBD source code will be released under LGPLv3, and will be stored on Github. We'll welcome code contributions.

Names

I've been through a few names for TBD. The latest proposal calls it "Science United". This is OK, but it's a bit long and uninteresting. Also it conjures the ill-fated "United Devices", an early attempt at commercializing VC.

I thought about names starting with "Sci" and came up with:

"Sciborg": volunteers are assimilated into a collective intelligence. Too ominous.

"Sciphon": like we're siphoning off computing power. Has connotations of stealing gasoline.

"SciOn": where the "O" is the power-button icon. Power up Science! I like this one, though Scion is also a former car brand.

The bottom line: computer nerds shouldn't invent brand names. Hopefully I can get help from marketing/branding experts from the business world. The UCB business school teaches classes in this sort of thing.


Copyright © 2017 University of California. Permission is granted to copy, distribute and/or modify this document under the terms of the GNU Free Documentation License, Version 1.2 or any later version published by the Free Software Foundation.

Sort:  
Loading...

This could be a good opportunity to work directly with David Anderson to involve GridCoin in this venture. I think that providing CryptoCurrency (especially as a separate, optional incentive) has proven to be pretty big in terms of encouraging/incentivizing volunteer computing.

It wouldn't need to be anything explicitly endorsed by the project, as it never has been with BOINC. It would just be helpful to at least get some official support for making sure the devs at GRC can have access to the data they need to incorporate this into the reward mechanism.

I think that once we remove the mandatory team requirement, that such a collaboration would be entirely possible!

Absolutely. We need only access to some sort of reliable tracking mechanism and then nothing stops us from incorporating this right in to the GridCoin platform :-)

Great post, seems like a great project. But I didn't know what some stuff meant :P sorry . Upvoted and followed. Ill stay tune for next post! :D

Great post thanks man for sharing.

Coin Marketplace

STEEM 0.16
TRX 0.16
JST 0.030
BTC 59488.68
ETH 2538.17
USDT 1.00
SBD 2.52