Distributed Data Processing - Freeing the Power of Information

Hello team! I'd like to share with you a short snippet from a project I am working on for BOINC/GRC. This is an early draft and should be treated as such.
I welcome all input, criticism, corrections to grammar, content, and spelling, and above all, open and ego free discussion on the subjects of:
data collection and processing
distributed processing
BOINC
If you would like to chat privately please feel free to PM me on slack.
Why Data?
An increased quality of living in a society often coincides with an increase in that society’s ability to gather and process data. For example, one of the most early examples of data collection and processing is likely related to farming, the technology which allowed humans to settle. Seeds were strewn about, crops grew, people collected data regarding quantity and quality of the harvest under varying conditions, and people processed that data to inform their future actions. In order to interact with data, humans use the technologies which sparked humanity’s data revolution: Language and Calculation.
As a result of this propensity for gathering and processing data, communities were able to form. Then towns, cities, empires and nations – society continued to collect magnitudes upon magnitudes of data. Which way do the winds blow? How do the seasons cycle? How do groups of people act, react? How do the gods act, react? Even the movement of the stars was analyzed and processed over thousands of years. Tools, from intricate machinery to the most detailed epics, were developed and improved humanity's ability to communicate and calculate. Over time these tools were refined and the human capacity for data collection and processing has, over time, only increased. The song, the story, the epic poem. Paper, the printing press, the internet. Addition, division, the abacus, the number 0. This human proficiency is what has given us our current understanding of our universe, our world, and ourselves. The quest to build upon this proficiency has driven countless major technological and sociological advancements.
The Z1
Humans relied on Language and Calculation to collect and process data for millennia. Automated binary processing changed everything.
Binary is a language that can be understood by humans. However, the human condition (mind, body, soul if you believe in that sort of thing) makes binary far less efficient for inter-human interaction when compared to our current language structures; It’s impractical to use binary for communication between humans. The opposite is true when binary is input to a machine instead of a human. It makes sense: Why would instruments of logic use human language, a tool ripe with context, metaphor, and idiosyncrasies. Languages and calculations were developed in ways that help human minds process data which humans bodies collect. Computers do the work of the human body and the mind in ways we can’t even fathom and at rates we’ve only just begun to explore.
In the 80 years since the Z1, computers have been collecting exponentially more data. Soil pH, particle density, climate trends, political will, social response, medical success rate, molecular result, particle composition, astronomical calculation, genomic structures, everything -- It could be said that the post-binary world is an infinite cacophony of data points touching every conceivable subject.
It could also be said that, for the most part, humans and our extraordinary minds have tried and failed to process this data. When developed, binary had the communication tool, but lacked the resources for calculation. Humans had the calculations, but lacked the language to deal with such large datasets.
Centralized Super Computers built, operated, and owned by individuals or organizations, have been the primary processing tools for binary calculation. There have been some insane super computers in our time, but access to their utility is hard to come by even for respected institutions, let alone the average scientist, and don't even mention a kid working on a project. In other words, access to a tool critical to improving civilization is controlled by a limited number of well-connected and wealthy individuals or organizations. This is not a bad thing, but simply how technology develops. Only the Egyptian elite kept records of society’s food supply. Only monks knew how to read and write. Only the government and universities had access to the early internet.
Consider: Language and Calculation was and is not taught to all people, but when it is, the standard of living of a society is, in general, greatly improved.
Distributed Processing
Distributed Processing takes a large data-set and splits it into manageable units. These units are distributed to multiple processors which are all connected by a unified network. These data set units are processed by the nodes of this network. Their results returned to the network and delivered to the original data host (or whatever the protocol of the network dictates). The host (or network) uses these results to formulate an answer to the best of its ability. Essentially, individuals or organizations offer their available processing power to individuals or organizations which have parsable data. This network of individuals and organizations creates an entity analogous with an enormous super computer and could be centralized or otherwise. If structured to prioritized volunteer oriented infrastructures which inhibit the utilization of task specific processors, the network would disintegrate the resource barrier inherent to rent-based super-computing. Under a decentralized system, everyone from first year CSC to top level researchers would have access to the world’s most powerful supercomputer.
Consider: the Bitcoin network is 50,000 times more powerful than the top 500 super computers in the world combined, and is only growing.
BOINC (The Berkeley Open Infrastructure for Network Computing)
BOINC is an open-source volunteer based distributed computing platform which provides scientists and enthusiasts with a means to host data that needs to be processed. BOINC has been operating since 2002 and has and continues to process data that helps map the Milky Way, detect asteroids, find prime numbers, fold proteins, test chemical and molecular combinations, Search for Extraterrestrial Life (SETI), and more. BOINC projects are created at no cost by anyone who has generated or gathered parsable data and has formatted that data for the BOINC network. This data can be anything – Scientific, mathematic, social, political – anything. This data can be processed at only the cost of electricity by anyone with a computer, from a cell phone to a server, connected to the BOINC network.
This structure of volunteer hosting and processing distributes the power of information among the proletariat instead of solely among those who possess the resources to process large volumes of data. It similar to making a public school system which prioritizes teaching reading, writing, and mathematics.
Further, BOINC is designed in a way which limits the ability of task specific processors (ASIC machines and GPU farms). Instead, it encourages the utilization of idle processors such as a personal computer when its user is asleep or at work. This model has the potential to scale limitlessly as the internet of things seeps further into digital cultures.
Hi @jringo, nice wrap up of the past out of a very specific angle. I think that this strikes a point. We just need to see that we do not use all our compute power to calculate quite useless hashes :)
Second paragraph, towns probably shouldn't be capitalized.
Couple of sentences after should probably read "How do groups..."
If you want a fun example of early data gathering related to science you should check out this article.
Interesting read though!
Oooo yes! excellent example of human data gathering and processing which informed future actions. They ended up raising entire cities, such as Chicago, to put in sanitation systems after John Snow!
(Who says Jon Snow knows nothing... eh? xD)
And thank you for catching those! I have corrected the errors. = )
Nice article, but what is the direct purpose? Public education about BOINC?
These three paragraphs are meant to explain why data gathering and processing are important, what distributed computing is and how it can benefit society, particularly if structured through distributed or decentralized governance, and how BOINC fulfills the goals of volunteer based distributed computing.
Since they are part of a larger piece, it's hard to say the direct purpose, but education is a major aspect.
I posted them to try to spark a conversation = )
I don't really agree with the last paragraph. BOINC was not designed to not be a coin mining service or to be an alternative to other mining as I read between the lines. BOINC existed well before coin mining.
It's the flexibility which BOINC offers which makes it hard to optimize. And it is this same flexibility that will make it hard to make it enter IOT devices. Except maybe the bigger ARM powered ones.
We agree that BOINC was not made to be a coin mining service.
Also that the lack of BOINC project uniformity is an issue to overcome in the future, but I think the benefits of BOINC's flexibility outweigh its drawbacks. Particularly because I think BOINC can be built as an ASIC resistant platform without losing too much of its flexibility.
I think we mostly agree but that paragraph just isn't very clear = )