Ten Rules For Development of Biological Databases

in #science7 years ago (edited)

Introduction

Today I read the article, Ten Simple Rules for Developing Public Biological Databases in the Open Access PLOS Computational Biology Journal. The article is by Mohamed Helmy, Alexander Crits-Christoph, and Gary D. Bader. It was published on November 10, 2016.

According to the authors, there are a large number of public biological databases with varying degrees of quality. Some are sophisticated, user-friendly, stable, and professional. Others are difficult to use, aging, and contain unreliable data. In order to improve the quality of data available to biologists, the authors suggest 10 rules, which are detailed below.

computer-1294359_1280.png
[Image Source: Pixabay.com, License: CC0, Public Domain]

Ten Rules:

1. Don’t reinvent the wheel.

Begin with a comprehensive literature review, in order to guarantee that your database is consistent with related utilities in the field.

2. The three most important things in database development are data quality, data quality, and data quality.

Develop operating procedures and quality standards to make sure that your database has the highest possible quality of data.

3. Know your audience

Is your audience technically sophisticated to compose their own queries, or do they need a web interface? What will they do with the data? What tools and APIs will they need?

4. Use modern technology

Technologies like HTML5, CSS3, and JavaScript are recommended. So are reusable tools such as Twitter Bootstrap, javascript libraries, Shiny for R, and Django for Python and freely available databases like mysql and nosql. Mongo DB and Apache Lucene are recommended for spreading databases across arrays of servers.

5. Put yourself in your user’s shoes

I can't say it better than the authors:

The process of graphical user interface design should be heavily influenced by principles of consistent and appealing graphical design, information visualization, and user-specific needs (see Rule 4).

6. Keep search simple and organized

Search should be quick and easy. The output should be well organized.

7. Give users data where they need it

Some users want to access the data online interactively, others want it offline in a spread-sheet or some other utility. The database should be designed to present the data where it will be desired.

8. Support open science

Publish your data model in a journal and your source code in github.

9. Tell the world

"If you build it they will come," is usually wrong. In fact, the database needs to be widely promoted in order to receive use. The authors provide these steps:

  1. Publish an article describing the database
  2. Index your web site in search engines.
  3. Register your database in specialized online directories.
  4. Promote your database in scientic conferences and meetings.
  5. Monitor online user groups and inform them as appropriate.
  6. Actively use social media to attract users and keep them up to date.

10. Maintain, update, or retire

The authors provide a set of guidelines to describe what this means:

  1. Use professionally managed servers
  2. Use virtualization technology
  3. Make the database available for download/mirroring.
  4. Make regular backups.
  5. Make your URL institution-independent.
  6. Automate monitoring and testing of the system availability and functions.
  7. Provide a mechanism for bug reports.
  8. Choose free and popular development technologies.
  9. If it's outdated and can't be maintained, shut it down and create a public archive.

Conclusion

The rules are intended for biological databases, or "online libraries that contain structured information about living organisms," but really, they are good guidelines for many types of public databases. For some reason, I still remember the software development life-cycle from a college Systems Analysis text book, circa 1989: "Survey, study, define, select, acquire, design, construct, deliver, maintain." Sadly, I have no idea what the text book was, so I can't cite it (but I found it in a diagram here on the last page). It's nice to see that these rules from HCD fit loosely into that model.


For more information, please read, Ten Simple Rules for Developing Public Biological Databases by Mohamed Helmy, Alexander Crits-Christoph, and Gary D. Bader.


About the Author: @remlaps is an Information Technology professional with three decades of business experience working with telecommunications and computing technologies. He has a bachelor's degree in mathematics, a master's degree in computer science, and is currently completing a doctoral degree in information technology.

Sort:  

Thank you for taking the time to break this article down for us all and providing us with the opportunity to learn a bit more about how databases can be used in the hard sciences.

As a bonus, and in addition to resteeming for exposure. We are awarding you a small 5 Steem Power deposit as a thank you for creating quality STEM related postings on Steemit. We hope you will continue to educate us all!

https://steemit.chat/channel/steemSTEM

What a pleasant surprise. You're welcome, and thank you too!

I find the same kinds of problems with all sort of academic software. Build it, publish it, and move on to the next grant application. That's why open software development as part of a larger community is so so so important.

Coin Marketplace

STEEM 0.30
TRX 0.11
JST 0.033
BTC 64243.42
ETH 3152.93
USDT 1.00
SBD 4.28