On Data, Code and Organization
So, I don't often post about the arguably drier side of my life. I have hopes of helping fellow Steemian's who might be looking to start businesses. If this isn't you, there may still be some content you will find useful (written in laymen's terms).
So outside of being an aspiring farmer, cryptocurrency enthusiast, and home brewer (I will post on this soon I promise), I am a nerd. It's true. I have been throwing lines of code down for over 20 years now and I have learned a ton along the way.
Often times data is the heart of the system where as the code is sort of the nervous system or brain in some cases. There are countless non-optimal ways of doing data and development. Standards, structure and consistency are key to building exceptional systems with that require less maintenance and redesign.
What I have here for your review are a few lessons I have learned along my path. They may seem simple, but often are not practiced by companies.
- Get yourself a good architect (or two), it will pay dividends. No one wants to hear this one, it is like suggesting someone find a good CPA or Lawyer. Programming isn't that hard, you wrote something in school. Databases are easy to setup these days, you could have a stack built on a virtual machine in minutes and start building. True. None of these tasks are that difficult to do, but they are very difficult to master. Knowing how to industry standard best practices and how to avoid issues that will surface years down the road are what you are looking for here. Like a CPA, you may not need them on staff, but consulting to do sanity checks or periodic reviews will pay off.
- K.I.S.S. - Not every issue you will encounter is worthy of an exception. Design modularly so you can reuse code, function test and adhere to standard naming conventions / design principles. If you want this system to last and scale, you cannot approach it like creating a spreadsheet. If what you are doing seems that complex, you might need to rethink, redefine, and refactor.
- Design considerations. These days you have hype about cloud (other people's computers), big data, nosequel, IoT and a million other things. All are great solutions to specific issues and I literally couldn't count the number of meetings I have been in where professionals we're trying to apply the wrong tech to a simple problem.
For the sake of simplification, let's stay in the world of relational databases here. We have to basic high-level considerations:
a) Is this a transactional database where people will use applications to insert and read data? Or
b) Will this system be used for analysis and reporting?
Each of these have nuances but the designs should look radically different. Transactional databases need to be structured in a specific manner which eliminates the storage of duplicate values by breaking optional repetitious and optional data into new tables to form a relational model.
Reporting systems often need to Extract, Transform and Load (ETL) data from various sources into a data warehouse or storage. If done correctly, the relationships should resemble the spokes of a wheel (often called a star schema). One noble goal of such system's is to have all of the data considered disposable (meaning you can throw it all away on a Wednesday night and have it all recreated by Thursday morning).
- On that note, what is your disaster recovery plan. Failing to plan is planning to fail when it comes to data. Seriously, protect your investments. Prove out your plans and version control your code as well as database objects.
- Design with intent to scale and internationalization. Stop limiting your success! Eventually, you will want to add a new product, integration or something. Now is the time to plan to weed out rookie mistakes (time stamps without time zones, non-localized constraints, triggers updating a table in another database, variables named after snack cakes or other nightmares) and build consistency into the model. Do this early.
- Consistency is key... Even if you are wrong consistently, it will be it will be easier to fix. Naming conventions, code structure, data types, schema design... All of it.
For example, a given application has a database. Anything else that wants to read from or write to it's database must use an API. The application is then updated to use the API as well for common functions to eliminate duplicate code and ensure consistency.
The theme of consistency should be carried to the API as well to standardize requests and replies. Pass this API call an integer (ID) and it returns a string (first name), ect. This makes clean code where you aren't waisting time handling unknown or missing values.
- Use the right tools for the job. Listen, there is no reason for most startups to have enterprise level monitoring systems. Similarly, there is no excuse for an Enterprise not to be aware of systems issues. Monitoring is important at all levels, however there is economy of scale here.
- Leverage the Data you are collecting or plan have a plan to scrap it. Strive to make data actionable; empower users with the information they need to make informed decisions. That's what it is all about isn't it?
A good approach is to ensure the lowest level users success and you will get better quality data for analytics. Just consider that feature/enhancement request from the people adding data as a requirement and add some icing on their cake. Seriously, I couldn't count the number of times this has made the difference in projects I was working on and it is almost always low hanging fruit.
- Garbage in, garbage out. I remember taking a class in JAVA after having already developed in it for years. I was marked down on an assignment for sanitizing my inputs. Long story short, I put the teacher on blast for perpetuating bad practices in my industry and had the School's Dean review the grade (which was fixed). Data cleansing starts in the application. For example, an Email address has a Max length of 255 characters, must contain only one "@" symbol followed by characters and end in a "." followed by at least two letters.
- Security can no longer be an afterthought. We are WAY past that point. You need to be testing code in development as well as servers and other attack vectors. Know your weaknesses and have a plan to monitor, log and protect them. This includes social engineering vectors like free thumb drives laying in the parking lot.
My final thoughts here are good developers are lazy (in a good way) and good DBA's are focused on ensuring the availability, security and quality of both data and data objects (schema). With a collaborative environment, this is a powerful combination of skills that is difficult for one person to pay equal attention to. Identify, address and remove toxic person's from the team as soon as possible.
Hope this helps. If you have questions, please do feel free to leave a comment. I tried to write this in English, however, at times Tech does seem to come out.