I am an economist by training and (gladly) stumble upon the field of Data Science and Artificial Intelligence. You can see that my coding experience/training is very minimal during my undergrad days. I tried my hands at Java programming as a module with credits and I passed miserably during the course because I can never get the syntax right, or in other words, the compiler can never understand me...
During the Java programming class, the lecturer was very focused on teaching the syntax and how Java behaves (i.e. Data Structures and the associated characteristics). Thus I was not made known of any coding best practices...sadly.
In this blog post, I will be sharing the coding best practices that I know, that can help non-Computer Science folks to quickly adopt it and make coding work easier, easier maintenance and lesser frustration.
Readability
Let me define what "Readability" means. It means that people, from layman, beginners all the way to advance coders can understand what you are doing by reading through your codes. That means when you email your codes to another person. The other person can understand your thought process behind the codes, without you being around to answer questions.
Why is that important? Well...have you ever experienced a time when you are immensely enjoying your nice holiday, only to be punctuated by the handful of calls from the colleague that is covering you? Or perhaps, have you experienced a time when you come back to the program a month since you worked on it and forgotten why you code it one way and not the other?
- Commenting
The first step to making your codes readable is to comment on your codes. Share with the future you and your colleagues what are you doing with the codes and perhaps state down why it is done one way or the other.
Learning to comment well takes trial and error but it is a good time to start early in your data science career.
- "Simple" codes
Most open-source programs have short cuts. For instance in Python,
a+= 1
this is equivalent to,
a = a+1
Layman or beginners can only fathom a guess on the former, which we want to avoid causing any misunderstanding whereas the latter is more straightforward and even layman can understand what the code is doing.
Your question might be, "When do we do the former then?" Well, when your organization is very mature in using codes for day-to-day work where there are a lot of experienced programmers and users. Then the former can be used with less risk of mistakes.
- Environment & Parameters Setup
Most of the time, in data science work, there is a need to set up the coding environment, for instance, calling for certain packages/libraries. Or perhaps, the same set of programs (analysis) need to be run for a different time period and thus parameters (to set the different dates) need to be set up.
These are considered important setups thus they should be stated at the top of any programs that you have written. This allows anyone to know how the environment was set up, if not codes may not work per expectations.
Maintenance
In most of the open-source software I have come across, namely R and Python, I realize that there are actually "many roads to Rome". This means that there are many ways to create a solution that we need.
As such, we do need to have some planning when we are coding. What we are looking at is the following:
- The minimum amount of code to achieve the same outcome. Lesser code, lesser maintenance.
- Be efficient with back-end resources needed for instance, I/O, memory, cache, computing time.
Being able to code with maintenance in mind will take practice. The more one codes, he/she will start to be more aware of coding for easier maintenance in his coding process.
Practice Makes Perfect
At the end of the day for coding, the more you write, the better you will get taking into account readability and maintenance. This was the experience I had when I was coding a lot. So I strongly urge you to start coding, learn to program and wherever possible keep in mind the above that I mentioned, making sure all levels of people can understand what you are doing with your codes and also making your codes run more efficiently.
If you have more coding best practices to share, feel free to link up with me on LinkedIn or my Twitter (@PSkoo). Thanks in advance! :)