Most of us have already seen something like "Must be able to write Clean Code" in the list of requirements for some data science and/or software development opportunity, but what does it mean, and why is it important?
Robert C. Martin (Uncle Bob), one of the professionals behind the Agile Manifesto (2001), extensively explored the principles of Clean Code on his book Clean Code: A Handbook of Agile Software Craftsmanship, originally published in 2008.
In short, Clean code is about being professional, writing maintainable and testable code to enable programmers to be more productive. It comprises a set of techniques that allows developers to go fast on the long run.
This post sumarizes some of the main topics covered in Uncle Bob's book, but the complete reading is highly encouraged.
Why Clean Code matters?
Unless we talk about a project in its initial stage, in most scenarios, a lot of the software development work is related to maintenance and extension of existing code, meaning that a lot of time is spent reading them rather than just writing new lines.
Not following the principles of Clean Code can drastically increase the time needed to read and fully understand a piece of code, leading to bottlenecks and large losses. In extreme circumstances the development speed can slow down almost to a halt.
Clean code is a great way to go fast and have productivity on the long run, and in the following section we explore how to write it.
How to write Clean Code?
To make this post lighter to read and facilitate future consultations, this section is divided in a few lists roughly summarizing what I believe are the most important and easier to implement principles of Clean Code, leaving denser and more advanced topics, like S.O.L.I.D., for future posts.
Chapter 2 - Meaningful Names
Any software is filled with names, and choosing them wisely can take time, but significantly improves code readability. When choosing names for variables and functions, as well as files and directories, keep the following in mind:
- Names should be relevant and explicit. No relevant information should be just implicit;
- Names should encode intent, and should reveal information regarding what something is and what it does without the need of comments;
- Names may be long if necessary;
- Names should be intuitive and searchable. Using pronounceable names may help with that;
- Prioritize verbs to name functions and methods, and substantives for classes and objects;
- Avoid using "magic numbers" (unexplained constants). Use named constants instead;
# Bad Example vf = vi + 9.81 * t # Good Example gravitational_acceleration = 9.81 final_velocity = initial_velocity + gravitational_acceleration * time_in_seconds
Chapter 3 - Functions
A lot of the aspects taken into account when writing clean functions, also apply when writing clean classes, covered on Chapter 10 of the book.
I will explore that in more detail on a future post about S.O.L.I.D., but in short, some of the key factors to consider when writing functions are:
- Functions should be small. They should have one single responsibility and accomplish it in the simplest way possible;
- A function should be reusable in multiple places throughout the code to make maintenance easier on the long run;
- Don't Repeat Yourself (DRY):
- If you find yourself repeating a piece of code, evaluate if it could become a function;
- This ensures no ambiguity and applies to different aspects of the project's development, such as documentation, tests and databases;
- Use the least amount of arguments necessary. Too many arguments, specially boolean ones, make it harder to respect the single responsibility rule;
Chapter 4 - Comments
Comments do not make up for bad code, and they should often be avoided, but if you find yourself in a situation that comments are really necessary, consider the following:
- Do not explain the code. A well written code should be self explanatory, and there should be only one source of truth which is the code itself;
- Comments lie:
- While code is refactored, comments are not always updated and eventually are not representative of the code where they are inserted in;
- A comment that is false is worse than the absence of a comment;
- Instructions for other programmers and rationale behind decisions might be worthy of comments, but should be kept to a minimum, and should be reviewed and updated as the code evolves;
- Comments that generate documentation are good. The further removed your documentation is from the source code, the more likely it is to become outdated. Embedding the documentation directly into the code is sometimes a good strategy;
Chapter 5 - Formatting
- Formatting is about communication within a team, and good formatting improves code readability and maintainability;
- It can be done with indentation, vertical and horizontal alignment, spaces and other IDE rules;
- There are some tools that automate part of the formatting job in some languages. Black is an interesting option if you are working with Python;
Chapter 7 - Error Handling
Things can go wrong, but the professional programmer makes sure that the code always does what it is supposed to do. When handling errors, consider the following:
- Use exceptions over error codes:
- Exceptions are easier to debug;
- When using exceptions, it is possible, but not necessary to handle all possible cases;
- Treat exceptions and try-catch blocks appropriately:
- Your "catches" should always lead the program to a consistent state;
- You should be able to determine the source of errors, so create informative messages to go along with any exception;
Chapter 8 - Boundaries
It is very common to work on projects that depend on third-party software, packages and libraries. Sometimes they are bought, sometimes we rely on open source projects, and sometimes we use code developed by colleagues from our own organization.
Either way we must set the boundaries to integrate foreign code into ours. To keep your code clean, consider the following:
- Write tests for the third-party code:
- It is a great opportunity to learn how to use them;
- It enables us to detect behavioral differences when there are new releases of the third-party packages;
- Write wrapper APIs:
- Your code remains unchanged when updating or migrating (only the wrappers change);
- Testing updates and migrations becomes easier;
Chapter 9 - Unit Tests
A code is only really clean once it is validated with tests, and test code should also follow Clean Code principles already mentioned. When writing tests, also consider:
- One assert per test is ideal;
- Tests should be F.I.R.S.T.:
- Fast. Should run fast to enable frequent execution;
- Independent. Should not depend on each other. You should be able to run only a small group of tests, and there should not be a cascading effect when something goes wrong;
- Repeatable. Should be repeatedly executable in different environments (Q.A., production, development, etc.);
- Self-validating. Tests should return True or False so that errors are not subjective and don't require specific knowledge to interpret the result;
- Timely. Tests should be implemented along with the code (ideally before the code), so that the code never gets too complicated to be tested;
Boy Scout rule
One last advice. Whenever writing code, always try to follow the Boy Scout rule:
- Leave your code base cleaner than you found it:
- If it is safe to do so;
- If your code is already covered by tests;
- Change names of variables and functions, maybe breaking down large functions into smaller ones, but without refactoring the code so you don't waste time;
- Writing good code is not good enough. The code has to be kept clean over time, and we must play an active role in it;
What are the results of Clean Code?
If you consistently follow the principles of Clean Code, you will have:
- Readable, testable and maintainable code that is easy to change, validate and extend;
- Productive developers that will easily implement new features, and be happy working on the code;
- You will have no surprises, and everything will behave as expected;
- Your project will be more scalable;
Last but not least, according to Michael C. Feathers, also involved in the early Agile movement:
“Clean code always looks like it was written by someone who cares.”
Did you find this article valuable?
Support AutoML Station's team blog by becoming a sponsor. Any amount is appreciated!