How to write good quality research software
This is a guide to how to write what we consider to be good quality research software. It collects together the advice we give to researchers at the University. It is not exhaustive. Please get in touch if you think we are missing something or there is something you think we should change.
What is "good quality" research software?
There is no one right or wrong way to write research software. Indeed, there isn't even one good description of what research software is and is not. For example, an Excel spreadsheet with embedded code is definitely a valid piece of research software, and produced well can definitely be considered "good quality". Short data analysis and visualisation scripts, or code to automate or control robots, collect sensor data etc. are just as valid and valuable pieces of research software as decades-old supercomputing codes.
A good indicator of "good quality" software is evidence that the author(s) have thought about the users or audience of their software. How easy is the software to install and use? How well described or documented is it? How trustable and verifiably correct is the software? How easily can the software be adapted to solve the research problems of its user community? How efficiently does the software use precious compute resources and its energy consumption when solving those problems? The better the answers to these questions, the better quality the software is likely to be.
How do I write good quality software?
How to write research software thus really depends on its scale, the intended audience of people who will use the software, and how long the software is expected to be "alive" (used, developed and maintained).
In our opinion, a good piece of research software has;
- some idea of who the expected audience of the software will be,
- some idea of how long the software is expected to be useful, of use and supported,
- some documentation explaining what the software does,
- some verification checking that the software correctly does what it says it does,
- and a license that gives permission for others to copy, use and, if appropriate, further develop or build on the software.
Why do I need to decide on an audience?
It is important to know, or at least have a fair guess, of who you expect will use your software. This will help you pitch the documentation and user interface for your software at the right level for that audience. For example, if you only expect people in your specific research field to use your software, then you can assume a base level of domain-specific knowledge.
The size of your expected audience is also useful to know (or guess). If you expect your software will have a small audience (maybe just your future self and immediate colleagues?) then you won't need to write as much documentation as you would if you expect to have a large audience. As well as size, you will take a different approach to writing documentation if the audience is local (your research group) versus global (anyone who finds your software online). A local audience can just talk to you to answer any questions, while you probably want more explanatory documentation and a formal communication route (e.g. GitHub Issues) for a global audience.
Why do I need to decide how long the software is expected to be useful?
Software needs to be maintained and cared for to remain useful. New users will have questions that need to be answered, there will be bugs that need fixing and new operating systems, processors or language versions to which your software may need porting to. In the best case, a community of users and developers will grow and your software will become the foundation of a large body of work. Keeping your software updated, and managing and growing your software community will take a lot of (often unpaid) time and effort.
Because software takes effort to stay useful, it is important to know (or guess) how long that effort could be provided, and thus how long your software could "live". Some software is written only to be used on a single project and is not expected to be useful for more than a few months. Other software is written as part of a large endeavor and will be expected to be useful and maintainable for decades. Obviously, the amount of thought that has to go into documentation, verification, community management, fund-raising etc is much greater for software that is expected to live for decades versus months. Equally, when you start a new piece of software, it can be helpful to be realistic about how much "free" effort you have available to donate to maintenance and community management. If you have little time, or your software is only needed for a short time, then it may be better to write your software as part of a larger community project than trying to write something new and standalone. Contributing something small to an existing larger project is often a better use of your time than writing a whole new software framework from scratch.
Why do I need documentation?
Documentation is the most important feature of good quality software. Software you write for yourself tends to have little documentation. This is because you already know what the software does and how to use it. Documentation is needed when you want to communicate this information to other people (and, in particular other people who find your software online and who don't know you). Good documentation will;
- clearly describe the purpose of the software, and the kinds of problems that it can be used to solve,
- provide information about who wrote the software, and how to ask questions, raise issues or report bugs,
- provide installation instructions so that users can install the software on their own computers,
- provide at least a single example of use of the software, so that new users can see how the software is used and try to reproduce those outputs.
Excellent documentation goes even further. Excellent documentation will;
- include tutorials that show how the software can be used to solve different types of problems,
- include detailed guides for developers to help them understand how the software is written and to begin making contributions of their own,
- include more information about how users can get help, perhaps even including "frequently asked questions",
- include a workplan or roadmap so that the community can see what features are planned for development in the software,
- include a changelog and release plan so that the community can have an idea of when they should expect new releases and how often they come out, and can see what is changed between releases
Why do I need verification?
Verification is a way of demonstrating that your software correctly completes whatever task is promised in the documentation. Verification builds trust, as it helps potential users of your software validate that it is working, and that it "does what it says on the tin".
At its simplest, verification can be as simple as providing some input and output files with your software, and then encouraging users to verify that the output files they produce with their copy of the software on their own computer are identical to the ones you supply. This lets users check that their copy of your software behaves in the same way on their computer as your computer.
As your software grows, you will find that this simple method of verification becomes unwieldy. This is a good point to start adding testing to your software. Testing means adding small tests that can be run automatically on parts of your code, and that validate that the outputs match what you expect. These tests could be on large parts of the code (so-called "integration tests"), or they could be on individual functions or small units of your software (so-called "unit tests").
Adding these tests will enable you to automate the verification of your code. You can use tools, such as GitHub Actions, to automatically run your tests whenever you make any changes to the software and commit them to Git. This automatic running of tests on commits is called "continuous integration". Continuous integration can be used to add badges to your Git repository that show that your code is tested regularly and that the current version passes all tests. This really helps build trust in your software, especially if you are aiming for a global audience of people who won't necessarily have met you or know who you are.
Our Best Practices in Software Engineering workshop provides more help on how to add unit tests to your software.
Why do I need to choose a license?
Every piece of software that is written is protected by copyright. A license is needed to give permission for everyone other than the copyright holder to share and, if permitted, build upon or modify the software. Our How to License Research Software explains why a license is needed and provides guidance on which license to choose for your software.