Code, Documentation, and Organization

Most programmers do not write good documentation of their code. That’s a known fact. That’s not good because it makes reading code written by others, even by themselves, really hard. But that’s understandable because most programmers want to try new ideas quickly and get results. Many programmers started their code by thinking and structuring it carefully before writing the first line of code; then they needed to make some structural changes or add/remove certain features, and everything began to go out of control; eventually the code may become lacking good structural organization.

A long time ago, Donald Knuth invented Literate Programming (LP) as a solution to the above issues, especially the code-documentation one. The basic idea is that programmers start out by organizing their thoughts into a document (often written in LaTeX or a structured text format), then write code fragments following the structural organization. The final product is a file that contains both documentation and code, however its structure follows the logic of the documentation (the thoughts, the algorithm) and the code fragments are just embedded along. A (very) nice documentation can be produced from that file, and the code can also be assembled from the code fragments in the file. There are quite a few tools that support literate programming: cweb, noweb, etc.

However, literate programming never took off. And I don’t think it will ever take off. The major reasons, I believe, are the same as why programmers do not write good documentation and do not structure their code well. LP can be very useful if you need to write code for a complex (mathematical) algorithm, such as algorithms in optimization and machine learning. The code for these types of algorithms is often not long, probably a few dozens or hundreds lines, but it must be well thought out, otherwise its performance can be very bad or it may fail. For this type of code, its documentation may well be a scientific paper. LP ensures that you will have a good documentation of your code which can be examined from the mathematical perspective without the need to look at the code. The product code will have a higher chance to be correct-by-design.

To encourage and help programmers to write good documentation without much effort, many languages and tools allow rich-format documentation to be embedded inside code as comments (or using special syntax). Python, Ruby, Java, Matlab, and many other popular languages all have this feature. A drawback of this approach is that the output documentation must follow the logic of the code (machine), not the logic of thoughts (human). Thus, it is very useful for creating help files or manuals, but not so much for explaining the underlying algorithms.

To solve the code structuring issue, we will need a tool that helps us to organize the structure and hierarchy of our code easily. Note that code browsers that display code trees (classes, methods, attributes, etc) do not count because they are merely for quickly jumping to certain positions in your linear code. They are surely very helpful for coding, but probably not so much for designing code structure since they force you to think in the machine’s way. A very good tool for creating well-structured code is Leo editor. It is hard to explain how Leo works in a short blog post, so I would suggest you to read its tutorial and slides, then try it. I believe it is worth spending your time to learn this tool. I have discovered that Leo is also good for writing presentation slides in LaTeX/Beamer.

Researchers have a different need regarding code and documentation. They want to write scientific documents (usually articles, also technical reports and slides) with embedded plots or results from their experiment code (in Matlab, Python, R, etc). The usual work-flow is to run your code, export the plots or results to files, then include these files in your document. If you need to include 10 plots of the same code but with 10 different sets of parameters, you repeat that process 10 times. Whenever you need to change your code, you repeat the process. An approach to make this process easier is to embed executable code or commands in the document and have a tool automatically run them then include produced outputs in the document. Sweave, sagetex, org-babel are a few of these tools (org-babel, included in org-mode, is more than that and is very powerful). In most cases, the documents are written in LaTeX.

In summary:

  • If you write code that implements a complex mathematical algorithm, use Leo editor or Literate Programming. If you want a very nice document of your code, which can become a paper, do use LP.
  • If you write documents with embedded results/plots from your code, consider using a tool like Sweave, sagetex or org-mode.
  • Otherwise, use Leo editor if possible. Note that you can still use your favorite IDE with Leo. Leo is not a code editor but a tool to manage complex coding projects and to organize code in a structural way. It is a very good tool and it may change your coding efficiency forever 🙂
Advertisements
This entry was posted in Computer, Research and tagged , , . Bookmark the permalink.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s