Uses and definitions

· Allanderek's blog


Sometime in the early part of the century, think about 2004, I saw a talk which analysed source code for defects. They had used, what was at the time fairly new, online source code repositories for open source programs. They determined that a piece of code was buggy if it was changed in a commit which fixed a bug. The authors analysed the code (which was all written in Haskell) to see if they could determine differences between 'buggy' code and 'non-buggy code'. I wish I could remember the authors or the title, but I cannot.

One of their results was that a use of a variable was more likely to be in error, the further it was from the definition of that variable. Again, I cannot remember the source talk, and I cannot remember how convincing the evidence or the argument was for this. But it must have been pretty convincing at the time since it's the one thing I remember from that talk and I still remember it now the better part of two decades later.

I use this to think about re-ordering definitions in let-in expressions. In particular I often end up with a final container (say an Html.div) with more or less just names:

 1let
 2    ...
 3    secondPart_A = 
 4        ...
 5
 6    secondPart_B = 
 7        ...
 8    secondPart_C = 
 9        ...
10    secondPart =
11        Html.div
12            [ Attributes.id "second-part" ]
13            [ secondPart_A
14            , secondPart_B
15            , secondPart_C
16            ]
17    ...
18in
19Html.section
20    []
21    [ heading
22    , firstPart
23    , secondPart
24    , thirdPart
25    , footer
26    ]

Why? Well, if instead I put half of the definition of secondPart inline within the final expression such as:

 1-    secondPart =
 2-        Html.div
 3-            [ Attributes.id "second-part" ]
 4-            [ secondPart_A
 5-            , secondPart_B
 6-            , secondPart_C
 7-            ]
 8Html.section
 9    []
10    [ heading
11    , firstPart
12-    , secondPart
13+    , Html.div
14+       [ Attributes.id "second-part" ]
15+       [ secondPart_A
16+       , secondPart_B
17+       , secondPart_C
18+       ]
19    , thirdPart
20    , footer
21    ]

Now the uses of secondPart_A, secondPart_B, and secondPart_C are much further way from their definitions than they were before. I could move those definitions down, but I can only do that for one of these parts.

Anyway the point is when I'm trying to make code tidying decisions like these, one of my metrics for deciding which is better, is which keeps my variable uses closer to their definitions. In addition, where you have some large section like this, you unavoidably have some variables used quite far from their definitions. However, if you keep such uses trivial they are far less likely to be in error. So notice in my first version, heading is necessarily far away from its definition, but the usage is trivial and highly unlikely to be in error.