What Makes For Good Functions?
Since functions are used for many different purposes, what makes any function good depends on the context. Imagine trying to apply CQS to a main function. I’m not referring to every function everywhere. I’m referring to what makes a function work well as a building block.
TLDR: matches problem complexity, helps clients (honest type signature), no side effects
Complexity
Complexity for our purposes is just how much we need to understand in a given scope.
compare:
static void PrintItems(int[] items) {
for (int i = 0; i < items.Length; i++) {
Console.WriteLine(items[i]);
}
}
and
static void PrintItems2(int[] items) {
foreach (var item in items) {
Console.WriteLine(item);
}
}
The second is simpler because there is less going on. It directly describes iterating over a collection instead of an i variable, indexing a collection, and incrementing a variable. One concept vs three.
Consider:
public virtual bool IsBar {
get { return !Index.HasValue && IsEnabled; }
}
public virtual bool IsFoo {
get {
if (!IsEnabled || (!Index.HasValue && !IsDetailed)) {
return true;
}
else {
return false;
}
}
}
How many tests do need to run to see exercise all this? If we count branches, it’s 6.
this is equivalent to:
public bool IsFoo {
get {
return !this.Index.HasValue && this.IsEnabled;
}
}
How many tests for this? 3?
Index.HasValue | IsEnabled | Result |
---|---|---|
true | * | false |
false | true | true |
false | false | false |
All problems have some amount of inherent complexity. A good function matches that complexity. This is similar to reducing a truth table.
Building Blocks
The point of a function is to combine several lower level things into one understandable chunk. A good function is:
- Understandable in isolation (which means possible to completely understand)
- Easy to test (which means we get automated testing, which acts as a spec and helps us to remember business decisions)
- Combinable without nasty surprises, which means it can be used in a lot of places safely.
Totality and referential integrity are two properties of functions that make for good building blocks.
Partial vs Total Functions
Functions map from input to output. Function signatures indicate the set of possible values containing the input and the set of possible values containing the output.
int Add1(int x) { return x + 1; }
The signature here indicates the function will take an integer (~-32k - ~32k) and return another integer.
> int Add1(int x) { return x + 1; }
> Add1(1)
2
> Add1(Int32.MaxValue)
-2147483648
>
We can see that this function does indeed take any integer, and return an integer. This is a total function. The signature is honest and means what it says.
int Div(int x, int y) { return x / y; }
This signature says given 2 ints, it’ll return an int. Is this true?
> Div(5, 0)
Attempted to divide by zero.
+ Submission#50.Div(int, int)
No. This is a partial function. This function does not do what it says. If the signature were honest it would be:
int OR Exception Div(int x, int y)...
The partial aspects of functions are great places for bugs to hide. Total functions are much easier for clients to use correctly for two reasons. They encapsulate the implementation much more cleanly, and clients don’t need to understand in detail how the function is written. Also, the compiler helps to point out potential problems. This means they can be combined much easier, and fewer problems show up only when functions are combined.
Referential Transparency (aka “purity”)
This just means that there are no side effects, and that it doesn’t depend on the environment. The technical definition is that you can replace the function with its value without observable differences.
int Square(int a) {
return a * a;
}
client:
var x = Square(2); //4
var z = Square(2);
is the same as:
var x = Square(2);
var z = x;
Contrast with
int Foo(int x) {
int y = 0;
var rand = new Random();
for(int i = 0; i < x; i++) {
y = rand.Next();
}
return y;
}
client:
var x = Foo(2);
var y = Foo(2);
is different from:
var x = Foo(2);
var y = x;
Foo is NOT referentially transparent, and everything else being equal is worse. The inputs are the params + something else out in the world somewhere. Prefer to have functions that don’t have “something out in the world somewhere” as a hidden input, because referentially transparent functions are really nice to use and understand.
Moral?
For our programs to actually work, we need side effects, and sometimes totality isn’t worth it. By solving the core problem with total, referentially transparent functions, and pushing side effects to the boundaries, we can solve most problems, and maintain the benefits of totality and referential integrity. In practice this typically means separating input and output from processing.
Complexity can be reduced by
- matching the complexity of the problem, and
- splitting it into pieces that can be understood in isolation. Total, referentially transparent functions make great building blocks because they’re easily understood in isolation and don’t have surprising behavior when combined with other components.
Next up: What makes a good class?