In computer software, less is more. If you're looking at programs with a certain set of capabilities, often the best will perform those functions with the fewest number of underlying ideas. Software of that type will be easier to learn because there's less to get your head around, and it will have the most potential because it can accomplish a lot by putting together just a few different kinds of things in interesting ways.
This observation is true for all of software engineering, from applications to APIs down to computer languages themselves. The best software designs will have fewer working parts, with each part having well-defined connections to a minimal subset of other parts. Such systems will be generally more powerful and more robust to changes than systems made from many different types of highly-interrelated parts. Think of this preference for parsimony as the Occam's Razor of engineering: employ the minimum number of entities for a task, and no fewer.
Object-oriented design and computer languages are well-suited to parsimonious design. While there may be many specific classes for the many types of objects used in an application, those typically derive from a much smaller group of more abstract ancestors. These root types and their relationships describe the core of the system in increasingly general terms. For example a graphics program might have classes for boxes, circles and lines, but all of them would derive from a more abstract "shape" class, and the other parts of the system would interact with them all as shapes. The more that functionality can be pushed down into more abstract types (without pushing it too far), the more robust the whole system will be to adding new types or new capabilities.
Object-oriented computer languages take this idea down to the foundation, defining what types of entities are possible at all. Ultimately all objects derive from some base "object" type, with the core features and functionality that all objects share. While programmers can define new types there must also be a core set of types that are implemented by the language itself, the common currency allowing the language to run on a physical substrate. These built-in entities are called intrinsics. We can learn a lot about a language by its intrinsics.
In more "pure" object-oriented languages, the object is the only type of entity that exists at all. Everything, including numbers, are understood as being objects. The expression "6 + 4" is interpreted as "the plus method on the object 6 is called with the object 4 as argument." While this may seem a little odd at first, the only other interpretation is that there is this one type of thing, the object, with its methods, and there's a different thing, the number, with its operators. In an abstract sense they're the same, and yet they are incompatible. We've multiplied entities when we didn't need to, and we've made the language more complex.
C++ is actually much worse than just this. Not only does C++ treat numbers differently from objects, the many types of numbers are incompatible with each other in important ways.
The intrinsics in C++ are inherited directly from C. Numbers come in floating-point and integer types, each in several different sizes (total range or precision). Integers can also be signed or unsigned, and some compilers add their own integer types for machine-specific word sizes. This can be represented as something like a class hierarchy:
char | short | int | long
signed_int
integer
unsigned
char | short | int | long
number
float
real
double
In many ways this appears reasonable. The operators that work on each specific type are derived from their super types. All numbers support arithmetic operators, integers support things like bit shift and modulo, and floating-point numbers support square root and trig functions. The C compiler will also convert between the different types so any reasonable formula will translate into code regardless of the exact types of the variables and constants. The problem is while the types are implicitly hierarchical, only the terminal types -- the ones in bold -- actually exist. As result any function that operates on an integer must have an argument declared as one of the eight specific integer types. Conversion from the local type to the type required by the function is done at the site of the call.
In old-style C code this issue is handled mostly by convention. Integer arguments would be declared in the form most reasonable for the function, and any failures due to conversion would be blamed on the caller. Math functions would always be written for doubles, because that would prevent the potential loss of precision. (In fact some old C compilers would cast all floating-point arguments to doubles regardless of how they were declared.) Of course doubles are slower when you only need single-precision, so often APIs would include versions of the same function taking a float argument, usually with a similar name but including an "f". Again, callers were left with the burden of type conversion on their end.
Giving callers that responsibility is arguably the worst way to do it, so C++ needed a better answer. But like so many of C++'s "solutions" to the problems of C, it turned out to be half-baked.
In C++ functions can have the same name but with different arguments, and the compiler will call the function that best matches the arguments and their datatypes. This moves the burden from the clients of an API to its author, but it imposes a substantially larger burden on the author in the process. Providing float and double versions of math functions is a huge pain, usually involving writing a function template and then invoking the template from the different entry points.
The correct solution, as indicated above, is to treat numbers as objects. Hard-core C programmers may blink in surprise at the thought, and wonder about how it could work internally. There are a couple of levels of support, and I have some ideas about implementation, but that's not the point of this post. The real point is to show how C++, by only dipping a toe into the object-oriented pool, has left everyone worse off.
Here are some things we could do with numbers as objects:
1) Write one function for all numbers. I could write a single function declared as "number my_func(number k, real r)", encode math and other operations over those arguments, and return a result. Clients could call it with any suitable value.
2) Subclass our own numbers. If I wanted to create an exotic type of number and call existing math functions on it, why couldn't I?
3) Add methods to numbers. I could create an "angle" subclass of real that could have additional operations appropriate for angles but not other numeric types.
4) Add new number classes. The canonical example is complex numbers. STL adds an implementation using templates which is unsatisfying on many levels. Why can't complex numbers be treated the exact same way a real numbers?
5) Finally, all those exotic number types native to specific architectures, like huge ints or SIMD vector values, could also exist in the same object class heirarchy. Any function compiled for "number" would work on them as well.
- jack*
Nice blog and nice post :)
Posted by: Accountants in London | February 29, 2012 at 05:47 AM
C++ = bloatware ... nuff said.
Posted by: Bob | May 23, 2012 at 08:43 PM