Friday, May 29, 2009

C++ super simplified

Caution: This blog post is the result of a long and boring flight. Also, my "C++ age" is little over an year, same being true for Python as well.

In the beginning there was C. C has a statically typed system which the compiler uses to verify type safety at compile time. It is possible to evade type checking by making raw memory accesses, but it's rarely a good idea to do so. Compare this with Python. In Python, type checking happens at runtime. It is type safe, but you find out violations only at runtime.

A language's type system allows you to write code to an interface. Programming to an interface allows different components of a system to evolve separately, one can be changed without affecting the other. However, different languages, or specifically C, C++, and Python differ in how the interface is enforced.

In traditional C, all code is checked against a predefined interface. It is generally tough to alter implementations without affecting multiple components or breaking the interface. Function pointers offer one way of doing this but the language lacks features to expose their full potential to the programmer.

C++ formalizes and streamlines function pointers using polymorphism. Polymorphism allows multiple implementations of the same interface. The different implementations can be attached to an object at runtime and they are all type-safe since the compiler verified it so at compile time. Thus, C++ increases the life of an interface and improves isolation between components.

This in some ways is similar to Python's dynamic typing. Python programs are written to an implicit interface which provides maximum isolation between components and expands the possible implementations of an interface.

For example, consider the following piece of Python code:

def fn(arg1, arg2):
if (arg1.valid) {
    return arg1.x;
} else {
    return arg2.y;
}

This method is programmed to an implicit interface of arg1 and arg2, viz., that arg1 have fields 'valid' and 'x', and arg2 have field 'y'. Beyond that it doesn't care about what else is inside arg1 and arg2. Furthermore, it can accept far more implementations of arg1 and arg2 than say similar C code.

This fashion of defining interfaces between components is definitely more generic than polymorphism in C++. In particular, it allows for a richer set of objects to be passed between components (and hence improves component isolation ?).

Fortunately, C++ provides a similar framework, viz., templates. A template is essentially programmed to the implicit interface of the template parameters. Just like Python, the C++ compiler does not care about properties of objects outside of those used in the template code. Still, the compiler verifies that the implicit interface used in the template is valid for each instantiation of the template (strictly speaking, it is possible to "control" C++ type checking of template code using various tricks, e.g., by offloading the checking to compile time (this->...) or by "using the using directive". See Scott Meyers Item #43 for details).

So, C++ does come close to Python's implicit interfaces and dynamic typing with features such as templates and polymorphism, albeit with their syntactic weirdness. Well, that explains most of C++, except "overloading" of course. I assert that the sole purpose of overloading is to facilitate template programming. All other uses of overloading can be attributed to programmer laziness (aka convenience). For example, consider the following code snippet.

struct A {
    int x;
};

struct B {
    int y[10];
};

template <>
void fn(A_or_B arg, others) {
if (/* found A */) {
    arg.x = 10;
} else if (/* found B */) {
    arg.y[0] = 10;
}


Will this compile ? No. Is this something you may want to do ? Maybe. The solution (to make it compile) is to use overloading.

template <>
void fn(A_or_B arg, others) { do_fn(arg); }

void do_fn(A arg) { arg.x = 10; }

void do_fn(B arg) { arg.y[0] = 10; }

Since all code inside a template must adher to the *same* implicit interface, an object whose implicit interface is a proper subset of the template's implicit interface may not compile even though the usage is logically correct. The solution is make the template's implicit interface the intersection of implicit interfaces of all possible type arguments and move type-specific code into overloaded methods/classes.

No comments: