Sunday, June 15, 2014

Repetition, repetition

In my previous post, I started porting parts of itertools from Python to Swift. Everything went pretty smoothly, but not exactly fun, because there's a ton of boilerplate and no way that I can see to factor it out.

Weak types

Each itertools function requires creating both a struct and a function, a minimum of 17 lines, 14 of which are identical, the other 3 identical but for naming our struct. This is exactly the kind of thing you'd want to factor out. But can you?

First, every Sequence-transforming function actually requires a struct for the new Sequence, plus a function that constructs that struct. This takes a minimum of 17 lines of code, of which 14 are always identical, and 3 are identical but for the name of the struct. And there doesn't seem to be any way to remove that by inheriting from a base struct or mixin, or by writing a function that generates the struct and function and returns the function, or in any other way. So you have to write them every time. (You could, of course, write a code generator, but Swift isn't exactly Lisp, so that is of little help.)

Even in the simplest case, the generate method always looks like this:
    func generate() -> Cycle {
        return self
    }
There's no way to specify Cycle<S, T>, the type currently being defined. Although self does mean "the current type", you can't use it here. In C++, you'd make each subclass define a typedef this_type, then use that in the base/mixin, but that doesn't seem to work with Swift.

The lack of a way to write forwarding functions makes this even more painful. You can declare an associated type and where-bind it to the type of a function that you take as an initializer argument; you can even where-bind its argument type tuple and its result type separately. But that still doesn't help you define a new function of the same type that takes all of its parameters, forwards them to the other function, and returns its return value. ython makes this possible through duck typing (just accept *args, **kw and call the target function with *args, **kw) and dynamic attributes (copy the signature-related attributes from the target function for better introspection). C++ and Haskell make this possible through strong types and type inference. (The fact that generic forwarding wasn't perfect in C++ (because of the issue of reference vs. value parameters) is one of the major reasons C++11 was necessary.)

The struct's initializer always takes the arguments and stores them, except for storing sequence.generate() in the case where one or more of the arguments is a sequence (which could be moved later if it helped), so you'd think you could just use a default initializer. But no, you can't, because there are almost always additional ("private") members that shouldn't be initialized that way, but would be if you tried.

Let's take two simple functions, takewhile and filter (of course filter already exists, but pretend it didn't), whose difference is entirely in the next function, and yet which still require repeating 17 lines of code (replacing the generic name in 3 of those lines) just so you can write the 4 different lines:
    struct TakeWhile
            : Sequence, Generator {
        var gen: S.GeneratorType
        let pred: T->Bool
        init(sequence: S, predicate: T->Bool) {
            gen = sequence.generate()
            pred = predicate
        }
        func generate() -> TakeWhile {
            return self
        }
        mutating func next() -> T? {
            if let val: T = gen.next() {
                if pred(val) { return val }
            }
            return nil
        }
    }

    func takewhile
                  (sequence: S, predicate: T->Bool) 
                  -> TakeWhile {
        return TakeWhile(sequence: sequence, predicate: predicate)
    }

    struct Filter
            : Sequence, Generator {
        var gen: S.GeneratorType
        let pred: T->Bool
        init(sequence: S, predicate: T->Bool) {
            gen = sequence.generate()
            pred = predicate
        }
        func generate() -> DropWhile {
            return self
        }
        mutating func next() -> T? {
            while true {
                if let val: T = gen.next() {
                    if pred(val) { return val }
                } else { 
                    return nil
                }
            }
        }
    }

    func filter
                  (sequence: S, predicate: T->Bool) 
                  -> Filter {
        return Filter(sequence: sequence, predicate: predicate)
    }
In Python, of course, you'd write these as generator functions, eliminating almost all of the boilerplate. And, even if you didn't, other differences (e.g., not needing types with wrapper functions, exceptions vs. checking and explicitly propagating failures) would eliminate a lot of it. But even if you wanted to write them in the same style and verbosity that Swift requires, you could still factor out the boilerplate after the fact. For example:
    class PredicateIterator:
        __slots__ = (gen, pred)
        def __init__(self, sequence, predicate):
            self.gen = iter(sequence)
            self.pred = predicate
        def __iter__(self):
            return self

    class TakeWhile(PredicateIterator):
        def __next__(self):
            try:
                val = next(self.gen)
            except StopIteration:
                raise
            if pred(val):
                return val
            raise StopIteration
    def takewhile(sequence, predicate):
        return TakeWhile(sequence, predicate)

    class Filter(PredicateIterator):
        def __next__(self):
            while True:
                try:
                    val = next(self.gen)
                except StopIteration:
                    raise
                if self.pred(val): return val
Or:
    def make_predicate_iterator(nextfn):
        class PredicateIterator:
            __slots__ = (gen, pred)
            def __init__(self, sequence, predicate):
                self.gen = iter(sequence)
                self.pred = predicate
            def __iter__(self):
                return self
            __next__ = nextfn
        def predicate_iterator(sequence, predicate):
            return PredicateIterator(sequence, predicate)
        return predicate_iterator

    def _takewhile_next(self):
        try:
            val = next(self.gen)
        except StopIteration:
            raise
        if pred(val):
            return val
        raise StopIteration
    takewhile = make_predicate_iterator(_takewhile_next)

    def _filter_next(self):
        while True:
            try:
                val = next(self.gen)
            except StopIteration:
                raise
            if self.pred(val): return val
    filter = make_predicate_iterator(_filter_next)
And so on. Any way you can think of to refactor things—using a base class or mixin or metaclass, or a function that creates a class or closure and returns it, etc.—will work. Not so in Swift.

Defaults and types

Consider a standard fold/reduce/sum function. The fully general version takes a sequence, a combining function, a start value, and a transforming function, like this:
    func reduce(sequence, combine, start, transform) {
        for value in sequence {
            start = combine(start, transform(value))
        }
        return start
    }
But often you don't want to specify all of those values. For example, it's very convenient to sum an array without specifying 0 as the starting value. Or to reduce a list without transforming any of its elements. And so on.

In a language without neither overloads nor default values, your only choice is to figure out all the useful combinations, come up with a name for each, and write a whole slew of separate functions:
  • sum1(sequence)
  • sum(sequence, start)
  • sum1t(sequence, transform)
  • sumt(sequence, start, transform)
  • fold1(sequence, combine)
  • fold(sequence, combine, start)
  • fold1t(sequence, combine, transform)
  • foldt(sequence, combine, start, transform)
But Swift has both overloads and C++-style default values, right? Well, no, that's for methods; for functions, there are no overloads, and default values work like Python instead. But that's fine. Could you write this all as one function with default values in Python? Sure:
    _sentinel = object()
    def fold(sequence, combine=lambda x, y: x+y, start=sentinel, transform=lambda x: x):
        sequence = iter(sequence)
        if start is _sentinel:
            start = transform(next(sequence))
        for value in sequence:
            start = combine(start, transform(value))
        return start
You might want to split off the sum and reduce cases anyway, just so sum([]) can be sensible (and zero—by specifying a default start of 0 instead of the first element), as Python's stdlib in fact does, but the point is, you have the choice to split things up however you want. In Swift, you don't. Let's try:
    func fold
            (sequence: S, combine: (T, T)->T = { $0+$1 }, 
             start: T? = nil, transform: T->T = { $0 }) -> T {
        var gen = sequence.generate()
        var current = (start == nil) ? transform(gen.next()!) : start!
        while let value = gen.next() {
            current = combine(start!, transform(value))
        }
        return current
    }
Notice a few other changes I had to make, from the good through the bad to the worse:
  • Swift's optional values mean I didn't need to define a custom sentinel (which is necessary in Python whenever the usual sentinel, None, could be a valid value in your domain).
  • Swift can't iterate over a generator, only a sequence, if you want to pop a value off before the loop, you have to replace the for loop with a while-let-next loop. (The average Python programmer doesn't know how to map a for statement to a call to iter and a loop over next; in Swift, it's pretty much necessary information.)
  • Swift won't let you assign to an in parameter. Parameters might be reference types, in which case you'd be changing the caller's variable, or they might be value types, in which case you'd just be changing a local—and this is pretty much invisible at the point of function definition, because it's up to the parameter type. This seems like an unavoidable consequence of other design decisions in Swift.
  • We can test that start is nil, and we can test that the generator is empty… but what happens if start is nil and the generator starts out empty?
In the code above (if it worked at all, that is), calling fold with no start and an empty generator would be a fatal error. That can't be right. Unexpected data shouldn't be a fatal error. In Haskell, Python, C++, C#, etc., it would be an exception. Clearly, then, the intention is that we're supposed to be using optional values a lot more in Swift than we'd use Maybe in Haskell, None in Python, nil in ObjC, etc. But look at the line of code. We can't use an if-let to try to bind current to start, because if that fails we have no way to bind it to gen.next(). Whatever we use, it can't be an expression (unless we cant to make current a T? just so we can check it immediately after binding it and return nil if it's nil and otherwise ! it everywhere). The most concise possibility I've been able to come up with is this hideous mess:
    var current: T
    if start == nil {
        if let firstval = gen.next() { 
            current = firstval
        } else {
            return nil
        }
    } else {
        current = start!
    }
This may be a sign of my lack of understanding of the language, but this, together with the while-let-next loop, seems to point to the fact that Swift is not actually a language designed to be concise, like Python or Haskell; it's a verbose language like Java or C++, with bits of syntactic sugar that paper over that verbosity in some cases, which means you're frequently going to be forced to write—and, worse, read—the long versions anyway, and that even a novice has to understand how to map it pretty early on.

But let's ignore all that. Does our function work? Nope: "could not find an overload for '+' that accepts the supplied arguments". The only way to fix that is to restrict it (via more where clauses for associated types) to only work on numbers. You can't write a fold function that has a default function for numbers, but not for other types. You have to write two separate functions, with separate names. In Python, of course, you'd just duck type it, and anyone who tried to fold a non-addable type without specifying a combine function would get a TypeError exception at runtime. In C++, you'd have many options—partial specialization, overloads and SFINAE, or a trait class template—so that the function would be perfectly usable with non-addable types (as long as you specify a combine function, directly or indirectly) and with addition as a default combine function (as long as the type is addable). Other strongly-typed languages like Haskell can do effectively the same as in C++, while most other modern static languages that don't have strong enough types fake it by raising an exception dynamically, just as Python would (theoretically a terrible solution, but practically, better than nothing).

I think this is the biggest problem with Swift. The intent was obviously to build a strictly statically-typed language that doesn't rely on exceptions, but without all the complexity of C++ templates or Haskell abstract data types. But the reality is that Java-style generics, especially interacting with C++-style associated types, aren't strong enough to make that actually work. Again, maybe I just don't know the language well enough, but I'm not encouraged.

Anyway, how does this fit into my attempt to port itertools? Well, it just means accumulate has to be split into two functions, accumulate (which requires a reducing function), and cumulative_sum (which requires addable types).

No comments:

Post a Comment