Friday, June 13, 2014

First experience continued: itertools

In my previous post, I said, "I think you'd have to write the equivalent of the itertools functions (count and takewhile) yourself. As far as I can tell from a quick check, Swift has nothing like Python's yield-based generator functions, or generator expressions; the only way to do it is to create explicit classes that implement the Sequence and Generator Protocols."

Here is where I try to do that.

Sequence and Generator protocols

Swift has the same two basic protocols as Python, just with different names.

In Python, an Iterable is an object with an __iter__ method that returns an Iterator, which is an object with a __next__ method that generates successive values until it runs out, after which it raises StopIteration. An Iterator is itself an Iterable, which returns self for __iter__.

In Swift, a Sequence is an object with a generate method that returns a Generator, which is an object with a next method that generates successive optional values until it runs out, after which it returns nil. A Generator may be a Sequence, in which case it may return self for generate—which seems to be common in the stdlib—but this isn't required.

Creating iterables

There are four different ways to create an iterable in Python:
  1. Write a class with an __iter__ method that returns an instance of a class (possibly self) with a __next__ method. (Or do the equivalent from the C/Java/RPython/etc. layer.)
  2. Write and call a generator function. A generator function is a function with one or more yield (or yield from, but I won't explain that here) expressions. Calling a generator function returns a generator, which is a special kind of iterator. Each time the generator is asked for a value, the function runs until the next yield expression, then generates the value yielded.
  3. Use a generator expression, a type of comprehension that evaluates to a generator.
  4. Create an indexable sequence. (This is primarily a fallback, meant for old code.)
Swift only has the first of these. (Actually, it also has the fourth, but only for creating a Generator that isn't Sequence, as far as I can tell, which isn't all that useful.) This makes things very tedious.

Of course you can call someone else's code that does any of the above. Python's collection types are iterable (and, in some cases, have additional dependent iterables, like dict.items()). It has a couple of builtin functions, map and filter, that take one iterable and give you a different dependent iterator, and its stdlib has many other such functions (mostly in itertools). It also has a wide range of functions that create new iterables (from range to xml.etree.ElementTree.iterparse). And there are third-party libraries to give you even more options. Swift's two collection types give you iterables, and it has map, filter, and an equivalent to range… and that's about it. A large part of the work in Python 3.0 was pushing iteration throughout the language; it's probably the #2 reason (after Unicode) that Python 3 isn't just an incremental change to Python 2. A new language shouldn't have a problem doing this right from the start—as many new languages, like C# and Clojure, have proven. But for some reason, Swift has the kind of minimal iteration support that Python 2.3 had: the framework is there, but most of the functionality you'd want isn't.

But at least the framework is there, so someone could write his own implementation of everything in itertools, etc. And publish it as open source, so everyone else can use it as a CocoaPod or whatever, and then the problem is solved, right? So, let's see how much work it is. (It's worth nothing that some of Python's core developers thing itertools is not particularly important, because most of it can be written trivially with generator expressions or generator functions…)

Count

One of the simplest functions in Python's library is count. The docs show a trivial implementation in Python using a generator function, but even if you had to write a class, it would look as simple as this:
    class count:
        def __init__(self, start=0, step=1):
            self.start = start
            self.step = step
        def __iter__(self):
            return self
        def __next__(self):
            val = self.start
            self.start += self.step
            return val
In Swift, this is complicated by a few things.

The type inference works somewhat backward from the way you'd expect from C++; instead of telling it that you're creating a Generator of Int values, you tell it that you're creating a Generator, make it generate Int values, and it figures out how to fill in the associated types for you. For simple cases like this, that's actually kind of nice, but when you get to more complicated cases, as we'll see below, it's less so.

Also, Swift seems to discourage using classes as if they were functions, so instead of a count class, we'll have to build a Count class and a count function that constructs a Count. (In C++, you would do this for a reason—to allow function template specialization to help you with type inference. In Swift, it seems to be just boilerplate for its own sake.)

Meanwhile, Swift has some nifty features for reducing boilerplate—automatic constructors, optional parameters with default values instead of multiple overloads, etc. But unfortunately, you can only effectively use one of these at a time, because they don't work together nicely.

Finally, in Swift, struct and class have many differences, and it's not always clear which one you want. From the descriptions, a class really sounds like a better idea here—when you pass a Count around, you want the same counter, with shared state, like in Python, right? However, I can't see any way to get any of the more complicated Generator types to work as classes; I either get baffling compiler errors that don't seem to apply to the code at all, or compiler crashes. And most of the stdlib and example types seem to be structs, so… let's go with that. So:
    struct Count: Sequence, Generator {
        var start: Int
        var step: Int
        init(start: Int = 0, step: Int = 1) {
            self.start = start
            self.step = step
        }
        func generate() -> Count {
            return self
        }
        mutating func next() -> Int? {
            let val = start
            start += step
            return val
        }
    }

    func count(start: Int = 0, step: Int = 1) -> Count { 
        return Count(start: start, step: step)
    }

Takewhile

One of the simplest iterator-transforming functions is takewhile. This just generates elements from a source iterable until a predicate returns false. Here it is in Python, as a class (and switching the parameter order, to match what appears to be the opposite standard in Swift):
    class takewhile:
        def __init__(self, iterable, predicate):
            self.iterator = iter(iterable)
            self.predicate = predicate
        def __next__(self):
            val = next(self.iterator)
            if self.predicate(val):
                return val
            raise StopIteration
This one is a lot harder to write in Swift.

First we have to decide whether this takes a Sequence (Iterable) or a Generator (Iterable). Because not every Generator is a Sequence, this could be a potential problem. We could try to take both, but with static typing and no overloads, that's easier said than done. It looks like map and enumerate take a Sequence and return a Generator that's also a Sequence, so let's do that.

Part of this is because of Swift's strange system based on associated types for its protocols—there's no protocol schema Generator, there's instead a single protocol named Generator, which has a typealias named Element, so you need to parameterize on both Generator and T, with a where clause to link them up. This should be somewhat familiar to anyone who's done C++ templates, but very puzzling to anyone who's worked with C#, Java, etc.

The other problem is that, as mentioned earlier, Swift's type inference goes in the easy direction, forcing you to fill in the hard direction. We want to take a Sequence, and act as a Sequence of the same element type. How do we do that? First, we have to parameterize on both the Sequence type and the element type, with a where clause to link them up. Then, our generate function can't figure out its type just because we return self; we have to specify it. Fortunately, in the simple case we're dealing with here, our generator type is just S.GeneratorType; in any more complex case you have to parameterize on a third type that never changes just so you can where-match its associated types (which also means you end up matching on and then exposing a type that your users don't—and, in fact, can't—care about).

Also, if you get anything wrong here, the error messages are baffling, like "type 'TakeWhile<$T0, $T1>' does not conform to protocol 'T'" if you accidentally use Generator where you should have used S.GeneratorType. But at least you don't get 30 lines of template expansion in the error message like most C++ compilers (or, worse, get code that compiles until you try to instantiate it with one of the rare types whose GeneratorType isn't just Generator).
    struct TakeWhile
            : Sequence, Generator {
        var gen: S.GeneratorType
        let pred: T->Bool
        init(sequence: S, predicate: T->Bool) {
            gen = sequence.generate()
            pred = predicate
        }
        func generate() -> TakeWhile {
            return self
        }
        mutating func next() -> T? {
            if let val: T = gen.next() {
                if pred(val) { return val }
            }
            return nil
        }
    }

    func takewhile
                  (sequence: S, predicate: T->Bool) 
                  -> TakeWhile {
        return TakeWhile(sequence: sequence, predicate: predicate)
    }
I have no idea what the idiomatic way is to indent something like this. It's as verbose as Objective C and C++ put together, but all of the examples are so simple that they provide no guidance. But anyway, I think this is reasonably readable.

So, now we can use it. I'll ignore the argv mess, and just compare the Python (not idiomatic Python with generator expressions, of course, but the filter function) and Swift versions of the key part of the code:
    primes = filter(make_isprime(), count(2))
    limited = takewhile(lambda x: x <= limit, primes)
    print(' '.join(map(str, limited)))

    let primes = filter(count(start: 2), make_isprime())
    let limited = takewhile(primes, { $0 <= limit })
    let s = " ".join(map(limited, { $0.description }))
    println("\(s)")

Cleanup

If you exhaust a TakeWhile, it still has a reference to the generator it depends on. If you chain together a whole mess of iterator transformations, they're all going to stay alive as long as the last one does, even after they're consumed. Experience with Python shows that this can be a lot more of a problem with refcounting than with "real" GC, especially once you start using tools like chain.

Final thoughts

From here, it shouldn't be too hard to port the other iterator-transforming tools in itertools. Whether the combinatorial and other functions in itertools, the rest of the stdlib, and libraries like more-itertools will be as easy is another question.

3 comments: