Previous Entry Share Next Entry
Migrating to Ranges Part 2
anime
he_the_great

Previously I talked about wrapping arrays in base range types. When having coded for arrays there can be some functionality that must change to conform range behavior. That is, I'm going to talk of some things what happen once a proper limited range type is used.

Input Range

If you've chosen the input range to be the most functionality your algorithms will use, things are fairly simple as pretty much everything is avoided. These things just tend to come naturally as they are very idiomatic D.

Indexing: Accessing elements must be done through front(), accesses such as arr[0] are trivially converted. If you're doing access in other locations than you'll likely need to consider algorithms which make use of std.algorithm.find/until/filter or maybe a RandomAccessRange is needed instead.

Slicing: In this case there are some from std.algorithm until/startsWith/splitter/findSplit however std.range.take/chunks/stride/popFrontN could be of value here. Or maybe checking that the range hasSlicing should be preferred.

Saving: This is a nasty one because most ranges end up being a "Forward Range." Arrays and most ranges built with a struct will copy their state resulting in a "save" type action. Passing the range to a function results in an implicit save. If this is desired then simply add a call to save() possibly storing in a local variable as many range operations will expect an rvalue.

Length: If you're requesting the length then it is possible that previous changes have actually already removed this need, and if not maybe there are such changes which can do so. Else look into adding the requirement of needing a length property. If a performance hit is acceptable look at using a Forward Range instead.

Forward Range

Same as an Input Range. This would be chosen when the algorithm requires examining the range multiple times. Obviously reducing the times over each element is important, but it can't always be avoided.

Saving: This shows up again because you'll want to fix all those implicit "save by copy" and throw in proper save() calls.

Length: If a performance hit is acceptable then std.range.walkLength will provide the generic means to obtain a length (remember to call save if you don't want the range to be consumed).

Has Slicing

A slice of an array has two benefits, one is that it can be assigned to itself and the other is that the result is an array. Consider a range of ubyte where the range has slicing. Slicing the range results in a range of ubyte so don't expect it to work with something which takes an array.

Example

Let me go over a very simple example written by myself and is the reason for this post.

dlang
    // serialized BlobHeader message
    auto osmData = datastream[0..size];
    datastream = datastream[size..$];
    auto header = BlobHeader(osmData);

Not too complicated, grab some data in the front, move the range past that data, and then deserialize the data. But it fails horribly when trying to conform to range primitives.

Initially I didn't pay close attention to the definitions for the range types, so I updated my code according to my incorrect FileRange

dlang
    auto osmData = datastream[0..size];
    datastream.popFrontN(size);
    auto header = BlobHeader(osmData);

Here I've got the definition of a slice wrong. Taking a slice was returning a ubyte[] and not a FileRange (a range of ubyte). But now I'm actually wanting this code to conform to these range types so I correct the slice and have to update again.

dlang
    auto osmData = datastream[0..size].array;
    datastream.popFrontN(size);
    auto header = BlobHeader(osmData);

BlobHeader takes a reference to a ubyte[], I haven't gotten arround to updating ProtocolBuffer to work with ranges yet. For this reason I turn my slice into an array with the convenient std.array.array() function. But what about that popFrontN() I only made that change because the slice didn't return the same type as its originator. With that fixed I should be able to return to the old code.

Wrong. datastream[size..$] would fail because the range doesn't provide opDollar(). There was mention in the news groups that if length is implemented then the range should get opDollar for free. This is needed, because really any range that supports slicing should support opDollar (and hasSlicing already requires hasLength).

Now lets take another look at fixing up my code.

dlang
    auto osmData = datastream.take(size).array;
    datastream.popFrontN(size);
    auto header = BlobHeader(osmData);

Cool, I've just removed the need for slicing! However there is a bug. Following the call to take() I'm popping off whatever size I just took. The question is, should I do this or not? The answer... it depends. Is my datastream range a fake Forward Range (i.e. is it a value range) or is the range a reference. With a reference range my call to take().array will have already removed size items from my range, while a value based range (which I'm using) will have its copy loose the items leaving the original datastream in tacked.

Answer: Please fill in the blank _________.

Conclusion

Most of the time it is possible to use a range which only provides the Input Range primitives. Arrays provide many convenient abilities, but several Phobos functions can provide similar convenience. The value vs reference range types just make things annoying. Most Phobos functions take the range by reference making the distinction unimportant (maybe take should too).

Note that if take() were to take by reference consider this code:

dlang
    auto osmData = datastream.take(size);
    datastream.popFrontN(size);
    auto header = BlobHeader(osmData);

In this case popFrontN would have moved the start of osmData and the call to BlobHeader (which magically takes a range :D) will not be parsing the correct data. Since BlobHeader would evaluate the range eagerly the popFrontN should just be removed.


?

Log in