Log in

No account? Create an account
Previous Entry Share Next Entry
Parsing CSV Files that exceed Available Memory
It is not to unusual to read very large files and D does have a few options one is std.stdio.File. But using std.csv requires a input range of dchar which isn't available in the standard library (that I know of) for files.

I'd done some previous work in providing a file as range in my OpenStreetMap project. I had to make a few modifications so that it would provide an input range of dchar. While I tried to make it able to read files encode in UTF-8, UTF-16, or UTF-32, my plans were not completed since FileRange would need to read multiple bytes for those not UTF-8.

If someone is more interested then it should be possible to remove the cast on line 70 and make a conditional which correctly builds the character size desired before returning it.

  • 1
File(name, "r").byChunk(4096).joiner.map!" cast(char)a".byDchar;

Untested and without consulting docs, maybe you need to do a map!"cast(ubyte[])a" before joiner.

That works, though it suffers from the same issue of not supporting UTF-16 or 32 text files.

  • 1