Tuesday, May 1, 2012

Large Files

It's been a while. I'm looking for a good understanding of the memory leaking in GHC haskell. In general things that should be compiled in a simple manner tend to start turning into unevaluated chunks of stuff until some value is needed. That's great, lazy is handy sometimes, but sometimes it's just bad. The compiler should be making smarter choices about optimizing, there should be some sort of option to let it know you want to minimize memory usage anyway. As it is it always optimizes for speed, or code size or something. That leads to refactoring code in the hope of making it all behave better, one liners turn into massive libraries and weeks trying to understand how someone else managed to make something they think works, when a lot of time it's really experimental. There are dozens of versions of libraries about dealing with long files, almost all aren't tested, none seem to have any ease in using them. Reading values and processing something from a file should be easy, just sequence it all one chunk or line at a time, it shouldn't always try to force huge memory usage. This is the sort of thing that makes people say haskell is a toy language.