Strict vs Lazy ByteString
The Haskell ecosystem’s preferred way of representing binary data is the ByteString
type. The bytestring
introduces its two variants like so:
- Strict
ByteString
s keep the string as a single large array. This makes them convenient for passing data between C and Haskell.- Lazy
ByteString
s use a lazy list of strict chunks which makes it suitable for I/O streaming tasks.
Otherwise it has little to preference to express between strict ByteString
s and lazy ByteString
s.
The broader library ecosystem often offers functions for working with both lazy and strict ByteString
s. For example, in aeson
exposes decode
which decodes a json value from a lazy ByteString
in addition to a strict variant decodeStrict
. From this, it’s reasonable to assume that lazy ByteString
is the default, because aeson
exposes it unqualified, while relegating strict to a section with “Variants for strict bytestrings”.
However, as a general rule, I recommend sticking to strict ByteString
s as a default:
- more memory efficient: lazy
ByteString
s include extra bookkeeping overhead for maintaining its list of strictByteString
chunks. - reads are faster: in a strict
ByteString
, reading any position in the string is just reading from an offset in memory. For a lazyByteString
its necessary to follow pointers for values later in the bytestring. O(1)
conversion to lazyByteString
s: converting a lazyByteString
to a strict one isO(n)
, requiring copying the lazyByteString
’s chunks into a single contiguous block of memory. StrictByteString
s don’t need to be split into smaller chunks, the existing memory reused.- avoid lazy IO: while often convenient seeming, lazy IO leads to hard to diagnose bugs, like
IOException
s being thrown from pure code. You’re generally better off using a streaming framework likeconduit
,pipes
orstreamly
together with strictByteString
s.
Output oriented APIs, like serialization, are the one scenario where lazy ByteString
s make a good amount of sense:
- builders produce lazy
ByteString
s:Data.ByteString.Builder
only exposestoLazyByteString
for converting aBuilder
into an actualByteString
. It takes anO(n)
conversion to turn the result into a strictByteString
. - concatenation of lazy
ByteString
s is more efficient: lazyByteString
’s list of chunks allows appending new chunks without copying entire buffers of data. Use a builder instead of concatenating lazyByteString
s directly to avoid creating lots of small inefficient chunks if possible though! - APIs for encoding/serializing often produce lazy
ByteString
s: because of the above its common for serialization operations likeaeson
’sencode
to only expose versions producing a lazyByteString
.