Foundation
A new hope. Foundation is a new library that tries to define a new modern Haskell framework. It is also trying to be more than a library: A common place for the community to improve things and define new things
It started as a thought experiment:
What would a modern Haskell base looks like if I could start from scratch ?
What would I need to complete my Haskell projects without falling into traditional pitfalls like inefficient String, all-in-one Num, un-productive packing and unpacking, etc.
One of the constraints, that was set early on, was not depending on any external packages, instead depending only on what GHC provides (itself, base, and libraries like ghc-prim). While it may sound surprising, especially considering the usually high quality and precision of libraries available on hackage, there are many reasons for not depending on anything; I’ll motivate the reason later in the article, so hang on.
A very interesting article from Stephen Diehl on production, that details well some of the pitfalls of Haskell, or the workaround for less than ideal situation/choice, outline pretty well some of the reasons for this effort.
Starting with the basic§
One of the few basic things that you'll find in any modern haskell project, is
ByteArray, Array and packed strings. Usually in the form of the bytestring
,
vector
and text
packages.
We decided to start here. One of the common problem of those types is
their lack of inter-operability. There's usually a way to convert one into
another, but it's either exposed in an Unsafe
or Internal
module, or has a
scary name like unsafeFromForeignPtr
.
Then, if you're unlucky you will see some issues with unpinned and pinned
(probably in production settings to maximise fun); the common ByteString
using
the pinned memory, ShortByteString
and Text
using unpinned memory, and
Vector
, well, it's complicated (there's 4 different kind of Vectors).
Note: pinned and unpinned represent whether the memory is allowed to move by the GC. Unpinned usually is better as it allows the memory system to reduce fragmentation, but pinned memory is crucial for dealing with Input/Output with the real world, large data (and some other uses).
Unboxed Array§
Our corner stone is the unboxed array. The unboxed array is a native Haskell
ByteArray (represented by the ByteArray#
primitive type),
and it is allowed to be unpinned or pinned (at allocation time). To also support
further interesting stuff, we supplement it with another constructor to make it
able to support natively a chunk of memory referenced by a pointer.
In simplified terms it looks like:
data UArray ty = UArrayBA (Offset ty) (Size ty) PinnedStatus ByteArray#
| UArrayAddr (Offset ty) (Size ty) (Ptr ty)
With this capability, we have the equivalent of ByteString
,
ShortByteString
, Unboxed Vector
and (Some) Storable Vector
, implemented
in one user friendly type. This is a really big win for users, as suddenly all
those types play better together; they are all the same thing working
the same way.
Instead of differentiating ByteString
and Vector
, now ByteString
disappears
completely in favor of just being a UArray Word8
. This has been tried
before with the current ecosystem with vector-bytestring.
String§
String is a big pain point. Base represents it as a list of Char [Char]
, which
as you can imagine is not efficient for most purpose. Text
from the popular text
package implements a packed version of this, using UTF-16 and unpinned native ByteArray#
.
While text
is almost a standard in haskell, it’s very likely you’ll need
to pack and unpack this representation to interact with base functions, or
switch representation often to interact with some libraries.
Note on Unicode: UTF-8 is an encoding format where unicode codepoints are encoded in sequence of 1 to 4 bytes (4 different cases). UTF-16 represent unicode sequences with either 2 or 4 bytes (2 different cases).
Foundation’s String are packed UTF-8 data backed by an unboxed vector of bytes.
This means we can offer a lightweight type based on UArray Word8
:
newtype String = String (UArray Word8)
So by doing this, we inherit directly all the advantages of our vector types; namely
we have a String
type that is unpinned or pinned (depending on needs), and supports
native pointers. It is extremely lightweight to convert between the two: provided
UTF8 binary data, we only validate the data, without re-allocating anything.
There’s no perfect representation of unicode; each representation has it own advantages and disadvantages, and it really depends on what types of data you’re actually processing. One of the easy rules of thumb is that the more your representation has cases, the slower it will be to process the highest unicode sequences.
By extension, it means that choosing a unique representation leads to compromise.
In early benchmarks against text we are consistently outperforming Text
when
the data is predominantly ASCII (i.e. 1-byte encoding).
In other type of data, it really depends; sometimes we’re faster still,
sometimes slower, and sometimes par.
Caveat emptor: benchmarks are far from reliable, and only been run on 2 machines with similar characteristic so far.
Other Types§
We also support already:
- Boxed Array. This is an array to any other Haskell types. Think of it as array of pointers to another Haskell value
- Bitmap. 1 bit packed unboxed array
In the short term, we expect to add:
- tree like structure.
- hash based structure.
Unified Collection API§
Many types of collections support the same kind of operations.
For example, commonly you have very similar functions defined with different types:
take :: Int -> UArray a -> UArray a
take :: Int -> [a] -> [a]
take :: Int -> String -> String
head :: UArray a -> a
head :: String -> Char
head :: [a] -> a
So we tried to avoid monomorphic versions of common Haskell functions and instead provide type family infused versions of those functions. In foundation we have:
take :: Int -> collection -> collection
head :: collection -> Element collection
Note: Element collection
is a type family. this allow from a type "collection"
to define another type. for example, the Element of a [a]
is a
, and the Element of
String
is Char
.
Note2: head
is not exactly defined this way in foundation: This was the simplest example
that show Type families in action and the overloading. foundation's head
is not partial and defined:
head :: NonEmpty collection -> Element collection
The consequence is that the same head
or take
(or other generic functions) works
the same way for many different collection types, even when they are monomorphic (e.g. String).
For another good example of this approach being taken, have a look at the mono-traversable package
For other operations that are specific to a data structure, and hard to generalize, we still expose dedicated operations.
The question of dependencies§
If you’re not convinced by how we provide a better foundation to the standard Haskell types, then it raises the question: why not depend on those high quality libraries doing the exact same thing ?
Consistency. I think it's easier to set a common direction, and have a consistent approach when working in a central place together, than having N maintainers working on M packages independently.
Common place. An example speaks better than words sometime: I have this X thing, that depends on the A package, and the B package. Should I add it to A, to B, or create a new C package ?
Atomic development. We don't have to jump through hoops to improve our types or functions against other part of foundation. Having more things defined in a same place, means we can be more aggressive about improving things faster, while retaining an overall package that make sense.
Versions, and Releases. Far easier to depends on a small set of library than depends on hundreds of different versions. Particularly in an industrial settings, I will be much more confident tracking 1 package, watch 1 issue tracker and deal with a set of known people, than having to deal with N packages, N issues trackers (possibly in different places), and N maintainers.
Some final notes§
A fast iterative release schedule. Planning to release early, release often. and with a predictable release schedule.
We're still in the early stage. While we're at an exciting place, don't expect a finish product right now.
You don't need to be an expert to help. anyone can help us shape foundation.
Join us. If you want to get involved: all Foundation works take place in the open, on the haskell-foundation organisation with code, proposals, issues and voting, questions.