I’m investigating a way to speed up persistent
as well as make it more powerful, and one of the potential solutions involves persisting some global state across module boundaries.
I decided to investigate whether the “Global IORef Trick” would work for this.
Unfortunately, it doesn’t.
On reflection, it seems obvious: the interpreter for Template Haskell is a GHCi-like process that is loaded for each module.
Loading an interpreter for each module is part of why Template Haskell imposes a compile-time penalty - in my measurements, it’s something like ~100ms.
Not huge, but noticeable on large projects.
(I still generally find that DeriveGeneric
and the related Generic
code to be slower, but it’s a complex issue).
Anyway, let’s review the trick and obseve the behavior.
This trick allows you to have an IORef
(or MVar
) that serves as a global reference.
You almost certainly do not need to do this, but it can be a convenient way to hide state and make your program deeply mysterious.
Here’s the trick:
module Lib where
import Data.IORef
import System.IO.Unsafe
globalRef :: IORef [String]
globalRef = unsafePerformIO $ newIORef []
{-# NOINLINE globalRef #-}
There are two important things to note:
{-# NOINLINE globalRef #-}
pragma.Let’s say we give globalRef
a more general type:
globalRef :: IORef [a]
This means that we woudl be allowed to write and read whatever we want from this reference.
That’s bad!
We could do something like writeIORef globalRef [1,2,3]
, and then readIORef globalRef :: IO [String]
.
Boom, your program explodes.
Unless you want a dynamically typed reference for some reason - and even then, you’d better use Dynamic
.
If you omit the NOINLINE
pragma, then you’ll just get a fresh reference each time you use it.
GHC will see that any reference to globalRef
can be inlined to unsafePerformIO (newIORef [])
, and it’ll happily perform that optimization.
But that means you won’t be sharing state through the reference.
This is a bad idea, don’t use it. I hesitate to even explain it.
But, well, sometimes you try things out to see if they work. In this case, they don’t, so it’s useful to document that.
We’re going to write a function trackString
that remembers the strings that are passed previously, and defines a value that returns those.
trackString "hello"
-- hello = []
trackString "goodbye"
-- goodbye = ["hello"]
trackString "what"
-- what = ["goodbye", "hello"]
Here’s our full module:
{-# language QuasiQuotes #-}
{-# language TemplateHaskell #-}
module Lib where
import Data.IORef
import System.IO.Unsafe
import Language.Haskell.TH
import Language.Haskell.TH.Quote
import Language.Haskell.TH.Syntax
globalRef :: IORef [String]
globalRef = unsafePerformIO $ newIORef []
{-# NOINLINE globalRef #-}
trackStrings :: String -> Q [Dec]
trackStrings input = do
strs <- runIO $ readIORef globalRef
_ <- runIO $ atomicModifyIORef globalRef (\i -> (input : i, ()))
ty <- [t| [String] |]
pure
[ SigD (mkName input) ty
, ValD (VarP (mkName input)) (NormalB (ListE $ map (LitE . stringL) $ strs)) []
]
This works in a single module just fine.
{-# language TemplateHaskell #-}
module Test where
import Lib
trackStrings "what"
trackStrings "good"
trackStrings "nothx"
test :: IO ()
test = do
print what
print good
print nothx
If we evaluate test
, we get the following output:
[]
["what"]
["good","what"]
This is exactly what we want.
Unfortunately, this is only module-local state.
Given this Main
module, we get some disappointing output:
{-# language TemplateHaskell #-}
module Main where
import Lib
import Test
trackStrings "hello"
trackStrings "world"
trackStrings "goodbye"
main :: IO ()
main = do
test
print hello
print world
print goodbye
[]
["what"]
["good","what"]
[]
["hello"]
["world","hello"]
To solve my problem, main
would have needed to output:
[]
["what"]
["good","what"]
["nothx","good","what"]
["hello","nothx","good","what"]
["world","hello","nothx","good","what"]
Fortunately, we don’t even need to do anything awful like this.
The Q
monad offers two methods, getQ
and putQ
that allow module-local state.
-- | Get state from the Q monad. Note that the state is
-- local to the Haskell module in which the Template
-- Haskell expression is executed.
getQ :: Typeable a => Q (Maybe a)
-- | Replace the state in the Q monad. Note that the
-- state is local to the Haskell module in which the
-- Template Haskell expression is executed.
putQ :: Typeable a => a -> Q ()
These use a Typeable
dictionary, so you can store many kinds of state - one for each type!
This is a neat way to avoid the “polymorphic reference” problem I described above.
If y’all dare me enough I might write a follow-up where I investigate using a compact region to persist state across modules, but I’m terrified of the potential complexity at play there.
I imagine it’d work fine for a single threaded compile, but there’d probably be contention on the file with parallel builds.
Hey, maybe I just need to spin up a redis server to manage the file locks…
Perhaps I can install nix
at compile-time and call out to a nix-shell
that installs Redis and runs the server.