A few weeks ago I gave a talk at London Haskell that was basically an advert for a data migrating library I haven’t even finished yet. However, I managed on the whole to hide this fact and some people even came up afterwards and asked me how the library was performing in production to which of course I said “very well indeed, because of course i am very good at computers” and quickly changed the subject.
What follows is basically the crap from my slides, occasionally turned from bullet lists into sentences where I realised I had overused that particular presentation device a little much.
Sitting comfortable? Then we shall begin…
In The Beginning…
…we had server side applications.
If the code agrees with the DB schema…
Deploy!
Great!
Then Came Javascript…
Suddenly all our data was spread around the place
Things didn’t necessarily agree with one another
There was sometimes JQuery.
And at some point we land at
The Traditional Backend / Frontend Monolith
When the back end changes…
Change the front end too.
Deploy everything together
(did you forget to update the DB schema?)
DO THAT QUICKLY
Forget about the past
YOLO
So…
What could possibly go wrong with this?
Problem One
Company A
have found that DB schema changes make changing the application more difficult than they would like. Therefore they choose to use event sourcing. As the application changes, the DB schemas keep up, but they are soon left with an event table full of various historical versions of JSON data.
- What is this data? Do they still understand it?
Problem Two
Company B
decide their code is so good that they are going to create a public API. Other companies decide to use this API, and annoyingly want it not to change at random. Therefore the API owners make promises not to change their API (even though they bloody love changing their API because they are 10x hackers who just can’t stop delivering business value).
- How can they make changes to this without breaking everything?
Problem Three
Company C
have noticed that large monoliths take ages to deploy. They would also like to decouple teams to maximise parallelisation of work. They adopt a microservice architecture. Suddenly, services that talk to one another aren’t guaranteed to have versions or interfaces that match, so a old service can be receiving requests from a very old service that hasn’t been updated.
- How will they cope with communicating with any number of historical deployments?
A concrete example
Here’s a data type that we use in our business critical application. It is called OldUser
, which we never really questioned at the time.
data OldUser
= OldUser
firstName :: String
{ surname :: String
, pet :: String
, age :: Int
, }
Business is going pretty well, I can’t imagine my meeting this afternoon will go badly…
Oh no!
Pivot immediately
Apparently we can increase profitability by 30% by using newtypes properly.
newtype Name
= Name { getName :: Text.Text }
And replacing String
selections of pet
types with a more restrictive sum type.
data OldPet
= OldDog
| OldCat
| NoPet
These changes took all night, but you really pulled through there.
data NewUser
= NewUser
newFirstName :: Name
{ newSurname :: Name
, newPet :: OldPet
, newAge :: Int
, }
Great job.
Hold On Though
Business is obviously booming now, but what are we going to do about:
Third parties that will insist on using
OldUser
in their API calls for the next 18 monthsStored JSON data with the old data shape
What options do we have?
- As well as developing new code, we keep old code for dealing with the old data
- This seems fine…
- …till you’re fixing bugs in the old system too
- Shit! More fixes! Now you’re got an new-old system as well as the new-new one.
- More fixes again! Now we have a new-old-new-old system to maintain as well as your main one, which is a new-new-new system by now? It may have been superceded too. Oh dear.
- Or migrate the old datatypes to the new datatypes and keep one set of logic.
- Logic stays in new code
- Bug fixes in business logic happen once
- Logic of migration separated from business logic
Decision time
If you choose Option 1
, you are on your own. However, if you have chosen Option 2
, read on…
Functions we will need: migrate
The first thing we’ll need is a function to convert our old terrible datatype into our new incredible exciting datatype, in this case OldUser
to NewUser
.
migrate :: OldUser -> NewUser
Actually. Let’s be realistic about this, and account for the idea that this operation could fail, as life is a bin.
migrate :: OldUser -> Maybe NewUser
Functions we will need: parse
We’ll also need some functions for decoding JSON, as I have made the somewhat brazen assumption this is a REST API that only receives JSON payloads. We’ll use functions from the Aeson
library because it is reasonably ubiquitous.
This function will attempt converting some JSON
into an OldUser
parseOldUser :: JSON -> Maybe OldUser
parseOldUser json= parseMaybe (parseJSON json)
And this very similar function will try and convert from JSON
into a NewUser
.
parseNewUser :: JSON -> Maybe NewUser
parseNewUser= parseMaybe . parseJSON
Putting it together
We can then make a function that takes some JSON
, and then tries to decode it into a NewUser
. If it can’t decode it into a NewUser
, it tries to parse it into an OldUser
, and if that succeeds, it uses some sort of migrate
function to turn OldUser
into a NewUser
.
Said function looks something like this:
parseSomeKindOfUser :: JSON -> Maybe NewUser
parseSomeKindOfUser json= parseNewUser json
<|> (parseOldUser json >>= migrate)
(The <|>
operator comes from the Alternative
typeclass, and works sort of like the ||
function. The intuition is try this OR try this
)
Does it scale though?
It seems to do the job, with a couple of datatypes, however it’s easy to see how it could get out of hand…
= parse a
thing <|> parse b >>= migrate
<|> parse c >>= migrate >>= migrate
<|> parse d >>= migrate >>= migrate >>= migrate
<|> parse e >>= migrate >>= migrate >>= migrate >>= migrate
<|> parse f >>= migrate >>= migrate >>= migrate >>= migrate >>= migrate
<|> parse g >>= migrate >>= migrate >>= migrate >>= migrate >>= migrate >>= migrate
<|> parse h >>= migrate >>= migrate >>= migrate >>= migrate >>= migrate >>= migrate >>= migrate
<|> parse i >>= migrate >>= migrate >>= migrate >>= migrate >>= migrate >>= migrate >>= migrate >>= migrate
<|> parse j >>= migrate >>= migrate >>= migrate >>= migrate >>= migrate >>= migrate >>= migrate >>= migrate >>= migrate
<|> parse k >>= migrate >>= migrate >>= migrate >>= migrate >>= migrate >>= migrate >>= migrate >>= migrate >>= migrate >>= migrate
<|> parse l >>= migrate >>= migrate >>= migrate >>= migrate >>= migrate >>= migrate >>= migrate >>= migrate >>= migrate >>= migrate >>= migrate
<|> parse m >>= migrate >>= migrate >>= migrate >>= migrate >>= migrate >>= migrate >>= migrate >>= migrate >>= migrate >>= migrate >>= migrate >>= migrate
So this method is a no?
I don’t think so. But writing all of that is a bit much. Hopefully right now you are asking “Why can’t I get the compiler to do this for me?”.
Good Point.
OK! Let’s give it a smash. We start by making a typeclass and using it to version tag our datatype.
class Versioned (label :: Symbol) (num :: Nat) where
type num `VersionOf` label :: Type
(This VersionOf
type here is an associated type family
- a type level function that is scoped to only work inside the typeclass it is defined in.)
The typeclass defines a function we can use to find a datatype from the label
and the num
. For example, we can make a label called "User"
, and make OldUser
version 1
of it:
instance Versioned "User" 1 where
type 1 `VersionOf` "User" = OldUser
We’ll also define an instance for NewUser
, and make that version 2
of it.
instance Versioned "User" 2 where
type 2 `VersionOf` "User" = NewUser
Linking versions together
Next we make a typeclass for migrations that uses our VersionOf
type function:
class Migratable (label :: Symbol) (num :: Nat) where
fromPrevious :: (num - 1) `VersionOf` label
-> Maybe (num `VersionOf` label)
It lets us define a function from the previous version of a datatype to the current one, so let’s use it to migrate OldUser
to NewUser
.
instance Migratable "User" 2 where
fromPrevious :: OldUser -> Maybe NewUser
fromPrevious older= Just $ NewUser
= Name (Text.pack (firstName older))
{ newFirstName = Name (Text.pack (surname older))
, newSurname = readPet (pet older)
, newPet = age older
, newAge
}where
readPet s| s == "dog" = OldDog
| s == "cat" = OldCat
| otherwise = NoPet
Problem: Migrating from old JSON versions
Once we’ve defined instances of the Versioned
and Migratable
typeclasses for our data, the Data.Migratable
library functions start doing helpful things. It provides a version of migrate
function we defined concretely earlier, but with a much more exciting (confusing) type signature.
migrate :: earliest `VersionOf` label
-> Maybe (target `VersionOf` label)
It means “I can take the earliest
version of label
and Maybe
return you the target
version of label
”. We use type applications to pass versions
and a label
to convert an old datatype to a new one.
oldToNew :: OldUser -> Maybe NewUser
= migrate @1 @2 @"User" oldToNew
Going from version 1
to 2
means we’ve just done a shitload of work for a single conversion, but the same function will recursively convert through as many versions of the datatype as you like:
veryOldToVeryNew :: OldUser -> Maybe VeryNewUser
= migrate @1 @100 @"User" veryOldToVeryNew
Solution: parseJSONVia
The Data.Migratable
library also provides us parseJSONVia
, with the following exciting type signature:
parseJSONVia :: JSON
-> Maybe (target `VersionOf` label)
We could use to try converting from both OldUser
and NewUser
like this:
parseSomeKindOfUser :: JSON
-> Maybe NewUser
parseSomeKindOfUser = parseJSONVia @"User" @1 @2
Underneath, this is doing our parse
and migrate
pattern under thge good
= parse @NewUser
thing <|> parse @OldUser >>= migrate
A note on Type Applications
To understand how to pass the types to the parseJSONVia
function, we need to look at the Schema
typeclass that provides this functionality:
class Schema (label :: Symbol) (earliest :: Nat) (target :: Nat) where
parseJSONVia :: JSON.Value -> JSON.Parser (target `VersionOf` label)
When we use it, we are passing it types in the order they appear in the class declaration.
@"User" @1 @2 parseJSONVia
Therefore, we are passing the type-level symbol "User"
as the first argument label
, then a type-level natural number 1
as the starting version earliest
, and finally another type-level natural 2
as the target version target
.
Problem: Uniqueness checking
Uniqueness checking is another feature of Data.Migratable
. Let’s say that we have this data type Info
:
data Info
= Info
: Pounds } { amount
Then, after another hard pivot, we change the units:
data NewInfo
= NewInfo
amount :: Pennies } {
In our static typed ivory tower, we are fine, but our clients keep sending us this:
{ "amount": 100 }
What is it? 100 pennies? 100 pounds? How can we stop this confusion?
Solution: Checking for duplicates with QuickCheck
Data.Migratable
provides us with the matchAll
function:
matchAll :: IO (Either [MatchError] [Integer])
It uses
QuickCheck
and it’sArbitrary
instances to generate randomJSON
values for each datatypeThen tries to load each generated value as each version of the datatype
And tells us how many version of a datatype each generated instance is able to decode
If it’s one each - we’re going to have a good time:
"Uses Arbitrary to generate said tests" $ do
describe "Checks if our datatypes will get confused" $ do
it <- matchAll @1 @4 @"User"
found `shouldBe` Right [1,2,3,4] found
- But if our
JSON
representations are non-unique, we’ll know:
"Our Pennies and Pounds schema" $ do
describe "Spots our problematic matching schema" $ do
it <- matchAll @1 @2 @"Same"
found `shouldBe` Left [Duplicates 1 [2,1], Duplicates 2 [2,1]] found
- And we can fix our data types to ensure uniqueness.
data Info
= Info
amountPounds :: Pounds }
{
data NewInfo
= NewInfo
amountPennies :: Pennies } {
- Good job.
Problem: getting a FromJSON instance
The Aeson
library works by making datatypes define instances of the FromJSON
typeclass, and packages like Servant
allow us to automagically create web servers that use these types. Can we still use all this good stuff?
Solution: FromJSON instance around a newtype wrapper
newtype APIUser
= APIUser { getAPIUser :: WhateverTheNewestUserTypeIsTheseDays }
…and using parseJSONVia
to create a FromJSON
instance for that datatype…
instance JSON.FromJSON APIUser where
parseJSON a= APIUser <$> parseJSONVia @"User" @1 @4 a
…we can make a Servant
server that can read any of our historical datatypes.
type ExcellentApi =
"user" :> Get '[JSON] [APIUser]
Great job!
Problem: Mistakes in our datatypes
What if we make “mistakes” in our types - like removing a piece of data we later decide we need? Here is a first version of some data.
data Dog
= Dog { name :: String
age :: Int
,
}
instance Versioned "Dog" 1 where
type 1 `VersionOf` "Dog" = Dog
This is the second version of the datatype, where we remove age
.
data AgelessDog
= AgelessDog
name :: String }
{
instance Versioned "Dog" 2 where
type 2 `VersionOf` "Dog" = AgelessDog
Oops. We needed that. It’s back in version 3
.
data WithAgeDog
= WithAgeDog
name :: String
{ age :: Int
, tail :: Bool
,
}
instance Versioned "Dog" 3 where
type 3 `VersionOf` "Dog" = WithAgeDog
However, this means any version 1
piece of data will convert through version 2
and lose everything on the way.
Solution: multiple import paths
Let’s change our declarations…
data Dog
= Dog { name :: String
age :: Int
,
}
instance Versioned "Dog" 1 where
type 1 `VersionOf` "Dog" = Dog
instance Versioned "AgeDog" 1 where
type 1 `VersionOf` "AgeDog" = Dog
(We’ve ignored the middle one for now - it is the same)
data WithAgeDog
= WithAgeDog
name :: String
{ age :: Int
, tail :: Bool
,
}
instance Versioned "Dog" 3 where
type 3 `VersionOf` "Dog" = WithAgeDog
instance Versioned "AgeDog" 2 where
type 2 `VersionOf` "AgeDog" = WithAgeDog
Then our parsing function becomes (something like)
parseSomeKindOfDog :: JSON -> Maybe WithAgeDog
parseSomeKindOfDog json= parseJSONVia @"AgeDog" @1 @2 json
<|> parseJSONVia @"Dog" @1 @3 json
How does it work?
First, we try the lossless path
Failing that, we try the lossy path to pick up any
AgeLessDog
values.
OK. Sum up what you’ve said and stop selling me your crappy non-existent library
So hopefully, this technique should let you:
Define migrations outside the main logic code, throw those into a file and forget about them forever until the next migration.
Use simple ADTs for my types if one feels like it.
Avoid historical code making new code more complicated.
But does it work?
Who knows? See the code at https://github.com/danieljharvey/migratable and decide for yourself. I mean, the tests pass, but what does that really tell us.
Addendum
After the talk, somebody suggested that it’s all very well accepting old JSON data version in an API, but really you’d need to provide the response in the old format too. This was a really annoyingly good point, so this is the next feature I am working on for the library - the plan so far is to create a kind of opposite of the Migratable
typeclass for responses, that goes from newer to older versions instead.