.. post:: 2014-12-13
:tags: OCaml
:author: Rudi Grinberg
Type Safe Routing - Baby Steps
==============================
Type safe routing means different things to different people. So let's
start by clarifying the intended meaning in this post. There are 2
widely used definitions:
- The first refers to the case where the type system is used to
maintain hyper links in your web app so that there are no dangling
links. This static checking also extends to the parameters accepted
by the hyper links.
- The second definition is processing URL's in a way that assigns types
to the values extracted.
The first definition is much stronger and is mostly useful for complete
web apps. The second definition pertains to API's exposed over HTTP and
will be the subject of this blog post. I am not an expert on this
subject and this post only documents my first steps as to how to solve
this problem. The end solution that I propose is not quite satisfying
but is elegant in its own right. Let's begin.
The Problem
-----------
Let's take a sample opium handler:
.. code-block:: ocaml
let h =
put "/hello/:x/from/:y" begin fun req ->
let (x,y) = (param req "x", param req "y") in
let msg = sprintf "Hello %s! from %s." x y in
`String msg |> respond |> return
end
There is something very unsatisfying in the code above. We are attaching
a function to a URL that contains 2 parameters ``x`` and ``y``. We know
in the body of the function that the parameters are always there
(otherwise our handler would not be executed) and yet we don't express
this deduction statically.
Can we do better? Yes, and I'll show you how. Let's start by pretending
that a solution exists:
.. code-block:: ocaml
let x =
put (s "/hello" > str > s "from" > str) begin fun x y ->
let msg = sprintf "Hello %s! from %s." x y in
`String msg |> respond |> return
end
This is much better. The function that we are binding to the route is
now typed. The combinators used to specify the routes should be
familiar, but a brief explanation is in order. ``s x`` will consume a
portion of the url defined by ``x``. ``>`` concatenates 2 routes
sequentially. ``str`` specifies a string parameter that ends with a
``"/"``.
The advantages of this approach include:
- We don't have to extract the parameters manually out of the request
or even name the parameters. Although, we can imagine adding support
for that if it proves useful.
- The compiler will warn us if we haven't used an argument.
- We can now specify the types of the parameters we accept. We can
imagine using ``int`` to specify a numerical parameter. In fact, we
can probably think of a host of combinators to add various
functionality.
The advantages are obvious, but how can we implement it?
Aspiring OCaml hackers might want give this a problem a crack
themselves. To specify the problem a little better, and perhaps give a
hint to the solution this is interface which we will be implementing:
.. code-block:: ocaml
type ('a, 'b) t
val s : string -> ('a, 'a) t
val (>) : ('b, 'c) t -> ('a, 'b) t -> ('a, 'c) t
val str : ('a, string -> 'a) t
val int : ('a, int -> 'a) t
val match_url : ('a, 'b) t -> string -> 'b -> 'a option
The Solution
------------
If you have a hard problem related to static typing, a good rule of
thumb is to check if Oleg has already solved it. Indeed, the problem
above falls under that category but under a different context. `Typed
Formatting `__ attempts to
define an EDSL for specifying a type safe printf/scanf. Our problem of
type safe routes is just a subset of that since we only need scanf.
Oleg's solution in Haskell involves GADT's, which have almost have worn
off the novelty for me in OCaml.
We will also need a generic parser type (``'a Parser.t`` is a parser
that yields a value of type ``'a``). A simple monadic parser will do.
Since it's not particularly interesting, I will link my own
implementation in the end. Here's the signature for the parser (Although
I don't think we will need all of these operations).
.. code-block:: ocaml
open Core.Std
(* A Substring.t is analgous to a string * int * int
The parser return None on failure. Some with the result and the rest
of the substring on success. *)
type 'a t = Substring.t -> ('a * Substring.t) option
val return : 'a -> 'a t
val (>>=) : 'a t -> ('a -> 'b t) -> 'b t
(* validate/map input with f. If f returns none then the parser fails.
If f return Some x then x is the result returned *)
val filter_map : 'a t -> f:('a -> 'b option) -> 'b t
(* run the parser against the string and return None if parsing fails. On
success Some (x, rest) where x is the result and rest is the remaining
string that needs to be parsed *)
val run : 'a t -> Substring.t -> ('a * Substring.t) option
(* little helpers *)
val drop_prefix : string -> unit t
val drop_while : (char -> bool) -> unit t
val take_while : (char -> bool) -> string t
Finally, here's the type representing our DSL:
.. code-block:: ocaml
type (_, _) t =
| Try_parse : unit Parser.t -> ('a, 'a) t
| Parse : 'b Parser.t -> ('a, 'b -> 'a) t
| Concat : ('b, 'c) t * ('a, 'b) t -> ('a, 'c) t
Enumerating the constructors:
- ``Try_parse`` represents a parser whose result we discard. Hence we
use ``()``. It will be useful to implement ``s``.
- ``Parse`` represents a parser whose result we retain. It will be
useful to implement typed parameters
- ``Concat`` will be used to sequence routes. It will be useful to
implement ``>``
The key to the whole solution is one simple function interpreting our
DSL:
.. code-block:: ocaml
let rec ints : type a b . (a, b) t -> b -> a Parser.t =
let open Option.Monad_infix in
fun t f inp ->
match t with
| Try_parse p -> Parser.run p inp >>| fun ((), inp') -> (f, inp')
| Parse p -> Parser.run p inp >>| fun (v, s) -> (f v, s)
| Concat (a, b) ->
ints a f inp >>= fun (vb, inp') ->
ints b vb inp'
This function takes a typed route and a function corresponding to a
handler and returns a parser that returns the result of our handler on
success.
We just need a bit more massaging to fit our proposed interface:
.. code-block:: ocaml
let match_url t s cb =
let s = Substring.of_string s in
match ints t cb s with
| None -> None
| Some (x, subs) ->
if subs |> Substring.to_string |> String.is_empty then
Some x
else (* we did not consume the whole string so no match *)
None
The rest is a small matter of programming the combinators:
.. code-block:: ocaml
(* a little helper to make sure our params are not empty *)
let non_empty = Parser.filter_map ~f:(fun x ->
if String.is_empty x
then None
else Some x)
let int = Parse (fun x ->
(Char.is_digit
|> Parser.take_while
|> non_empty
|> Parser.map ~f:Int.of_string) x)
let s x = Try_parse (Parser.drop_prefix x)
let (>) x1 x2 =
let lead_slash x = Concat (s "/", x) in
Concat (x1, lead_slash x2)
let str = Parse (fun x -> (Parser.take_while ((<>) '/') |> non_empty) x)
A little test using oUnit:
.. code-block:: ocaml
let it_works _ =
let test_path = "/user/johndoe/comment/123" in
let p = s "/user" > str > s "comment" > int in
match match_url p test_path (fun user id ->
assert_equal ~printer "johndoe" user;
assert_equal 123 id
) with
| None -> assert_failure "Failed to match url"
| Some _ -> print_endling "tada"
Conclusion
----------
I have a prototype of Opium using these routes but it's not for public
consumption. I'd like to totally revamp routes from a scalability
standpoint first before I improve the API for routing. There are also
many other approaches to this problem (mostly from the land of Haskell).
Here's some great resources:
- `Happstack `__
- The "pioneer" of type safe routing
- The most recent and promising
`servant `__.
However, it covers a whole lot more than just routing.
- `Spock `__ is a tiny Haskell web
framework that is most similar to my approach, although it uses type
families under the hood.
At the end, I'm not sure which approach is the best for OCaml so I'm not
ready to commit to anything in Opium yet. However, one thing is for
certain: I definitely intend to the have the current untyped approach as
an option.
Here's a
`gist `__ of a
working version of my library. You're welcome to fork and experiment,
please let me know if you find any bugs.
One last thing. My implementation above can be called an "initial
encoding" of the EDSL. You can actually implement the EDSL as a "final
encoding" without GADT's. I don't see the advantage in either approach
so I went with the initial encoding. I reached for it first and it gives
me an excuse to play with GADT's. A solution that uses a final encoding
will be left as an exercise to the reader. (Who else has fond memories
of this phrase from back in college?)