RethinkDB and Elixir - Part 2: Queries

This is the second part of an ongoing series of posts (Part 1) about my experiences building a RethinkDB driver for Elixir. The driver can be found at https://github.com/hamiltop/rethinkdb-elixir.

RethinkDB.Query

RethinkDB does not use a text based query language like SQL or the query language used in MongoDB. Instead, it has an S-Expression syntax, similar to LISP.

The guidance from the RethinkDB team is that drivers should capture native language constructs and idioms whenever possible. The official drivers are in JavaScript, Ruby, and Python and make use of method chaining. Consider the following example:

r.table("people").filter({name: "Peter"})  

The equivalent expression in Elixir:

filter(table("people"), %{name: "Peter"})  

That's a bit hard to read. We can't do method chaining because we don't have the OO style of programming available in JavaScript, but luckily we have the beloved pipe operator (|>). Rewriting the previous example, we get:

table("people") |> filter(%{name: "Peter"})  

That's much better.

RethinkDB.Lambda

A major part of using RethinkDB is querying with anonymous functions. An example from JavaScript:

r.table("people").filter(function (person) {  
  return person("age").ge(21);
});

The above query will return all people who are age 21 or greater. The anonymous function, however, isn't really capturing idiomatic JavaScript. We can follow a similar approach in Elixir:

table("people") |> filter(fn (person) ->  
  person |> bracket("age") |> ge(21)
end)  

This suffers from the same problem; it doesn't capture the native semantics of Elixir. Luckily, we have Macros available. RethinkDB.Lambda.lambda is a macro for converting Elixir code into RethinkDB code. Rewriting the previous example with this macro, we get:

import RethinkDB.Lambda

table("people") |> filter(lambda fn (person) ->  
  person[:age] >= 21
end)  

Implementation

For the lambda macro, we merely use Macro.prewalk/2 to transform the code into the correct ReQL. See here for details.

To serialize an Elixir function into the ReQL AST is a bit more complicated. Following the guide at https://rethinkdb.com/docs/writing-drivers/, the Elixir driver does the following when it encounters an anonymous function in a query:

  1. Check how many arguments the function expects via :erlang.fun_info/2
  2. Call make_ref once per argument.
  3. Build a var query with the reference.
  4. Apply the anonymous function with the list of var queries.
  5. Wrap the result in a funcall query.
  6. When the query is run, walk the entire AST and replace each ref with an integer.

Some of this is low level detail, but there are two main things happening here that are important.

First, we use make_ref to assign each var query an identifier when building the query. We then replace each ref with a unique integer. This allows us to compose and reuse queries without collision. By waiting until we run the query, we can safely assign unique ids without relying on global state.

Second, we run the anonymous function. Consider this:

table("people") |> filter(fn (person) ->  
  :timer.sleep(1000)
  5 == 10
end)  

Just creating that query will cause the process to sleep for 1000ms and then compare 5 and 10. When you actually run it, it is equivalent to:

table("people") |> filter(fn (person) ->  
  false
end)  

This is actually really powerful. It allows you to use the full extent of the language for anything that can be computed client side. Consider this example:

def get_if_related_to(id, relationship) do  
  table("people") |> filter(lambda fn (person) ->
    case relationship do
      r when r in [:brother, :sister] ->
        person[:siblings] |> contains(id)
      r when r in [:mother, :father] ->
        person[:children] |> contains(id)
      r when r in [:son, :daughter] ->
        table("people")
          |> get(id)
          |> get_field("children")
          |> contains(person[:id])
      r when r in [:aunt, :uncle] ->
        person[:siblings]
          |> flat_map(fn sibling ->
            sibling |> get_field("children")
          end) |> contains(id)
    end
  end)
end  

Functionally, this is equivalent to:

def get_if_related_to(id, relationship) do  
  filter_fun = case relationship do
    r when r in [:brother, :sister] ->
      lambda fn (person) ->
        person[:siblings] |> contains(id)
      end
    r when r in [:mother, :father] ->
      lambda fn (person) ->
        person[:children] |> contains(id)
      end
    r when r in [:son, :daughter] ->
      lambda fn (person) ->
        table("people")
          |> get(id)
          |> get_field("children")
          |> contains(person[:id])
      end
    r when r in [:aunt, :uncle] ->
      lambda fn (person) ->
        person[:siblings]
          |> flat_map(fn sibling ->
            sibling |> get_field("children")
          end) |> contains(id)
      end
  end
  table("people") |> filter(filter_fun)
end  

Both of those queries would be identical over the wire. Being able to use full Elixir semantics in an anonymous function is a powerful way to keep code clean.

Conclusion

If you want to dive in deeper, hit me up on Slack or Github. I'm @hamiltop on both the RethinkDB and Elixir slack teams. I recommend reading http://www.rethinkdb.com/blog/lambda-functions/ as well to understand how the official drivers handle it.

comments powered by Disqus