Query evaluation
Eclair is a high-level declarative logic query language (a Datalog variant). At first glance, it might not be immediately obvious how these kinds of programs are executed. This page aims to explain step-by-step how each part of an Eclair program is evaluated using some examples.
Top-level facts
Top-level, hard-coded facts are the easiest to evaluate. When a fact is encountered in a program, Eclair will add the fact to the corresponding relation.
A fact will only be added to a relation if it contains new information; if a fact with the same arguments was already present, the redundant information will be discarded.
The facts that are hard-coded into the Eclair program will be combined with the facts that are dynamically added via the API.
When evaluating the following snippet, two facts would be added to the
example
relation.
@def example(u32, u32).
example(1, 2).
example(2, 3).
Rules
For rules we make a distinction between non-recursive rules and recursive rules, since they are evaluated in completely different ways.
Non-recursive rules
Non-recursive rules are rules where none of the clauses in the rule body refer to the relation in the rule head itself.
The simplest example of a non-recursive rule is a rule with a single clause:
@def a(u32) output.
@def b(u32, u32) input.
a(x) :-
b(x, _).
To evaluate the rule in the above example, Eclair will first search for all
facts of relation b
. After that, it copies over the first column of each fact
in relation b
to a new fact in relation a
.
Things get slightly more complicated when a rule has multiple clauses. Take the following code snippet:
@def a(u32, u32) output.
@def b(u32, u32) input.
@def c(u32, u32) input.
a(x, z) :-
b(x, y),
c(y, z).
In order to find all facts for relation a
, Eclair first has to find all facts
in relation b
. Then, it has to find all matching facts in relation c
such
that the first column of c
is equal to the second column of b
(variables
with the same name add an implicit equality constraint!). Each time Eclair
finds a set of data that matches this pattern, it will add the result to the
relation a
.
Recursive rules
Now that we understand non-recursive rules, recursive rules are next. These are rules where one of the clauses in the rule body refers to the same relation in the rule head.
The main difference here is that this effectively means that Eclair will have to keep applying the same rule over and over again in a loop until it no longer can find any new information.
A good example of a recursive rule is the recursive case of the path example used in a few other places on this website. Here’s the relevant part of that code:
@def edge(u32, u32).
@def reachable(u32, u32).
reachable(x, z) :-
edge(x, y),
reachable(y, z).
In order to evaluate this rule, Eclair will have to first have to find all
edge
facts, and then look for matching reachable
facts. If it finds new
facts, it will need to apply the same rule again.
Important: Eclair only takes into account facts derived from the last iteration to avoid a lot of unnecessary computation. This technique is called semi-naive evaluation.
You might notice that a recursive rule by itself can’t result in any facts. (There would be no facts to start the recursion from!) This is why a recursive rule should always be paired with one or more non-recursive rules or top-level facts.
Stratification
So far, we’ve looked at the evaluation order of different situations one by one. In a real program however, all these situations can occur simultaneously.
Eclair uses a stratification algorithm in order to evaluate a program in the most optimal way. It does this by performing a topological sort of all declarations, based on the dependencies between relations. Mutually recursive relations are grouped together and evaluated in a single unit.
Stratification guarantees that rules are always executed in the same deterministic order, using bottom-up evaluation. If a program can’t be stratified, the compiler will generate an error and abort compilation.
In the snippet below, the code would be evaluated in the following order:
- First, the top-level facts are evaluated (facts have no data dependencies). The hard-coded facts are combined with facts added via the API.
- Next, the non-recursive rule is evaluated, since it only depends on the
edge
relation. - Finally, the recursive rule is repeatedly evaluated until no new results can
be found. (It has a dependency both on
edge
andreachable
.)
@def edge(u32, u32) input.
@def reachable(u32, u32) output.
// Step 3
reachable(x, z) :-
edge(x, y),
reachable(y, z).
// Step 2
reachable(x, y) :-
edge(x, y).
// Step 1
edge(1, 2).
edge(2, 3).
edge(3, 4).
Wrapping up
It would take too long to explain each possible situation, but the previous
examples should have given you some intuition how programs are evaluated. You
can use the intermediate RA
format (short for relational algebra) generated
by the compiler, if you want to know how eclair exactly evaluates your code.
You can generate RA
output using the following command:
$ eclair compile --emit ra program.eclair
This will print out the intermediary format as pseudo code to the screen and will explain how Eclair code is evaluated using set semantics. (This is similar to how evaluation was explained on this page.)
If we generate the RA
pseudo-code for the last snippet from the previous
section, we get the output below. You can try comparing it with the previous
explanation:
project (3, 4) into edge
project (2, 3) into edge
project (1, 2) into edge
search edge as edge0 do
project (edge0[0], edge0[1]) into reachable
merge reachable delta_reachable
loop do
purge new_reachable
search edge as edge0 do
search delta_reachable as delta_reachable1 do
if (edge0[0], delta_reachable1[1]) ∉ reachable do
if edge0[1] = delta_reachable1[0] do
project (edge0[0], delta_reachable1[1]) into new_reachable
exit if counttuples(new_reachable) = 0
merge new_reachable reachable
swap new_reachable delta_reachable
The RA
format should be seen as an advanced tool and is definitely not needed
all the time, but it can give you a greater understanding of what’s exactly
happening under the hood.
For more information on the idea behind the RA
format, check out this
paper by the Soufflé Datalog authors.