Saturday, April 2, 2011

Topological Sort in Datalog

As a neat trick, I've been spending the weekend rewriting our hundrends (thousands?) of lines of attribute grammar visit scheduling from Java to maybe 100 lines of Datalog. This isn't so much of a surprise: there's been a bunch of work showing how to declaratively describe program analyses with datalog (and optimally optimize it with BDDs, though those results are, unfortunately, more contentious).

Anyways, the core of an attribute grammar scheduler is a topological sort of dependencies. I found this difficult to write. Originally, I tried using a successor relation, where each step was the set of edges (or nodes) that are reachable from the previous. No go, and searching on the web, I didn't see anything else. So, here's another tact:

Given the data:


edge('aa', 'b').

edge('a', 'b').
edge('b', 'c').
edge('c', 'd').

edge('d', 'y').
edge('d', 'z').

edge('a', 'm').
edge('m', 'n').
edge('n', 'o').

edge('o', 'y').
edge('o', 'z').

node(?n) :- edge(?n, ?m).
node(?n) :- edge(?m, ?n).


We calculate the flows-to relation, and then, for each edge, propagate 1 + the label of the previous edge. Finally, we take the maximal incoming label for each node, and voila!



flows(?n, ?m) :- edge(?n, ?m).
flows(?n, ?m) :- flows(?n, ?x), flows(?x, ?m).
whenAll(?n, 0) :- node(?n), !edge(?m, ?n).
whenAll(?n, ?c) :- whenAll(?m, ?d), flows(?m, ?n), ?d + 1 = ?c.
when(?n, ?c) :- whenAll(?n, ?c), ?c + 1 = ?d, !whenAll(?n, ?d).


Testing the result:


?- when(?n, ?c).
Query: ?- when(?n, ?c). ==>> 10 rows in 0ms
Variables: ?n, ?c
('aa', 0)
('a', 0)
('b', 1)
('m', 1)
('c', 2)
('n', 2)
('d', 3)
('o', 3)
('y', 4)
('z', 4)

No comments: