Woorpje is a string solver for bounded word equations, i.e., the length of the strings substituting each variable is upper bounded by a given integer. Using the existence of these bounds, and the fact that the alphabet of the strings is finite, allows us to reformulate the question is this equation solvable into a reachability check in a finite automaton. Preliminary experiments with a simplistic reachability search revealed, however, that the search space quickly becomes intractable. Thus, Woorpje translates the Bounded Word Equation problem further into a satisfiability problem of propositional logic, and uses the award-winning SAT-solver Glucose as its SAT-Solving backend.
Using Woorpje:
Woorpje is invoked from the command line with ./woorpje equation.eq
where equation.eq
is the configuration file for Woorpje.
Simple Example
Using Woorpje to solve the equation aX = Yb
is as simple as providing the below configuration file to Woorpje.
Variables {XY} Terminals {ab} Equation: aX = Yb SatGlucose(10)
The lines Variables
Terminals
and Equation
must be written in
that order, and they have the obvious semantics. The last line
SatGlucose(10)
invokes the solver and pass the number 10 as
parameter to the solver. The 10
informs the solver to try increase
the bound exponentially ten times. If the parameter is 0
, Woorpje will guess a suitable value.
Tip
|
It is possible to specify a system of equations, where all equations must be true, by specifying several Equation lines
|
As can be seen from the output below, Woorpje finds a solution to the equation where it sets X = b
and Y = a
.
Woorpje also provides a proof that the substitution is correct by outputting the left hand side and right hand side
with substituted variables.
woorpje 0.1 (Mar 4 2019) Revision: 7621932 woorpje is a word equation solver based on Glucose. Dependable Systems Group, Kiel Univeristy Mitja Kulczynski, Danny Boegsted Poulsen and Thorsten Ehlers - {mku,dbp,the}@informatik.uni-kiel.de SatSolver Ready c last restart ## conflicts : 0 2 Substitution: X: b Y: a Equations after substition ab == ab Timing info Encoding : 0 milliseconds Solving : 0 milliseconds Overall Solving Time : 0 milliseconds Found a solution
Linear Constraints
Woorpje supports constraining the set of potential solutions by specifying constraints between linear arithmetic expressions
over the length of the variables. As an example, consider that want solution to aX = Yb
where X
is longer than 5,
we can add the constraint LinConstraint: [ >= +1|X|, +5]
to the configuration file.
Linear constraints must be specified after the word equations. Giving us the configuration file
Variables {XY} Terminals {ab} Equation: aX = Yb LinConstraint: [ >= +1|X|, +5] SatGlucose(10)
Tip
|
Linear Constraints are specified in prefix form where the comparison operator (>=,⇐, >, < ) is first specified and
after that the left hand side and then the right hand side of comparison.
|
The output from Woorpje
woorpje 0.1 (Mar 4 2019) Revision: 7621932 woorpje is a word equation solver based on Glucose. Dependable Systems Group, Kiel Univeristy Mitja Kulczynski, Danny Boegsted Poulsen and Thorsten Ehlers - {mku,dbp,the}@informatik.uni-kiel.de SatSolver Ready c last restart ## conflicts : 0 12 Substitution: X: baaaaaaab Y: abaaaaaaa Equations after substition abaaaaaaab == abaaaaaaab Timing info Encoding : 0 milliseconds Encoding : 0 milliseconds Encoding : 0 milliseconds Solving : 0 milliseconds Overall Solving Time : 0 milliseconds Found a solution
reveals that there is indeed a solution. It also shows, that Woorpje encoded the word equation problem into SAT three times.
Benchmarks
For evaluating the performance of Woorpje, we generated five collections of word equations benchmarks that exhibit a special structure. The equations are solved with Woorpje and state of the art string solvers (CVC4, Z3str, Z3Seq and Norn, Sloth). The benchmarks are available in our Availability section.
For each word equation we ran the solvers and noted if they gave the correct result, if they timed out and we measured the total time for solving all equations. The solvers were allowed to run for 30 seconds.
First Track
The first track (I) is based on completely random generated string, where randomly picked string of arbitrary length were replaced by variables. This guarantees the existence of a solution.The generated word equations have an upper bound of 15 variables, 10 constants, and length 300.
Program |
Satis |
NSatis |
Error |
Unknown |
Timeout |
Total |
Time |
TimeWO |
woorpje |
200 |
0 |
0 |
0 |
0 |
200 |
8.10 |
8.10 |
cvc4 |
182 |
0 |
0 |
0 |
18 |
200 |
543.32 |
3.32 |
z3str3 |
197 |
0 |
0 |
1 |
2 |
200 |
105.07 |
45.07 |
z3seq |
183 |
0 |
0 |
0 |
17 |
200 |
545.24 |
35.24 |
norn |
176 |
4 |
4 |
0 |
20 |
200 |
1037.63 |
437.63 |
sloth |
101 |
0 |
0 |
0 |
99 |
200 |
3658.34 |
688.34 |
Second Track
The second track (II) is based on the idea in Proposition 1 of MFCS2017, where the equation
\(X_naX_nbX_{n-1}bX_{n-2}\cdots bX_1 = aX_nX_{n-1}X_{n-1}bX_{n-2}X_{n-2}b \cdots b X_{1}X_{1}baa\) is shown to have a minimal solution of exponential length w.r.t. the length of the equation.
Program |
Satis |
NSatis |
Error |
Unknown |
Timeout |
Total |
Time |
TimeWO |
woorpje |
5 |
0 |
0 |
0 |
4 |
9 |
123.85 |
3.85 |
cvc4 |
1 |
0 |
0 |
0 |
8 |
9 |
240.03 |
0.03 |
z3str3 |
0 |
0 |
0 |
9 |
0 |
9 |
0.33 |
0.33 |
z3seq |
9 |
0 |
0 |
0 |
0 |
9 |
1.81 |
1.81 |
norn |
0 |
0 |
0 |
0 |
9 |
9 |
270.00 |
0.00 |
sloth |
7 |
0 |
0 |
0 |
2 |
9 |
124.56 |
64.56 |
Third Track
The third track (III) is based on the second track, but each letter $b$ is replaced by the left hand side or the right hand side of some randomly generated word equation (e.g., with the methods from track (I))
Program |
Satis |
NSatis |
Error |
Unknown |
Timeout |
Total |
Time |
TimeWO |
woorpje |
147 |
42 |
0 |
0 |
11 |
200 |
341.74 |
11.74 |
cvc4 |
121 |
44 |
0 |
0 |
35 |
200 |
1055.24 |
5.24 |
z3str3 |
51 |
48 |
6 |
43 |
58 |
200 |
2089.92 |
349.92 |
z3seq |
82 |
45 |
1 |
0 |
73 |
200 |
2199.99 |
9.99 |
norn |
60 |
12 |
1 |
0 |
128 |
200 |
4038.83 |
198.83 |
sloth |
126 |
7 |
12 |
0 |
67 |
200 |
2808.48 |
798.48 |
Fourth Track
The fourth track (IV) consists of a system of 100 small random word equations with at most 6 constant letters, 10 variables and length 60. The hard aspect of this track is solving multiple equations at the same time.
Program |
Satis |
NSatis |
Error |
Unknown |
Timeout |
Total |
Time |
TimeWO |
woorpje |
104 |
92 |
0 |
2 |
2 |
200 |
136.20 |
76.20 |
cvc4 |
76 |
96 |
0 |
0 |
28 |
200 |
925.13 |
85.13 |
z3str3 |
79 |
99 |
3 |
10 |
12 |
200 |
490.23 |
130.23 |
z3seq |
97 |
97 |
1 |
0 |
6 |
200 |
200.09 |
20.09 |
norn |
16 |
112 |
68 |
0 |
72 |
200 |
3216.95 |
1056.95 |
sloth |
15 |
1 |
0 |
0 |
184 |
200 |
5615.81 |
95.81 |
Fifth Track
The fifth track (V) enriches a system of 30 word equations by suitable linear constraints, as presented in this paper. This track is influenced by the Kaluza benchmark set.
Program |
Satis |
NSatis |
Error |
Unknown |
Timeout |
Total |
Time |
TimeWO |
woorpje |
164 |
14 |
0 |
9 |
13 |
200 |
399.22 |
9.22 |
cvc4 |
156 |
23 |
0 |
0 |
21 |
200 |
635.97 |
5.97 |
z3str3 |
175 |
23 |
0 |
1 |
1 |
200 |
41.52 |
11.52 |
z3seq |
170 |
23 |
0 |
0 |
7 |
200 |
217.35 |
7.35 |
norn |
88 |
103 |
79 |
0 |
9 |
200 |
742.34 |
472.34 |
sloth |
11 |
0 |
2 |
2 |
187 |
200 |
5750.79 |
140.79 |