# A Constraint Solver and its Application to Machine Code Test Generation 

Trevor Alexander Hansen

## Submitted in total fulfilment of the requirements of the degree of Doctor of Philosophy

October 2012

## Department of Computing and Information Systems The University of Melbourne

Australia

This thesis is printed on acid-free paper.


#### Abstract

Software defects are a curse, they are so difficult to find that most software is declared finished, only later to have defects discovered. Ideally, software tools would find most, or all of those defects for us. Bit-vector and array reasoning is important to the software testing and verification tools which aim to find those defects. The bulk of this dissertation investigates how to build a faster bit-vector and array solver.

The usefulness of a bit-vector and array solver depends chiefly on it being correct and efficient. Our work is practical, mostly we evaluate different simplifications that make problems easier to solve. In particular, we perform a bit-vector simplification phase that we call "theory-level bit-propagation" which propagates information throughout the problem. We describe how we tested parts of this simplification to show it is correct.

We compare three approaches to solving array problems. Surprisingly, on the problems we chose, we show that the simplest approach, a reduction from arrays and bit-vectors to arrays, has the best performance.

In the second part of this dissertation we study the symbolic execution of compiled software (binaries). We use the solver that we have built to perform symbolic execution of binaries. Symbolic execution is a program analysis technique that builds up expressions that describe all the possible states of a program in terms of its inputs. Symbolic execution suffers from the "path explosion", where the number of paths through a program grows tremendously large, making analysis impractical. We show an effective approach for addressing this problem.


## Declaration

This is to certify that
(i) the thesis comprises only my original work towards the PhD except where indicated in the Preface,
(ii) due acknowledgement has been made in the text to all other material used,
(iii) the thesis is less than 100,000 words in length, exclusive of tables, maps, bibliographies and appendices.

[^0]
## Preface

The work discussing symbolic execution has been previously published as:
T. Hansen, P. Schachte, and H. Søndergaard[HSS09], "State Joining and Splitting for the Symbolic Execution of Binaries", Runtime Verification 2009.

## Acknowledgements

Thanks first to my supervisors, Peter Schachte and Harald Søndergaard, for being patient-spending literally hundreds of hours working with me. And, for cheerfully improving my, initially appalling, writing and work. Thanks to Nicholas Nethercote, who supervised me during part of this work.

Thanks to my office mates: Ben Horsfall, for being good company and explaining logical formalisms; Thibaut Feydy, for knowing about constraint program; Marco S. Netto, for showing me how to be productive; Matt Davis, for being enthusiastic; and to the always friendly Khalid Al-Jasser, Kathryn Francis, and Jad Abi-Samra.

Thanks to Robert Brummayer and Armin Biere for making their SMT fuzzing and delta debugging tools available. We used them constantly during the development of our solver.

Thanks to Morgan Deters and to the other organisers of SMT-COMP 2010, 2011, and 2012. The competition has advanced the practice of building SMT solvers, and given us the chance to measure our progress against others.

Thanks to David Dill and Vijay Ganesh for making the STP solver, on which our bit-vector and array solver is based available.

Thanks to Zoe for allowing me to forgo large amounts of income, large amounts of leisure time, and large amounts of domestic duties, so that I could tap away at the computer.

I am grateful to Peter Stuckey who inspired our approach to solving array problems, and to Graeme Gange who suggested how to simply implement the approach in Minisat.

Thanks to the Australian Government for supporting me with a post-graduate scholarship.

## Contents

1 Introduction ..... 1
2 Preliminaries ..... 5
2.1 SAT Solving ..... 5
2.2 SMT ..... 8
2.3 Bit-Vectors ..... 9
2.4 Arrays ..... 12
2.5 Term Rewriting ..... 13
2.6 Structural Hashing ..... 14
2.7 Sharing-Aware Transformations ..... 14
2.8 Bit-Blasting ..... 15
2.9 And-Inverter Graphs ..... 16
2.10 Propagators and Propagation Solvers ..... 16
3 Building a Better Bit-Vector Solver ..... 19
3.1 STP 0.1 Overview ..... 20
3.2 STP2 Overview ..... 20
3.3 Simplifications when Creating Expressions ..... 23
3.4 Variable Elimination ..... 27
3.5 Partial Solver ..... 33
3.6 Speculative Transformations ..... 34
3.7 Clause Count Estimation and Comparison ..... 35
3.8 Bit-Blasting ..... 36
3.9 Multiplication with Sorting Networks ..... 38
3.10 CNF through Technology Mapping ..... 41
3.11 Discovering Rewrite Rules ..... 41
3.11.1 Finding Equivalences ..... 42
3.11.2 Automatically Building a Rewrite System ..... 45
3.11.3 Summary ..... 47
3.12 AIG Rewriting ..... 47
3.13 Boolean Abstraction AIG Rewrite ..... 49
3.14 ITE Transformations ..... 49
3.15 Unconstrained-Variable Simplification ..... 51
3.16 Pure Literal Elimination ..... 55
3.17 Interval Analysis ..... 55
3.18 Parameter Optimisation ..... 55
3.19 Evaluation ..... 58
3.20 Relative Significance of Simplifications ..... 58
3.21 A Comparison of the Tseitin and TM Encodings ..... 63
3.22 A Simple Fast Solver (4Simp) ..... 66
3.23 Related Work ..... 67
3.23.1 Spear ..... 68
3.23.2 MathSAT ..... 69
3.23.3 UCLID ..... 69
3.23.4 Boolector ..... 70
3.23.5 Z3 ..... 70
3.23.6 Beaver ..... 70
3.23.7 Other Approaches ..... 71
3.23.8 Bit-Width Reduction ..... 73
3.23.9 Peephole Optimisation ..... 73
3.24 Conclusion ..... 74
4 Theory-Level Bit Propagation ..... 75
4.1 Introduction ..... 75
4.2 Preliminaries ..... 80
4.3 A "Bit-Vector And" Propagator ..... 83
4.4 Some Useful Propagators ..... 84
4.4.1 An Addition Propagator ..... 84
4.4.2 Multiplication Propagators ..... 87
4.4.3 An Unsigned Division Propagator ..... 97
4.5 A Propagation Solver ..... 97
4.6 Using the Results ..... 98
4.7 Evaluation of Theory-Level Bit Propagation ..... 99
4.8 Testing that Propagators Are Optimal ..... 100
4.9 An Optimal 6-Bit Multiplication Propagator ..... 104
4.10 Propagator Evaluation ..... 106
4.11 Related Work ..... 113
4.12 Conclusion ..... 115
5 Building a Better Array Solver ..... 119
5.1 Simplifying ..... 122
5.2 Removing Store and Array-ITEs ..... 123
5.3 Eliminating selects: Ackermannization ..... 125
5.4 Eliminating selects: Abstraction-Refinement ..... 129
5.5 Eliminating selects: Delayed Congruence Instantiation ..... 134
5.6 Evaluation ..... 143
5.6.1 A Comparison of Two $\mathcal{A} c k$ Implementations ..... 143
5.6.2 A Comparison to Other Solvers ..... 144
5.6.3 A Problem Requiring Many $\mathcal{F} C C$ Instances ..... 146
5.6.4 Quadratic Blow-Up of Select-over-Store Elimination ..... 148
5.6.5 A Comparison with STP 0.1 ..... 150
5.7 Related Work ..... 151
5.7.1 STP 0.1 ..... 151
5.7.2 Boolector ..... 154
5.7.3 BAT ..... 155
5.7.4 Other Solvers ..... 156
5.8 Conclusion ..... 157
6 Symbolic Execution for Automated Test Generation ..... 161
6.1 Background ..... 162
6.2 Why Binary Analysis? ..... 167
6.3 State Joining: A More Detailed Example ..... 168
6.4 The Algorithm ..... 171
6.4.1 Preparing to Join ..... 171
6.4.2 Joining and Splitting ..... 173
6.5 Simplifications and Approximations ..... 173
6.5.1 Path Constraint Simplification ..... 174
6.5.2 Value Analysis for Pointers ..... 175
6.6 MinkeyRink's Implementation ..... 177
6.7 Results ..... 178
6.8 MinkeyRink with STP2 ..... 180
6.9 Complications ..... 181
6.10 Related Work ..... 183
6.10.1 Tools ..... 184
6.10.2 Symbolic Memory Accesses ..... 185
6.10.3 Handing the Path Explosion ..... 186
6.10.4 Other ..... 186
6.11 Conclusion ..... 188
7 Conclusion ..... 191
Bibliography ..... 195

## List of Figures

2.1 A grammar for QF_BV ..... 9
2.2 AIGs corresponding to the propositional ITE ..... 16
3.1 Phases of STP2 when solving a QF_BV problem ..... 21
3.2 Sharing-aware simplifications ..... 22
3.3 A 4-bit table of partial products ..... 39
3.4 A comparator and equations that describe it ..... 40
3.5 Batcher's odd-even sorting network (for 16 input bits) ..... 40
3.6 Expression instantiation template ..... 44
3.7 Terms used when automatically deriving rewrite rules ..... 45
3.8 Twenty randomly selected rewrite rules ..... 48
3.9 Failures running 615 SMT-LIB2 problems ..... 60
4.1 The ternary domain (3) ..... 77
4.2 Truth tables in a three valued logic ..... 77
4.3 A 4-bit multiplication's table of partial products ..... 90
4.4 Finding the optimal propagator ..... 103
6.1 A C-language program that sometimes fails. ..... 162
6.2 A C-language program that may divide by zero ..... 164
6.3 Pop count, with the path explosion. ..... 165
6.4 Pop count, without the path explosion. ..... 166
6.5 Using shift-and-add to multiply positive integers $x_{0}$ and $y_{0}$ ..... 168
6.6 CFG for multiplication code ..... 169
6.7 The state graph for the example of Figure 6.6 ..... 170
6.8 CFG fragments ..... 172
6.9 Two path constraints ..... 175
6.10 Number: Error on input 12345678 ..... 178
6.11 Wegner: Counting 1-bits ..... 178
6.12 A complication for state joining ..... 182
6.13 Unrolling a loop with an unwinding assertion ..... 188

## List of Tables

2.1 Logical operations of QF_BV \& QF_ABV ..... 5
2.2 The bit-vector operations of QF_BV ..... 11
2.3 Array operations of QF_ABV ..... 13
3.1 Unconstrained variable simplification ..... 54
3.2 QF_BV problems solved with various configurations ..... 59
3.3 Bruttomesso families benchmarks solved by configurations ..... 60
3.4 STP2 with simplifications enabled/disabled ..... 62
3.5 Extra problems solved by other STP2 configurations ..... 62
3.6 Extra problems solved by other STP2 configurations - Bruttomesso ..... 63
3.7 Problems solved by various configurations ..... 64
3.8 Tseitin vs TM ..... 65
4.1 Logical-and propagation rules ..... 83
4.2 STP2 performance with and without bit propagation ..... 101
4.3a Comparison of unit propagation and bit-blasting at $1 \%$ ..... 107
4.3b Comparison of unit propagation and bit-blasting at $5 \%$ ..... 108
4.3c Comparison of unit propagation and bit-blasting at $50 \%$ ..... 109
4.3d Comparison of unit propagation and bit-blasting at 95\% ..... 110
4.4a Comparison of the best propagator to our propagators ..... 111
4.4b Calculating the effect of the best propagator ..... 112
4.4c Calculating the effect of the best propagator ..... 112
4.5 Propagation strength of propagators versus CNF ..... 114
5.1 A comparison of $\mathcal{A c k}$ approaches ..... 144
5.2 Comparison of Solvers times on QF_ABV problems ..... 145
5.3 Time for various solvers for countbitstableoffbyone0128 ..... 145
5.4 Solvers' performance for $\mathrm{QF} \_$ABV problems ..... 151
6.1 Results of applying state joining ..... 179
6.2 Results of applying state joining ..... 181

## List of Algorithms

2.1 A simple SAT solving algorithm ..... 8
3.1 Generating equations implied by a term ..... 30
3.2 Generating equivalences: terms ..... 31
3.3 Generating equivalences: propositions ..... 31
3.4 Find all equivalent terms from a list of terms ..... 43
3.5 ITE simplification algorithm ..... 50
3.6 Pure literal elimination ..... 56
3.7 Unsigned interval analysis algorithm ..... 57
4.1 Addition propagator for 3 ..... 86
4.2 Enforcing trailing zeroes on multiplications ..... 88
4.3 Column bounds propagator for multiplication ..... 91
4.4 Multiplication column bounds propagation ..... 92
4.5 Column bounds propagation for multiplication ..... 93
4.6 Generating clauses for $n$-bit multiplication ..... 106
5.1 $\mathcal{A l c k}_{\text {cnf }}$ algorithm ..... 126
5.2 Generating $\mathcal{F} C C$ instances in CNF ..... 126
5.3 $\mathcal{A l k}_{\text {ite }}$ algorithm ..... 128
5.4 STP2's algorithm for $\mathcal{A l} b s r e f$ ..... 131
5.5 Precursor phase to applying $\operatorname{DCI}$ ..... 135
5.6 A simple DCI implementation ..... 136
5.7 The DCI algorithm implemented with a SAT solver ..... 140
5.8 Precursor steps for an improved DCI algorithm ..... 141
5.9 Steps performed after cancel for an improved $\mathcal{D C I}$ algorithm ..... 141
5.10 Steps after unit propagation for an improved $\mathcal{D C I}$ algorithm ..... 142
5.11 STP 0.1 store abstraction-refinement algorithm ..... 152
6.1 Applying state joining to a program ..... 171

## 1

## Introduction

MILLIONS of computer programmers spend a substantial part of their days finding and fixing defects in their programs. Tools that make it easier to find defects in software are of enormous practical significance. This thesis contributes components that improve some of those defectfinding tools.

The bulk of the dissertation investigates approaches to efficiently solving bitvector and array problems. The bit-vector theory introduces low-level operations such as multiplication and addition, which model the basic operations provided by a computer. These basic operations make it easy for software verification and testing tools to pose questions to such solvers about the effect of sequences of instructions.

The improvements to bit-vector and array solvers have made possible many tools. Amongst many uses, bit-vector and array solvers are used to automatically generate exploits for vulnerable software [ACHB11], to implement string solvers [GKA ${ }^{+}$11], to discharge theorems in theorem provers [BFSW11], and to check the equivalence of software [Smi11].

As an instance of a problem posed to a bit-vector solver, consider asking whether there exists a 64 -bit value that when multiplied by 4 equals 12, but which is not 3 . Because the arithmetic that computers perform can overflow, such a value exists.

The problem may be expressed in the SMT-LIB2 format as:

```
; This is a comment
(set-logic QF_BV); This says it's a bit-vector problem
(declare-fun x () (_ BitVec 64)); Makes a 64-bit variable
(assert (=
    (_ bv12 64) ; The constant 12 (in 64 bits)
        (bvmul (_ bv4 64) x ) ; 4*x
    )
) ; 12 = 4*x (remember: not equivalent to 3 = x)
(check-sat)
; Prints x=3 (The first solution it happened to find)
(assert (not (= x (_ bv3 64) ))) ; x != 3
(check-sat)
; Prints x= 0x40000000000000003
```

Commands to the bit-vector solver are given between brackets. Anything to the right of a semi-colon is a comment. This creates a bit-vector $x$ of 64 bits, then asserts to the bit-vector solver that $12=(4 \times x)$. The (check-sat) command tells the bitvector solver to look for a satisfying assignment to the problem. When we run this problem on our bit-vector solver, STP2, it reports that $x=3$ is such an assignment. However, it could have reported any of the possible assignments. Next, we assert to the bit-vector solver that $(x \neq 3)$, and ask for another satisfying assignment. This time it returns an assignment to $x$ with the second-most significant bit set. This value when multiplied by 4 produced 12 as the result.

Bit-vector and array solvers take problems, usually from a software verification or testing tool, and decide whether satisfying assignments exist to those problems.

Our bit-vector solver STP2 is efficient; it won the QF_BV division at the annual SMT-COMP 2010. It placed second at the 2011 contest. STP2 and another of our bit-vector solvers placed second and third at the 2012 contest. Since 2007, bit-vector solvers have gone through a dramatic performance improvement. Comparing winners on the SMT-COMP 2007 benchmark set, the 2007 winner Spear v1.9 takes

3260s, the 2008 winner Boolector takes 1029s, and the 2009 winner MathSAT 4.3 takes 355 seconds. Our solver, STP2 r1659, solves the problems in 210 seconds.

STP2 is based on STP, an open-source solver that was equal winner at the 2006 contest. Modern bit-vector solvers contain hundreds of simplification and optimisation rules. We have made more than a thousand, sometimes small, changes to STP that cumulatively have the effect of making STP2 amongst the best available solvers.

Towards the end of this dissertation, we investigate an application of STP2 to the symbolic execution of machine code programs. Symbolic execution builds formulae that describe the value of program variables as functions of the program's inputs.

This thesis makes several significant contributions. In chapter 3, we identify simplifications that when combined give a bit-vector solver that is extremely efficient. In chapter 4, we describe a particular simplification which utilises bit-propagation to speed up bit-vector solving significantly. In chapter 5, we describe a novel decision procedure for solving problems in the combined theory of arrays and bit-vectors. Finally, in chapter 6 we describe an approach to improving test generation of binary programs.

## 2

## Preliminaries

THIS chapter gives a quick review of the basic concepts that we rely on. The material we present here is not detailed enough to learn the concepts. Instead, this section is intended to refresh the reader's knowledge of the material and to fix the notation we use. In particular, we introduce the theories of bit-vectors and arrays. The solver that we spend the majority of this dissertation discussing solves satisfiability problems in these theories.

### 2.1 SAT Solving

A propositional formula is: true (1), false (0), a Boolean variable, the negation of a propositional formula, or the conjunction of two propositional formulae. For convenience, we add redundant propositional operations (Table 2.1). A classical truth assignment $\mu$ is a mapping from each of the Boolean variables in a propositional formula to either 1 or 0 . Here 0 denotes falsehood and 1 denotes truth. So we write the fact that $\mu$ makes a Boolean variable $b$ true by $\mu(b)=1$. The domain dom $(\mu)$ of $\mu$

| Logical Operation | Description |
| :--- | :--- |
| $\neg p$ | logical not (evaluates to 1 iff $(p \Longleftrightarrow 0))$ |
| $p_{0} \wedge p_{1}$ | logical and (evaluates to 1 iff $\left(p_{0} \Longleftrightarrow 1\right)$ and $\left.\left(p_{1} \Longleftrightarrow 1\right)\right)$ |
| $p_{0} \vee p_{1}$ | logical or $\neg\left(\neg p_{0} \wedge \neg p_{1}\right)$ |
| $p_{0} \oplus p_{1}$ | logical exclusive-or $\left(\left(p_{0} \wedge \neg p_{1}\right) \vee\left(\neg p_{0} \wedge p_{1}\right)\right)$ |
| $p_{0} \Longleftrightarrow p_{1}$ | logical if-and-only-if $\left(p_{0} \oplus \neg p_{1}\right)$ |
| $p_{0} \Longrightarrow p_{1}$ | logical implication $\left(\neg p_{0} \vee p_{1}\right)$ |
| ITE $\left(p_{0}, p_{1}, p_{2}\right)$ | logical if-then-else (ITE) $\left(p_{0} \Longrightarrow p_{1}\right) \wedge\left(\neg p_{0} \Longrightarrow p_{2}\right)$ |

Table 2.1: Logical operations of QF_BV \& QF_ABV

## CHAPTER 2. PRELIMINARIES

is the set of variables that are mapped, by $\mu$, to 0 or 1 . We denote the set of variables in a propositional formula $p$ by vars $(p)$.

When a propositional formula $p$ is evaluated subject to an assignment $\mu$, it evaluates to either 1 or 0 , provided $\operatorname{vars}(p) \subseteq \operatorname{dom}(\mu)$. We sometimes apply a truth assignment $\mu$ to a whole formula $p$, writing $\mu(p), \mu$ being defined by natural extension. A satisfying assignment to a propositional formula is one for which the formula evaluates to 1 .

The propositional satisfiability problem (SAT) is to decide if there exists a satisfying assignment to a propositional formula. A SAT solver is a decision procedure for the SAT problem. Currently, most SAT solvers accept conjunctive normal form (CNF) as input. The basic building blocks of CNF are literals, a literal being either a Boolean variable, or the negation of a variable. (We sometimes talk of literals being assigned a value, meaning the underlying variable is assigned a value that causes that particular literal to be either 0 or 1.) A clause is a disjunction of literals. A formula is in CNF if it is a conjunction of clauses. The Cook-Levin theorem [Coo71] states that the SAT problem for CNF is $\mathcal{N} \mathcal{P}$-complete, so finding an answer is not necessarily easy. Propositional formulae $p$ and $p^{\prime}$ are equisatisfiable if, loosely, $p$ is satisfiable iff $p^{\prime}$ is satisfiable. In most contexts where $p$ and $p^{\prime}$ share variables, it is understood that, for $p$ and $p^{\prime}$ to be considered equisatisfiable, satisfying truth assignments for the two must agree on the shared variables. More precisely:

$$
\begin{aligned}
& \forall \mu(\mu(p)=0) \Longleftrightarrow \forall \mu\left(\mu\left(p^{\prime}\right)=0\right) \wedge \\
& \forall \mu \exists \mu^{\prime}\left(\mu(p)=\mu^{\prime}\left(p^{\prime}\right) \wedge\left(v \in \operatorname{vars}(p) \cap \operatorname{vars}\left(p^{\prime}\right) \Longrightarrow \mu(v)=\mu^{\prime}(v)\right)\right)
\end{aligned}
$$

The widely-used Tseitin transformation [Tse83] converts an arbitrary propositional formula into an equisatisfiable CNF formula in linear time by adding a linear number of fresh variables. Note the encoding size is linear only as long as the logical operations have a linear size encoding, as in our case. The Plaisted and Greenbaum translation [PG86] often achieves a more compact CNF encoding.

## Example 2.1

Consider the propositional formula: $\left(b_{0} \wedge\left(b_{1} \oplus b_{2}\right)\right)$, where $b_{0}, b_{1}$ and $b_{2}$ are propositional variables. A satisfying assignment is [ $b_{0} \mapsto 1, b_{1} \mapsto 0, b_{2} \mapsto 1$ ]. The formula can be converted into CNF via the Tseitin transformation with the aid of a fresh
variable $b_{3}$. As CNF it becomes: $b_{0} \wedge b_{3} \wedge\left(\neg b_{1} \vee b_{2} \vee b_{3}\right) \wedge\left(b_{1} \vee \neg b_{2} \vee b_{3}\right) \wedge\left(b_{1} \vee b_{2} \vee \neg b_{3}\right) \wedge$ $\left(\neg b_{1} \vee \neg b_{2} \vee \neg b_{3}\right)$. Because the exclusive-or expression, in the context of the rest of the formula, must be 1 , an approach like Plaisted and Greenbaum's omits some clauses and still produces an equisatisfiable result: $b_{0} \wedge b_{3} \wedge\left(b_{1} \vee b_{2} \vee \neg b_{3}\right) \wedge\left(\neg b_{1} \vee \neg b_{2} \vee \neg b_{3}\right)$.

We introduce some SAT solving concepts because we apply an approach to solving array problems (section 5.5) that is built into a SAT solver.

The SAT solvers we use are based on the DPLL algorithm [DLL62]. However, modern SAT solvers have tremendously improved upon the DPLL algorithm. The DPLL algorithm alternately propagates information, and performs search which selects variable assignments heuristically. The DPLL algorithm takes CNF format as input. It lifts from Boolean logic to ternary logic where along with 1 and 0 variables may be unassigned, meaning their value is unknown. Unit propagation assigns 1 to an unassigned literal in some clause if all the other literals in that clause have been assigned 0 . If all the literals in a clause are assigned 0 , or if a clause is empty, then a conflict has occurred because the assignment does not satisfy the formula. Modern SAT solvers use the partial assignment to perform "conflict driven clause learning". When a conflict occurs, they generate a conflict clause that summarises which assignments cannot occur together. The conflict clause is conjoined with the other clauses to prevent that combination of assignments from occurring again.

A simple SAT solving algorithm is given in Algorithm 2.1.
When unit propagation has stabilised, search is performed. The decision level is the number of variables assigned via search. A trail is a list of the variables that are assigned, both by search and unit propagation, together with the decision levels at which they were assigned. Using the trail, a cancel undoes the work performed beyond a given decision level.

The two watched literals technique $\left[\mathrm{MMZ}^{+} 01\right]$ speeds up unit propagation. Two literals in each clause are watched. When one of the two watched literals is assigned 0 , a new unassigned literal in the clause is searched for. If one does not exist and the clause does not contain a 1 , then the remaining literal must be 1 . The technique is particularly useful because it is independent of backtracking: it does not break sophisticated backtracking techniques.

```
Algorithm 2.1 A simple SAT solving algorithm
Require: \(p\), a propositional formula in CNF
    Create \(d l \leftarrow 0\), the decision level
    Create \(\mu\) : assignments from variables to: \(\{0,1\}\)
    while true do
        Perform unit propagation
        if a conflict occurred during unit propagation then
            if \(d l=0\) then
                    return unsatisfiable
                end if
                Create a conflict clause \(c\)
                \(p \leftarrow c \wedge p\)
                Cancel assignments until \(c\) is not in conflict
                Update \(d l\)
            else
                if some variable is not in \(\mu\) then
                    Set a variable not in \(\mu\) to either 1 or 0
                    Increment \(d l\)
            else
                    return satisfiable
            end if
        end if
    end while
    return satisfiable
```

The SAT solvers we use are incremental. This means that after solving some CNF formula $p_{0}$, the work performed speeds up solving $p_{0} \wedge p_{1}$, where $p_{1}$ is another CNF formula.

For a thorough treatment of SAT solving history, design, and practice, see Biere et al. [BHvMW09].

### 2.2 SMT

The satisfiability modulo theories (SMT) problem is to find a satisfying assignment to a first-order logic formula where some functions are interpreted in one or more theories. The bit-vector and array theories we consider are decidable. We do not allow quantifiers, so solve for a propositional combination of functions. SMT solvers combine the efficiency of propositional satisfiability solvers (SAT), with the ability to reason at a higher theory-level.

The bit-vector and array theories that we consider in this dissertation, are just two of the theories defined as part of the SMT-LIB [BST10] initiative.

```
t [n] ::= | t tn] ot t [n]
    | -t [n] | (bvnot t [n] )| ite(p,t t [n], t tn])(term if-then-else)
    | (t [m] :: t
    | t [m][i,j],where: j\geq0,i<m,i-j+1=n(extraction)
    | ([0|1] n}\mp@subsup{)}{2}{}|v| \mp@subsup{\operatorname{sext}}{}{[n]}(\mp@subsup{t}{}{[m]}),m<n(\mathrm{ sign extend)
o
    | (%s) (signed remainder) | ( mod
    | (×) (multiplication) | (<<) (left shift) | (>>l) (right shift)
    | (>>a) (arithmetic shift) | ( }\mp@subsup{\digamma}{u}{\prime})\mathrm{ (unsigned division)| ( }\mp@subsup{\dot{~}}{s}{)}\mathrm{ ) (signed division)
    p::= | t [n] opt [n]
    |p\oplusp|p\veep|p\wedgep|p\Longleftrightarrowp|\negp|ite(p,p,p)|0|1|b
o
```

Figure 2.1: A grammar for QF_BV. $b$ ranges over a countably infinite set of propositional variables, and $v$ ranges over a countably infinite set of (fixed bit-width) bit-vector variables. An ' $s$ ' subscript means the operation interprets bit-vectors as signed integers, a ' $u$ ' subscript means bit-vectors are interpreted as unsigned integers.

### 2.3 Bit-Vectors

The QF_BV language is the first-order quantifier-free theory of fixed-width bitvectors. A fixed-width bit-vector $\left(t^{[n]}\right)$ is a vector of $n$ bits. The bits of a bit-vector are indexed from 0 to $n-1$, and are written with the zeroeth bit on the right. We indicate extraction of a single bit from a bit-vector as $t^{[n]}[i]$, where $i$ is a natural number between 0 and $n-1$ inclusive. Figure 2.1 gives a grammar for the QF_BV language; the names corresponding to the symbols are given in Table 2.2. Bit-vector terms are signedness agnostic, that is, neither signed nor unsigned. The semantics of some operations treats terms as being signed integers, others treat them as unsigned. Unsigned operations interpret the bit-vector $t^{[n]}$ as the natural number $\sum_{i=0}^{n-1}\left(2^{i} \times t^{[n]}[i]\right)$, where $t^{[n]}[i]$ yields the integer value 0 if $t^{[n]}$ 's $i^{\text {th }}$ bit is zero, otherwise it yields 1 . Signed operations use two's-complement to interpret bit-vectors as the integer $\left(-2^{n-1} \times t[n-1]\right)+\sum_{i=0}^{n-2}\left(2^{i} \times t[i]\right)$.

We indicate binary literals in brackets with a subscript of 2. For example, (10) ${ }_{2}$ corresponds to a 2-bit bit-vector that denotes the decimal constant 2 .

Because the bit-vector operations can overflow, some bit-vector arithmetic operations return different results to their integer counterparts. For instance, both multiplication and addition are performed modulo $2^{n}$, where $n$ is the bit-width. Because the bit-width of the result is the same as the bit-width of the operands, bits in the result at position $n$ or above are discarded. If the bit-width of the result of multiplication was twice the width of the operands, as in some formulations, then there would be no overflow and the result would be the same as for integer multiplication.

## Example 2.2

Consider the multiplication $\left(3^{[2]} \times 3^{[2]}\right)=\left((11)_{2} \times(11)_{2}\right)=(01)_{2}=1^{[2]}$. There are two bits in $1^{[2]}$, the least significant is 1 (that is, $1^{[2]}[0]=1$ ), and the most significant is 0 (that is, $1^{[2]}[1]=0$ ).

The semantics of bit-vectors is similar to that of integers, but differs in some important cases.

## Example 2.3

Some instances of the bit-vector arithmetic producing perhaps unexpected results are:

- $\frac{3}{3}=1, \frac{4}{3}=1, \frac{5}{3}=1$ (truncating division).
- $86^{[8]} \times 3^{[8]}=2^{[8]}$ (overflow).
- $\left((011)_{2}>_{s}(111)_{2}\right) \not \equiv\left((011)_{2}+(001)_{2}>_{S}(111)_{2}+(001)_{2}\right)$ (overflow).
- $(2 x=2 y) \not \equiv(x=y)$. The equivalence fails to hold because there is no unique multiplicative inverse for an even number modulo $2^{n}$.
- $(3 x=3 y) \equiv(x=y)$. There is a unique multiplicative inverse for odd numbers.

Comprehensive descriptions of the $\mathrm{QF} \_\mathrm{BV}$ and $\mathrm{QF} \_\mathrm{ABV}$ languages are downloadable from the SMT-LIB website [BRST08]. As of 2011, the SMT-LIB2 format has

| Bit-vector Operation | Description |
| :---: | :---: |
| $\left(t^{[n]} \ll t^{[n]}\right)$ | Left shift |
| $\left(t^{[n]} \ggg{ }_{a} t^{[n]}\right)$ | Arithmetic right shift |
| $\left(t^{[n]} \ggg t^{[n]}\right)$ | Logical right shift |
| $\left(t^{[n]}:: t^{[m]}\right)$ | Concatenation |
| $t^{[n]}[i, j]$ | Extract |
| (bunot $t^{[n]}$ ) | Bitwise negation |
| $-t^{[n]}$ | Unary minus |
| ( $t^{[n]}$ bvand $t^{[n]}$ ) | Bitwise and |
| ( $t^{[n]}$ buxor $t^{[n]}$ ) | Bitwise exclusive-or |
| $\left(t^{[n]}\right.$ bvort ${ }^{[n]}$ ) | Bitwise or |
| $\left(t^{[n]}+t^{[n]}\right)$ | Modulo addition |
| $\left(t^{[n]} \times t^{[n]}\right)$ | Modulo multiplication |
| $\left(t^{[n]}-t^{[n]}\right)$ | Modulo subtraction |
| $\left(t^{[n]} \div_{u} t^{[n]}\right)$ | Unsigned division |
| $\left(t^{[n]} \div{ }_{\text {s }} t^{[n]}\right)$ | Signed division |
| $\left(t^{[n]} \%_{u} t^{[n]}\right)$ | Unsigned remainder |
| $\left(t^{[n]} \%_{s} t^{[n]}\right)$ | Signed remainder |
| $\left(t^{[n]} \bmod _{s} t^{[n]}\right)$ | Signed modulus |
| $\left(t^{[n]}<_{s} t^{[n]}\right)$ | Signed less than |
| $\left(t^{[n]} \leq_{s} t^{[n]}\right)$ | Signed less than equals |
| $\left(t^{[n]}>_{s} t^{[n]}\right)$ | Signed greater than |
| $\left(t^{[n]} \geq_{s} t^{[n]}\right)$ | Signed greater than equals |
| $\left(t^{[n]}<_{u} t^{[n]}\right)$ | Unsigned less than |
| $\left(t^{[n]} \leq_{u} t^{[n]}\right)$ | Unsigned less than equals |
| $\left(t^{[n]}>_{u} t^{[n]}\right)$ | Unsigned greater than |
| $\left(t^{[n]} \geq_{u} t^{[n]}\right)$ | Unsigned greater than equals |
| ite( $\left.p, t^{[n]}, t^{[n]}\right)$ | Term if-then-else |

Table 2.2: The bit-vector operations of QF_BV
replaced the older SMT-LIB format. In this thesis, we conform with SMT-LIB2 unless indicated.

Some notes about the symbols given in Table 2.2:

- The $i$ and $j$ used by the extract operation are natural numbers including zero. In the fixed-width formulation of bit-vectors, which we use, it is not possible to use arbitrary terms as $i$ or $j$.
- The "arithmetic shift right" operation $\left(>_{a}\right)$, copies the most significant bit of the first operand as the value is right shifted. The logical shift operation moves in zeroes to the left.
- There is a single multiplication operation $(\times)$. When the bit-width of the operands and results is the same, as in the QF_BV formulation, signed and unsigned multiplication are equivalent.
- Unsigned division $(\div u)$ rounds towards zero. It never overflows, that is, it returns the same result as integer division with rounding towards zero. Signed division $\left(\div_{s}\right)$ also rounds towards zero, but it overflows when the most negative value is divided by minus one.
- Unsigned remainder $\left(\%_{u}\right)$ gives the remainder of division rounding towards zero. Signed remainder ( $\%_{s}$ ) gives the remainder of division with rounding towards zero. The signed modulus ( $\%_{s}$ ), which is rarely used, gives the remainder for division rounding towards negative infinity.
- All of the bit-vector operations are total. So, division by zero is acceptable. For convenience, we define: $\left(t \%_{s} 0\right)=t,\left(t \bmod _{s} 0\right)=t,\left(t \%_{u} 0\right)=t$, and $\left(t \div{ }_{u} 0\right)=1$. For $\left(t<_{s} 0\right)$ we define $(t \div s 0)=-1$, and for $\left(t \geq_{s} 0\right),(t \div s 0)=1$. This avoids the more complicated SMT-LIB2 semantics for division by zero. Although division by zero is defined, it is treated specially and is not introduced when solving problems that do not already contain it. So, contradictions will not be proved if division by zero is not initially present in the problem.

The complexity of the decision problem for QF_BV was recently shown by Kovásznai et al. [KFB12] to be non-deterministic exponential-time complete.

### 2.4 Arrays

An array is a map from bit-vectors of width $n$ to bit-vectors of width $n^{\prime}$. Alternatively we can think of an array as a list of $2^{n}$ values, indexed from 0 to $2^{n}-1$. We sometimes annotate arrays with "type" information. For example, if $a$ maps from bit-vectors of bit-width 2, to bit-vectors of bit-width 3, we write it as $a^{[2: 3]}$. The select function is used to return the contents of a particular location. The store function creates a new array. The array operations are shown in Table 2.3.

The QF_ABV language extends QF_BV with single-dimensional arrays that are manipulated using the select and store functions. QF-ABV is the extensional theory

| Array Function | Description |
| :--- | :--- |
| ite $\left(p, a_{0}^{[n: m]}, a_{1}^{[n: m]}\right)$ | array ITE |
| $\operatorname{select}\left(a_{0}^{[n: m]}, t^{[n]}\right)$ | array read |
| $\operatorname{store}\left(a_{0}^{[n: m]}, t_{0}^{[n]}, t_{1}^{[m]}\right)$ | array write |

Table 2.3: Array operations of QF_ABV
of arrays, so allows equality between arrays. Our solver STP2 does not handle extensionality, so it only solves for a fragment of QF_ABV.

While arrays are single-dimensional, multi-dimensional arrays can be simulated by concatenating multiple indices together.
store $\left(a^{[n: m]}, t^{[n]}, t^{[m]}\right)$ returns a new array the same as $a^{[n: m]}$, except that at index $t^{[n]}$, the value is $t^{[m]}$. Arrays only contain, and are indexed by, bit-vectors. So, the number of values that an array contains will always be a power of two. Arrays are total, so there are no out-of-bounds indices.

### 2.5 Term Rewriting

Term rewriting is widely used by bit-vector solvers to simplify expressions. Rewrite rules can apply theorems that the SAT solver might struggle to determine. Babić [Bab08] gives the example that the best SAT solvers cannot prove that ( $a^{[12]} \times b^{[12]}$ ) $=\left(b^{[12]} \times a^{[12]}\right)$ in reasonable time. For this reason, Babić ([Bab08] page 89) reports that the Spear solver has approximately 160 rewrite rules. Franzén ([Fra10] page 42) reports that MathSAT contains close to 300 rewrite rules.

A rewrite rule contains term variables, which match arbitrary expressions. A subterm of a term $t$ is $t$ itself or, if $t$ is composite, a subterm of one of $t^{\prime}$ s children. A rewrite rule transforms a term of some arbitrary type, to an equivalent (usually simpler) term of the same type. An equality can be transformed into a rewrite rule by treating its variables as term variables, and by orienting the equality somehow.

Given two terms $t_{0}$ and $t_{1}$, a rewrite rule $t_{0} \triangleright t_{1}$ has the property that $t_{0}$ and $t_{1}$ are equal for any possible assignment, and $t_{0}>_{r} t_{1}$, where $>_{r}$ is some pre-specified partial ordering on terms. Matching $t_{0}$ to $t_{1}$ is finding a substitution $(\sigma)$ for the term variables in $t_{0}$ that make $t_{0}[\sigma]$ syntactically identical to $t_{1}$.

### 2.6 Structural Hashing

STP2, like most other solvers, avoids the creation of duplicate sub-expressions. A single sub-expression is created in the case that two or more expressions refer to the same sub-expression. Because an expression may have identical children, for instance $(x+x)$, structural hashing produces an acyclic directed multi-graph. However, in our work the distinction between multi-graphs and graphs is largely irrelevant. Instead, we consider an expression to be a directed acyclic graph (DAG) with labels on edges ordering the sub-expressions. Our expression DAGs have a single root expression, oriented so the leaves of the DAG are constants and variables. Structural hashing goes by various names and is used in many contexts, including as the hash-consing of Lisp.

Smith [Smi11] reports that a bit-vector and array theory representation of the Blowfish cryptographic algorithm has $6.9 \times 10^{5186}$ nodes in the tree representation, and 220,639 in the DAG representation. Applying structural hashing to some expressions is clearly essential.

### 2.7 Sharing-Aware Transformations

In a shared expression, that is, one that has been structurally hashed (section 2.6) and where expressions are shared, changes which when viewed locally decrease the global number of expressions, may, because of the sharing, actually increase the total number of expressions. Transformations that take such sharing into account are variously called DAG aware, graph aware, sharing aware, size preserving, or size reducing. We call such transformations sharing aware. We call a transformation, which may increase the total number of expressions, because it is ignorant of the effect of its transformations on shared expressions, speculative.

## Example 2.4

Consider replacing the expression $-\left(v_{0}+v_{1}\right)$ with $\left(-v_{0}-v_{1}\right)$, where $v_{0}$ and $v_{1}$ are bit-vector variables. This pushes unary minus through bit-vector addition. Initially, there were four expressions: $v_{0}, v_{1},\left(v_{0}+v_{1}\right)$, and $-\left(v_{0}+v_{1}\right)$. After the transformation, there are still four expressions: $v_{0},-v_{0}, v_{1}$, and $\left(-v_{0}-v_{1}\right)$. However, if the term $\left(v_{0}+v_{1}\right)$ is shared, that is, it is the child of another term, and if $\left(-v_{0}-v_{1}\right)$ is
not shared, then the transformation removes a unary minus expression, but creates a unary minus and a binary minus term. In that context the transformation has caused a total increase of one binary minus.

### 2.8 Bit-Blasting

Bit-blasting reduces a problem, expressed in some theory, to propositional logic. For instance, in the QF_BV theory a multiplication between two 64-bit terms is expressed as one term. However, when it is bit-blasted to CNF, thousands of clauses are produced that contain many fresh propositional variables.

## Example 2.5

An algorithm to bit-blast an $n$-bit addition is shown below. The algorithm assumes that the operands have already been converted to propositional formulae. It returns a formula which faithfully mimics bit-vector addition. This translates a single bitvector term to propositional logic. The translation does not introduce any fresh variables; however, fresh variables will most likely be introduced during conversion to CNF.

Require: $p_{0}^{[n]}, p_{1}^{[n]}$ : arrays of formulae to add
1: Create: $r$, an empty array of formulae
2: Create: carry, a variable of type formula
carry $\leftarrow 0$
for $i \in 0 \ldots(n-1)$ do
$r[i] \leftarrow p_{0}[i] \oplus p_{1}[i] \oplus \operatorname{carry}$
carry $\leftarrow\left(\operatorname{carry} \wedge p_{0}[i]\right) \vee\left(\operatorname{carry} \wedge p_{1}[i]\right) \vee\left(p_{0}[i] \wedge p_{1}[i]\right)$
end for
return $r$


Figure 2.2: Two different AIGs corresponding to the propositional ITE. Hollow circles negate the value travelling along an edge.

### 2.9 And-Inverter Graphs

The bit-blasted propositional formulae that we create are stored as and-inverter graphs (AIGs). AIGs can store arbitrary propositional formulae in a non-canonical form, so there may be many possible distinct AIGs for a given propositional function. Figure 2.2 shows two AIGs for the propositional ITE. AIGs are useful to us because they give us structural hashing at the propositional level, and there are advanced approaches for manipulating and transforming them [BB04].

An AIG is a DAG where nodes correspond to logic gates, and the directed edges to wires that connect the logic gates. There are four types of nodes: the unique 1 node which has no incoming edges, input nodes which also have no incoming edges, output nodes which have at most one inward edge, and 2-input AND nodes. The edges may be inverted, complementing the result of the source node. As an AIG is built, structural hashing is performed so that there are no duplicates nodes. The 1 node may only be connected to output nodes.

Sharing aware simplifications [BB06] are applied at node creation time. The simple AIG structure makes it easy to apply local rewriting rules.

### 2.10 Propagators and Propagation Solvers

In chapter 4 we simplify bit-vector problems using an approach which is similar to a propagator based solver.

Finite-domain constraint problems have been studied in different fields of computer science, and different techniques have resulted. We shall make use of ideas
developed in the field of artificial intelligence, namely Constraint Satisfaction Problems (CSPs) and propagation solvers.

Given a set of variables $x_{1}, \ldots x_{k}$, ranging over some set $X$ (say, the set of integers), a CSP is a constraint $C$ over the variables, together with a mapping $D$ which associates, with each variable, a finite subset of $X$. In CSP terminology, the set $D(x)$ is the domain associated with $x$. A constraint is assumed to be in conjunctive form, that is, $C$ is taken to be a conjunction of primitive constraints. The CSP $(C, D)$ then represents the conjunctive constraint $C \wedge \bigwedge_{i=1}^{k} x_{i} \in D\left(x_{i}\right)$. In our use, domains will be integer intervals.

As an example, assuming that primitive constraints allow the use of linear arithmetic and inequality, we may have $C=\left(2<x_{1}\right) \wedge\left(x_{1}+x_{1}<x_{2}\right)$, together with domains $D\left(x_{1}\right)=[1 \ldots 9], D\left(x_{2}\right)=[4 \ldots 8]$. The idea now is to use local reasoning rules to strengthen these constraints by narrowing the domains, without changing the set of possible solutions. For example, "node consistency" allows the use of the constraint $2<x_{1}$ to narrow $D\left(x_{1}\right)$ to [3...9]. "(Hyper-)arc consistency" can then use the constraint $x_{1}+x_{1}<x_{2}$ and simple interval arithmetic to narrow $D\left(x_{2}\right)$ to [7...8]. Slightly more sophisticated reasoning can make use of parity information to determine $x_{2}$ completely.

We call hyper-arc consistent propagators maximally precise propagators.
More generally, each primitive constraint (schema) $C$ has associated with it a set of propagators, $\operatorname{prop}(C)$. Each propagator may be able to narrow the domains of the constraint's variables. Formally, a propagator for $C$ is a monotone function $f$ operating on domains and satisfying $f(D) \sqsubseteq D$ (that is, $f$ is decreasing). The propagator must preserve the set of solutions to $C \wedge D$. If we define $\mu \vDash D$ (" $\mu$ agrees with $\mathrm{D}^{\prime \prime}$ ) to mean $\forall x \in \operatorname{dom}(\mu)(\mu(x) \in D(x))$ we can state solution preservation more precisely: $\{\mu \mid \mu \vDash D \wedge \mu \vDash C\}=\{\mu \mid \mu \vDash f(D) \wedge \mu \vDash C\}$.

There is no requirement that a propagator is idempotent, that is, that $f(f(D))=$ $f(D)$. However, the idea behind propagation is that, given a CSP, we can apply a set of propagators, alternately and repeatedly, until no further domain improvement is possible. We follow Ohrimenko et al. [OSC09] and refer to the resulting idempotent function as a propagation solver.

These semi-formal definitions may leave the impression of a propagation solver as an unstructured "soup" of propagators. In practice we can make use of domain
specific knowledge to impose restrictions on the order in which various propagators are employed, so as to use them most effectively. For example, we shall make use of nested solvers, that is, solvers that use propagators which are themselves fullyfledged propagation solvers.

## Example 2.6

Consider $\left(v_{0}^{[5]} \times v_{1}^{[5]}\right)=v_{2}^{[5]}$, where each variable is known to be in an unsigned interval, initially: $D\left(v_{0}\right)=[1 \ldots 2], D\left(v_{1}\right)=[1 \ldots 2]$, and $D\left(v_{2}\right)=[1 \ldots 6]$. There is no possible overflow, and a standard propagator for multiplication will shrink the domain of $v_{2}$ to $[1 \ldots 4]$. Note that, while there are assignments to $v_{0}$ and $v_{1}$ that allow $v_{2}$ to take values 1,2 , and 4 , there is no assignment consistent with $v_{2}=3$. The strength of a propagator has to be measured with respect to properties of the domains used. For example, the interval domain cannot express membership of a set such as $\{1,2,4\}$. So, in spite of the information loss, the multiplication propagator is still considered optimal, as it produces the best possible interval.

# Building a Better Bit-Vector Solver 

SINCE the annual SMT solver competitions (SMT-COMP) ([BDdM $\left.\left.{ }^{+} 12\right]\right)$ began in 2005, there has been a dramatic improvement in the best bit-vector solvers' performance. In this chapter we describe the implementation of a high performance bit-vector solver called STP2. STP2 is open-source. The complete source code is available online from STP's source code repository.

Our solver is efficient; it won the QF_BV division at SMT-COMP 2010. However, few other solvers competed in 2010 because the input language's syntax had changed since the prior competition. STP2 placed second in the QF_BV division at SMT-COMP 2011. STP2 placed third in the QF_BV division at the SMT-COMP 2012. In section 3.22 we describe 4 Simp which placed second in the QF _BV division at SMT-COMP 2012.

We show later (section 3.22) that (at least on the problems we have selected) the majority of STP2's success is due to just a few simplifications. We use the term simplification loosely to mean an equi-satisfiable transformation intended to make a problem faster to solve. In this chapter we describe about twenty different simplifications. However, we show that just a handful of those simplifications are really useful. This is an important result, highlighting where authors of bit-vector solvers should first focus their efforts. Of course, on different types of problems, the relative benefit of each simplification differs.

STP2 solves problems expressed in the QF_BV language described in section 2.3, a quantifier-free first order theory of fixed-width bit-vectors. This language is practically important because many software verification problems are expressible
in it. We use the SMT-LIB2 bit-vector semantics, except that division by zero is defined differently.

In this chapter we focus on bit-vector problems. In chapter 4 we detail a particular bit-vector theory-level simplification. In chapter 5, we solve problems in a combined theory of arrays and bit-vectors.

### 3.1 STP 0.1 Overview

Vijay Ganesh and David Dill built the open-source STP 0.1 solver on which our STP2 solver is based. STP 0.1 is described in Vijay Ganesh's PhD thesis [Gan07], and a conference paper [GD07].

STP 0.1 converts bit-vector problems to CNF eagerly, and arrays lazily. If no array operations are used, then STP 0.1 acts as a compiler, converting bit-vector problems into CNF (section 2.1). The eager approach of bit-blasting problems is to create an equisatisfiable CNF encoding of the entire problem, which is then sent to a SAT solver. This reduces a bit-vector theory-level satisfiability problem to the propositional satisfiability problem (SAT). It contrasts with the lazy SMT approach [Seb07] which repeatedly switches between the SAT solver and a theory solver.

STP 0.1 has three main contributions. First, and most important, it showed that a well engineered bit-blasting bit-vector solver was competitive, and often superior, to other approaches. Second, it solved array problems via counter-example guided abstraction-refinement (which we discuss in chapter 5). Third, it used a partial solver in the simplification phase to determine some variables' bits' values (section 3.5).

### 3.2 STP2 Overview

STP2, like STP 0.1, is a bit-blasting bit-vector solver. Primarily STP2 differs from STP 0.1 in that extra simplification phases, or pre-solving, are applied; the simplifications are sharing-aware (section 2.7); and-inverter graphs (AIGs) (section 2.9) are used to hold the bit-blasted representation; and a more sophisticated CNF encoding approach is used.

STP2 can parse bit-vector and array constraints in the CVC3, SMT-LIB1, and SMT-LIB2 formats. It then simplifies them and encodes them via AIGs to CNF.


Figure 3.1: Phases of STP2 when solving a QF_BV problem

STP2, like STP 0.1, encodes bit-vector constraints eagerly, and can encode array constraints lazily or eagerly.

STP2 preserves a copy of the input formula in memory, after structural hashing (section 2.6), constant folding and term normalisation. If the formula is satisfiable, as a check of its own correctness, STP2 substitutes the assignment found into the original formula. STP2 always maintains the book-keeping required to verify a satisfying assignment.

Broadly, STP2 simplifies constraints in two phases. In the first phase, simplifications that do not increase the number of expressions are applied. After they reach a fixed point, a copy of the formula and book-keeping is made. Next, speculative transformations are applied. These are transformations that could increase the total number of expressions, but which may simplify the problem drastically. Afterwards, if the resulting transformed problem seems more difficult than the result from the sharing-aware simplifications, then the expression is replaced with the previously saved expression (section 3.7).

In more detail, the phases of STP2, which are shown in Figure 3.1 are:

- Parsing: The input is parsed, and quick local transformations that simplify the problem are applied, as discussed in section 3.3.


Figure 3.2: Sharing-aware simplifications performed by STP2. The sequence is repeated until they cause no change.

- Sharing-aware simplifications: Simplifications that do not increase the number of expressions are applied.
- Speculative transformations: Simplifications that may increase the number of expressions, such as distributing multiplication over plus are applied, as described in section 3.6.
- Clause count comparison: A quick estimate of the CNF size of the problem before and after speculative transformations is performed. The problem with the smaller estimated CNF size is chosen, as described in section 3.7.
- Bit-blasting: Convert to AIGs, as described in section 3.8.
- CNF conversion: Convert the AIGs into CNF.
- SAT solving: STP2 can use Cryptominisat [SNC09], Simplifying Minisat 2.2, or Minisat 2.2 [ES04] (the default) as a SAT solver. A DIMACS format CNF file can be output which most SAT solvers can parse.
- Sanity checking: If the SAT solver returns a model, as a check, evaluate the original formula with the model.


### 3.3 Simplifications when Creating Expressions

Before a new expression is created, creation-time simplifications transform the requested expression into an equivalent but potentially different expression. Creationtime simplifications make it impossible to create some expressions. For instance, it is impossible to create a term $t^{[n]}-t^{[n]}$. If such a term is requested, $0^{[n]}$ will be created instead. Many of the simplifications are highly specific, and will only apply occasionally. Their value is based on the fact that they are cheap to apply, and when applicable they may help tremendously. A principle of the creation-time simplifications is to produce few extra expressions. All creation-time simplifications create at worst a constant number of extra expressions, irrespective of the requested expression. STP2 applies more than 250 creation-time simplifications ${ }^{1}$.

## Example 3.1

The rule $\left(\left(t_{0} \%_{u} t_{1}\right) \gg_{l} t_{0}\right) \triangleright 0$ converts a term with an expensive remainder operation into a constant, resulting in a much smaller CNF encoding. If the arbitrarily complex terms $t_{1}$ or $t_{0}$ exist nowhere else, then they are eliminated from the problem. For the term $\left(t_{2} \times\left(\left(t_{0} \%_{u} t_{1}\right) \gg_{l} t_{0}\right)\right)$, applying the rewrite rule will simplify it to $\left(t_{2} \times 0\right.$ ), allowing the $t_{2}$ term to be potentially eliminated, too (if it is not referred to elsewhere).

The creation-time simplifications are idempotent; the simplifications are applied to the result before it is returned. That is, if any expression that is returned by the creation-time simplifications is requested, then the same expression will be returned. The following example will clarify this.

## Example 3.2

If the term $\left(\left(2 \times t^{[n]}\right)-\left(2 \times t^{[n]}\right)\right)$ is requested, then the term $0^{[n]}$ is created. The $\left(t^{[n]}-t^{[n]}\right)$ term, which is equivalent and simpler is not created. If $\left(t^{[n]}-t^{[n]}\right)$ is requested, $0^{[n]}$ is created. The creation-time simplifications are idempotent, so requesting $\left(\left(2 \times t^{[n]}\right)-\left(2 \times t^{[n]}\right)\right)$ will not create $\left(t^{[n]}-t^{[n]}\right)$.

[^1]Some rules have as their main purpose to normalise terms.

## Example 3.3

The equivalent terms $\left(t^{[n]}+t^{[n]}\right),\left(2 \times t^{[n]}\right)$ and $\left(t^{[n]} \ll 1\right)$ are all converted to $\left(2 \times t^{[n]}\right)$. This kind of normalisation increases the chance that occurrences of equivalent expressions will be identified, so that rewrite rules like $\left(t^{[n]}-t^{[n]}\right) \triangleright 0$, will apply more frequently.

We do not aim to achieve a complete normalisation of equivalent terms, just commonly occurring ones. For example, one of the infinitely many equivalent but different terms, which is not converted to $\left(2 \times t^{[n]}\right)$ is: $\left(t^{[n]}[n-2,0]:: 0^{[1]}\right)$.

Some speculative transformations (section 2.7) are not applied at creation-time, owing to their potential to dramatically increase the number of terms.

## Example 3.4

Consider $((x$ bvxor $y)[u, l]) \triangleright(x[u, l]$ bvxor $y[u, l])$, which can apply recursively. Applying this rule to the term (( $t_{0}$ bvxor $t_{1}$ ) bvxor ( $t_{2}$ bvxor $\left.t_{3}\right)$ )[4, 2], gives:

$$
\left(\left(t_{0}[4,2] \text { bvxor } t_{1}[4,2]\right) \text { bvxor }\left(t_{2}[4,2] \text { bvxor } t_{3}[4,2]\right)\right) .
$$

In the worst case, the request for a single term has created seven terms (assuming the natural numbers 4 , and 2 are free). Therefore, such rules are not applied at expression creation-time.

## Example 3.5

When creating a bit-vector exclusive-or term, the following rules are applied, where $<$ is some fixed but arbitrary total order on terms:

$$
\begin{align*}
& \left(c_{0} \text { bvxor } c_{1}\right) \triangleright c_{2} \text {, where constants } c_{0} \text { and } c_{1} \text { evaluate to } c_{2}  \tag{3.1}\\
& \left(t_{1} \text { bvxor } t_{0}\right) \triangleright\left(t_{0} \text { bvxor } t_{1}\right), \text { where } t_{0}<t_{1}  \tag{3.2}\\
& (t \text { bvxor } t)  \tag{3.3}\\
& (0 \text { bvxor } t) \tag{3.4}
\end{align*}
$$

$$
\begin{align*}
(-1 \text { bvxor } t) & \triangleright(\text { bvnot } t)  \tag{3.5}\\
\left(t_{0} \text { bvxor }\left(\text { bvnot } t_{1}\right)\right) & \triangleright\left(\text { bvnot }\left(t_{0} \text { bvxor } t_{1}\right)\right)  \tag{3.6}\\
\left(\left(\text { bvnot } t_{0}\right) \text { bvxor } t_{1}\right) & \triangleright\left(\text { bvnot }\left(t_{0} \text { bvxor } t_{1}\right)\right) \tag{3.7}
\end{align*}
$$

We now describe different categories of simplification.
Constant Folding. Expressions that have only constant children are evaluated, and a constant is returned. For instance, instead of creating the $\left(6^{[5]}+3{ }^{[5]}\right)$, or $\left(-3^{[5]} \times-3^{[5]}\right)$ terms, the term $9^{[5]}$ is created. Equation 3.1 gives the rule for bitvector exclusive-or constant folding.

Replacing Operations. The QF_BV language contains several similar operations. For instance, the language contains: unsigned less than, unsigned greater than, unsigned less than equals, and unsigned greater than equals. Using all of these would require duplicate code in the solver. So instead, we convert all the unsigned inequalities to unsigned greater than. Likewise the four signed inequalities are all replaced by signed greater-than. Some other operations removed are: not-and, not-or, Boolean-equals, Boolean implies, bit-vector rotate, and unsigned extension. These operations are compactly replaced by other operations.

Commutative sorting. The children of commutative operations are sorted. The children are put in three groups: constants, variables and other expressions. Each group is then ordered (<) based on a unique number that is allocated to an expression when it is created. Children are then ordered, starting with constants, then variables, then other expressions. Equation 3.2 is an instance of this normalisation.

## Example 3.6

If the term $\left(\left(\right.\right.$ bvnot $\left.t_{1}\right)$ bvxor $\left(\right.$ bvnot $\left.\left.t_{0}\right)\right)$ is requested where $t_{0}<t_{1}<\left(\right.$ brnot $\left.t_{1}\right)<$ (bvnot $t_{0}$ ), assuming the creation-time rules are checked top to bottom, then:

1. The children are already sorted, so no change is made.
2. Equation 3.6 is applied. The term (bvnot ((bvnot $\left.t_{1}\right)$ bvxor $\left.t_{0}\right)$ ) is requested.
3. The bit-vector exclusive-or's children are sorted: (bvnot ( $t_{0}$ bvxor (bvnot $\left.t_{1}\right)$ )).
4. Equation 3.6 is applied. The term $\left(\operatorname{bvnot}\left(\operatorname{bvnot}\left(t_{0}\right.\right.\right.$ bvxor $\left.\left.\left.t_{1}\right)\right)\right)$ is requested.
5. The simplified term $\left(t_{0}\right.$ bvxor $\left.t_{1}\right)$ is returned.

Rewrite rules. Rewrite rules are applied at creation-time. They have three different purposes. First we have rules that necessarily produce fewer terms. Second are rules that potentially increase the number of bit-vector terms by some fixed amount, but improve sharing. Third are rules that potentially increase the number of bit-vector terms, but produce an expression with a smaller CNF encoding.

The sharing aware rewrite rules return a sub-expression of the requested expression. Returning a reference to an existing expression requires no new expressions to be created. A similar idea is applied during creation of AIGs by Brummayer and Biere [BB06].

## Example 3.7

Some instances of rewrite rules are:

- $\left(\left(t^{[n]}\right.\right.$ bvxor $\left.\left.t^{[n]}\right) \triangleright 0^{[n]}\right)$. One term is requested, and at most one term is created (the $0^{[n]}$ term). This rule potentially increases the number of bit-vector terms, but improves sharing. Also, the resulting term might have a smaller CNF encoding. Whether the result of the rewrite rule actually has a smaller CNF encoding depends on whether later simplifications would have simplified it to zero anyway.
- $(($ brnot $t) \gg l-t) \triangleright$ ite $(t=0,-1,0)$. One term is requested, and at most four terms are created. However, the resulting term has a smaller CNF encoding.
- $((-1 \times-t) \triangleright t)$. One term is requested, but no extra terms are created because the rule returns a sub-expression of the requested expression.


## Example 3.8

If the term $\left(t_{1}\right.$ bvxor $\left(\right.$ bvnot $\left.\left.t_{0}\right)\right)$ is requested, then both (bvnot $\left.t_{0}\right)$ and $t_{1}$ already exist. So returning the requested term will increase the total number of terms by at most one. If the term already exists, there will be no increase.

After applying the creation-time simplifications, in particular Equation 3.6, if $t_{0}<t_{1}$ then (bvnot ( $t_{0}$ bvxor $\left.t_{1}\right)$ ) is returned. Both $t_{0}$ and $t_{1}$ already exist, but the bvxor term, and the bunot term, might not, so the total number of terms is increased by at most two.

We found it advantageous to discover useful rewrite rules semi-automatically. In section 3.11 we discuss an approach to do that.

After applying the simplifications, structural hashing is performed which returns a reference to an existing expression if it already exists.

The creation-time simplifications achieve three goals. First, they eliminate equivalent but different expressions, making it easier to implement other simplifications. Second, eliminating equivalent but different expressions prevents those expressions from being encoded separately to CNF, which would necessitate extra SAT solver work. Third, simplifications can replace an expression with one that has a smaller CNF encoding.

These same simplifications could be applied in a separate simplification phase. However, applying them at creation-time means that the other simplifications reach a fixed point faster, and are easier to implement because they operate on fewer operations.

### 3.4 Variable Elimination

A variable can be eliminated from a problem if it is semantically equivalent to a term, and the term does not contain the variable.

## Example 3.9

Consider the formula $\left(v_{0}^{[5]}=\left(v_{1}^{[5]}+6^{[5]}\right)\right)$. Because $v_{0}$ is equal to a term which does not contain $v_{0}, v_{0}$ can be eliminated from the problem by replacing it with $\left(v_{1}^{[5]}+6^{[5]}\right)$ throughout. Alternatively, $v_{1}^{[5]}$ could be eliminated by replacing it with $\left(v_{0}^{[5]}-6^{[5]}\right)$.

After variable elimination, the formula becomes trivially satisfiable.

Some expressions that are not equalities have the same effect as an equality-of equating an expression and a variable. So, variable elimination is applied to more than just equalities.

## Example 3.10

In the formula $\left(\left(v_{0}^{[5]}\right.\right.$ bvxor $\left.\left.6^{[5]}\right)=7^{[5]}\right), v_{0}$ can be eliminated from the problem by replacing it with $1^{[5]}$ throughout. After elimination the formula becomes trivially satisfiable.

## Example 3.11

Consider the formula $\left(\left(v_{0}=5\right) \vee\left(\left(v_{0}+v_{1}\right)=\left(2 \times v_{0}\right)\right)\right)$. Neither $v_{0}$ nor $v_{1}$ can be eliminated because they appear under a disjunction, so are not necessarily equivalent to a particular term.

Suppose we have that $v=t$, where $t$ is a term. The variable $v$ may be eliminated from the problem by replacing it throughout by the term it equals.

This occurs when an equation is conjoined at the top level, i.e. $((v=t) \wedge p)$, where the term $t$ does not contain the variable $v$. Then, all occurrences of $v$ in $p$ are replaced with $t$. For correctness, the variable is replaced throughout the problem before other variables are eliminated.

Replacing the variables with expressions does not create a blow-up because a single shared expression replaces the variable in each sub-expression.

Variable elimination has four advantages. First, it reduces the number of variables in the problem that are functionally related to each other. Second, eliminating variables gives smaller CNF encodings because fewer equalities occur. Third, further theory-level simplification might become applicable. Fourth, if there are no other occurrences of the variable, then the expression it equals might disappear from the problem.

We store away the equivalences of eliminated variables so that later, if required, a model can be built.

## Example 3.12

Consider the expression $\left(\left(v_{0}=-t\right) \wedge\left(v_{1}=-v_{0}\right)\right)$, where $t$ contains neither $v_{0}$ nor $v_{1}$. Without variable elimination the CNF encoding contains clauses for two unary minuses, two equalities, and the $t$ term. After variable elimination, the problem is simplified to 1 , so the SAT solver is not called. If a model is needed, STP2 assigns zero to the variables in $t$, and evaluates $t$ with this assignment to calculate the assignments to $v_{0}$, and $v_{1}$.

The variable elimination process is shown in Algorithm 3.1. It makes use of the procedures defined in Algorithm 3.2 and Algorithm 3.3. The algorithm attempts to isolate variables on the left-hand side of an equivalence, by iteratively moving expressions to the right-hand side. It generates some of the equivalences that are entailed by the original formula. Starting from the top-most expression, the algorithm finds candidates where a variable is equivalent to an expression. An elimination is allowed if the variable does not appear in the expression. Because of the normalisation that occurs when expressions are created, there is no need for the algorithm's pattern matching to be exhaustive. For instance, a bi-implication is not matched because it is converted to an exclusive-or at creation-time.

Replacing a variable by the term it equals will remove the equality expression. That is, given $(v=t)$, replacing $v$ with $t$ gives $(t=t)$ which is simplified to 1 by the creation-time simplifications. However, because other operations, such as bitvector exclusive-or, do not reduce to 1 so readily, the variable elimination algorithm explicitly removes an expression which is used to eliminate a variable. This is safe because replacing a variable throughout by the expression it equals, makes the expression the equality was derived from redundant.

The variable elimination algorithm we give is not idempotent. For instance, given the expression $\left(\left(v_{1} \times v_{0}\right)=t\right) \wedge\left(v_{1}=1\right)$, no elimination is performed when traversing the left conjunct. On the right-hand side the elimination $v_{1}=1$ is generated. The result from variable elimination is: $v_{0}=t$, from which $v_{0}$ might be eliminated if variable elimination is run again. We run the variable elimination algorithm as part of a series of simplifications (Figure 3.2) that are run until a fixed point.

```
Algorithm 3.1 The procedure for generating equivalences. This procedure is ini-
tially called with the top-most node of a problem.
Require: \(e \quad\) // The current node
    procedure SEARChAND(e)
        Create Boolean changed \(\leftarrow 0\)
        if \(e\) is a variable then
                Eliminate with \(e \Longleftrightarrow 1\)
                changed \(\leftarrow 1\)
        else if \(e\) matches \(\left(p_{0} \wedge p_{1}\right)\) then
            searchAnd ( \(p_{0}\) )
            \(\operatorname{searchAnd}\left(p_{1}\right)\)
        else if \(e\) matches \(\left(p_{0} \oplus p_{1}\right)\) then
            changed \(\leftarrow \operatorname{searchProp}(e, 1)\)
        else if \(e\) matches \(\left(\neg p_{0}\right)\) then
            changed \(\leftarrow \operatorname{searchProp}\left(p_{0}, 0\right)\)
        else if \(e\) matches \(\left(t_{0}=t_{1}\right)\) then
                                    // Does not evaluate the second disjunct if the first is true
            changed \(\leftarrow \operatorname{searchTerm}\left(t_{0}, t_{1}\right) \vee \operatorname{searchTerm}\left(t_{1}, t_{0}\right)\)
        end if
        if changed then
            Replace \(e\) with 1
        end if
    end procedure
```


## Example 3.13

Consider applying the variable elimination algorithm (Algorithm 3.1) to ( $v_{0}=$ $\left.v_{1}\right) \wedge\left(\left(v_{0}+6\right)=\left(2 \times v_{2}\right)\right)$. The algorithm calls itself recursively with each conjunct. Eventually searchTerm $\left(v_{0}, v_{1}\right)$ is called. Because $v_{0}$ is a variable not contained in the right-hand side, $v_{0}$ is replaced throughout by $v_{1}$. Since the formula ( $v_{0}=v_{1}$ ) was used to eliminate a variable, $\left(v_{0}=v_{1}\right)$ is dropped from the problem. searchTerm $\left(\left(v_{1}+6\right),\left(2 \times v_{2}\right)\right)$ is called soon after, which in turn calls both $\operatorname{searchTerm}\left(6,\left(\left(2 \times v_{2}\right)-v_{1}\right)\right)$ and searchTerm $\left(v_{1},\left(\left(2 \times v_{2}\right)-6\right)\right)$. In the second case, $v_{1}$ has been isolated and so will be eliminated. The original expressions have been replaced by 1 , and the equations between variables stored so that models can be constructed if needed.

In the worst case, the algorithm has exponential running time in the number of expressions. However, on the problems used in the evaluation (section 3.19) the runtime is always reasonable. Because the encoding to CNF guarantees completeness and since this is an anytime algorithm, to limit the worst case cost, an upper

```
Algorithm 3.2 Sub-procedure for generating equivalences: dealing with terms. The
inverse function gives the multiplicative inverse of an odd constant.
    procedure SEARCHTERM(lhs, rhs)
    Create Boolean changed \(\leftarrow 0\)
    if \(l h s\) is a variable and is not a sub-expression of \(r h s\) then
        Eliminate with lhs \(\leftarrow r h s\)
        changed \(\leftarrow 1\)
    else if \(l h s\) matches \(-t\) then
        changed \(\leftarrow \operatorname{searchTerm}(t,-r h s)\)
    else if lhs matches (bvnot \(t\) ) then
        changed \(\leftarrow \operatorname{searchTerm}(t,(b v n o t r h s))\)
    else if lhs matches ( \(t_{0}\) bvxor \(t_{1}\) ) then
        changed \(\leftarrow \operatorname{searchTerm}\left(t_{0},\left(t_{1}\right.\right.\) bvxor rhs \(\left.)\right) \vee \operatorname{searchTerm}\left(t_{1},\left(t_{0}\right.\right.\) buxor rhs \(\left.)\right)\)
    else if \(l h s\) matches \(\left(t_{0}+t_{1}\right)\) then
            changed \(\leftarrow \operatorname{searchTerm}\left(t_{0},\left(r h s-t_{1}\right)\right) \vee \operatorname{searchTerm}\left(t_{1},\left(r h s-t_{0}\right)\right)\)
    else if lhs matches ( \(t_{0} \times t_{1}\) ) and \(t_{0}\) is an odd constant then
                                    // Creation-time simplifications ensure \(t_{1}\) is not a constant
        changed \(\leftarrow \operatorname{searchTerm}\left(t_{1},\left(\right.\right.\) inverse \(\left(t_{0}\right) \times\) rhs \(\left.)\right)\)
    end if
    return changed
    end procedure
```

```
Algorithm 3.3 Sub-procedure for generating equivalences: dealing with proposi-
tions (no top-level).
    procedure searchProp(lhs, rhs)
    Create Boolean changed \(\leftarrow 0\)
    if \(l h s\) is a variable and is not a sub-expression of the rhs then
        Eliminate with lhs \(\Longleftrightarrow\) rhs
        changed \(\leftarrow 1\)
    else if lhs matches \(p_{0} \oplus p_{1}\) then
        changed \(\leftarrow \operatorname{searchProp}\left(p_{0},\left(r h s \oplus p_{1}\right)\right) \vee \operatorname{searchProp}\left(p_{1},\left(r h s \oplus p_{0}\right)\right)\)
    else if \(l\) hs matches \(\neg p_{0}\) then
        changed \(\leftarrow \operatorname{searchProp}\left(p_{0}, \neg r h s\right)\)
    else if lhs matches \(\left(t_{0}=t_{1}\right) \wedge \operatorname{bitwidth}\left(t_{0}\right)=1\) then
        changed \(\leftarrow \operatorname{searchTerm}\left(t_{0}\right.\), ite \(\left(r h s, t_{1},\left(\right.\right.\) bunot \(\left.\left.\left.t_{1}\right)\right)\right)\)
        if \(\neg\) changed then
                changed \(\leftarrow \operatorname{searchTerm}\left(t_{1}\right.\), ite \(\left(r h s, t_{0},\left(\right.\right.\) bunot \(\left.\left.\left.t_{0}\right)\right)\right)\)
            end if
    end if
    return changed
    end procedure
```

bound can simply be placed on the algorithm's number of iterations, or on running time.

## Example 3.14

Consider the expression with four variables: $v_{0} \ldots v_{3}$, where $S_{i, j}$ are syntactic/term variables, and $S_{r}$ is the formula we create. For some $k$, construct:

$$
\begin{aligned}
S_{0,0} & \leftarrow\left(v_{0}+v_{1}\right) \\
S_{0,1} & \leftarrow\left(v_{2}+v_{3}\right) \\
S_{0,2} & \leftarrow\left(S_{0,0}+S_{0,1}\right) \\
S_{i, 0} & \leftarrow\left(S_{i-1,0}+S_{i-1,2}\right), \text { for } 0<i \leq k \\
S_{i, 1} & \leftarrow\left(S_{i-1,1}+S_{i-1,2}\right), \text { for } 0<i \leq k \\
S_{i, 2} & \leftarrow\left(S_{i, 1}+S_{i, 0}\right), \text { for } 0<i \leq k \\
S_{r} & \leftarrow\left(S_{k, 2}=0\right)
\end{aligned}
$$

For $k>1$, variable elimination calls the searchTerm procedure $\left(8 \times 3^{k}\right)$ times. For $k=11, S_{r}$ has about 40 subterms and is equivalent to: $177147 \times v_{0}+177147 \times v_{1}+$ $177147 \times v_{2}+177147 \times v_{3}=0$. The searchTerm procedure is called about 1.4 million times.

Checking that a variable is not contained on the right-hand side is a potentially expensive operation that variable elimination must perform. Some software verification benchmarks are initially thousands of levels deep. With each variable eliminated, the depth of the remaining terms might increase by as much as the depth of the term that replaces the eliminated variable. Our implementation performs caching to avoid repeated traversals of the same expression.

## Example 3.15

We list some eliminations that are performed when the given expression is conjoined at the top level. The variable $v$ is a bit-vector variable, and $b$ is a propositional variable, $v$ and $p$ are not sub-expression of the other expressions.

- $(-v=t)$. Replace $v$ by $-t$.
- $5-(3 \times v)=t$. Replace $v$ with $((1 / 3) \times-(t-5))$.
- $\neg(b \oplus p)$. Replace $b$ with the formula $p$.
- $(\neg b)$. Replace $b$ with 0 .
- $\left(p_{0} \oplus\left(p_{1} \oplus\left(b \oplus p_{2}\right)\right)\right)$. Replace $b$ with $\neg\left(\left(p_{0} \oplus p_{1}\right) \oplus p_{2}\right)$.

As an example of an equation that is left unprocessed, consider $(v=(6 \times(v+1)))$. Nothing is done in this case, although it would be correct to generate the equation $v=(-1 / 5) \times 6$. This shows that more powerful variable elimination algorithms exist than the one that we have given.

Unlike MathSAT [Fra10], we have not experimented with replacing extracts from variables with the expressions they equal. For instance, given $(v[u, l]=t) \wedge p$, MathSAT will eliminate part of $v$. The partial solver that we describe in the next section will eliminate part of the variable in the special case when $l=0$.

Variable elimination is widely applied by bit-vector solvers [Fra10, Gan07]. However, the algorithm that we have presented eliminates variables in situations that other algorithms do not. Our method is obvious enough that is has surely been applied in other contexts, although we believe its application to bit-vector solving is novel.

### 3.5 Partial Solver

Barrett et al. [BDL98] reduce the bit-width of variables and the number of variables in a problem with equalities of a certain form. In Section 3.2 of Barrett et al. [BDL98] they describe a linear solver for equations of the form:

$$
c_{0} \times v_{0}+\ldots+c_{k} \times v_{k}+c=0
$$

Here the $c^{\prime}$ s are constants, and the equation is at the top level. By using basic rules of algebra, expressions of the form $c_{0} \times v_{0}$ are isolated on one side. If the constant $c_{0}$ is odd, both sides of the equality are multiplied by $c_{0}$ 's unique multiplicative inverse, isolating $v_{0}$. The variable $v_{0}$ is then eliminated from the problem.

If the constant $c_{0}$ is even, then it can be written as $c_{0}=2^{l} \times c^{\prime}$, where $c^{\prime}$ is odd and $l>0$. The equality is then replaced by two equalities:

$$
\begin{equation*}
0^{[l]}=\left(-c_{1} \times v_{1}-\ldots-c_{k} \times v_{k}\right)[l-1,0] \tag{3.8}
\end{equation*}
$$

$$
\begin{equation*}
v_{0}[n-l-1,0]=\left(1 / c^{\prime}\right) \times\left(-c_{1} \times v_{1}-\ldots-c_{k} \times v_{k}\right)[n-1, l] \tag{3.9}
\end{equation*}
$$

Equations of the latter form are used in turn to eliminate part of $v_{0}$.
The "partial linear solver" [GD07] of STP 0.1 is more general; instead of restricting solving just to equations of variables, arbitrary terms can appear in equalities. For instance STP 0.1's partial solver can eliminate $v$, given $((7 \times v)+t)=0$, where $t$ is an arbitrary term not containing $v$. This necessitates a check that $v$ is not contained in $t$. The partial solver of STP 0.1 is claimed to perform at most $O\left(m^{2} k\right)$ multiplications, where $m$ is the number of equations, and $k$ is the number of variables. However, the algorithm's total run time also includes time to check whether terms contain a given variable, and to re-normalise equations after variable elimination. These steps, which are necessary to run the algorithm, may consume significant time.

STP2 preserves the partial solver of STP 0.1. As above, variable elimination (section 3.4), eliminates a variable which has an odd coefficient. The variable elimination algorithm additionally isolates variables contained in expressions with the logical exclusive-or, bit-vector exclusive-or, single bit equality, and bit-vector not.

The partial solver, but not the variable elimination algorithm, solves for variables in equations of the form $c_{0} \times v_{0} \ldots+c_{k} \times v_{k}+c=0$, where all the $c$ 's are even constants. Similarly to Barrett et al. [BDL98], the algorithm reduces this to two equisatisfiable equations, one of which has at least one odd coefficient. Also, the partial solver can eliminate parts of variables; for instance it uses $v[1,0]=t$ to remove the lowest two bits of $v$ throughout the problem.

### 3.6 Speculative Transformations

STP2 applies speculative transformations until they reach a fixed point. We call them speculative (section 2.7) because they might increase the total number of expressions. They do not necessarily reduce the difficulty of the problem as measured by the size of the CNF encoding.

We have never found a problem for which the /spectrans do not reach a fixedpoint. However, such problems may exist.

A common motivating example that speculative transformations simplify is:

$$
\left(t_{0} \times\left(t_{1}+t_{2}\right)\right) \neq\left(\left(t_{0} \times t_{1}\right)+\left(t_{0} \times t_{2}\right)\right)
$$

which is false, but which in general is difficult for SAT solvers to establish.
STP2 uses approximately one hundred of STP 0.1's normalisation rules, which were substantially published when implemented in CVC Lite [GBD05]. The normalisations increase the number of expressions by at worst a polynomial amount. After applying the speculative transformations, to estimate whether the problem has become easier we apply clause count comparison (section 3.7), and then discard the changes if the problem seems harder.

Examples of the rules applied during the speculative transformations are:

- $\left(t_{0} \times t_{1}\right)[u, l] \triangleright\left(t_{0}[u, 0] \times t_{1}[u, 0]\right)[u, l]$
- $\left(t_{0} \times\left(t_{1}+t_{2}\right)\right) \triangleright\left(\left(t_{0} \times t_{1}\right)+\left(t_{0} \times t_{2}\right)\right)$

STP2 has fewer speculative rules than STP 0.1. Some of the problems in the SMTLIB benchmarks grow large when the rules of STP 0.1 are applied. In some cases, the growth exceeds the memory limit before the phase can be completed and the transformations undone. Only some of the simplifications that are applied during this phase give a worst case polynomial increase in node size. Others increase the number of expressions by a constant amount.

We describe only these simplifications as speculative. But there is no guarantee that the sharing-aware simplifications that we apply will make the problem easier to solve. In particular, it is possible for a simplification to produce a problem which is both smaller and harder for a SAT solver.

### 3.7 Clause Count Estimation and Comparison

The speculative transformations (section 3.6) sometimes make the problem larger. In the worst case, they can cause a polynomial size increase in the number of expressions. To estimate whether simplifications have made the problem easier, STP2 estimates the number of CNF clauses that bit-blasting a particular expression will produce. We call this clause count estimation. The number of clauses is estimated twice, once before and once after the speculative transformations. This allows STP2
to use whichever formula is estimated to produce the fewest CNF clauses. We use the estimated number of clauses as a proxy for the difficulty of solving a problem. So if, after speculative transformations, the estimated difficulty has increased, a copy of the problem saved prior to applying the speculative transformations is used instead. We call this clause count comparison. A similar, but more local, approach is taken by AIG Rewriting [BB04].

Some expressions are encoded into many more clauses than others. For instance, a 64-bit signed division operator introduces about 65,000 clauses (Table 3.8), but the bit-vector negation operation introduces none. STP2's clause estimator has a weighting for each operation that estimates the number of clauses generated for an operation of a particular bit-width.

The algorithm we employ is not sophisticated or particularly accurate. We have adjusted its parameters through trial and error. But as we show (section 3.20) it is successful.

This approach has the advantage that the problem can be transformed through a more difficult state, perhaps by distributing multiplication over addition, which later drastically simplifies. If only transitions which decreased the difficulty were allowed, then it would not be possible to transition through more difficult intermediate states.

A disadvantage of the approach is that if speculative transformations shrink one part of the problem, and complicate another part, then the change will be accepted or rejected in entirety, without considering the local effects.

A more precise means of measuring difficulty is to bit-blast the problem entirely before and after speculative transformations. We do this for small instances. However, some crafted problems, in particular, have CNF encodings that are expensive to generate. They require hundreds of millions of clauses before speculative transformations, and afterwards require far fewer. It is inefficient to bit-blast such large problems prematurely.

### 3.8 Bit-Blasting

Bit-blasting converts an expression into an equisatisfiable propositional formula. We use bit-blasting in three contexts.

- First, as a simplification step to identify theory-level expressions that are equivalent to each other or to constants. These equivalences are used to remove equivalent but different theory-level expressions.
- Second, as part of measuring whether other simplifications have made the problem easier or not (section 3.7).
- Third, and most importantly, as a precursor to CNF encoding.

We use the open-source ABC package [BM10] to build the AIGs that are created when bit-blasting. For each theory-level operation, we have built a procedure that requests AIGs that faithfully represent the semantics of each bit-vector and logical operation.

For bit-vector problems that are deemed easy by the clause estimator (section 3.7), we bit-blast the problem once during the simplification phase. Performing bitblasting as part of the preprocessing has the advantage that it can be applied with other simplifications until a fixed point, rather than as a final stage.

Bit-blasting can discover that some expressions have a constant value. This allows the expression to be replaced with the respective constant. After requesting an AIG corresponding to an expression, we traverse the AIG vector checking whether each node is 1 or 0 . If all the nodes are constants, then the theory-level expression can be replaced by the corresponding constant.

We also use the AIGs to find theory-level expressions that are different but equivalent. We call this bit-blasting equivalence checking. We do this by storing the relationship between vectors of AIG nodes and the theory-level expression that they represent. After bit-blasting, we iterate through the map looking for pairs of distinct expressions with the same AIG vector-these are equivalent. When equivalent expressions are discovered, one of the expressions is substituted throughout for the other expression. Using AIGs like this avoids the necessity of explicitly applying some word-level rewrite rules.

## Example 3.16

Some bit-blasting simplifications identified by this stage are:

- $\left(t^{[5]}>_{s}\left(3^{[5]}\right.\right.$ bvor $\left.\left.t^{[5]}\right)\right)$ is replaced by 0 .
- $(00001)_{2}=\left(t^{[5]} \ll t^{[5]}\right)$ is replaced by 0 .
- $\neg\left(1^{[3]}>_{u}\left(0^{[2]}:: v^{[1]}\right)\right)$ is replaced by $\left(1^{[1]}=v^{[1]}\right)$ if that expression exists elsewhere.
- $\left(\left(v^{[2]} \times v^{[2]}\right)>s 3^{[2]}\right)$ is replaced by 1 .

However, the most important use of bit-blasting is to produce a propositional formula that, when converted to CNF, is fast to solve.

Using AIGs to store propositional formulae is good for two reasons. First, the AIGs simplify the formula as it is constructed, for instance, AIG creation time simplifications apply $(p \wedge p) \triangleright p$, where $p$ is an arbitrary propositional formula. Second, because formulae are structurally hashed, sharing is performed between theorylevel operations. For instance, the propositional formulae for the least significant bit of ( $t_{0}$ bvand $t_{1}$ ), and ( $t_{0} \times t_{1}$ ) are identical, so the AIG representing the bit is shared.

The particular encodings chosen for bit-blasting bit-vector operations considerably affect the solver's speed. However, fast encodings are often specific to the solver's implementation. Because of this, like other authors, we omit a detailed description of the particular encodings we chose ${ }^{2}$. An issue is that poor encodings can be "repaired" by later stages. Poor encodings make later stages, for instance, a CNF converter which can fix a poor encoding, look unreasonably good. As an indication of the care we took: for each bit-vector operation we implemented several encodings, for instance, 10 multiplication encodings and 8 division encodings. Then, we applied parameter optimisation (section 3.18) to select the best combination of those encodings.

### 3.9 Multiplication with Sorting Networks

In this section we describe an interesting approach to encoding the multiplication operation using sorting networks. Our results with the new approach are disappointing, as it is no faster than the standard approach of addition networks, but we present the idea in the hope that others might advance it.

[^2]

Figure 3.3: A 4-bit table of partial products. Column zero is on the right.

We encode multiplication using a table of partial products (Figure 3.3). This figure shows the table for a multiplication of $x^{[4]}$ and $y^{[4]}$, with the result $r^{[4]}$.

The exclusive-or of each column is taken to produce the result. For instance, the formula for the second least significant bit is: $r[1]=(x[0] \wedge y[1]) \oplus(x[1] \wedge y[0])$ Note, there is no $x[3] \wedge y[1]$ term used to calculate the value of $r$, which is ignored because it overflows.

A common approach to the summation of partial products is to capture the sum of a column by encoding it as some kind of addition network [ES06]. This corresponds to treating a column sum as a number in binary representation.

For 32 or 64 -bit multiplication, there is a small upper bound on how large a column sum gets, even when carry-in is included. This makes it feasible to treat a column sum as a number in unary representation, which we had anticipated would lead to better constraint propagation. In this case summation becomes sorting and some kind of sorting network is called for in place of addition networks.

Given $u$ inputs to a sorting network, a sorting network produces an output where, given $\ell$ true inputs, the outputs $s[0]$ to $s[\ell-1]$ are 1 , and the remainder $s[\ell]$ to $s[u-1]$ are 0 . Note that we want a sorting network to produce a bit sequence in non-increasing order, so the basic building block is the comparator shown in Figure 3.4.

To capture a sorting network we encode the method underlying Batcher's oddeven mergesort [Sed98]. Figure 3.5 shows the corresponding sorting network for


Figure 3.4: A comparator and equations that describe it


Figure 3.5: Batcher's odd-even sorting network (for 16 input bits)

16 input bits.
Batcher's algorithm is non-adaptive: merging is expressed in terms of compareexchange operations only. As a consequence, the same sequence of operations happen irrespective of input. This makes the algorithm suitable for the purpose of CNF generation.

As carries into a column we use each second sorted value from the prior column. Batcher's sorting network is based on merging sorted values, so we are able to avoid re-sorting the carries which are already sorted. That is, we merge the sorted sequence $\{s[1], s[3], \cdots, s[u-1]\}$ with the sorted sequence of partial products for the next highest column.

The (binary representation) result $r$ of the multiplication is now easily found: bit $i$ should be the parity of the number of bits in column $i$. If there are an odd number of bits set, then the result is 1 , if there are an even number the result is 0 . So if $u$ is even, the equation for the resulting bit is $(((s[0] \wedge \neg s[1]) \vee(s[2] \wedge \neg s[3]) \vee(s[4] \wedge$ $\neg s[5]) \vee \ldots \vee(s[u-2] \wedge \neg s[u-1])) \Leftrightarrow r[i])$.

After encoding to CNF however, we did not find the sorting network encoding to be consistently faster than an addition network encoding. As we show later, in section 4.10, the unit propagation of the addition network encoding of multiplication is surprisingly strong.

### 3.10 CNF through Technology Mapping

In section 6 of their paper, Eén, Mishchenko, and Sörensson [EMS07] describe an encoding of AIGs to CNF, which they call CNF through technology mapping (TM). An implementation is available in the open-source ABC package [BM10].

STP2 uses ABC to manage AIGs and to convert those AIGs to CNF. ABC has two ways to convert AIGs to CNF. First, via a slightly improved [EMS07] Tseitin encoding, and second via the TM approach. STP 0.1 did not use AIGs, it used an implementation that we show (section 3.20) to be much less efficient.

The basic idea of the TM algorithm is to break the AIG into subgraphs of no more than $k$ inputs, such that the sum of clauses needed to represent the subgraphs is minimised. Depending on their structure, subgraphs require differing numbers of clauses to represent them. For instance, an 8-input AIG denoting the "and" function can be represented with as few as 9 clauses, but many other functions require more. A partition of the AIG into subgraphs is chosen that heuristically minimises the number of clauses needed. The clauses that represent each possible subgraph are pre-generated via Minato-Morreales's algorithm [Min92].

Using ABC to manage AIGs, and to convert those AIGs to CNF, means that STP2 does not precisely control the clauses that are asserted. Some tools, for example Minisat+ [ESO6], explicitly add extra clauses to the CNF to improve unit propagation. The extra clauses allows unit propagation to force entailments that are otherwise opaque and require search to discover. CNF representations which are maximally precise (section 2.10) under unit propagation have been created for many operations [Bac07]. We show later (Table 3.8) that ABC does a good job; it generates CNF that is maximally precise under unit propagation for some bitwise operations. However, the disadvantage is that we lose control over the CNF encoding, we cannot easily specify better encodings for particular operations.

### 3.11 Discovering Rewrite Rules

Initially, we discovered rewrite rules on an ad-hoc basis, implementing more than 100 rewrite rules. When STP2 solved a problem slowly, we looked through the problem for sub-expressions that could be simplified. However, to avoid missing

## CHAPTER 3. BUILDING A BETTER BIT-VECTOR SOLVER

important rules, we instead decided to automatically discover extra rewrite rules. In this section we refer to term rewriting concepts introduced in section 2.5.

We applied two approaches to generating rewrite rules. The first and simpler approach helped us to find useful rewrite rules. We generated all the equalities between a set of expressions and implemented as rewrite rules any that seemed reasonable. We automated just the process of discovering equivalences.

To discover equivalences, we automatically generate disequalities in a fragment of the QF_BV language. Then we send those disequalities to STP2, if STP2 reports that the disequality is unsatisfiable, then for that bit-width the equality is sound. If the equality holds, we then check whether the equality holds at some higher bit-widths. Of course, the equality might not hold at lower or higher bit-widths.

The second approach that we tried, but which unfortunately did not generate any useful rewrite rules, was more automated and generated rewrite rules rather than just equivalences. We give details of this in subsection 3.11.2. We were inspired to automatically generate rewrite rules by Bansal [Ban08], who generates rewrite rules to build a super-optimiser.

### 3.11.1 Finding Equivalences

In this section we give an approach to automatically discovering equivalent (but different) terms. We apply Algorithm 3.4 to discover equivalent expressions from a list of expressions. The algorithm is an optimised version of an all-pairs comparison. If the SAT solver discovers that two expressions' values differ for some assignment, then all the expressions in the list of expressions are evaluated with that assignment, splitting the list into at least two sub-lists.

Algorithm 3.4 has two procedures. The algorithm is started by calling the discover procedure. The discover procedure considers each pair of distinct expressions from the list. If the SAT solver discovers that there exists an assignment for which a pair of expressions evaluate to different values, then the algorithm calls the split procedure using that model. The algorithm widens the bit-width to check that the two expressions are equivalent on a range of bit-widths.

The split procedure splits the list into sub-lists where the terms in each list evaluate to the same value on an assignment. As the list is recursively split, the terms in each list have been shown to be equal at an increasing number of assignments.

```
\(\overline{\text { Algorithm 3.4 Given a list of expressions, output expressions that are equivalent }}\)
between the bit-widths of low and minimum. If time permits, check to a maximum
bit-width of 1024. Calling discover checks all pairs of expressions contained in the
list of expressions.
```

```
Require: list // a list of bit-vector terms of bit-width \(n\)
```

Require: list // a list of bit-vector terms of bit-width $n$
Require: assignment // a map from variables to integers
Require: assignment // a map from variables to integers
procedure Split(list, assignment)
procedure Split(list, assignment)
Create newLists, a map from integers to lists of expressions
Create newLists, a map from integers to lists of expressions
for $e \in$ list do
for $e \in$ list do
newLists[eval(increase(e), assignment)].insert(e)
newLists[eval(increase(e), assignment)].insert(e)
// Evaluate the expression $e$ with the assignment. Place each expression
// Evaluate the expression $e$ with the assignment. Place each expression
that evaluates to the same integer into the same list. It might be necessary to
that evaluates to the same integer into the same list. It might be necessary to
increase the bit-width of $e$ to match the bit-width of the assignment.
increase the bit-width of $e$ to match the bit-width of the assignment.
end for
end for
return newLists
return newLists
end procedure

```
    end procedure
```

| Require: start | // the bit-width to start testing expressions |
| :---: | :---: |
| Require: minimum | // the least bit-width to test to |
| 9: procedure Disc |  |
| 10: for $i$ in $0 \ldots$ |  |
| 11: $\quad$ for $j$ in 0 |  |
| 12: for $k$ |  |
| 13: | s_timed_out() then |
| 14: |  |
| 15: end |  |
| 16: C | of list[ $i$ ] and list[ $j$ ] to $k$ |
| 17: if | list[j]) is satisfiable then |
| 18: | be the model from the SAT solver |
| 19: | t,assignment) do |
| 20: |  |
| 21: |  |
| 22: |  |
| 23: end |  |
| 24: end for |  |
| 25: | y is unsatisfiable at all the bit-widths tested |
| 26: Outp | // Some duplicates output |
| 27: end for |  |
| 28: end for |  |
| 29: end procedure |  |

Note that Algorithm 3.4 splits using assignments at a range of bit-widths. If the expressions first differ at a bit-width of $k$, then the list will be split after temporarily increasing the bit-width of all the terms in the list to $k$ bits.

We generated all subgraphs of the expression template given in Figure 3.6. There are three possible unary expressions: no-operation, unary minus, or bit-vector not.


Figure 3.6: Expression instantiation template. We instantiate all possible subgraphs of this graph. There are 5 possible leaves, 3 possible unary operations, and 14 possible binary operations.

There are 14 different binary operations: addition, multiplication, bit-vector or, bit-vector and, bit-vector exclusive-or, leftshift, logical right shift, arithmetic right shift, subtraction, unsigned remainder, signed remainder, unsigned division, signed division, and signed modulus. We allow only five possible leaves: $v_{0}^{[n]}, v_{1}^{[n]}, 0,1$, and -1 . We generate expressions at a bit-width of 6 , which allows a reasonable but not excessive number of constants. After applying the node creation-time simplifications (section 3.3) then removing duplicates, STP2 r1654 identifies 1570 unique expressions. Of these about 200 are equivalent at the bit-widths we tested. The rewrite rules that are not applied at creation time are mostly omitted because they might increase the total number of terms. Many of these are performed implicitly by other simplification phases.

## Example 3.17

Two equivalences that are discovered, which are not amongst the creation-time simplifications are:

- $(1$ bvxor $t) \equiv($ bvnot $(-2$ bvxor $t))$
- $\left(t_{0}+\left(b v n o t ~ t_{1}\right)\right) \equiv\left(\operatorname{bvnot}\left(t_{1}+-t_{0}\right)\right)$

Some equivalences are expensive to verify. It takes about a minute to test at each bit-width between 6 bits and 19 bits for:

$$
-\left(-1 \%_{u} v\right)=-\left((b v n o t v) \%_{u} v\right)
$$



Figure 3.7: Expression instantiation template. We instantiate all possible subgraphs of this graph. Again, there are 5 possible leaves, 3 possible unary operations, and 14 possible binary operations.

### 3.11.2 Automatically Building a Rewrite System

Given the success in automatically generating equivalences, we decided to expand to automatically the work to convert equalities into rewrite rules. Informally, a rewrite rule is an equality that has been ordered so that it transforms a more complex expression into a simpler expression. We convert some equalities into rewrite rules.

To reduce the number of redundant rewrite rules, we rewrite all the subterms of the left and right-hand side of each rule. A term $t$ is in normal form with respect to a set of rewrite rules, if no left-hand sides of any rewrite rule matches the subterms of $t$. A rule is irreducible if its left and right-hand sides are in normal form with respect to the set of rewrite rules (excluding itself). We produce only irreducible rules. For instance, given two rules $t_{0} \triangleright t_{1}$ and $t_{2} \triangleright t_{3}$, if $t_{2}[\sigma]$ equals $t_{0}$, then we replace $t_{1}$ with $t_{3}[\sigma]$.

The subgraphs of Figure 3.7 are input to the checking algorithm (Algorithm 3.4). We define $t_{0}>_{r} t_{1}$ to hold in two cases:

- If $t_{0}$ is not a constant, but $t_{1}$ is a constant.
- If $t_{1}$ is a proper subterm of $t_{0}$.

If the ordering $>_{r}$ does not hold on the rewritten rule, the rule is removed. We use commutative matching to reduce the number of rewrite rules needed. With commutative matching every possible ordering of commutative operations' operands is considered when the matching is occurring. So for instance, the rule $((t+(b v n o t t)) \triangleright-1)$ matches both $(v+(b v n o t v))$ and $((b v n o t v)+v)$.

## CHAPTER 3. BUILDING A BETTER BIT-VECTOR SOLVER

We generate subgraphs of the template expression shown in Figure 3.7.
After applying the node creation-time simplifications (section 3.3) and the AIG equivalence simplification (section 3.8) then removing duplicates, there are about 30 million distinct expressions. We generate expressions at a fixed bit-width, currently 6 bits, and use sign-extension to increase the bit-width of constants.

It is not possible to apply an infinite sequence of rewrite rules to a term, because clearly, the number of sub-terms in the rewritten (the resulting) term strictly decreases.

After generating the rewrite rules, we test each equivalence at bit-widths from 6 bits to 1024 bits with a total timeout of 30 seconds. For instance:

$$
\left(\left(\text { bvnot }\left(-2 \div_{s}(\text { bvnot } v)\right)\right)>_{l}(\text { bvnot }(v \ll-v))\right)=0
$$

holds for bit-widths from 6 to 63 bits, but it does not hold at a bit-width of 64 . We do not test expressions for equivalence below a lower bound, currently 6 bits. We remove rules that do not hold at some bit-width.

We generate expressions and check if they are equivalent at some bit-widths using SAT via STP2. These problems are instances of the combinatorial equivalence checking problem, for which specialist solvers exist [KJJP09]. An alternative approach would be to combine some axioms of the QF_BV system to produce new theorems; this approach is not bit-width dependent.

Using Algorithm 3.4, we discovered about 120,000 rewrite rules in about 60 days of computer time. We stopped the algorithm before it finished. We show a selection of the rules that were discovered in Figure 3.8. Note that the number of rewrite rules needed is reduced significantly by the node creation-time simplifications. The rules that are generated are what we call sharing-aware (section 2.7). Expressions such as if-then-else expressions occur in the rewrite rules because some creation time transformations produce these expressions rather than the requested expression.

Running the rewrite rules on a random sample of 200 of the QF_BV benchmark set, no rules matched. So the rewrite rules we discovered are not useful for solving those problems.

In order to check that we did not miss important rewrite rules, we automatically generated a rewrite system. However, the rules we generated did not match any
terms of our test problems. This shows that the common approach, of building rewrite systems by hand, produces good rewrite systems.

### 3.11.3 Summary

We found automatically generating equivalences useful to inspire new rewrite rules. For instance, initial versions of our creation-time simplifications omitted Equation 3.6. The equivalences we generated contained terms we realised would be simplified by the rule, so we implemented it.

In the first approach we manually inspected the proposed rules, and convinced ourselves that they are correct. In our second approach we recorded the bit-width intervals at which we tested each equivalence, and only applied the rewrite rule to expressions of a bit-width in that interval.

We had less success with automatically generating rewrite rules. The rules we generated applied occasionally to randomly generated problems but not to the evaluation problems. As shown in Figure 3.8, a large proportion of the rewrite rules contain constant values on the left hand side. As later work, omitting rules with constants on the left hand side might generate a more applicable rewrite system.

### 3.12 AIG Rewriting

When STP2 converts from a bit-vector theory formula to propositional logic, it stores the propositional formula as an AIG (section 2.9).

An approach to simplifying AIGs is to use AIG rewriting [BB04]. AIG rewriting performs local sharing-aware rewrites to the AIG, so that each rewrite does not increase the total number of AIG nodes. Representations with an equal number of nodes may contribute differently to the total number of nodes if one representation contains a subgraph that is used elsewhere in the AIG. Mishchenko et al. [MCB06], similarly to Bjesse et al. [BB04], measure the change in node count as functionally equivalent nodes replace each 4 -input subgraph. Their rewriting is a greedy algorithm that reduces the AIG by iteratively replacing AIG subgraphs with equivalent but smaller pre-computed subgraphs, then selecting those replacements that reduce the total number of nodes. AIG rewriting is local, but its scope is enhanced by ap-

$$
\begin{aligned}
& \left(- \text { ite }\left((111111)_{2}=t_{1}^{[6]},(000001)_{2},(000000)_{2}\right) \gg_{l}\left(t_{0}^{[6]} \text { bvand }\left(\text { bvnot } t_{1}^{[6]}\right)\right)\right) \\
& \triangleright \quad-i t e\left((111111)_{2}=t_{1}^{[6]},(000001)_{2},(000000)_{2}\right) \\
& \left(-\left((111111)_{2} \%_{s} t_{1}^{[6]}\right) \div \text { ite }\left((111110)_{2}=t_{1}^{[6]},(000000)_{2},(000001)_{2}\right)\right) \quad \triangleright \quad-\left((111111)_{2} \%_{s} t_{1}^{[6]}\right) \\
& \left(-\left(t_{0}^{[6]} \ll\left(\text { bvnot } t_{1}^{[6]}\right)\right) \%_{s}\left((000001)_{2} \div u t_{1}^{[6]}\right)\right) \quad \triangleright \quad-\left(t_{0}^{[6]} \ll\left(\text { bvnot } t_{1}^{[6]}\right)\right) \\
& \left.\left(-\operatorname{ite}\left(\left((111110)_{2}>_{u} t_{1}^{[6]}\right),(000000)_{2},(000001)_{2}\right) \bmod _{s}-\left((111111)_{2} \div_{s}\left(\text { bvnot } t_{1}^{[6]}\right)\right)\right) \quad \text { (000000) }\right)_{2} \\
& \left(-\left(t_{1}^{[6]} \ll t_{0}^{[6]}\right) \text { bvand }\left(\text { bvnot }\left(\left(\text { bvnot } t_{0}^{[6]}\right) \div s t_{0}^{[6]}\right)\right)\right) \quad \triangleright \quad(000000)_{2} \\
& \left(\left(\text { bvnot }\left(t_{0}^{[6]}>_{1} t_{1}^{[6]}\right)\right) \text { bvand }-\left(\left(\text { bvnot } t_{1}^{[6]}\right) \ll t_{0}^{[6]}\right)\right) \quad \triangleright \quad-\left(\left(\text { bvnot } t_{1}^{[6]}\right) \ll t_{0}^{[6]}\right) \\
& \left(\left(\text { bvnot }\left(t_{0}^{[6]} \text { bvor }\left(\text { bvnot } t_{1}^{[6]}\right)\right)\right)>_{l} t_{1}^{[6]}\right) \quad \triangleright \quad(000000)_{2} \\
& \left(-\operatorname{ite}\left((000001)_{2}=t_{1}^{[6]},(000000)_{2},(000001)_{2}\right) \div{ }_{u}-\left(-t_{1}^{[6]} \div{ }_{u} t_{1}^{[6]}\right)\right) \\
& \triangleright \quad \operatorname{ite}\left((000001)_{2}=t_{1}^{[6]},(000000)_{2},(000001)_{2}\right) \\
& \left(\left(\text { bvnot }\left((111110)_{2}>_{l}\left(\text { bvnot } t_{1}^{[6]}\right)\right)\right) \text { bvand }-\left(t_{0}^{[6]} \gg_{l} t_{1}^{[6]}\right)\right) \quad \triangleright \quad-\left(t_{0}^{[6]} \gg_{l} t_{1}^{[6]}\right) \\
& \left(\text { ite }\left((000000)_{2}=t_{1}^{[6]},(111111)_{2},(000000)_{2}\right) \bmod _{s}\left(\text { bvnot }\left((111110)_{2} \gg_{l} t_{1}^{[6]}\right)\right)\right) \quad \triangleright \quad(000000)_{2} \\
& \left(\left((000001)_{2} \text { bvxor }\left((000001)_{2} \text { bvor } t_{1}^{[6]}\right)\right) \bmod _{s}-\left((111110)_{2} \gg_{a} t_{0}^{[6]}\right)\right) \quad \triangleright \quad(000000)_{2} \\
& \left(-\left((000010)_{2} \times t_{1}^{[6]}\right) \bmod _{s}\left(\operatorname{bvnot}\left((00000)_{2}:: t_{0}^{[6]}[0,0]\right)\right)\right) \quad \triangleright \quad(000000)_{2} \\
& \left(\text { ite }\left((000000)_{2}=t_{1}^{[6]},(111111)_{2},(000000)_{2}\right) \%_{s}\left(\left(\text { bvnot } t_{1}^{[6]}\right) \times\left(\operatorname{bvnot} t_{1}^{[6]}\right)\right)\right) \quad \triangleright \quad(000000)_{2} \\
& \left(-\left(\left(\text { brnot } t_{0}^{[6]}\right)>_{a} t_{0}^{[6]}\right)>_{l}-\left((111111)_{2} \div{ }_{u} t_{1}^{[6]}\right)\right) \quad \triangleright \quad(000000)_{2} \\
& \left(-\left(t_{1}^{[6]} \gg_{l} t_{0}^{[6]}\right)>_{l}\left(\left(\text { bvnot } t_{0}^{[6]}\right) \gg_{a}(000001)_{2}\right)\right) \quad \triangleright(000000)_{2} \\
& \left(\left((00000)_{2}::-t_{1}^{[6]}[5,5]\right) \gg_{l} \text { ite }\left((111110)_{2}=t_{1}^{[6]},(000000)_{2},(111111)_{2}\right)\right) \quad \triangleright \quad(000000)_{2} \\
& \left(\left(t_{1}^{[6]} \div s\left(\text { bvnot } t_{1}^{[6]}\right)\right) \ll-\text { ite }\left((000001)_{2}=t_{1}^{[6]},(000000)_{2},(000001)_{2}\right)\right) \quad \triangleright(000000)_{2} \\
& \left(- \text { ite }\left((111111)_{2}=t_{1}^{[6]},(000001)_{2},(000000)_{2}\right) \%_{u}\left((111110)_{2}+-t_{1}^{[6]}\right)\right) \quad \triangleright \quad(000000)_{2} \\
& \left(\text { ite }\left(\left((111110)_{2}>_{u} t_{1}^{[6]}\right),(000000)_{2},(000001)_{2}\right) \ll\left(\text { bvnot }\left(\left(\text { bvnot } t_{1}^{[6]}\right) \div_{s}(111110)_{2}\right)\right)\right) \\
& \triangleright \quad(000000)_{2} \\
& \left(\left(\operatorname{bvnot}\left(t_{1}^{[6]} \div_{s}\left(\text { bvnot } t_{0}^{[6]}\right)\right)\right)>_{l}\left(\operatorname{bvnot}\left(-t_{1}^{[6]} \gg_{l}-t_{0}^{[6]}\right)\right)\right) \quad \triangleright \quad(000000)_{2}
\end{aligned}
$$

Figure 3.8: Twenty randomly selected rewrite rules that were automatically generated by Algorithm 3.4. Note that rules are given at a bit-width of 6, but can be safely widened to any larger bit-width for which they have been tested.
plying rewriting iteratively. We use the implementation from the open-source $A B C$ tool [BM10].

Brummayer and Biere [BB06] give creation-time rules for AIGs that perhaps simplify, but never increase the total number of nodes. These rules are implemented in ABC, for instance, $b$ is created instead of $(b \wedge b)$.

### 3.13 Boolean Abstraction AIG Rewrite

Observing that it is potentially expensive to apply AIG rewriting to the entire bitblasted problem, we instead apply AIG rewriting just to the Boolean abstraction of a problem, sometimes called the propositional skeleton. We build an AIG that corresponds to the logical operations in the problem. Bit-vector theory predicates are replaced by fresh Boolean variables. Boolean variables map to themselves. This is the "Boolean abstraction of the input formula" used by DPLL(T) approaches ([Seb07] section 2.2).

## Example 3.18

Given the expression $\left(\left(v_{0}=0\right) \vee\left(\left(v_{0}=0\right) \wedge\left(v_{1}=1\right) \wedge\left(v_{2} \leq_{s} 2\right)\right)\right.$, each of the predicates are mapped to fresh Boolean variables and substituted, giving $\left(b_{0} \vee\left(b_{0} \wedge b_{1} \wedge b_{2}\right)\right)$. AIG rewriting simplifies this to $b_{0}$, which after substitution back is $\left(v_{0}=0\right)$.

After abstraction, we apply AIG rewriting to the resulting AIG. Then, the AIG is converted back to a bit-vector theory expression, and the introduced Boolean variables are replaced with the predicates they substituted for. The intention is for the AIG rewrite to simplify the Boolean abstraction, perhaps removing theory-level expressions.

The system description for Boolector submitted to SMT-COMP 2012 mentions a top-level Boolean-skeleton simplifier, which we understand is an independent implementation of the idea in this section.

### 3.14 ITE Transformations

Kim et al. [KSJ09] simplify if-then-elses (ITEs) before applying a linear arithmetic solver. Their idea is that a term which is reachable via the "true" branch of an ITE can have any subterm that it shares with the ITE's conditional replaced by 1 . Shared subterms that are reachable via the "false" branch can likewise be replaced by 0 . We implement a variant of their approach, shown here as Algorithm 3.5.

The algorithm keeps a context of the conditions that must be 1 or 0 at a particular expression. Initially, at the root node, the context is empty. When we encounter an expression ite $\left(p, t_{0}, t_{1}\right)$, execution is forked, with $p$ being added to the context

```
Algorithm 3.5 ITE simplification algorithm. Initially the procedure is called with
the root node and an empty context.
    procedure REPLACE_KNOWN( \(e\), context)
        if ( \(e \in\) context \()\) then
            return 1
        else if ( \(\neg e \in\) context) then
            return 0
        else if size \((\) context \()>\) maximum_size then
            return \(e \quad / /\) Limit to prevent blowup
        else if \(e\) matches ite \(\left(p, t_{0}, t_{1}\right)\) then
            return ite(REPLACE_KNOWN \((p\), context \()\), REPLACE_KNOWN \(\left(t_{0}\right.\), context \(\cup\)
    \(p)\), REPLACE_KNOWN \(\left(t_{1}\right.\), context \(\left.\left.\cup(\neg p)\right)\right)\)
        else
            Create simplified, a new expression
            Let the type of simplified be the same as \(e\)
            for each child \(c\) of \(e\) do // Add another child to the new expression
                simplified.addChild(replace_Known(c, context))
            end for
            return simplified
        end if
    end procedure
```

before $t_{0}$ is visited, and $\neg p$ before $t_{1}$ is visited. If a formula that is in the context is encountered, it is replaced by 1 or 0 , as appropriate.

## Example 3.19

Consider the expression (ite $\left(p_{0} \wedge p_{1}\right.$, ite $\left.\left.\left(p_{0}, v, 3\right), 5\right)=5\right) .\left(p_{0} \wedge p_{1}\right)$ is added to the context before the true branch of the outermost ITE is traversed, and $\neg\left(p_{0} \wedge p_{1}\right)$ is added before the false branch is traversed. The condition of the innermost ITE, ( $p_{0}$ ), is evaluated in the context ( $p_{0} \wedge p_{1}$ ) and evaluates to 1 , so it is replaced by 1 . The simplified expression is: $\left(\right.$ ite $\left.\left(p_{0} \wedge p_{1}, v, 5\right)=5\right)$

This transformation just replaces formulae with 1 and 0 . It does not do more elaborate transformations. For example, it leaves the expression ite( $\left(t<_{u} 6\right),\left(t<_{u}\right.$ 7), $t=6$ ) unchanged, although it is equivalent to $\left(t \leq_{u} 6\right)$.

## Example 3.20

As an example of the worst case behaviour, consider the following conjuncts, where $S_{0}$ is the root node, and $S_{1}$ to $S_{4}$ are syntactic variables:

$$
\begin{aligned}
& S_{0}=i t e\left(p_{0},\left(\text { bvnot } S_{1}\right),-S_{1}\right) \\
& S_{1}=\text { ite }\left(p_{1},\left(\operatorname{bvnot} S_{2}\right),-S_{2}\right) \\
& S_{2}=\operatorname{ite}\left(p_{2},\left(\operatorname{bvnot} S_{3}\right),-S_{3}\right) \\
& S_{3}=i t e\left(p_{3},\left(\text { bvnot } S_{4}\right),-S_{4}\right) \\
& S_{4}=i t e\left(p_{4}, v_{0}^{[n]}, v_{1}^{[n]}\right)
\end{aligned}
$$

Both $v_{0}$ and $v_{1}$ can be reached via the true and false branches of 4 ITEs. So there are 16 distinct contexts that can reach $v_{0}$, and 16 that can reach $v_{1}$.

The transformation is expensive because it considers the path through all the ITE nodes between an expression and the root node. That is, each extra ITE expression on the path from the root node doubles the number of node contexts. In the worst case, this creates $2^{i}$ contexts, where $i$ is the number of ITE nodes. If a depth-first traversal is performed then space proportional to the depth of the ITE expressions is needed.

Other algorithms can find more substitutions than Algorithm 3.5 deduces. For instance, ROBDDs[Bry86] allow entailments that are missed by our algorithm to be deduced. However, such data structures are more expensive to maintain during traversal of the expression. Also, they are not perfect; like our algorithm they operate on the expression's Boolean abstraction.

### 3.15 Unconstrained-Variable Simplification

Bruttomesso [Bru08] and Brummayer [Bru09] both provide rules to simplify expressions that contain unconstrained variables. Some expressions containing unconstrained variables are eliminated by replacing them with fresh variables.

An unconstrained variable is one which has a single edge from a parent expression in the DAG representation of the problem. If the parent of an unconstrained variable can take any possible value, then the expression can be replaced by a fresh variable. Because the newly introduced fresh variable might also be unconstrained, the parent
of the recently introduced fresh variable can sometimes be replaced by a fresh variable too.

## Example 3.21

Consider $(v+1)$, and assume this is the only use of $v$. Then, the occurrence of $(v+1)$ can be replaced by a fresh variable.

The unconstrained variable simplification replaces an expression $t$ with a fresh variable $v$. It can be applied if two conditions are satisfied: first, $t$ can vary independently of the rest of the problem, and second, $t$ denotes a surjective function, that is, $t$ can yield any value in its range.

## Example 3.22

Consider a sub-expression $v=t$, where $v$ occurs nowhere else. This equation can be replaced by a fresh Boolean variable because the equality can evaluate to 1 or 0 , that is, whatever is required to ensure the problem is satisfiable. If the equality must be 1 , then $v$ can be assigned $t^{\prime}$ 's value, otherwise it can be assigned something different from $t$, such as $(t+1)$.

## Example 3.23

Consider the expression $\left(v^{[1]}:: v^{[1]}\right)$, where these are the only occurrences of the variable $v^{[1]}$. There are some values this expression cannot produce, for instance $(10)_{2}$. Even though $v$ has only a single parent, it has two edges from that parent. So the expression cannot be replaced by a fresh variable.

## Example 3.24

Brummayer [Bru09] considers the sub-expression $\left(\left(v_{0}+t\right)=\left(v_{1}\right.\right.$ bvand $\left.\left.v_{2}\right)\right)$. Assume these occurrences of $v_{0}, v_{1}$, and $v_{2}$ are the only ones, while $t$ is an otherwise arbitrary term. Because ( $v_{1}$ bvand $v_{2}$ ) can evaluate to any value, it can be replaced with a fresh variable $v_{3}$, giving $\left(v_{0}+t\right)=v_{3}$. Because $\left(v_{0}+t\right)$ can evaluate to any value, it too can
be replaced by a fresh variable, giving $\left(v_{4}=v_{3}\right)$. This can evaluate to either 1 or 0 , so can be replaced by a fresh propositional variable $b$. The sub-expression can be 1 or 0 , depending on what is required to make the problem satisfiable.

Bruttomesso [Bru08] describes the simplification applied to bit-vector problems. Brummayer independently developed the simplification, and gives the rules for the array variant, which he describes in section 3.4 of his thesis [Bru09]. Franzén [Fra10] gives the rules to build a model for the original problem from a model to the simplified problem. Following Franzén, STP2 keeps mappings between variables so that a model to the original problem can be calculated.

Most of the rules are straightforward; we give the rules in Table 3.1. Inequalities are complicated because of the possibility that they are necessarily 1 , such as $v \geq 0$, or necessarily 0 , such as ( $v^{[3]}>_{u} 111$ ), where $v$ is unconstrained. Multiplication by a constant is complicated because only odd constants have a unique multiplicative inverse. Table 3.1 omits some of the operations in the language (section 2.3), because they are removed at creation-time.

In Table 3.1 the "Model" column gives the expressions that produce a model for the original problem given a model for the transformed problem. They produce $a$ model, not every model.

STP2 also analyses the extracts from variables: if all of a variable's parents are extract expressions, and all of the extract expressions select different bits of the variable, then each extract expression is replaced by a fresh variable.

## Example 3.25

Suppose a problem contains just two references to $v^{[20]}$, namely $v^{[20]}[15,13]$ and $v^{[20]}[12,2]$. Because the extracts do not overlap, and there are no other references to $v$, each extract could be replaced by a fresh variable.

Because applying the unconstrained simplification is based on the syntactic appearance of terms, only some equivalent terms will be simplified. The rules of Table 3.1 will for instance replace ite $\left(p, b_{0}, b_{1}\right)$, where $p$ and $b_{1}$ are unconstrained, with a fresh variable. However, the different but equivalent expression $((p \Longrightarrow$ $\left.\left.b_{0}\right) \wedge\left((\neg p) \Longrightarrow b_{1}\right)\right)$ will be left unchanged.

| Expression | Condition | Replacement | Model |
| :---: | :---: | :---: | :---: |
| $\left(v_{0}^{[n]}:: v_{1}^{[m]}\right)$ | $u n c\left(v_{0}^{[n]}\right) \wedge u n c\left(v_{1}^{[m]}\right)$ | $v^{[n+m]}$ | $\begin{gathered} v_{0}^{[n]}=v[m+n-1, m] \\ \wedge v_{1}^{[m]}=v[m-1,0] \end{gathered}$ |
| $\begin{aligned} & \left(\text { bvnot } v_{0}^{[n]}\right) \\ & -v_{0}^{[n]} \\ & \neg b_{0} \end{aligned}$ | $\begin{aligned} & \operatorname{unc}\left(v_{0}^{[n]}\right) \\ & \operatorname{unc}\left(v_{0}^{[n]}\right) \\ & \operatorname{unc}\left(b_{0}\right) \end{aligned}$ | $v^{[n]}$ $v^{[n]}$ $b$ | $\begin{aligned} & v_{0}^{[n]}=\left(b v n o t v^{[n]}\right) \\ & v_{0}^{[n]}=-v^{[n]} \\ & b=\neg b_{0} \end{aligned}$ |
| $\left(v_{0}^{[n]}>_{s} v_{1}^{[n]}\right)$ | $\begin{aligned} & u n c\left(v_{0}^{[n]}\right) \wedge u n c\left(v_{1}^{[n]}\right) \\ & \wedge n>1 \end{aligned}$ | $b$ | $\begin{aligned} & v_{0}^{[n]}=\operatorname{ite}(b, 1,0) \\ & \wedge v_{1}^{[n]}=\operatorname{ite}(b, 0,1) \end{aligned}$ |
| $\left(v_{0}^{[n]}>_{u} v_{1}^{[n]}\right)$ | $\begin{aligned} & \operatorname{unc}\left(v_{0}^{[n]}\right) \wedge \operatorname{unc}\left(v_{1}^{[n]}\right) \\ & \quad \wedge n>1 \end{aligned}$ | $b$ | $\begin{aligned} & v_{0}^{[n]}=\operatorname{ite}(b, 1,0) \\ & \wedge v_{1}^{[n]}=\operatorname{ite}(b, 0,1) \end{aligned}$ |
| $\left(v_{0}^{[n]}>_{u} v_{1}^{[n]}\right)$ | $\begin{aligned} & \operatorname{unc}\left(v_{0}^{[n]}\right) \wedge \operatorname{constant}\left(v_{1}^{[n]}\right) \\ & \wedge v_{1}^{[n]} \neq \max ^{[n]} \end{aligned}$ | $b$ | $v_{0}^{[n]}=$ ite $\left(b, \min ^{[n]}, \max ^{[n]}\right)$ |
| $\left(v_{0}^{[n]}>_{s} v_{1}^{[n]}\right)$ | $\begin{aligned} & \operatorname{unc}\left(v_{0}^{[n]}\right) \wedge \operatorname{constant}\left(v_{1}^{[n]}\right) \\ & \wedge v_{1}^{[n]} \neq \max ^{[n]} \end{aligned}$ | $\underbrace{[n]} \neq m a{ }^{[n]})$ | $v_{0}^{[n]}=$ ite $\left(b, \min ^{[n]}, \max ^{[n]}\right)$ |
| $\left(v_{0}^{[n]}>{ }_{u} v_{1}^{[n]}\right)$ | $\operatorname{unc}\left(v_{0}^{[n]}\right)$ | $b \wedge\left(v_{1}^{[n]} \neq \max ^{[n]}\right)$ | $v_{0}^{[n]}=$ ite $\left(b\right.$, max $^{[n]}$, min $\left.^{[n]}\right)$ |
| $\left(v_{0}^{[n]}>_{s} v_{1}^{[n]}\right)$ | $\operatorname{unc}\left(v_{0}^{[n]}\right)$ | $b \wedge\left(v_{1}^{[n]} \neq \max ^{[n]}\right)$ | $v_{0}^{[n]}=$ ite $\left(b\right.$, max $^{[n]}$, min $\left.^{[n]}\right)$ |
| $\left(v_{0}^{[n]}>_{u} v_{1}^{[n]}\right)$ | $\operatorname{unc}\left(v_{1}^{[n]}\right)$ | $b \wedge\left(v_{0}^{[n]} \neq \min ^{[n]}\right)$ | $v_{1}^{[n]}=$ ite $\left(b, \min ^{[n]}\right.$, max $\left.^{[n]}\right)$ |
| $\left(v_{0}^{[n]}>_{s} v_{1}^{[n]}\right)$ | $u n c\left(v_{1}^{[n]}\right)$ | $b \wedge\left(v_{0}^{[n]} \neq \min ^{[n]}\right)$ | $v_{1}^{[n]}=$ ite $\left(b\right.$, min $\left.^{[n]}, \max ^{[n]}\right)$ |
| $b_{0} \wedge b_{1}$ | $\operatorname{unc}\left(b_{0}\right) \wedge \operatorname{unc}\left(b_{1}\right)$ | $b$ | $b_{0}=b \wedge b_{1}=b$ |
| $b_{0} \vee b_{1}$ | $u n c\left(b_{0}\right) \wedge u n c\left(b_{1}\right)$ | $b$ | $b_{0}=b \wedge b_{1}=b$ |
| $\left(v_{0}^{[n]}\right.$ bvand $v_{1}^{[n]}$ ) | $u n c\left(v_{0}^{[n]}\right) \wedge u n c\left(v_{1}^{[n]}\right)$ | $v^{[n]}$ | $v_{0}^{[n]}=v^{[n]} \wedge v_{1}^{[n]}=v^{[n]}$ |
| $\left(v_{0}^{[n]}\right.$ bvor $\left.v_{1}^{[n]}\right)$ | $u n c\left(v_{0}^{[n]}\right) \wedge u n c\left(v_{1}^{[n]}\right)$ | $v^{[n]}$ | $v_{0}^{[n]}=v^{[n]} \wedge v_{1}^{[n]}=v^{[n]}$ |
| $b_{0} \oplus b_{1}$ | $u n c\left(b_{0}\right)$ | $b$ | $b_{0}=b_{1} \oplus b$ |
| $\left(v_{0}^{[n]}\right.$ bvxor $\left.v_{1}^{[n]}\right)$ | $u n c\left(v_{0}^{[n]}\right)$ | $v^{[n]}$ | $v_{0}^{[n]}=\left(v_{1}^{[n]}\right.$ bvxor $\left.v^{[n]}\right)$ |
| ite( $\left.b_{0}, v_{0}^{[n]}, v_{1}^{[n]}\right)$ | $u n c\left(b_{0}\right) \wedge u n c\left(v_{0}^{[n]}\right)$ | $v^{[n]}$ | $b_{0} \wedge\left(v_{0}^{[n]}=v^{[n]}\right)$ |
| ite ( $\left.b_{0}, v_{0}^{[n]}, v_{1}^{[n]}\right)$ | $u n c\left(b_{0}\right) \wedge u n c\left(v_{1}^{[n]}\right)$ | $v^{[n]}$ | $\neg b_{0} \wedge\left(v_{1}^{[n]}=v^{[n]}\right)$ |
| ite ( $\left.b_{0}, v_{0}^{[n]}, v_{1}^{[n]}\right)$ | $u n c\left(v_{0}^{[n]}\right) \wedge u n c\left(v_{1}^{[n]}\right)$ | $v^{[n]}$ | $\left(v_{0}^{[n]}=v^{[n]}\right) \wedge\left(v_{1}^{[n]}=v^{[n]}\right)$ |
| $\left(v_{0}^{[n]} \div{ }_{u} v_{1}^{[n]}\right)$ | $u n c\left(v_{0}^{[n]}\right) \wedge u n c\left(v_{1}^{[n]}\right)$ | $v^{[n]}$ | $v_{0}^{[n]}=v^{[n]} \wedge v_{1}^{[n]}=1$ |
| $\left(v_{0}^{[n]} \times v_{1}^{[n]}\right)$ | $u n c\left(v_{0}^{[n]}\right) \wedge u n c\left(v_{1}^{[n]}\right)$ | $v^{[n]}$ | $v_{0}^{[n]}=1 \wedge v_{1}^{[n]}=v^{[n]}$ |
| $v_{0}^{[n]}=v_{1}^{[n]}$ | $\operatorname{unc}\left(v_{0}^{[n]}\right)$ | $b$ | $v_{0}^{[n]}=\operatorname{ite}\left(b, v_{1}^{[n]}, v_{1}^{[n]}+1\right)$ |
| $\left(v_{0}^{[n]}+v_{1}^{[n]}\right)$ | $u n c\left(v_{0}^{[n]}\right)$ | $v^{[n]}$ | $v_{0}^{[n]}=v^{[n]}-v_{1}^{[n]}$ |
| $\left(v_{0}^{[n]} \ggg l v_{1}^{[n]}\right)$ | $u n c\left(v_{0}^{[n]}\right) \wedge u n c\left(v_{1}^{[n]}\right)$ | $v^{[n]}$ | $v_{0}^{[n]}=v^{[n]} \wedge v_{1}^{[n]}=0$ |
| $\left(v_{0}^{[n]} \gg{ }_{a} v_{1}^{[n]}\right)$ | $u n c\left(v_{0}^{[n]}\right) \wedge u n c\left(v_{1}^{[n]}\right)$ | $v^{[n]}$ | $v_{0}^{[n]}=v^{[n]} \wedge v_{1}^{[n]}=0$ |
| $\left(v_{0}^{[n]} \ll v_{1}^{[n]}\right)$ | $u n c\left(v_{0}^{[n]}\right) \wedge u n c\left(v_{1}^{[n]}\right)$ | $v^{[n]}$ | $v_{0}^{[n]}=v^{[n]} \wedge v_{1}^{[n]}=0$ |

Table 3.1: Rules for the unconstrained variable simplification. unc is a predicate returning 1 iff its operand is a variable and is unconstrained. $v$ is a fresh variable. The table lists one rule for some commutative operations (e.g. xor); for these operations the rules are applied with the operands reversed, too. Starting from the top, use the first rule that matches the expression. "min" and "max" are the signed or unsigned minimum and maximum values for the appropriate bit width, respectively. We give the rules only for term ITEs, they are similar for propositional ITEs.

There are cases which a more sophisticated algorithm could simplify. For instance, assume $v$ does not occur elsewhere and consider: $\operatorname{ite}(p,(v+t)=6, v=4)$,
$((v \div u 10)<u 2)$, or $((v<10) \wedge(v>2))$. All of these expressions denote surjective functions, so can be replaced by a fresh variable.

### 3.16 Pure Literal Elimination

We perform pure literal elimination to identify Boolean variables that can be set to a constant. Initially, this approach was called the affirmative-negative rule [DP60]. Pure literal elimination over graphs has also been called monotone input reduction [JBH10]. Algorithm 3.6 calculates the polarities for each expression in a problem. This algorithm replaces any Boolean variable that has a polarity of TRUE with the 1 expression, and any Boolean variable with a FALSE polarity by the 0 expression.

### 3.17 Interval Analysis

STP2 performs a bottom up unsigned interval analysis. We use standard rules to derive the bounds of operations. The bounds for logical operations are given by Warren [War02], the bounds for integer arithmetic operations are standard and can be found in constraint programming textbooks, e.g., Marriott and Stuckey [MS98]. During the analysis, any expression that has the same lower and upper bound is replaced by the corresponding constant expression.

In chapter 4 we investigate a more sophisticated variant of this.
The interval analysis is fast and imprecise. We do not take care to ensure that the bounds produced are as tight as possible. The outline of the algorithm that we use, and the implementation for some of the operations is given as Algorithm 3.7.

### 3.18 Parameter Optimisation

SMT solvers, like other decision procedures, often have many configuration options. For instance Z 3 version 3.0 has 284 configuration options [dMP12]. STP2 has fewer, perhaps 30 that can be changed via the command line, but there are dozens more in the source code.

Parameter optimisation aims to find a good assignment to the configuration options of a decision procedure on some set of problems. To decide which simplifications to enable, we applied the parameter optimisation tool ParamILS [HBHH07,

```
Algorithm 3.6 Pure literal elimination. replace is called initially with the root
expression. It replaces any Boolean variable with a TRUE polarity by 1, and FALSE
by 0 . The possible polarities are TRUE, FALSE, and BOTH.
\begin{tabular}{lr}
\hline Require: \(e\) & // The current expression \\
Require: current & // The polarity of the current expression \\
Require: pol \(\leftarrow\}\) & // Maps from an expression to its polarity
\end{tabular}
    procedure calculate_polarity (e, current, pol)
        if \(\operatorname{pol}[e]=\) TRUE \(\wedge\) current \(\neq\) TRUE then
                pol[e] \(\leftarrow\) BOTH
        else if \(\operatorname{pol}[\mathrm{e}]=\) FALSE \(\wedge\) current \(\neq F A L S E\) then
            \(\operatorname{pol}[e] \leftarrow\) BOTH
        else if \(\operatorname{pol}[e] \neq B O T H\) then
            pol \([e] \leftarrow\) current
        end if
        if \(e\) matches \(\left(p_{0} \wedge p_{1}\right)\) then
            CALCULATE_POLARITy \(\left(p_{0}\right.\), current, pol \()\)
            Calculate_polarity \(\left(p_{1}\right.\), current, pol)
        else if \(e\) matches \(\left(p_{0} \vee p_{1}\right)\) then
            calculate_polarity \(\left(p_{0}\right.\), current, pol)
            CALCULATE_POLARITY ( \(p_{1}\), current, pol)
        else if \(e\) matches \((\neg p)\) then
            if current \(=\) BOTH then
                    Calculate_polarity \((p, B O T H, p o l)\)
            else if current \(=\) TRUE then
                    calculate_polarity \((p, F A L S E, p o l)\)
            else
                    CALCULATE_POLARITY \((p, T R U E, p o l)\)
            end if
        else
            for each child \(c\) of \(e\) do
                calculate_polarity ( \(c\), BOTH,pol)
            end for
        end if
    end procedure
    procedure replace( \(e\) )
        Create pol, a map from expressions to their polarity
        calculate_polarity (e, TRUE,pol)
        for all \((b\), polarity \() \in\) pol do // Iterate over all Boolean variables
            if polarity \(=\) TRUE then
                Eliminate from \(e\), with \(b \Longleftrightarrow 1\)
            else if polarity \(=F A L S E\) then
                    Eliminate from \(e\), with \(b \Longleftrightarrow 0\)
        end if
        end for
    end procedure
```

HHLBS09]. ParamILS performs a hill-climbing search with random restarts to select a good, but probably not optimal, combination of parameters for a solver. We ran ParamILS with STP2 on a selection of problems both from the SMT-LIB library and from STP2 users. Three of the simplifications that we have discussed so far

```
Algorithm 3.7 Unsigned interval analysis with some operations omitted. Initially,
the calculate_interval procedure is called with the root expression, and empty
lower and upper maps.
Require: \(e \quad\) // The current expression
Require: lower // Map from expressions to the lower bound
Require: upper // Map from expressions to the upper bound
    procedure calculate_interval(e, lower, upper)
        if \(e \in\) lower then
                return // Expression has already been evaluated. Only visit once.
            end if
            for each child \(c\) of \(e\) do calculate_interval(c, lower, upper)
            end for
            Create integer \(u \leftarrow 2^{\text {bitwidth(e) }}-1 \quad / /\) assign 1 for formulae
            Create integer \(l \leftarrow 0\)
            if \(e\) matches true then
                \(l \leftarrow u \leftarrow 1\)
            else if \(e\) matches (bvnot t) then
            \(l \leftarrow(b v n o t\) upper \((t))\)
            \(u \leftarrow(\operatorname{bvnot} \operatorname{lower}(t))\)
            else if \(e\) is a constant then
                \(u \leftarrow l \leftarrow\) toInteger \((e)\)
            else if \(e\) matches \(\left(t_{0}=t_{1}\right)\) then
            if \(\left(\operatorname{lower}\left(t_{1}\right)>\operatorname{upper}\left(t_{0}\right)\right) \wedge\left(\operatorname{lower}\left(t_{0}\right)>\operatorname{upper}\left(t_{1}\right)\right)\) then
                    \(l \leftarrow u \leftarrow 0\)
            end if
            else if \(e\) matches \(\left(t_{0}+t_{1}\right)\) then
            if \(\left(\operatorname{upper}\left(t_{1}\right)+\operatorname{upper}\left(t_{0}\right)\right)\) does not overflow then
                    \(l \leftarrow \operatorname{lower}\left(t_{0}\right)+\operatorname{lower}\left(t_{1}\right)\)
                    \(u \leftarrow \operatorname{upper}\left(t_{0}\right)+\operatorname{upper}\left(t_{1}\right)\)
            end if
        else if \(e\) matches ite \(\left(p, t_{0}, t_{1}\right)\) then
            \(l \leftarrow \min \left(\operatorname{lower}\left(t_{0}\right), \operatorname{lower}\left(t_{1}\right)\right)\)
            \(u \leftarrow \max \left(u p p e r\left(t_{0}\right), \operatorname{upper}\left(t_{1}\right)\right)\)
    end if
    if \(l=u\) then
        Replace \(e\) by constant \(l\)
    end if
    upper \((e) \leftarrow u\)
    lower \((e) \leftarrow l\)
    end procedure
```

are disabled by default in STP2: AIG rewriting (section 3.12), AIG rewriting of the Boolean abstraction (section 3.13), and ITE Simplifications (section 3.14).

When we give the results in the next section (section 3.19), we show that enabling AIG rewriting solves extra problems. Because the test set that we use in our

## CHAPTER 3. BUILDING A BETTER BIT-VECTOR SOLVER

evaluation differs from the test set that we used to optimise the parameters, the best selections of simplifications to enable is different.

### 3.19 Evaluation

To build a test set, we took the SMT-LIB QF_BV benchmark set as of January 2012. Then we discarded the asp family of benchmarks which is large (29GB), and contains encodings of problems we are uninterested in, for example: towers of Hanoi, travelling salesperson, and Sudoku problems. We discarded the mom family because it uses the define-fun syntax that STP2 cannot yet parse. We discarded the bruttomesso-core family because they contain no arithmetic. Next, we limited each family to 50 randomly chosen benchmarks that at least one of STP2 r1611 or Z3 3.2 [dMB08b], could not solve inside 1 second. We were left with 715 benchmarks in 31 families. Next we ran each problem using a memory limit of 3GB and a timeout of 500 seconds on a single core of an Intel E5507 Linux computer. This is the test set and configuration we use in the next chapter's evaluation, too (section 4.7).

There is a substantial variation in the solving time due to the bruttomesso families of hardware verification problems, so we report times for those families separately.

Table 3.2 compares STP2 r1654 with Z3 3.2, showing that STP2 is competitive with Z3. Table 3.3 shows the times for the Bruttomesso families; on these benchmarks, STP2 is not competitive with Z3. Overall STP2 performs well.

### 3.20 Relative Significance of Simplifications

To determine which of the simplifications we have presented are the most important, in this section we compare configurations of STP2 with individual simplifications disabled and enabled. The intention is to identify which simplifications are the most important.

In the evaluation, we use three different CNF simplifications. All of the CNF simplifications we use read and write the DIMACS CNF format, so can easily be used before SAT solving. The ability to simply use CNF simplification tools is an advantage of the eager approach. Integrating CNF simplification in the lazy SMT approach is much more time consuming. SatELite [EB05] converts a CNF into a simpler CNF by eliminating variables and subsumed clauses. We use the original

|  |  | STP2 with TM |  | STP2 r1654 |  | STP2-4Simp |  | Z3 3.2 |  |
| :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: |
| Family | \# | time | fail | time | fail | time | fail | time | fail |
| VS3 | 11 | 1 | 1/10 | 120 | 1/10 | 393 | 1/9 | 610 | 7 |
| brummayerbiere | 28 | 224 | 1/12 | 290 | 1/12 | 670 | 1/11 | 551 | 1/13 |
| brummayerbiere2 | 50 | 1941 | 13 | 1903 | 1/12 | 2897 | 9 | 2127 | 1/29 |
| brummayerbiere3 | 50 | 1524 | 24 | 1505 | 24 | 1319 | 24 | 1326 | 1/31 |
| calypto | 17 | 4 | 15 | 2 | 12 | 597 | 14 | 969 | 11 |
| galois | 3 | 0 | 3 | 0 | 3 | 0 | 3 | 0 | 3 |
| gulwani-pldi08 | 3 | 58 |  | 15 |  | 27 |  | 17 |  |
| pipe | 1 | 0 | 1 | 0 | 1 | 0 | 1 | 0 | 1 |
| rubik | 6 | 114 |  | 271 |  | 152 | 1 | 88 | 1 |
| sage:app1 | 50 | 52 |  | 238 |  | 69 |  | 216 |  |
| sage:app12 | 14 | 0 |  | 0 |  | 0 |  | 0 |  |
| sage:app2 | 1 | 0 |  | 0 |  | 0 |  | 9 |  |
| sage:app7 | 6 | 0 |  | 0 |  | 0 |  | 10 |  |
| sage:app8 | 50 | 36 |  | 19 |  | 43 |  | 63 |  |
| sage:app9 | 50 | 38 |  | 19 |  | 44 |  | 60 |  |
| spear:cvs_v1.11.22 | 28 | 42 |  | 44 |  | 50 |  | 134 |  |
| spear:inn_v2.4.3 | 50 | 81 |  | 46 |  | 113 |  | 294 |  |
| spear:openldap_v2.3.35 | 5 | 0 | 5 | 11 | 1 | 475 | 2 | 0 | 5 |
| spear:samba_v3.0.24 | 50 | 640 |  | 119 |  | 334 |  | 585 |  |
| spear:wget_v1.10.2 | 41 | 101 |  | 83 |  | 157 |  | 485 |  |
| spear:xinetd_v2.3.14 | 1 | 1 |  | 0 |  | 0 |  | 3 |  |
| spear:zebra_v0.95a | 5 | 3 |  | 4 |  | 8 |  | 16 |  |
| stp | 1 | 0 | 1/1 | 16 |  | 27 |  | 9 |  |
| stp_samples | 22 | 2 | 2 | 3 | 2 | 2 | 2 | 1 | 2 |
| tacas07 | 3 | 155 | 1 | 256 |  | 322 |  | 929 |  |
| uclid_contrib_smtcomp09 | 7 | 648 |  | 999 |  | 645 |  | 1598 |  |
| uclid:catchconv | 50 | 0 | 1/50 | 47 |  | 92 |  | 137 |  |
| uum | 7 | 35 | 6 | 31 | 6 | 17 | 6 | 11 | 6 |
| wienand-cav2008:Booth | 5 | 82 | 4 | 78 | 4 | 45 | 4 | 30 | 4 |
| Sum | 615 | 5791 | 147 | 6132 | 87 | 8511 | 86 | 10288 | 113 |
| Time incl. penalty |  | 79 |  | 497 |  |  |  | 669 |  |

Table 3.2: Problems solved by: STP2 with all simplifications disabled except for TM, STP2 default configuration, 4Simp-a simple solver (section 3.22), and the current version of SMT-COMP 2012 winner Z3 3.2. ' $\#$ ' is the number of problems in each family. 'time' is the time in seconds for successful instances. 'fail' is the number of failures. $1 / 19$, means 19 failures in total, one of which exceeds the memory limit. 'Time incl. penalty' is the sum of the successful times plus 501 seconds penalty for each failure. The bruttomesso benchmarks are given in Table 3.3.

SatELite implementation, and the implementation of PrecoSAT 570. We also use blocked clause elimination [JBH10] from Precosat 570.

Table 3.4 gives the number of failures for STP2 configurations with single simplifications enabled or disabled. Not all of the simplifications that we have discussed

|  |  | STP2 with TM |  | STP2 r1654 |  | 4Simp |  | Z3 3.2 |  |
| :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: |
| Family | \# | time | fail | time | fail | time | fail | time | fail |
| bruttomesso:lfsr | 50 | 6861 | 5 | 7434 | 4 | 1760 | 1 | 895 |  |
| bruttomesso:simple_processor | 50 | 3511 | 4 | 3576 | 5 | 1600 | 4 | 1765 | 3 |
| Sum | 100 | 10372 | 9 | 11010 | 9 | 3360 | 5 | 2661 | 3 |
| Time incl. penalty |  | 14881s |  | 15519s |  | 5865s |  | 4164s |  |

Table 3.3: Bruttomesso families benchmarks solved by configurations. Headings are as per Table 3.2.


Figure 3.9: Number of times the memory limit of 3GB or the time limit of 500s was exceeded when running 615 SMT-LIB2 problems.
are enabled by default in STP2. The Virtual Best Solver (VBS) gives the number of problems that no variant could solve. The VBS effectively runs each configuration in parallel and gives the time when the first configuration solves a problem (if any). Note that some simplifications solve more problems than the default STP2 configuration, for instance, using AIG rewrites solves four extra problems. Table 3.6 gives the results just for the Bruttomesso families. Figure 3.9 shows the relative number of failures on the 615 test problems.

Table 3.4 contains comparisons between three approaches for generating CNF. The STP2 original CNF encoding, which was inherited from an earlier version of STP, does not use AIGs; it uses a custom propositional-to-CNF encoding scheme. It solved 45 fewer problems than the AIG and TM approach. We use the ABC tool version abc70930. The Tseitin CNF encoding, implemented by the ABC tool, uses AIGs, and converts via the Tseitin transformation to CNF. TM has the highest
impact of any simplification. Note though that STP2 has various ways to encode the basic operations. For example, by selecting different configuration options, there are about 80 distinct ways to encode multiplication. We have used parameter optimisation to select good propositional encodings for bit-vector operations with the TM CNF translator. If parameter optimisation was reapplied with either the original or the Tseitin CNF transformation, then the difference would probably be reduced.

With clause count comparison disabled, 10 fewer problems are solved. So, the speculative transformations have made these 10 problems harder to solve. If the speculative transformations are disabled, 4 fewer instances are solved. So without the clause count comparison approach, the speculative transformations are harmful. Note that disabling the speculative transformations also disables the partial solver, which relies on the speculative transformations for correctness.

The virtual best solver answers 19 more problems than the default configuration. This demonstrates a challenge in building efficient bit-vector solvers: simplifications speed up some problems, but slow others down.

Table 3.5 shows the number of problems that were solved by particular configurations, that were not also solved by the default STP2 configuration. It shows that even though the original CNF encoding solved 45 fewer problems than the default STP2 configuration (Table 3.4), it solved one problem that the default STP2 configuration could not.

Table 3.5 also shows that disabling clause count comparison solves two problems that the default configuration does not. Disabling clause count comparison does not solve 12 problems the default configuration does, but solves two problems the default configuration fails upon. So the changes from speculative transformations are reverted for two problems, even though they made the problem easier to solve.

Table 3.6 gives the number of failures for different configurations when solving the Bruttomesso families. Unlike for the other benchmarks, where Glucose 2.0 solved the most problems, for the Bruttomesso families, Glucose 2.0 solves 25 fewer problems than Minisat 2.2.

Table 3.7 gives the measurements for STP2 with Glucose as a solver, and for STP2 and AIG rewriting. These configurations correspond to the best two configurations that we considered. Note that for the spear families, enabling AIG rewrites

| \# Failures | Configuration |
| :--- | :--- |
| 132 | Using STP original CNF encoding |
| 103 | Using ABC Tseitin CNF encoding |
| 97 | Disabling clause count comparison (section 3.7) |
| 92 | Enabling Precosat's Blocked clause elimination |
| 91 | Disabling speculative transformations and the partial solver |
| 90 | Disabling bit-blasting simplifications (section 3.8) |
| 90 | Disabling unconstrained simplification (section 3.15) |
| 90 | Disabling theory-level bit-propagation (chapter 4) |
| 89 | Disabling variable elimination (section 3.4) |
| 89 | Disabling creation-time simplifications (section 3.3) |
| 88 | Enabling AIG Boolean abstraction rewrite (section 3.13) |
| 88 | Disabling the pure literal rule (section 3.16) |
| 88 | Disabling the interval simplification (section 3.17) |
| 87 | Enabling the ITE simplifications (section 3.14) |
| 87 | Disabling the partial solver (section 3.5) |
| 87 | STP2 r1654 default configuration |
| 86 | Enabling Precosat's Satelite-style variable elimination |
| 83 | Enabling AIG rewriting (section 3.12) |
| 81 | Using SatELite CNF preprocessor and Glucose 2.0 |
| 68 | Virtual Best Solver |

Table 3.4: STP2 with simplifications enabled/disabled. The number of failures amongst 615 test problems excluding the Bruttomesso families is shown for each configuration. The fewer the failures the better.

| \# New problems solved |  |
| ---: | :--- |
| 1 | Disabling speculative transformations and the partial solver |
| 1 | Enabling Precosat's Blocked clause elimination |
| 1 | Enabling AIG Boolean abstraction rewrite (section 3.13) |
| 1 | Disabling unconstrained simplification (section 3.15) |
| 1 | Using ABC Tseitin CNF encoding |
| 1 | Using STP original CNF encoding |
| 1 | Disabling the interval simplification (section 3.17) |
| 1 | Disabling theory-level bit-propagation (chapter 4) |
| 2 | Disabling clause count comparison (section 3.7) |
| 5 | Enabling Precosat's Satelite-style variable elimination |
| 8 | Enabling AIG rewriting (section 3.12) |
| 10 | Using SatELite CNF preprocessor and Glucose 2.0 |

Table 3.5: Problems solved by STP2 with specified configuration that were not also solved by the default STP2 configuration. This excludes the Bruttomesso families.
considerably slows down solving. These instances generally take a few seconds to solve, so the AIG rewriting is not justified for these. However, the brummayerbiere3 family has far more problems solved with AIG rewriting enabled.

| Failures | Configuration |
| ---: | :--- |
| 45 | Using ABC Tseitin CNF encoding |
| 40 | Using STP original CNF encoding |
| 34 | Using SatELite CNF preprocessor and Glucose 2.0 |
| 22 | Enabling Precosat's Blocked clause elimination |
| 10 | Enabling AIG rewriting (section 3.12) |
| 10 | Disabling the pure literal rule (section 3.16) |
| 10 | Disabling the interval simplification (section 3.17) |
| 10 | Disabling theory-level bit-propagation (chapter 4) |
| 10 | Disabling the partial solver (section 3.5) |
| 9 | Disabling speculative transformations and the partial solver |
| 9 | Enabling AIG Boolean abstraction rewrite (section 3.13) |
| 9 | Disabling clause count comparison (section 3.7) |
| 9 | Disabling unconstrained simplification (section 3.15) |
| 9 | Enabling the ITE simplifications (section 3.14) |
| 9 | Disabling variable elimination (section 3.4) |
| 9 | Disabling creation-time simplifications (section 3.3) |
| 9 | STP2 r1654 default configuration |
| 7 | Disabling bit-blasting simplifications (section 3.8) |
| 4 | Enabling Precosat's Satelite-style variable elimination |
| 1 | Virtual Best Solver |

Table 3.6: Number of failures for the 100 Bruttomesso test problems with various configurations of STP2.

### 3.21 A Comparison of the Tseitin and TM Encodings

Table 3.4 showed that disabling the TM simplification had the largest improvement of any simplification we investigated. In this section, we compare the CNF encoding of bit-vector operations via the Tseitin and TM encodings.

For several operations we measure how long unit propagation takes, and how many assignments are derived by unit propagation for each encoding. We encode each operation in an equality expression, which allows us measure how much unit propagation occurs. For instance, to measure the bit-vector exclusive-or's propagation, we encode $\left(v_{0}^{[64]}=\left(v_{1}^{[64]}\right.\right.$ bvxor $\left.\left.v_{2}^{[64]}\right)\right)$ to CNF. We randomly set all of the bits of $\left(v_{1}, v_{2}\right)$ to one or zero with uniform probability, then calculate $v_{0}$. Next we delete $50 \%$ of the assignments. We apply unit propagation, but not search, to 100,000 such instances. After unit propagation completed, we counted the extra number of input and output bits that were assigned.

For some operations it is quick to compute the maximally precise (section 2.10) assignment. We discuss how we implement this in section 4.8. The results are

|  |  | STP2 |  | STP2+AIG rewrites |  | STP2 + Glucose |  |
| :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: |
| Family | \# | time | fail | time | fail | time | fail |
| VS3 | 11 | 120 | 1/10 | 0 | 1/11 | 186 | 1/9 |
| brummayerbiere | 28 | 290 | 1/12 | 293 | 1/12 | 380 | 1/11 |
| brummayerbiere2 | 50 | 1903 | 12 | 2113 | 3/12 | 1670 | 11 |
| brummayerbiere3 | 50 | 1505 | 24 | 1007 | 1/17 | 1569 | 23 |
| calypto | 17 | 2 | 12 | 441 | 11 | 855 | 9 |
| galois | 3 | 0 | 3 | 0 | 3 | 0 | 3 |
| gulwani-pldi08 | 3 | 15 |  | 18 |  | 6 |  |
| pipe | 1 | 0 | 1 | 0 | 1 | 0 | 1 |
| rubik | 6 | 271 |  | 92 |  | 64 |  |
| sage:app1 | 50 | 238 |  | 400 |  | 247 |  |
| sage:app12 | 14 | 0 |  | 0 |  | 0 |  |
| sage:app2 | 1 | 0 |  | 0 |  | 0 |  |
| sage:app7 | 6 | 0 |  | 1 |  | 0 |  |
| sage:app8 | 50 | 19 |  | 48 |  | 22 |  |
| sage:app9 | 50 | 19 |  | 51 |  | 22 |  |
| spear:cvs_v1.11.22 | 28 | 44 |  | 131 |  | 32 |  |
| spear:inn_v2.4.3 | 50 | 46 |  | 659 |  | 91 |  |
| spear:openldap_v2.3.35 | 5 | 11 | 1 | 119 | 3 | 1394 | 2 |
| spear:samba_v3.0.24 | 50 | 119 |  | 578 |  | 297 |  |
| spear:wget_v1.10.2 | 41 | 83 |  | 598 |  | 320 |  |
| spear:xinetd_v2.3.14 | 1 | 0 |  | 0 |  | 0 |  |
| spear:zebra_v0.95a | 5 | 4 |  | 28 |  | 6 |  |
| stp | 1 | 16 |  | 25 |  | 84 |  |
| stp_samples | 22 | 3 | 2 | 4 | 2 | 3 | 2 |
| tacas07 | 3 | 256 |  | 312 |  | 150 |  |
| uclid_contrib_smtcomp09 | 7 | 999 |  | 455 | 1 | 447 |  |
| uclid:catchconv | 50 | 47 |  | 132 |  | 114 |  |
| uum | 7 | 31 | 6 | 31 | 6 | 9 | 6 |
| wienand-cav2008:Booth | 5 | 78 | 4 | 90 | 4 | 21 | 4 |
| Sum | 615 | 6132 | 87 | 7639 | 83 | 8002 | 81 |
| Time incl. penalty |  | 497 |  |  | 49222s |  |  |

Table 3.7: Problems solved by various configurations. Columns are as per Table 3.2.
shown in Table 3.8. A higher percentage means more of the possible information was determined. The percentage given for the arithmetic shift is deceptively high because shifting random assignments is easy (we discuss this in section 4.10). The Plaisted and Greenbaum translation [PG86] is a more modern encoding, so it would make a better comparison for TM than comparing it to the Tseitin transformation. However, when measuring the SAT solving time of SMT-LIB bit-vector problems Jarvisalo et al. [JBH11] found only a small difference using each translation.

|  | TM Encoding |  |  |  |  | Tseitin Encoding |  |  |  |
| :--- | ---: | ---: | ---: | ---: | ---: | ---: | ---: | ---: | :---: |
| operation | clauses | time | extra | $\%$ | clauses | time | extra | $\%$ |  |
| signed $\geq$ | 693 | 0.95 | 17296 | 79 | 1340 | 1.87 | 15932 | 73 |  |
| unsigned less than | 681 | 0.97 | 17337 | 80 | 1325 | 1.90 | 15903 | 73 |  |
| equal | 310 | 0.43 | 49994 | 100 | 767 | 0.82 | 49826 | 100 |  |
| bit-vector xor | 384 | 0.98 | 2398227 | 100 | 704 | 1.45 | 2400319 | 100 |  |
| bit-vector or | 320 | 0.87 | 2799457 | 100 | 320 | 0.91 | 2797177 | 100 |  |
| bit-vector and | 320 | 0.75 | 2800824 | 100 | 320 | 0.70 | 2799984 | 100 |  |
| arithmetic shift | 2114 | 1.44 | 3249159 | 100 | 4289 | 2.85 | 3249174 | 100 |  |
| addition | 1011 | 1.78 | 1136975 | 67 | 2204 | 3.28 | 1138400 | 67 |  |
| subtraction | 1011 | 1.76 | 1139723 | 67 | 2204 | 2.92 | 1140825 | 67 |  |
| multiplication | 34350 | 20.45 | 148453 | - | 71504 | 81.64 | 148273 | - |  |
| unsigned division | 63738 | 117.99 | 3038078 | - | 166091 | 447.68 | 3038667 | - |  |
| unsigned remainder | 64074 | 124.02 | 757930 | - | 167429 | 456.71 | 719392 | - |  |
| signed division | 65624 | 62.05 | 1163098 | - | 170048 | 248.40 | 1065681 | - |  |
| signed remainder | 65761 | 61.96 | 211400 | - | 171377 | 232.60 | 164152 | - |  |

Table 3.8: A comparison of operations encoded via ABC's Tseitin and TM transformation. 100,000 iterations on a single core of an Intel Q8400 Linux computer were run. $50 \%$ of variables have a known assignment initially. 'clauses' is how many clauses the encoding contains (including an extra equality). 'time' is the sum of the time in seconds to perform unit propagations. 'extra' is the total number of extra assignment unit propagation determined. ' $\%$ ' is the percentage of the maximum possible number of assignments that were discovered. That is, the percentage of assignments discovered versus the maximally precise propagator. The $\%$ is not shown for some operators for which it is too expensive to calculate the maximally precise result.

In this section we do not measure the percentage of unsatisfiable assignments that are detected via unit propagation. This is another useful measure of an encoding's propagation strength.

The results show why the TM encoding is better than the Tseitin encoding. The TM encoding of the operations never has more clauses than the Tseitin encoding. Applying unit propagation to the TM encoding is sometimes much faster. For instance, unit propagation on the multiplication TM encoding is four times faster than applying it to the Tseitin encoding. The result of applying unit propagation to the TM encoding does not assign fewer variables. The TM encoding has fewer clauses, unit propagation completes faster on it, and more variables' assignments are deduced for the comparison operations.

## CHAPTER 3. BUILDING A BETTER BIT-VECTOR SOLVER

### 3.22 A Simple Fast Solver (4Simp)

We have already turned off single simplifications and measured the effect of this (section 3.20). In this section, we complement the prior sections by measuring STP2 with just a few simplifications enabled. We answer the question: "Which simplifications make a simple, fast bit-vector solver?". We show just that a few simplifications are the most important.

We start with STP2 r1654 with all simplifications disabled. Based on the prior results, TM was the single most important simplification, so we enable just that. The workflow of STP2 in this configuration is simple. Problems are parsed, structurally hashed, bit-blasted to AIGs, then encoded via TM to CNF. The results are shown in Table 3.2; with this configuration 147 benchmarks failed.

By trial and error we determined that turning on variable elimination and creation-time simplifications solved many of the uclid:catchconv family-all 50 of which failed with just TM enabled. Because it is easy to run the SatELite-style simplification, and the prior sections showed it helped, we enabled that too. We call this solver 4Simp; it is STP2 with only creation-time simplifications, variable elimination, Precosat's SatELite-style simplification, and TM. It solves one more problem than the default STP2 configuration, and 27 more than Z3 3.2. Enabling some other simplifications gives even better performance, but our intention is to show that a few simplifications are enough to build a competitive bit-vector solver. Of course, on different tests, different simplifications might help.

STP2 performs worse than 4Simp on the problems we have evaluated with. This is because STP2 has been tuned to solve problems from a different test set-those provided over time by users of STP. For the problems we evaluated with in this section, the 4Simp solver is better choice of simplifications.

We compared 4Simp using the variable elimination algorithm that we described in section 3.4 versus 4 Simp with a simpler variable elimination algorithm which only eliminates variables which are asserted to equal terms at the top level. For the SMT-COMP 2012 QF_BV division problems, the simpler variable elimination algorithm solved 189 problem, while the algorithm we describe solved 188.

### 3.23 Related Work

Bit-vector solvers have a rich history inspired initially by problems from electronics design automation. At SMT-COMP 2011, in the QF_BV division, the five top placed solvers were bit-blasting based solvers. Eager SAT based approaches are currently dominant at solving bit-vector problems derived from software. In this section, we focus mostly on differences between eager bit-vector solvers. In subsection 3.23.7, we discuss alternative approaches.

Tracy Larrabee [Lar90] bit-blasted circuits and used a SAT solver to discover test cases for hardware circuits. However, BDDs, popularised by Bryant [Bry86], dominated hardware verification problems until Biere et al. [BCCZ99] showed good results with a SAT approach.

Many variants of BDDs were tried, like *BMD ${ }_{( }$[Ard96] which have linear rather than BDDs' exponential memory growth as the bit-width of multiplication grows. The size of BDDs, and so the memory used, depends on its variable ordering, if a good variable ordering can be found, then for some problems BDDs are still faster than SAT [SD11].

Many theorem provers, such as ACL2, have bit-vector libraries [Rus99] to allow bit-vector reasoning inside the prover. To increase automation (theorem provers often require human intervention to direct the search for a proof), libraries have been implemented which can discharge theorems by bit-blasting to SAT [Fox11]. Proofs of unsatisfiability which are generated by some bit-vector solver, for instance Z3 [dMB08b], but not by STP2, can be automatically checked by the theorem prover to increase the confidence the result is correct [BFSW11]. Böhme et al. [BFSW11] compare a bit-blasting algorithm [Fox11] for HOL4 versus that of Z3, and find that Z3 was substantially faster at solving QF_ABV problems.

In another context, Huang [Hua08] has used bit-blasting to solve finite domain constraint programming problems.

Equivalence checking problems ask whether, for the same inputs, the output of two circuits can differ. Bit-blasting has been used in part to solve equivalence checking problems in hardware [KJJP09], and software [Smi11].

Little work has focused on solving problems in the quantified theory of bitvectors. John and Chakraborty [JC11] perform quantifier elimination before solving
with STP. Wintersteiger et al. [WHdM10] apply a simplification phase to quantified problems before instantiating quantifiers.

Fragments of QF_BV are decidable in polynomial time. For instance, Cyrluk et al. [CMR97] give an algorithm for solving problems with bit-vector variables, bitvector constants, concatenation, extraction and equality in polynomial time in the number of equalities.

Our results (section 3.19) showed that the virtual best solver solved 27 more problems than the default STP2 configuration. Adjusting the solver approach to fit with particular problems is clearly advantageous. De Moura and Passmore (unpublished [dMP12]) refer to the "strategy challenge" and advocate giving users the ability to exert control over the heuristics used to solve problems, that is, giving informed users the ability to specify the simplifications and solving techniques that are applied to a particular problem.

Portfolio solving runs different, or differently configured, solvers in parallel to solve a problem. The approach is successful for SAT solving [XHHLB08]. The difference between the virtual best solver, and the default configuration of STP2 show that a similar approach might be useful for bit-vector solving.

### 3.23.1 Spear

Domagoj Babić [Bab08] describes the Spear solver which won the SMT-COMP 2007 QF_BV division. Spear is an eager bit-vector solver. It rewrites the input expression, then encodes to CNF using encodings of operations that are optimised to have the few logical operations. Spear then simplifies the CNF before SAT solving.

A major novelty of Spear is the automatic parameter tuning of its SAT solver. We also used the ParamILS tool as described in section 3.18. Automatically tuning the SAT parameters reduced the average runtime of Spear on some problems from 780 seconds to 1.5 seconds.

Other potentially significant differences to STP2 are: the use of Guild divisor-a type of division circuit, and the conversion of division by constants into multiplications.

### 3.23.2 MathSAT

Roberto Bruttomesso [Bru08] described a bit-vector solver that is incorporated into the lazy MathSAT solver. MathSAT calculates the boolean abstraction (introduced in section 3.13) of bit-vector problems, and then uses a layered approach to solving. Each potentially spurious model is checked by an equality with uninterpreted functions solver before bit-vector solving occurs.

MathSAT uses an uninterpreted function solver to look for obviously inconsistent sets of assignments. For example, $((1 \times 2)=4) \wedge((1 \times 2)=3)$ is inconsistent because of functional congruence, irrespective of the interpretation of the multiplication operation.

Bruttomesso gives rules for the unconstrained simplification (section 3.15) that we followed with STP2.

### 3.23.3 UCLID

UCLID as described in Bryant et al. [ $\left.\mathrm{BKO}^{+} 07\right]$ uses under- and over-approximation of bit-vector problems. The problem is first under-approximated (i.e. replaced by one that entails it) and solved, and if the under-approximation is satisfiable, the procedure has completed. If it is unsatisfiable, the unsatisfiable core is examined and used to produce an over-approximation of the input.

To produce the under-approximation, the topmost bits of variables are constrained to all be equal. For instance, given a variable $\langle v[3], v[2], v[1], v[0]\rangle, v[3]$ through $v[1]$ might be constrained to all have the same value. Each time an underapproximation is produced, the number of top-most bits thus constrained is reduced

The over-approximation is produced by replacing each Boolean node with a fresh unconstrained variable, whenever the node does not appear in the underapproximation's unsatisfiable core.

During its abstraction phase, UCLID replaces hard operations with partial implementations. For instance, the multiplication of more than 4 bits is replaced by the partial implementation: ite $(x=0 \vee y=0,0, \operatorname{ite}(x=1, y$, ite $(y=1, x, \operatorname{mul}(x, y))))$, where $m u l$ is a uninterpreted function symbol.

### 3.23.4 Boolector

Robert Brummayer [Bru09] describes the open-source Boolector eager bit-vector and array solver.

Brummayer and Biere [BB09] describe an under-approximation technique to speed up solving unsatisfiable formulae. This is a more sophisticated implementation of the idea of Bryant et al. [BKO $\left.{ }^{+} 07\right]$. Extra constraints are encoded with the problem so that it is over-constrained. The bottom $n$ bits of a bit-vector have no additional constraints, while the topmost $m$ bits are constrained. The original formula is translated to CNF, with extra constraints.

For instance, to constrain the topmost two bits of $t^{[8]}$ to the sign extension, a new variable $e$ is added to the CNF. Then the additional clauses $(e \rightarrow(t[7]=t[5])) \wedge(e \rightarrow$ $(v[6]=v[5]))$ ) are asserted to the SAT solver, and $e$ is assumed. If the SAT solver reports that the problem is satisfiable, then the process is finished. However, if it is unsatisfiable, the actual models might have been erroneously removed, so it is necessary to assert $\neg$ e. Assumptions allow constraints to be removed from the SAT solver while keeping some of the learnt conflict clauses. To avoid too many refinement loops, the effective bit-width $n$ is usually doubled with each refinement. If the SAT solver yields "unsatisfiable", and none of the assumptions were used to derive the contradiction, then it is no longer necessary to search.

### 3.23.5 Z3

The Z3 solver [dMB11], which recently had its source code published, is the most widely used and capable SMT solver. Z3 supports the combination of many different theories, as well as features that STP2 does not implement, such as producing reasons for unsatisfiability and interpolants. There is limited published information about Z3's bit-vector implementation. In a mailing list, de Moura [dM11] describes that QF_BV solving in Z3 version 3 is based on preprocessing, then bit-blasting.

### 3.23.6 Beaver

Beaver [LS10] is an eager QF_BV solver. It performs rewrites followed by conversions to AIGs then via the ABC tool's TM to CNF. Beaver pre-computes AIG templates for expensive operations (multiplication, addition, division and remainder). When
these operations are bit-blasted to AIGs, these pre-simplified templates are instantiated. For instance, at design-time multiplication is encoded from Verilog into an AIG, then the ABC package's AIG optimisations are used to simplify the encoding. Performing this simplification at runtime would be too expensive. The technical report [LS10] compares the solving time using these optimised versus unoptimised templates, showing that the optimised templates are helpful. The optimised templates have two advantages: first, bit-blasting time is lowered because the templates are cheap to instantiate. Second, the templates can be carefully optimised off-line. We have not measured if this is also useful for STP2.

Another novelty of Beaver is that it replaces modulus, remainder and division operations by multiplication. Excluding consideration of division by zero, $\left(a \div{ }_{u} b\right)$ is replaced by $q$, with the additional constraint $a=q b+r \wedge r<b$, constrained at the top level. The addition and multiplication that are introduced are specially constrained to avoid overflow. The authors justify the rewriting of division, modulus and remainder to multiplication as being useful because division generates larger circuits as compared to multiplication. In Table 3.8, we showed that the multiplication operation's encoding has about half the number of clauses compared to division. We experimented with converting unsigned division to multiplication, but did not get a speedup.

When solving SMT-LIB problems, of the SAT solvers Limaye and Seshia experimented with, they found the non-clausal SAT Solver NFLSAT [JC09], which can read AIG input, the fastest.

Jha et al. [JLS09] also compare the ABC tool's implementation of TM versus the Tseitin encoding. The scatter plots they give show that TM is faster, but they do not quantify the difference. They found TM to significantly speed up the spear families. STP2 r1654 with TM solves 180 spear problems in 460 seconds, whereas with Tseitin it takes 310 seconds. We saw the largest difference in the Bruttomesso families, where the use of TM led to solving 36 more problems.

### 3.23.7 Other Approaches

We now discuss a few approaches other than the eager encoding to solve bit-vector problems.

Lazy SMT. The traditional SMT approach is the lazy approach [Seb07]. It follows the ideas of the Nelson-Oppen combination method, and abstracts the problem into a Boolean abstraction, on which the SAT solver produces candidate assignments that theory solvers check for consistency. The lazy SMT approach is good for combining theories, and for dealing with problems that have a large or infinite eager CNF encoding.

Integer Linear Programming. Zeng et al. [ZKC01] converts bit-vector problems to integer linear problems.

## Example 3.26

To linearise the logical and operation, where $(a=(b \wedge c))$, Zeng et al. [ZKC01] encode using integer variables that are either 0 or 1 , and then assert: $a_{i} \leq b_{i}, a_{i} \leq c_{i}$, and $a_{i} \geq b_{i}+c_{i}-1$.

Achterberg [Ach07] linearises bit-vector problems and solves them using a standard linear solver. He does not linearise all bit-vector operations, reporting that linearisation of the shift-left constraint on a 64-bit input requires 30944 inequalities, and 20929 new variables; too many to be practical. Bruttomesso [Bru08] also investigated linearisation for solving bit-vector problems.

Propagators. Bardin et al. [BHP10] built propagators for two domains for each bitvector operation. One domain is for constant bits, the same domain we investigate in chapter 4. The other is sets of unsigned intervals. Information is kept updated between the two domains. Using a worklist, all the propagators are run until a fixed point is reached, and then search occurs. An advantage of their approach is that the encoding size does not increase quadratically with the bit-width of the expressions. Their solver slows down a little bit as the size of problems is changed from 64 to 512 bits, but less drastically than the bit-blasting solvers slow down. Their propagators are not able to explain, in the sense of lazy clause generation [OSC09], why a conflict occurred. So, unlike SAT based approaches, there is no conflict driven clause learning to prune the search space.

### 3.23.8 Bit-Width Reduction

Bit-width reduction produces an equisatisfiable problem where the expressions have fewer bits.

These reductions can be performed in the presence of bitwise operations like "and", "or", and exclusive-or which operate uniformly on all the bits. That is, each bit $i$ of the output is a function just of the bit $i$ 's of the inputs. Care needs to be taken to ensure that the reduced bit-width is large enough to allow equations to be transitively equals / not equals. Johannsen and Drechsler [JD01] reduce the encoding of a hardware verification problem by $70 \%$ by applying the simplification upfront. This is similar to the decision procedure of Cyrluk et al. [CMR97].

## Example 3.27

Consider the expression $\left(y^{[32]}[15,0]=y^{[32]}[31,16]\right)$, where these are the only occurrences of $y$. An equisatisfiable expression with a fresh 2-bit variable is: $\left(v^{[2]}[1,1]=\right.$ $\left.v^{[2]}[0,0]\right)$. To convert a model of $v^{[2]}$ into a model of $y^{[32]}$, let $y^{[32]}=\left(0^{[15]}::\left(y^{[2]}[0,0]::\right.\right.$ $\left.\left(0^{[15]}:: y^{[2]}[1: 1]\right)\right)$.

We do not implement this simplification because the arithmetic operations commonly contained in software verification problems, such as addition and multiplication, do not operate uniformly on the bit-vector operands.

### 3.23.9 Peephole Optimisation

A peephole optimiser is a rewrite system in a compiler that replaces sequences of instructions with other equivalent, but better sequences of instructions. It is called a peephole optimisation because the replacement is made locally, while looking at a small piece of the program.

Sorav Bansal [Ban08] automatically derives peephole rules. The idea is to enumerate instruction sequences, then run those sequences on a few inputs. The result of the instructions is used to build a hash value, and the instruction sequence is stored in a hash table. When two instruction sequences with the same fingerprint are found, both sequences are run on extra inputs. If the output of each sequence is the same, then the sequences are encoded to SAT and checked for equivalence.

If they are equivalent, then a rewrite rule is produced which replaces the inferior sequence with the better one. We applied a similar idea when we generated equivalences (subsection 3.11.1).

The principal differences between finding rewrite rules for peephole and bitvectors are: the rules for bit-vectors should apply to all bit-widths; bit-vectors have a single output whereas machine instructions change processor flags, registers and memory; and instructions have irrelevant instructions intermixed, that is, the data dependency is not clear.

### 3.24 Conclusion

Research into QF_BV solvers aims to produce correct and faster solvers. We have described the architecture and simplifications that we developed to make STP2 and 4Simp, competitive modern bit-vector solvers.

This chapter contains descriptions of some novel approaches. In particular:

- The variable elimination algorithm (section 3.4), is a principled approach to isolating variables.
- The bit-blasting equivalence checking (section 3.8), transfers information derived by the AIGs back to the bit-vector theory-level.
- The approach to discovering equivalences (subsection 3.11.1), gives a way for authors of bit-vector solvers to discover equivalences that might be useful.

We showed how to automatically generate bit-vector equivalences by comparing bit-vector terms on a range of bit-widths. We found it most useful to use the equivalences that were discovered to check whether there were extra rules that we could include into STP2 and 4Simp. We found that too few of the rules matched larger instances, making it impractical to apply them.

We have explored the effect of simplifications on solving bit-vector problems. Of the simplifications we discussed, on the benchmarks we examined, the TM approach ([EMS07]) made the most dramatic improvement. We showed that applying it, along with creation-time simplifications, variable elimination, and SatELite preprocessing was enough to achieve a simple and competitive bit-vector solver.

In the next chapter we examine another transformation.

Theory-Level Bit Propagation

### 4.1 Introduction

IN this chapter we investigate whether it is useful for STP2 to have a simplification phase which calculates, at the theory-level, which bits must be true or false. STP2 was described in chapter 3.

We consider the case where argument or result values are partially known, that is, some bit values are known. Reasoning at this level has the potential to expose bit relationships that will be much harder to identify after the high level structure has been "lost in translation", that is, after a problem has been encoded in CNF. Our aim here is to explore whether the approach scales well enough to be useful for larger realistic examples.

It is important to understand that bit propagation, as we use the term, deals with multi-way information flow. In compiler theory, "constant propagation" is a "forwards" analysis, in which values of expressions may be deduced from the values of sub-expressions, and bit-vector solvers often incorporate this. With bit propagation we aim not only to deduce bit values of composite expressions from their sub-expressions' known bit values, but also, simultaneously, to deduce bit values of the sub-expressions from what is known about the composite expression's bit values.

Intuitively, for multiplication and allied operations, the relationships amongst result and argument bits are highly complex. However, for the most and least significant bit positions, important relationships can be extracted with relatively

## CHAPTER 4. THEORY-LEVEL BIT PROPAGATION

little effort, and this is often sufficient to enable constraint simplification or improved implicativity of generated clauses.

Our idea is as follows. Before encoding into CNF (as presented to a SAT solver), we apply inexpensive propagators that deduce some of the bit values that the input, output and intermediate values must take for the generated clauses to be satisfiable. We use the information that these propagators establish to simplify expressions before encoding, that is, to replace sub-expressions and variables with known values.

We aim to simplify problems expressed in the QF-BV language, a quantifier free theory of fixed-width bit-vectors. Let us briefly recall (from section 2.3): We use $t^{[n]}$ to represent a bit-vector expression $t$ of bit-width $n$, where $n>0$. We use square brackets for the extract operation, which extracts a single bit or a sequence of bits. $t[0]$ is the least significant (or rightmost) bit. Unsigned arithmetic operations interpret the bit-vector as the integer $\sum_{i=0}^{n-1} 2^{i} \times t[i]$. We indicate bit-vector constants as strings of 0 s and 1 s . For example, the unsigned integer corresponding to (110) ${ }_{2}$ is 6. Multiplication and addition are performed modulo $2^{n}$, so the result may overflow-which significantly complicates the analysis. Unsigned division performs truncating integer division, which never overflows. Signed remainder gives the remainder of signed division with rounding toward zero. Signed modulus gives the remainder of signed division with rounding toward negative infinity.

For each of the operations in the $\mathrm{QF} \_\mathrm{BV}$ language, we have built propagators that deduce bits' values from operations' inputs and output. We use these deduced bit values to simplify the problem before it is encoded to CNF, while the problem is still at the theory-level. This can identify some simplifications that are harder to find in a CNF encoding.

Another advantage of implementing propagators rather than bit-blasting to CNF is that propagators use less memory per operation. The CNF encoding of some operations, like signed division, is large. For instance, STP2 encodes a 64-bit signed division as about 65,000 clauses (Table 3.8); for each such operation STP2 uses 20 MB of memory, greatly limiting the number of operations that STP2 can handle. A 32-bit signed division is encoded as 17,500 clauses, and 128-bit signed division encoded as 262,000 clauses. The number of clauses and the memory used grows


Figure 4.1: The ternary domain (3)

| $\wedge$ | 0 | 1 | $\star$ |
| :---: | :--- | :--- | :--- |
| 0 | 0 | 0 | 0 |
| 1 | 0 | 1 | $\star$ |
| $\star$ | 0 | $\star$ | $\star$ |$\quad$| $\vee$ | 0 | 1 | $\star$ |
| :--- | :--- | :--- | :--- |
| 0 | 0 | 1 | $\star$ |
| 1 | 1 | 1 | 1 |
| $\star$ | $\star$ | 1 | $\star$ |$\quad$| $\oplus$ | 0 | 1 | $\star$ |
| :---: | :---: | :---: | :---: |
| 0 | 0 | 1 | $\star$ |
| 1 | 1 | 0 | $\star$ |
| $\star$ | $\star$ | $\star$ | $\star$ |$\quad$| $\neg$ |  |
| :---: | :---: |
| 0 | 1 |
| 1 | 0 |
| $\star$ | $\star$ |

Figure 4.2: Truth tables in a three valued logic for and, or, xor, and negation, as given by Kleene's strong three-valued logic $K_{3}$
roughly quadratically with the bit-width. The memory use of the propagators we describe in this chapter grows more slowly.

To reason about the values that separate bits may take, it is useful to introduce three-valued logic. Let $\mathbf{2}=\{0,1\}$ be the set of classical truth values. Figure 4.1 shows a Hasse diagram for the set of ternary truth values $\mathbf{3}=\{0,1, \star\}$. As can be seen, the ordering $\leq$ of these values is defined by $v \leq v^{\prime}$ iff $\left(v=v^{\prime}\right) \vee\left(v^{\prime}=\star\right)$. That is, $\leq$ is an ordering on information content: 0 and 1 are incomparable, but equally informative elements, whereas $\star$ represents absence of information. We can give the semantics of elements of $\mathbf{3}$ with a function $\gamma:: 3 \rightarrow \mathscr{P}(\mathbf{2})$ specifying the set of truth values each element of 3 corresponds to: $\gamma(0)=\{0\}, \gamma(1)=\{1\}, \gamma(\star)=\{0,1\}$. Propositional logic's strongest monotone extension to 3 is known as Kleene's (strong) 3-valued logic and is used extensively in the fields of program transformation and verification to reason about partial functions. Truth tables for Kleene's logic, often denoted $K_{3}$, are given in Figure 4.2.

In the next section we define (ternary) truth assignments, but for now we take the liberty of using ternary bit vectors with the "obvious" meaning. For example, we use $\langle 00 \star 1\rangle$ to denote a set of (classical) bit-vectors of length 4 , containing two vectors $\left\{(0001)_{2},(0011)_{2}\right\}$.

The input to our analysis is a formula, a well typed QF_BV expression of propositional type. The satisfiability problem is to find assignments to the variables that make the expression true. If there are no possible assignments, the formula is unsatisfiable, or equivalent to false. Before analysis, syntactically identical subexpressions are shared (structurally hashed), giving a rooted DAG, with a propositional root node. When we perform an analysis using 3, we calculate the sets of

## CHAPTER 4. THEORY-LEVEL BIT PROPAGATION

possible values at every node. We consider the value of the expression to come out the top (the root), and variables and constants to be the leaves at the bottom.

We assume the presence of a map $M$ which aids bit-blasting. $M$ maps QF_BV syntactic expressions to tuples of variables (ranging over 3). $M$ maps a propositional expression to a single variable, and a bit-vector expression of bit-width $n$ to a vector of $n$ variables. A 3 -valued assignment $\mu$ maps variables to elements of 3 . Propagation works by updating this assignment $\mu$.

In our analysis we begin by associating the output of each QF_BV node in the input expression with a vector of fresh variables, then set each variable to $\star$. Logical operations are associated with a vector of size one, bit-vector operations with fresh vectors of the appropriate width (see example below).

Because equalities are constraints, they may evaluate to 1 or 0 . When it is not clear from context we indicate that an equality must be true with a superscript $=^{t}$, similarly when it must be false as $=f$.

Occasionally we shall silently assume the presence of $M$ and $\mu$. For example, for brevity we may write $\langle 10\rangle+\langle 01\rangle=^{t}\langle\star \star\rangle$ for the the equation $t_{0}^{[2]}+t_{1}^{[2]}=t_{2}^{[2]}$, in the context of $M$ and $\mu$ defined as

$$
\begin{aligned}
& M\left[t_{0}\right]=\left\langle o_{0}, o_{1}\right\rangle \\
& M\left[t_{1}\right]=\left\langle o_{2}, o_{3}\right\rangle \\
& M\left[t_{0}+t_{1}\right]=\left\langle o_{4}, o_{5}\right\rangle \\
& M\left[t_{2}\right]=\left\langle o_{6}, o_{7}\right\rangle \\
& M\left[t_{0}+t_{1}=t_{2}\right]=\left\langle o_{8}\right\rangle \\
& \mu=\left\{o_{0} \mapsto 1, o_{1} \mapsto 0, o_{2} \mapsto 0, o_{3} \mapsto 1, o_{4} \mapsto 1, o_{5} \mapsto 1, o_{6} \mapsto \star, o_{7} \mapsto \star, o_{8} \mapsto 1\right\}
\end{aligned}
$$

As an example of performing bit propagation, consider the expression $\left(\left(b_{0} \vee b_{1}\right) \wedge\right.$ $\left(v_{0}^{[3]}<_{u}\left(4^{[3]} \times v_{1}^{[3]}\right)\right)$.

We first create a partial assignment of each node to an appropriately sized fresh vector $\left\langle o_{i}, \ldots, o_{j}\right\rangle$ of variables. We map constants directly to vectors of 1 or 0 in 3:

$$
\begin{array}{r}
M\left(\left(b_{0} \vee b_{1}\right) \wedge\left(v_{0}^{[3]}<_{u}\left(4^{[3]} \times v_{1}^{[3]}\right)\right)\right)=\left\langle o_{0}\right\rangle \\
M\left(b_{0} \vee b_{1}\right)=\left\langle o_{1}\right\rangle \\
M\left(v_{0}^{[3]}<_{u}\left(4^{[3]} \times v_{1}^{[3]}\right)\right)=\left\langle o_{2}\right\rangle
\end{array}
$$

$$
\begin{aligned}
M\left(4^{[3]} \times v_{1}^{[3]}\right) & =\left\langle o_{5}, o_{4}, o_{3}\right\rangle \\
M\left(b_{1}\right) & =\left\langle o_{9}\right\rangle \\
M\left(b_{0}\right) & =\left\langle o_{10}\right\rangle \\
M\left(4^{[3]}\right) & =\langle 100\rangle \\
M\left(v_{1}^{[3]}\right) & =\left\langle o_{11}, o_{12}, o_{13}\right\rangle \\
M\left(v_{0}^{[3]}\right) & =\left\langle o_{14}, o_{15}, o_{16}\right\rangle
\end{aligned}
$$

Next we propagate bits in any expressions that have operands or results that are known bits.

- $\left(\left(4^{[3]} \times v_{1}^{[3]}\right)\right) \equiv\left\langle 0_{5}, o_{4}, o_{3}\right\rangle$ sets $o_{3} \leftarrow 0$ and $o_{4} \leftarrow 0$.

Then, the value of the expression $o_{0}$ is set to 1 (true) in the partial assignment: $o_{0} \leftarrow 1$. Next propagators are applied until a global fixed point is reached:

- $o_{1} \wedge o_{2} \equiv o_{0}$, where $o_{0}=1$, sets $o_{1} \leftarrow 1$ and $o_{2} \leftarrow 1$.
- $\left(M\left(v_{0}^{[3]}\right)<\left\langle 0_{5}, 0,0\right\rangle\right) \equiv\langle 1\rangle$ sets $o_{14} \leftarrow 0$ and $o_{5} \leftarrow 1$.
- $\left(\left(M\left(4^{[3]}\right) \times M\left(v_{1}^{[3]}\right)\right)\right) \equiv\langle 1,0,0\rangle$ sets $o_{13} \leftarrow 1$.

Note that constraint propagation has deduced the value for one bit in each of $v_{0}^{[3]}$ and $v_{1}^{[3]}$, as well as some intermediate values. We can now conjoin these values with the CNF clauses that are sent to the SAT solver. The analysis we perform is not complete, but that is of little concern, since the SAT solver provides completeness.

Constant propagation is commonly implemented in bit-vector SMT solvers [LS10, BH08] and it has been studied in other contexts (see section 4.11). STP2, used in our evaluation later, performs constant propagation. There are cases where theory-level bit propagation, as introduced in this chapter, can surpass ordinary constant propagation. For example, bit propagation can determine that the formula $\left(1^{[1]}:: x^{[3]}\right)=\left(0^{[1]}:: y^{[3]}\right)$ where " $(::)^{\prime \prime}$ is concatenation, must evaluate to 0 , something constant value propagation cannot.

## CHAPTER 4. THEORY-LEVEL BIT PROPAGATION

### 4.2 Preliminaries

We now introduce the set of partial truth assignments as an ordered structure, related to sets of classical truth assignments. Let $\mathcal{A}_{2}=\operatorname{Var} \rightarrow \mathbf{2}$ be the set of 2 -valued (classical) truth assignments and let $\mathcal{A}_{3}=(\operatorname{Var} \rightarrow 3) \cup\{\perp\}$ be the set of 3 -valued truth assignments, extended with a special element $\perp$. The ordering $\leq$ on $\mathcal{A}_{3}$ is defined as follows: $\mu \leq \mu^{\prime}$ iff $\mu=\perp \vee \mu(v) \leq \mu^{\prime}(v)$ for all $v \in \operatorname{Var}$.

We also define the set of concrete ( 2 -valued) and abstract (3-valued) bit vectors, parameterised by bit-width, as

$$
\begin{aligned}
& \mathcal{V}_{2}^{[n]}=2^{n} \\
& \mathcal{V}_{3}^{[n]}=3^{n}
\end{aligned}
$$

where $2^{n}$ and $3^{n}$ denote the sets of $n$-tuples of elements of $\mathbf{2}$ and $\mathbf{3}$, respectively. We lift our semantic function for 3 to $\gamma:: \mathcal{V}_{3}^{[n]} \rightarrow \mathscr{P}\left(\mathcal{V}_{2}^{[n]}\right)$ in the obvious way:

$$
\gamma\left(\left(x_{1}, \ldots x_{n}\right)_{2}\right)=\gamma\left(x_{1}\right) \times \cdots \times \gamma\left(x_{n}\right)
$$

For example, $\gamma\left((\star 0 \star 1)_{2}\right)=\{0,1\} \times\{0\} \times\{0,1\} \times\{1\}=\left\{(0001)_{2},(0011)_{2},(1001)_{2},(1011)_{2}\right\}$.
We need to reason about relations over bit vectors (we are mostly interested in arithmetic functions, but since our propagators can also propagate information from outputs to inputs, it is most convenient to view $n$-ary functions as $(n+1)$-ary relations). An $n$-ary relation on $m$-bit vectors is a set of $n$-tuples of $m$-bit vectors listing the valid combinations of inputs and outputs. For example, 1-bit addition (as well as exclusive or) is captured by the relation

$$
\left\{\left\langle(0)_{2},(0)_{2},(0)_{2}\right\rangle,\left\langle(0)_{2},(1)_{2},(1)_{2}\right\rangle,\left\langle(1)_{2},(0)_{2},(1)_{2}\right\rangle,\left\langle(1)_{2},(1)_{2},(0)_{2}\right\rangle\right\}
$$

Thus it is convenient to define the sets of concrete and abstract $n$-tuples of $m$-bit vectors, ordered component-wise:

$$
\begin{aligned}
& \mathcal{T}_{2}^{[m] \backslash n\rangle}=\left(\mathcal{V}_{2}^{[m]}\right)^{n} \\
& \mathcal{T}_{3}^{[m][n\rangle}=\perp \cup\left(\mathcal{V}_{3}^{[m]}\right)^{n}
\end{aligned}
$$

Then a concrete relation is a set of tuples:

$$
\mathcal{R}_{2}^{[m]\langle n\rangle}=\mathscr{P}\left(\mathcal{T}_{2}^{[m]\langle n\rangle}\right)
$$

Note that we introduce a bottom element $\perp$ to our set of abstract values to serve as the abstraction for an empty set of concrete tuples. Also note that $\mathcal{V}_{3}^{[m]}$ (and hence $\left.\left(\mathcal{V}_{3}^{[m]}\right)^{n}\right)$ is a join-semilattice, which suffices to give $\mathcal{T}_{3}^{[m]\langle n\rangle}$ the structure of a lattice.

We define the semantics of an abstract tuple by lifting the $\gamma$ function to tuples much as we lifted it to bit vectors. We also define an abstraction function $\alpha$ to give the best abstraction for a set of concrete tuples.

$$
\begin{aligned}
\gamma & :: \mathcal{T}_{3}^{[m]\langle n\rangle} \rightarrow \mathscr{P}\left(\mathcal{T}_{2}^{[m]\langle n\rangle}\right) \\
\alpha & :: \mathscr{P}\left(\mathcal{T}_{2}^{[m]\langle n\rangle}\right) \rightarrow \mathcal{T}_{3}^{[m]\langle n\rangle} \\
\gamma(t) & = \begin{cases}\varnothing & \text { if } t=\perp \\
\gamma\left(x_{1}\right) \times \cdots \times \gamma\left(x_{n}\right) & \text { if } t=\left\langle x_{1}, \ldots x_{n}\right\rangle\end{cases} \\
\alpha(S) & =\bigsqcup S
\end{aligned}
$$

Here $\bigsqcup$ is the least upper bound operator on $\mathcal{T}_{3}^{[m]\langle n\rangle}$. Note that $\mathbf{2} \subseteq \mathbf{3}$, so if $\left\langle x_{1}, \ldots, x_{n}\right\rangle \in \mathcal{T}_{2}^{[m]\langle n\rangle}$ then $\left\langle x_{1}, \ldots, x_{n}\right\rangle \in \mathcal{T}_{3}^{[m]\langle n\rangle}$.

A propagator uses what we know about the behaviour of a function to derive extra information about the function's inputs and outputs from the information supplied. The input to a propagator is a single tuple of abstract values, and the output is the same as the input tuple, but perhaps strengthened. Thus a propagator for an $(n-1)$-ary function on $m$-bit integers has type

$$
\mathcal{P}_{3}^{[m]\langle n\rangle}=\mathcal{T}_{3}^{[m]\langle n\rangle} \rightarrow \mathcal{T}_{3}^{[m]\langle n\rangle}
$$

and is ordered point-wise. Note that a propagator can only ever strengthen its input, so it must be reductive (section 2.10). It also makes no sense for a propagator ever to produce a weaker output from a stronger input, so it is required to be monotone.

Given a concrete relation $R$, we can define the optimal propagator $P_{R}$ formally, like so:

$$
P_{R}(t)=\alpha(R \cap \gamma(t))
$$

## CHAPTER 4. THEORY-LEVEL BIT PROPAGATION

Informally, this says that $P_{R}$ is maximally precise. A propagator $p$ for an $n$-ary relation $R$ is sound iff $P_{R} \leq p$.

All the propagators that we build are sound. Apart from the cases of division, remainder and multiplication, we have not found instances where they are not optimal, although we have not proved optimality formally, for most operations. An efficient optimal propagator for multiplication would solve cryptographically important factorisation problems, so the likelihood of discovering such a propagator is low. In section 4.8 we describe how we have tested our propagators.

We have implemented propagation for all QF_BV operations; in this chapter we focus on the more interesting propagators, namely those for bit-wise and, addition, multiplication, and unsigned division.

The propagators we implement allow the result of an expression to partially determine the inputs. Consider $\left(\left(t_{0}^{[2]}\right.\right.$ bvand $\left.\left.t_{1}^{[2]}\right)={ }^{t} t_{2}^{[2]}\right)$, where $\mu=\left\{t_{0}=\langle\star \star\rangle\right.$, $\left.t_{1}=\langle\star 0\rangle, t_{2}=\langle 1 \star\rangle\right\}$. Our propagators use the inputs and outputs to refine the other values, giving additional information about inputs $t_{0}$ and $t_{1}: \mu=\left\{t_{0}=\langle 1 \star\rangle\right.$, $\left.t_{1}=\langle 10\rangle, t_{2}=\langle 10\rangle\right\}$.

As an example of a propagator consider the equality operation. When propagating from operands to result, the rule is: if both operands' bits are known and pairwise the same, the result is true. If any of the bits are different, the result is false. When propagating from the result to operands, there are two new rules: (1) If the result is fixed to 1 , then any fixed bits of one operand should be the same as the corresponding bits of the other. (2) If the result is fixed to 0 , and there is a single $\star$ value, and all the other bits are fixed to the same values, then that $\star$ value should be fixed to the negation of the value in the same position of the other operand; for example, given $\left(\langle 0 \star 0\rangle={ }^{f}\langle 000\rangle\right)$, the $\star$ value should be fixed to 1 .

If a propagator discovers an inconsistent assignment, it will set the partial assignment to empty $(\mu=\perp)$ which halts propagation. For instance, the partial assignment will be set to $\perp$ when processing the sub-expression: $\langle 0\rangle+\langle 1\rangle={ }^{t}\langle 0\rangle$, which means that the entire expression, not just that sub-expression, is unsatisfiable.

For convenience, in the rest of this chapter we use a shorter notation for the extract operation, using $x_{i}$ to mean $x[i]$.

Later we will discuss in detail propagators for the addition and multiplication operations. However, we start by describing in detail the bit-vector and propagator.

| initial | relations | result |
| :---: | :---: | :---: |
| $0 \wedge \star=\star$ | $0 \wedge 1=0,0 \wedge 0=0$ | $0 \wedge \star=0$ |
| $1 \wedge 1=\star$ | $1 \wedge 1=1$ | $1 \wedge 1=1$ |
| $0 \wedge 1=\star$ | $0 \wedge 1=0$ | $0 \wedge 1=0$ |
| $0 \wedge 0=\star$ | $0 \wedge 0=0$ | $0 \wedge 0=0$ |
| $\star \wedge \star=1$ | $1 \wedge 1=1$ | $1 \wedge 1=1$ |
| $1 \wedge \star=1$ | $1 \wedge 1=1$ | $1 \wedge 1=1$ |
| $0 \wedge \star=1$ | $\}$ | $\perp$ |
| $0 \wedge 1=1$ | $\}$ | $\perp$ |
| $0 \wedge 0=1$ | $\}$ | $\perp$ |
| $1 \wedge \star=0$ | $1 \wedge 0=0$ | $1 \wedge 0=0$ |
| $1 \wedge 1=0$ | $\}$ | $\perp$ |

Table 4.1: Given the input on the left hand side, the result is the rightmost column. We give only the rules that cause a change. Only some of the 27 possible permutations are shown.

### 4.3 A "Bit-Vector And" Propagator

In this section we take the simple "bit-vector and" (bvand) operation and define its propagator formally. Later in the chapter we focus on more difficult operations which we present less formally.

A bvand of bit-width $n$ takes the logical 'and' of two $n$-bit operands giving a result. It takes two operands $\left(v_{0}^{[n]}, v_{1}^{[n]}\right)$ and gives one result $\left(v_{2}^{[n]}\right)$. Let $D$ be a mapping from each variables' bits to a 3 value.

The bvand operation is a bit-wise operation, the value at position $k$ of some bit-vector, depends only on the values at $k$ of the other bit-vectors. So it's enough to show the properties for just a single bit.

The operation is commutative, so we give only some permutations in Table 4.1. The 'initial' column contains the initial assignments to variables, the relations column gives the valid assignments that when joined produce the 'result'.

The initial state summarises a set of relations. We show some of the relations that are summarised in the "relations" column. Taking the least upper join of the relations gives the result. For every $\star$ value in the result, there exists at least two relations in the set, where those relations have a 1 and 0 in the same position as the $\star$.

## CHAPTER 4. THEORY-LEVEL BIT PROPAGATION

### 4.4 Some Useful Propagators

For convenience, we implement the right shift propagator by reversing the first operand and the result, then using left shift propagator, then reversing the first operand and the result again. This requires a simple optimal reverse propagator. This reverse propagator swaps each bit at position $i$ with the bit at position $n-i$, where $n$ is the bit-width. Because of the extra reversing steps involved, our right shift propagator runs slightly slower than our left shift propagator.

If operands to a propagator are the same, then extra information may be determined. Consider the expression $(t+t)$. A propagator that recognises that the two operands are the same can determine that the result must be even. Our propagators do not consider such aliasing. However, in most cases, instances that would benefit from aliasing have already been removed during normalisation, for example, replacing $(t+t)$ by $(2 \times t)$.

### 4.4.1 An Addition Propagator

Our addition propagator performs an interval analysis to estimate (the minimum and maximum of) the addends that may be 1 , for each column. One of our multiplication propagators (presented in section 4.4.2) also uses this same approach. We use intervals because they allow simple reasoning about the full adder's majority and parity functions.

Let $x_{i}, y_{i}$, and $c_{i}$ be the single bit addends of column $i$. The $c$ variables are created by the addition propagator, and are not used outside it. We call $c_{i}$ the carry-in to the column and $c_{i+1}$ the carry-out. $c_{0}$ is set to $\langle 0\rangle$ whereas $c_{n}$, where $n$ is the bit-width, is ignored. The result bit is $r_{i}=x_{i} \oplus y_{i} \oplus c_{i}$. The carry-out is the majority function: $c_{i+1}=\left(x_{i} \wedge c_{i}\right) \vee\left(y_{i} \wedge c_{i}\right) \vee\left(x_{i} \wedge y_{i}\right)$.

The details of the operation are shown in Algorithm 4.1. For each column we calculate a lower bound $l$ and an upper bound $u$ of the number of elements of $\left\{x, y, c_{i n}\right\}$ which may possibly be 1 . The propagator works one column at a time, generally moving from less significant bits towards more significant bits. However, if some carry-in is updated in the process, then propagation moves back to the prior column (if it exists).

We do not have a formal proof that the addition propagator is maximally precise. However, we show in Table 4.5 that it is maximally precise from a bit-width of 1 to 5. The full-adder at bit $l$ depends just on the carry-in from bit $l-1$, the carry-out to bit $l+1$, one bit of each operand, and the resulting bit. Because each full-adder is local and interacts with its immediate neighbours only, it should be possible to use this property to construct an inductive proof of precision. We leave this, however, for future work.

## Example 4.1

Consider applying Algorithm 4.1 to a column $i$ in the context $\mu=\left\{x_{i} \mapsto 1, y_{i} \mapsto\right.$ $\left.\star, c_{i} \mapsto \star, r_{i} \mapsto 1, c_{i+1} \mapsto 1\right\}$.

Initially $l=1$ and $u=3$; next, because $c_{i+1}$ is $1, l$ is set to 2 ; and next, because $r_{i}$ is odd, $l$ is incremented to 3 . Now $l=u$, so in line $25, \mu\left(y_{i}\right) \leftarrow 1$ and $\mu\left(c_{i}\right) \leftarrow 1$. Finally, because the $c_{i}$ value has changed, propagation moves back to the prior column.

## Example 4.2

For a more complex example, involving both left and right sweeps across the bitvectors involved, consider $x=\langle 000001 \star \star\rangle, y=\langle 00000 \star 1 \star\rangle$, and $r=\langle\star \star \star \star 111 \star\rangle$. For $i=0,1,2$, the body of Algorithm 4.1's while loop does nothing. For $i=3$, however, the carry-in gets determined. More specifically, we find that $l=u=1$, and consequently $c_{3}$ is determined to be 1 . This causes the algorithm to revisit the previous column $(i=2)$, this time finding a larger lower bound $l=3$. But this means all bits in that column are 1 , that is, $y_{2}=c_{2}=1$. Similarly, now that the carry-in $c_{2}$ has been determined, attention shifts to column 1, where it is determined that $x_{1}=c_{1}=1$. For column 0 , propagation is particularly effective: From the outset we had no knowledge of any of $x_{0}, y_{0}$, and $r_{0}$. The propagation method finds $l=u=2$, which means we must have $x_{0}=y_{0}=1$ and $r_{0}=0$.

After this backwards sweep, the algorithm again proceeds right-to-left, picking up at column 3 , where $c_{4}$ is determined as being 0 . For columns 4 to 7 , the remaining unknown bits are then easily determined: clearly $r_{4}=r_{5}=r_{6}=r_{7}=0$. At this point the algorithm stops, having determined all bits.

```
Algorithm 4.1 Propagating information about the addition relation for an addition
\(x^{[n]}+y^{[n]}=r^{[n]}\).
Require: \(x^{[n]}, y^{[n]}, r^{[n]}\), lists of ternary variables
Require: ones \((i)\) yields number of \(x_{i}, y_{i}, c_{i}\) that are 1
Require: nonzeros( \(i\) ) yields number of \(x_{i}, y_{i}, c_{i}\) that are 1 or
    Create \(c^{[n+1]}\), a list of ternary variables
    Initialise all \(c\) variables to
    \(c_{0} \leftarrow 0\)
    Create integers \(i, l, u\)
    \(i \leftarrow 0\)
    while \(i<n\) do
        \(l \leftarrow\) ones \((i)\)
        \(u \leftarrow\) nonzeros \((i)\)
        if \(c_{i+1}=1\) then \(l \leftarrow \max (2, l)\) end if
        if \(c_{i+1}=0\) then \(u \leftarrow \min (1, u)\) end if
        if \(r_{i}=1\) and \(l\) is even then increment \(l\) end if
        if \(r_{i}=1\) and \(u\) is even then decrement \(u\) end if
        if \(r_{i}=0\) and \(l\) is odd then increment \(l\) end if
        if \(r_{i}=0\) and \(u\) is odd then decrement \(u\) end if
        if \(l \geq 2\) then \(c_{i+1} \leftarrow 1\) end if
        if \(u \leq 1\) then \(c_{i+1} \leftarrow 0\) end if
        if \(u<l\) then return \(\perp\) end if
        \(c^{\prime} \leftarrow c_{i}\)
        if \(l=u\) then
            \(r_{i} \leftarrow\) the parity of \(l\)
            if ones \((i)=l\) then
                set each addend that is \(\star\) to 0
            end if
            if nonzeros \((i)=l\) then
                set each addend that is \(\star\) to 1
            end if
        end if
        if \(c^{\prime} \neq c_{i} \wedge i>0\) then
            decrement \(i\)
        else
            increment \(i\)
        end if
    end while
```

An equivalent way to reason about the equations is using the truth tables for 3 (Figure 4.2). For instance, using the result bit equation, if there is only a single $\star$ value, re-arranging the equation to isolate the $\star$ value variable gives the value it should take.

The algorithm is linear in $n$ : once some column $i$ has been revisited owing to $c_{i+1}$ having been set, it will not be revisited again (except in the course of the overall right-to-left sweep).

The soundness of Algorithm 4.1 follows from each step in the algorithm being correct. Termination is somewhat less clear, because $i$ may grow or shrink in the body of the while loop. However, for successive iteration steps, consider the pairs $(s, n-i)$ of natural numbers, where $s$ is the number of bits in $x, y, r$, and $c$ that are undetermined ( $\star$ ). It is easy to see that each iteration strictly decreases (s,n-i) ordered lexicographically. That is, either some bits are determined (that is, $s$ decreases), or else $s$ remains unchanged, and as a result of the lack of change, $i$ is incremented. Termination follows, since $\mathbb{N}^{2}$, ordered lexicographically, is wellfounded.

### 4.4.2 Multiplication Propagators

In this section we describe two propagation methods that we combine to produce our multiplication propagation solver. The combination is an efficient, albeit not optimal, propagator. The two complement each other: for each method there are instances where it can fix bits that the other cannot.

The first propagator enforces consistency over the number of trailing zeroes of the operands. The second propagator is a more general version of the addition propagator already described. Instead of performing an exclusive-or over just three addends, we apply it over an arbitrary number of addends.

## Consistency of the Number of Trailing Zeroes

This propagator exploits the fact that, given $x \times y=r$, the sum of trailing zeroes of $x$ and $y$ equals the number of trailing zeroes in $r$. If the sum is greater than, or equal to, the bit-width, then the result will be zero.

For proof consider that $x$ can be written as $v \times 2^{l}$, and $y$ can be written as $w \times 2^{m}$, where $v$ and $w$ are odd, and where $l$ and $m$ are the number of trailing zeroes in the respective operands. Zero can be written as $\left(1 \times 2^{n}\right)$. The result of multiplying the values is $v \times w \times 2^{l+m}$. Because both $v$ and $w$ are odd, the bit in position $m+l$ of the result will be 1 , and all less significant bits in the result are 0 .

```
\(\overline{\text { Algorithm 4.2 Enforcing a consistent number of trailing zeroes on a multiplication's }}\)
\(x\) operand where \(x \times y=r\) and each variable is a vector of variables. For each position
in \(x\) the algorithm checks if that position can be the rightmost 1 value, and if not,
sets it to 0 . We call it twice, once for \(x, y, r\), and once for \(y, x, r\).
Require: \(x^{[n]}, y^{[n]}, r^{[n]}\), lists of ternary variables
    \(t y=\) index of the least significant possible 1 value of \(y\), (that is, \(y_{t y}=1\) or
    \(y_{t y}=\star\) ). If \(y=0\), then \(t y=n\).
    \(\operatorname{tr}=\) index of the least significant 1 value of \(r\).
    \(\min =\min (t y, t r, n)\).
    for \(i \in 0 . .(n-1)\) do
        if \(x_{i}=1\) then return
        else if \(x_{i} \neq 0\) then
            for \(j \in 0 . . \min\) do
                if \((j+i \geq n) \vee\left(y_{j} \neq 0 \wedge r_{i+j} \neq 0\right)\) then return
                    end if
            end for
            \(x_{i} \leftarrow 0\)
        end if
    end for
```

Our propagator starts from the least significant bit of an operand and checks if the bit can be 1 ; if it can, we stop. Otherwise, it sets the variable to 0 and continues checking. The method is shown in Algorithm 4.2. We shall show an example of running this propagator shortly.

Setting bits in the result $r$ is performed separately. The number of trailing zeroes in both $x$ and $y$ is summed, and that many trailing bits in $r$ are set to zero.

Let us give some examples of reasoning about the trailing-zeroes, where we assume the constraints are asserted at the top level.

## Example 4.3

Consider the initial constraint $(\langle 111 \star\rangle \times\langle 11 \star \star\rangle=\langle 1000\rangle)$. This gets strengthened to $(\langle 1110\rangle \times\langle 1100\rangle=\langle 1000\rangle)$. The last bit of the first operand cannot be 1, because then the result would have at most two trailing zeroes, so it is set to 0 . Similar reasoning holds for the second operand. Since this is the first example, let us trace in detail how Algorithm 4.2 proceeds:

Consider $x=\langle 111 \star\rangle, y=\langle 11 \star \star\rangle, r=\langle 1000\rangle$. Initially $\min =\min (2,3,4)=2$, and soon we have $i=0$, and $j=0$. The first time line 8 is reached, the test is $(0+0 \geq n) \vee\left(y_{0} \neq 0 \wedge r_{0+0} \neq 0\right)$, which fails. The second time, when $j=1$, the test is $(1+0 \geq n) \vee\left(y_{1} \neq 0 \wedge r_{0+1} \neq 0\right)$, which also fails. Again, the third time, when $j$ has
reached min, $\left.(2+0 \geq n) \vee\left(y_{2} \neq 0 \wedge r_{0+2}=0\right)\right)$ fails. Hence $x_{0}$ is set to 0 , and attention turns to $x_{1}$. Because $x_{1}=1$, we return (line 5). Note that $x$ is now fully determined.

Next $x$ and $y$ are swapped, and the algorithm is called again. So let $x=\langle 11 \star \star\rangle$, $y=\langle 1110\rangle, r=\langle 1000\rangle$. Initially $\min =1$, and soon we have $i=0, j=0$. The first time line 8 is reached, the test is $(0+0 \geq n) \vee\left(y_{0} \neq 0 \wedge r_{0+0} \neq 0\right)$, which fails (as $\left.y_{0}=0\right)$. The second time, when $j=1$, the test is $(1+0 \geq n) \vee\left(y_{1} \neq 0 \wedge r_{0+1} \neq 0\right)$, and again this fails (as $r_{1}=0$ ). Hence $x_{0}$ is set to 0 , and attention turns to $x_{1}$. We have $x_{1}=\star$, and again the test at line 8 fails repeatedly, first because $y_{0}=0$, and next because $r_{2}=0$. So $x_{1}$ is also set to 0 , and so all bits have been determined.

## Example 4.4

Consider $(\langle 1 \star\rangle \times\langle\star \star\rangle=\langle 00\rangle)$. At the start of Algorithm 4.2, ty is set to zero, and $t r$ is set to two. The algorithm will strengthen this to $(\langle 1 \star\rangle \times\langle\star 0\rangle=\langle 00\rangle)$. This makes sense: If the last bit of the second operand was 1 , there would be at most one trailing 0 in the result, so the bit must be 0 .

## Example 4.5

As another example of Algorithm 4.2, consider $(\langle\star \star \star\rangle \times\langle 0 \star 0\rangle=\langle 10 \star\rangle)$. First, because the second operand has one trailing 0 , the result must have at least one trailing 0 , which yields $(\langle\star \star \star\rangle \times\langle 0 \star 0\rangle=\langle 100\rangle)$. Second, if the last bit of the first operand is 1 , then it is not possible to get exactly two trailing zeroes in the result. So it must be 0 , yielding $(\langle\star \star 0\rangle \times\langle 0 \star 0\rangle=\langle 100\rangle)$. Note that the second bit of both operands must be 1 . The algorithm described in the next section sets these bits.

## Bounds Consistency over Partial Products

The second multiplication propagator we implement is a generalised version of the addition propagator that we described in subsection 4.4.1. The implementation is complicated because it adds an arbitrarily large number of partial products.

Our bounds consistency propagator operates on a table of partial products (see Figure 4.3). The table of partial products contains a column for each bit of the


Figure 4.3: A 4-bit multiplication's table of partial products. Column zero is on the right.
output. In each column, one value is taken from each operand, where the sum of their indices equals the index of the column. The two values are conjoined. For example, column 2 contains $\left\{x_{0} \wedge y_{2}, x_{1} \wedge y_{1}, x_{2} \wedge y_{0}\right\}$. The exclusive-or of these values, when combined with the carry-in, gives the resulting bit. However, the formula for the carry-in quickly gets complicated, so instead of summing using exclusive-or, we use normal integer addition, and take the parity of the result. Since information is partial, rather than working with integers, we deal with integer intervals. For this reason we refer to the technique as column bounds propagation.

For each column in the table of partial products, without considering that some partial products contain the same variables, the propagator establishes an integer interval (bounds) on both the partial product count (ppc) and the sum (which includes the carry).

Given three vectors of abstract variables $x, y$ and $r$ of bit-width $n$, where $x^{[n]} \times$ $y^{[n]}=r^{[n]}$, we create a set of products $\left(P_{c}\right)$ for each column number $c$, in $(0 \ldots(n-1))$. Let $P_{c}=\{(i, j) \mid i+j=c\}$. The partial product count is the number of partial products known to evaluate to 1 , that is: $p p c_{c}=\left|\left\{(i, j) \in P_{c} \mid x_{i}=y_{j}=1\right\}\right|$. We define $p p c_{c}^{\downarrow}$, to be the lower bound of the partial product count, that is, $p p c_{c}$ evaluated with all $\star$ assignments $(\mu(v)=\star)$, replaced by $0(\mu(v) \leftarrow 0)$. The upper bound $p p c_{c}^{\uparrow}$ is the sum evaluated with all $\star$ values in the partial assignment replaced by 1 . The sum is defined recursively: $\operatorname{sum}_{0}=p p c_{0}$, and $\operatorname{sum}_{c}=p p c_{c}+\left\lfloor\frac{\text { sum }_{c-1}}{2}\right\rfloor$, for $c$ in $\{1 \ldots(n-1)\}$. That is, the sum is the sum of the partial products in column $c$, together with all

```
Algorithm 4.3 Column bounds propagator for multiplication \(x \times y=r\)
Require: \(x^{[n]}, y^{[n]}, r^{[n]}\), in the context of \(x \times y=r\).
    \(P_{c}=\{(i, j) \mid i+j=c\}\)
    for \(c \in\{0 . . n-1\}\) do
        \(p p c_{c}^{\downarrow} \leftarrow\left|\left\{(i, j) \in P_{c} \mid x_{i}=y_{j}=1\right\}\right|\)
        \(p p c_{c}^{\uparrow} \leftarrow\left|\left\{(i, j) \in P_{c} \mid x_{i} \neq 0 \wedge y_{j} \neq 0\right\}\right|\)
    end for
    sum \(_{0}^{\downarrow} \leftarrow p p c_{0}^{\downarrow}\)
    sum \(_{0}^{\uparrow} \leftarrow p p c_{0}^{\uparrow}\)
    for \(c \in\{1 . . n-1\}\) do
        \(s u m_{c}^{\downarrow} \leftarrow p p c_{c}^{\downarrow}+\left\lfloor\frac{\text { sum }_{c-1}^{\downarrow}}{2}\right\rfloor\)
        \(s u m_{c}^{\uparrow} \leftarrow p p c_{c}^{\uparrow}+\left\lfloor\frac{\operatorname{sum}_{c-1}^{\uparrow}}{2}\right\rfloor\)
    end for
    repeat
        for \(c \in\{0 . . n-1\}\) do
            if \(r_{c} \neq \star \wedge \operatorname{parity}\left(\right.\) sum \(\left._{c}^{\downarrow}\right) \neq r_{c}\) then
                    increment sum \({ }_{c}^{\downarrow}\)
            end if
            if \(r_{c} \neq \star \wedge \operatorname{parity}\left(\operatorname{sum}_{c}^{\uparrow}\right) \neq r_{c}\) then
                decrement sum \(_{c}^{\uparrow}\)
            end if
        end for
        Perform integer bounds propagation on \(s u m^{\downarrow}, s u m^{\uparrow}, p p c^{\downarrow}\) and \(p p c^{\uparrow}\) variables
    (Algorithm 4.4)
        if for some column \(c, p p c_{c}^{\downarrow}>p p c_{c}^{\uparrow}\) or \(s u m_{c}^{\downarrow}>s u m_{c}^{\uparrow}\) then
            set \(\mu=\perp\) and return
        end if
        Do singleton interval propagation on \(x, y\), and \(r\) variables (Algorithm 4.5)
    until all \(x, y\) and \(r\) bit values are stable
```

the carries that spill into that column. For this we likewise use lower and upper bounds $s u m_{c}^{\downarrow}$ and $s u m_{c}^{\uparrow}$.

When the lower and upper bounds of a sum coincide and $s u m_{c}$ is odd, then the result of that column is 1 (that is, result $_{c}=1$ ). If it is even then result $=0$.

We perform propagation on the intervals until they reach a fixed point. The detailed method is shown in Algorithm 4.3. Lines 1-5 initialise the ppc variables by counting the minimum and maximum number of partial products in each column. Lines 6-11 initialise the lower and upper bounds of the sum, for $i \in\{1 . . n\}$ (as usual the division rounds towards zero). Lines 12-26 is the workhorse of the algorithm which repeatedly tightens lower and upper bounds of the sum and ppc variables. First (lines 13-20), if the result bit of a column is known, then we enforce that the

```
Algorithm 4.4 Column bounds propagator: Integer bounds propagation
    Apply propagation using the following propagators:
```

```
\[
\begin{aligned}
& \begin{array}{l}
\operatorname{sum}_{0}^{\downarrow} \leftarrow \max \left(\text { sum }_{0}^{\downarrow}, p p c_{0}^{\downarrow}\right) \\
\operatorname{sum}_{0}^{\uparrow} \leftarrow \min \left(\text { sum }_{0}^{\top}, p p c_{0}^{\uparrow}\right)
\end{array} \\
& \begin{array}{l}
p p c_{0}^{\downarrow} \leftarrow \max \left(\text { sum }_{0^{\prime}}^{\downarrow}, p p c_{0}^{\downarrow}\right) \\
p p c_{0}^{\uparrow}
\end{array} \leftarrow \min \left(\text { sum }_{0}^{\top}, p p c_{0}^{\uparrow}\right) \\
& \text { sum }_{c}^{\downarrow} \leftarrow \max \left(p p c_{c}^{\downarrow}+\left\lfloor\frac{\text { sum }_{c-1}^{\downarrow}}{2}\right\rfloor, \text { sum }_{c}^{\downarrow}\right) \\
& \operatorname{sum}_{c}^{\uparrow} \leftarrow \min \left(p p c_{c}^{\uparrow}+\left\lfloor\frac{\text { sum }_{c-1}^{\uparrow}}{2}\right\rfloor, s u m_{c}^{\uparrow}\right) \\
& p p c_{c}^{\downarrow} \leftarrow \max \left(\text { sum }_{c}^{\downarrow}-\left\lfloor\frac{s u m_{c-1}^{\uparrow}}{2}\right\rfloor, p p c_{c}^{\downarrow}\right) \\
& p p c_{c}^{\uparrow} \leftarrow \min \left(s u m_{c}^{\uparrow}-\left\lfloor\frac{\text { sum }_{c-1}^{\downarrow}}{2}\right\rfloor, p p c_{c}^{\uparrow}\right) \\
& \text { sum }_{c-1}^{\downarrow} \leftarrow \max \left(2 \times\left(\text { sum }_{c}^{\downarrow}-p p c_{c}^{\uparrow}\right), \text { sum }_{c-1}^{\downarrow}\right) \\
& \operatorname{sum}_{c-1}^{\uparrow} \leftarrow \min \left(2 \times\left(s u m_{c}^{\uparrow}-p p c_{c}^{\downarrow}\right)+1, \text { sum }_{c-1}^{\uparrow}\right)
\end{aligned}
\]
```

lower and upper bounds of the sum have the same parity (the function parity is defined by $\operatorname{parity}(k)=k \bmod 2$ ). Second (line 21), bounds propagation is applied. We describe this shortly, and Algorithm 4.4 gives details. These last two steps are repeated until all lower and upper bounds for sum and $p p c$ are stable.

Next (lines 22-24) possible inconsistency is detected, and finally (line 25) bit values are extracted in cases where lower and upper bounds of intervals coincide. The details of this are provided as Algorithm 4.5. There are three steps involved. First, if the lower and upper bound of a column's sum coincide then the result bit for that column is determined (Algorithm 4.5's lines 2-4). Second, if the lower and upper bounds of a column's $p p c$ are the same and there are already enough ones in the column, then any partial product of form $\star \times 1$ in that column must in fact be $0 \times 1$, and similarly for a partial product of form $1 \times \star$ (lines 5-14). And third, dual to the last case, if the lower and upper bounds coincide and every partial product which could yield 1 in fact must yield 1 , then we can change the $\star$ values to 1 (lines 15-24).

The steps of Algorithm 4.3 just described may set bits of $x, y$ and/or $r$. Hence the whole process is repeated, until no new bit values are deduced. Again, the outermost repeat loop is guaranteed to terminate, as the only changes to bit values

```
\(\overline{\text { Algorithm 4.5 Column bounds propagator: Fixing bits by singleton interval prop- }}\)
agation
    for \(c \in\{0 . . n-1\}\) do
        if \(s u m_{c}^{\downarrow}=s u m_{c}^{\uparrow}\) then
        \(r_{c} \leftarrow \operatorname{parity}\left(\right.\) sum \(\left._{c}^{\downarrow}\right)\)
        end if
        if \(p p c_{c}^{\downarrow}=p p c_{c}^{\uparrow}=\left|\left\{(i, j) \in P_{c} \mid x_{i}=1 \wedge y_{j}=1\right\}\right|\) then
        for \((i, j) \in P_{c}\) do
            if \(\mu\left(x_{i}\right)=\star \wedge \mu\left(y_{j}\right)=1\) then
                \(\mu\left(x_{i}\right) \leftarrow 0\)
            end if
            if \(\mu\left(x_{i}\right)=1 \wedge \mu\left(y_{j}\right)=\star\) then
                \(\mu\left(y_{j}\right) \leftarrow 0\)
            end if
        end for
        end if
        if \(p p c_{c}^{\downarrow}=p p c_{c}^{\uparrow}=\left|\left\{(i, j) \in P_{c} \mid x_{i} \neq 0 \wedge y_{j} \neq 0\right\}\right|\) then
        for \((i, j) \in P_{c}\) do
            if \(\mu\left(x_{i}\right)=\star \wedge \mu\left(y_{j}\right) \neq 0\) then
                \(\mu\left(x_{i}\right) \leftarrow 1\)
            end if
            if \(\mu\left(x_{i}\right) \neq 0 \wedge \mu\left(y_{j}\right)=\star\) then
                \(\mu\left(y_{j}\right) \leftarrow 1\)
            end if
        end for
    end if
    end for
```

involved replace $\star$ values by 0 or 1 . The integer bounds propagation implemented by Algorithm 4.4 was inspired by similar bounds propagation methods in constraint programming [MS98] and CSP techniques. It propagates lower and upper integer bounds for column sums, both from less significant columns to more significant columns, and vice versa. Note that this part of the process must terminate because the steps involved can only tighten, never relax, bounds. The following example shows how the propagation works.

## Example 4.6

Consider the situation where sum $_{4}=[2,5]$, ppc $_{5}=[1,1]$, sum $_{5}=[1,2]$, and $r_{5}=\star$. Here $[a, b]$ means that $a$ is the current lower bound, and $b$ the upper bound. As the result bit is unknown, lines 12-20 of Algorithm 4.3 will not change the sum.

Next the propagators of Algorithm 4.4 are applied to the equation $[1,2]=[1,1]+$ $\left\lfloor\frac{[2,5]}{2}\right\rfloor=[1,1]+[1,2]=[2,3]$. Note that $\operatorname{sum}_{i}^{\downarrow}=\max \left(p p c_{i}^{\downarrow}+\left\lfloor\frac{\text { sum }_{i-1}^{\downarrow}}{2}\right\rfloor\right.$, sum $\left.m_{i}^{\downarrow}\right)$ when
instantiated reads $\operatorname{sum}_{5}^{\downarrow}=\max (1+2 / 2,1)$, so the lower bound of $\operatorname{sum}_{5}$ tightens from 1 to 2 . Also note that $s u m_{i-1}^{\uparrow}=\min \left(2 \times\left(s u m_{i}^{\uparrow}-p p c_{i}^{\downarrow}\right)+1, s u m_{i-1}^{\uparrow}\right)$ when instantiated reads $\operatorname{sum}_{4}^{\uparrow}=\min (2 \times(2-1)+1,5)$, so the upper bound of sum ${ }_{4}$ tightens from 5 to 3 .

Substituted into the definition of sum $_{5}$, the final bounds are $[2,2]=[1,1]+\left\lfloor\frac{[2,3]}{2}\right\rfloor=$ $[1,1]+[1,1]=[2,2]$. Since the lower and upper bound of the sum are both 2, Algorithm 4.5 (line 3) will set $r_{5}=0$.

Our column bounds propagator is powerful; it subsumes each of the following three natural multiplication propagators.

First, it subsumes the propagator that sets some of the most significant bits of the result to zero when the multiplication of $x$ and $y$ cannot cause overflow. That is, in all positions $i$ where $2^{i}>x^{\uparrow} \times y^{\uparrow}$, set $r_{i}$ to 0 . It is not hard to see that our column bounds propagator subsumes this. When $x$ and $y$ take their maximum values, the sum of each column equals $s u m_{i}^{\uparrow}$. Eventually the upper bound of the sum will go to zero, and when it does, the upper and lower bound of the sum will both be zero, hence the result bit will be set to zero.

## Example 4.7

Applying column bounds propagation (Algorithm 4.3) to $(\langle 000 \star \star\rangle \times\langle 000 \star \star\rangle)=$ $r^{[5]}$, at line 25 we have sum $_{0}=[0,1]$, sum $_{1}=[0,2]$, sum ${ }_{2}=[0,2]$, sum $_{3}=[0,1]$, sum $_{4}=$ $[0,0]$. So $r_{4}$ is set to 0 . Note that, when the operands take their maximum possible values, so we have $(00011)_{2} \times(00011)_{2}$, the number of true partial products in each column including carries, equals the sums' upper bounds.

Second, our propagator subsumes the propagation principle for multiplication that, if the least significant $j$ bits of both the operands are known, then the least significant $j$ bits of the result are uniquely determined. For our column bounds propagator, at column $j$, when all the bits of the operands in positions less than or equal to $j$ are fixed, there will be no $\star$ values in partial products of those columns. Without $\star$ values, the ppc in each of those columns is exact, so the sum in each of those columns is known, and by taking the parity of the sum, the result bit is calculated.

## Example 4.8

Applying column bounds propagation to $\left(\langle\star 01\rangle \times\langle\star 10\rangle=r^{[3]}\right.$, at line 25 we have sum $_{0}=[0,0]$, sum $_{1}=[1,1]$, sum $_{2}=[0,1]$. So $r_{0}$ is set to 0, and $r_{1}$ is set to 1 .

Third, our propagator subsumes the propagator that, when some of the least significant bits of the result and one operand are known, the multiplicative inverse (when it exists) of the partially known operand can derive extra known bits of the other operand.

## Example 4.9

Consider $\langle\star \star 11\rangle \times\langle\star \star \star \star\rangle=\langle\star 110\rangle$. We can restrict our attention to the right-most sections where one of the operands and the result are entirely fixed: $\langle 11\rangle \times\langle\star \star\rangle=\langle 10\rangle$. The multiplicative inverse of 3 modulo $2^{2}$ is 3 , that is, $(3 \times 3)$ $\bmod 4=1$. Multiplying the result by this inverse gives 2 , so we can update the second operand to $\langle\star \star 10\rangle$.

It is well known that $x$ has a multiplicative inverse modulo $2^{n}$ if and only if $x$ is odd.

Again, column bounds propagation subsumes this. If the least significant $w$ bits of both $r^{[n]}$ and $x^{[n]}$ are fixed, and $x$ is odd, then bits $(w-1) \ldots 0$ of $y^{[n]}$ will be fixed, as can be seen by a simple proof by cumulative induction.

- For the base case, in column 0 we have a single addend, $x_{0} y_{0}$, and by assumption $x_{0}=1$ ( $x$ is odd). Hence the sum will be [0,1]. Because $r_{0}$ is known, the interval will be tightened. If $r_{0}$ is 1 , it is tightened to $[1,1]$, and $y_{0}$ will be set by Algorithm 4.5 at line 21 . Otherwise, first the sum and then $p p c$ will be tightened to $[0,0]$, and $y_{0}$ will be fixed by Algorithm 4.5 at line 11. Hence $y_{0}$ is determined, in fact equal to $r_{0}$.
- Now assume that $y_{j}$ is determined for all $j<k$. We show that $y_{k}$ must be determined. The addends of column $k$ are $x_{k} y_{0}, x_{k-1} y_{1}, \ldots, x_{1} y_{k-1}, x_{0} y_{k}$. But since $y_{0}, \ldots y_{k-1}$ as well as $x_{0}=1$ are determined, each of these addends is 0 or 1 , except the last, which is $y_{k}$. That is, the upper and lower bound of the sum differ by at most 1 . Because the result bit is known, the bounds will
be tightened by Algorithm 4.5. That is, the equation for $\mathrm{sum}_{k}$ boils down to $c+y_{k}=r_{k}$ for some integer constant $c$. Hence the propagation algorithm will determine $y_{k}$, setting $y_{k}$ to $r_{k}$ 's value (if $c$ is even) or to its complement (if $c$ is odd).

From this it follows that the column bounds propagator will fix at least as many bits as the rule that exploits multiplicative inverses.

## Example 4.10

Consider $x \times y=r$, with $x=\langle\star \star 1\rangle, y=\langle 011\rangle$ and $r=\langle 001\rangle$. When we process the $0^{\text {th }}$ column, we begin with $x_{0} \wedge y_{0}=r_{0}$, which after substituting in known values gives $1=1$.

When we process the $1^{\text {st }}$ column, we begin with $x_{0} y_{1} \oplus x_{1} y_{0}=r_{1}$, which after substituting in known values gives $y_{1} \oplus 1=0$, which sets $y_{1}$ to 1 .

When we process the $2^{\text {nd }}$ column, we begin with $y_{2} x_{0} \oplus y_{1} x_{1} \oplus y_{0} x_{0}=0$, which after substituting gives $y_{2} \oplus 1 \oplus 1=0$, which sets $y_{2}$ to 0 . So $y^{[3]}$ is set to $\langle 011\rangle$, which indeed gives the multiplicative inverse of 3 , that is, $(3 \times 3) \equiv_{8} 1$.

## Example 4.11

A simple example for which the propagator is not optimal, is: $x^{[2]} \times y^{[2]}=r^{[2]}$ with $\mu=\{x=\langle 1 \star\rangle, y=\langle 1 \star\rangle, r=\langle 1 \star\rangle\}$. Substituted into the Boolean formula definition of multiplication gives $r_{0}=\left(x_{0} \wedge y_{0}\right), 1=x_{0} \oplus y_{0}$. Since neither $x_{0}$ nor $y_{0}$ can be $1, r_{0}$ must be 0 . Our propagator does not deduce that the result must be even, because it conservatively treats the variables in the partial products of each column as being distinct.

The column bounds propagator generally subsumes interval propagation, but not quite. The reason it may fail is that multiplication is signedness-agnostic. For example, $\langle 111 \star\rangle \times\langle 111 \star\rangle=\langle\star \star \star \star\rangle$, interpreted as signed intervals is: $([-2,-1] \times$ $[-2,-1])=[-8,7]$, which can be strengthened to $([-2,-1] \times[-2,-1])=[1,4]$. Hence the most significant bit of the result must be 0 . However, the bounds analysis does not determine this.

### 4.4.3 An Unsigned Division Propagator

We propagate unsigned division by using a truncating integer division propagator that operates on unsigned integer bounds.

We begin by converting the operands and results from 3 to the integer bounds domain by calculating for each value the maximum and minimum value that the ternary variable contains. Next we enforce bounds consistency over those integer domains. Then we convert back to 3 . We perform these three steps until a fixed point is reached and the representation in $\mathbf{3}$ is stable. This terminates because we do not allow known values to become $\star$.

Like the multiplication propagator, this propagator is not optimal. As a simple example of non-optimality, the propagator is not able to deduce that the numerator must be odd in this case: $\left(\langle\star \star\rangle \div_{u}\langle\star 1\rangle\right)=\langle\star 1\rangle$. Interpreted as unsigned intervals this says $([0,3] \div u[1,3])=[1,3]$. Now it is not possible for the numerator to be 0 , so the interval can be tightened to $[1,3]$. It is possible for each interval to take its extreme value, so no further propagation is possible. Converting the intervals back to the $\mathbf{3}$ domain leaves the ternary variables unchanged. However, if the numerator takes the value of 2 , the denominator can be 1 or 3 . So, the result must be either 0 or 2 , neither of which the result can express. So, it is impossible for the numerator to be 2 , so it can be either 1 or 3 , so the least significant bit of the numerator must be 1 .

Note that it is not straightforward to utilise the multiplication propagator for integer division. To turn $(a \div u b)=q$ into $a=b q+r \wedge(b \neq 0 \Rightarrow r<b)$ is unattractive, and it is more profitable to utilise the fact that unsigned division cannot overflow, whereas multiplication can. The Beaver bit-vector solver (subsection 3.23.6) performs this same transformation in another context. As we show in the section 4.7, because unsigned division does not overflow, our division propagator is quite effective and able to determine many of the available bits.

Analysing multiplication using this interval approach is not practical because of the prevalence of overflow.

### 4.5 A Propagation Solver

The previous section described a number of propagators for various bit-vector operations. A propagation solver runs propagators until a global fixed point is

## CHAPTER 4. THEORY-LEVEL BIT PROPAGATION

reached. When each propagator is at a fixed point, the global fixed point has been reached. Initially the partial assignment of all the nodes' bits is $\star$, then the partial assignment of constants is updated to be the respective value. A propagator is only run when the partial assignment to its operands or output change.

In an attempt to reduce the number of times that expensive propagators are run, we use a fast and a slow worklist. Propagators that are fast, such as the bitvector exclusive-or propagator, are run before expensive propagators such as the signed remainder propagator. We show later, for instance in Table 4.3d, that the exclusive-or propagator can be hundreds of times faster than the signed remainder propagator.

We run the propagation engine twice to a global fixed point. The first time we run propagation using only bit-vector constants or $1 / 0$ as sources of fixed bits. In this phase, information never flows downwards from results to operands. Initially, all the propagators that depend on the constant values are added to the worklist. Propagators are taken from the work list one by one and run. If a propagator changes any ternary assignment, then all the propagators that depend on that assignment are added into the worklist. The propagation engine continues until the worklist is empty. When the worklist is empty, a global fixed point has been reached.

The second time that the propagation engine is run, the root node is set to true, and again we propagate until a global fixed point.

Once propagation reaches a global fixed point, how the results of the analysis can be used depends on what was assumed before propagation started. We discuss using the results of the analysis in the next section.

### 4.6 Using the Results

After the fixed point is reached, simplifications to the expression are appliedhopefully saving time overall. After bit propagation we use the partial assignment in three different ways. First, some expressions are replaced by the values discovered. Second, some values are replaced and an equality conjoined at the top level. Third, individual bits are conjoined with the CNF.

In the first case, before the root node is set to true, if an expression is found to have a particular value, then the node is replaced by that value. For instance, given
the expression $\left(\left(0^{[3]}:: x^{[2]}\right)=7^{[5]}\right)$, bit propagation will discover that the equality expression is necessarily 0 , so the formula can be replaced by 0 . With the root node not set to 1 , bit propagation gives the values that nodes always take. There can never be a conflict, and variables' bits will never be fixed (because information only flows upwards).

When bit propagation is performed with the root node set to 1 , expressions are replaced by constants and the expression conjoined to the top. Consider the sub-expression: " $\left(t_{0}=^{t} t_{1}\right) \wedge^{t} p^{\prime \prime}$. It is unsound to replace this with 1 , because the condition that $t_{0}$ must equal $t_{1}$ is lost. Instead, this sub-expression is replaced by 1 , and $\left(t_{0}=t_{1}\right) \wedge p$ is conjoined with the root node.

Nodes that are partially set are stored and conjoined with the CNF expression of the formula just before sending it to the SAT solver. As an example, say the analysis reveals that $t=\langle 1 \star 1 \star 0\rangle$. After bit-blasting, there is a Boolean formula produced for each element of this bit-vector that produces the value of the node. When the CNF encoding is run, each of these Boolean formulae is made equivalent to some fresh variable, say $t=\left\langle b_{4}, b_{3}, b_{2}, b_{1}, b_{0}\right\rangle$. For each value that we know must be set to either 1 or 0 , we assert the appropriate literal. So in our example, we add three clauses to the CNF: $\left\{\neg b_{0}, b_{2}, b_{4}\right\}$. These clauses fix the value of the SAT solver's variables, simplifying other clauses that contain the same variables.

### 4.7 Evaluation of Theory-Level Bit Propagation

We compare STP2 r1611 with and without theory-level bit propagation, and for reference compare against the current version of the SMT-COMP 2011 (QF-BV) winner Z3 3.2 [dMB08b] with and without bit propagation.

To isolate the effect of bit propagation, we created a standalone executable that reads SMT-LIB2 format, applies bit propagation, then outputs the simplified result. We used this to pre-process input to Z 3 . The processor applies two of the three techniques for simplifying expressions, it does not conjoin information about partially specified sub-expressions. In STP2, partial information about subexpressions is used to add extra information.

We perform the evaluation with the same experimental configuration as in section 3.19. We took the SMT-LIB QF_BV benchmark set as of January 2012. We
discarded the asp family benchmarks which is large (29GB), and contains encodings of problems we are uninterested in, for example: towers of Hanoi, travelling salesperson, and Sudoku problems. We discarded the $m \subset m$ family because it uses syntax that STP2 cannot yet parse. We discarded the bruttomesso:core family as it contains no arithmetic, a key part of the software verification benchmarks we are interested in. We limited each family to 50 randomly chosen benchmarks and only chose a benchmark if some solver required more than 1 second to solve it. We were left with 715 benchmarks in 31 families. Finally, we used a memory limit of 3GB and a time limit of 500 seconds on a single core of an Intel E5507 Linux computer to run the benchmarks.

When multiple solvers returned a result for a benchmark, they always agreed about the result. Moreover, all the results agreed with the expected status as given by annotations in the benchmarks.

The results are shown in Table 4.2. For each family and solver, the number of failures and the number of those failures due to exceeding the memory limit is given. For each benchmark family, the best result (fewest failures) is highlighted with boldface type.

Z3 3.2 with bit propagation has the fewest failures of the solvers we compare, 10 fewer than with bit propagation disabled. Of the solvers we compare, STP2 with bit propagation is the best on the most families: 18 .

Compared to no bit propagation, bit propagation enables 10 extra benchmarks to be solved. This is true for both STP2 and Z3, although the gains for the two are on different problems.

For STP2 to perform bit propagation on the 715 problems takes 65 seconds. When preprocessing Z 3 's input, bit propagation takes 210 seconds. The difference is because as a pre-processor many simplifications have not been applied, so bit propagation operates on a larger expression. Z3 3.2 with bit propagation has the lowest overall time and the fewest failures.

### 4.8 Testing that Propagators Are Optimal

In this section we describe how we generated the evidence that (most of) our propagators are optimal. We use the details in this section to later address the

|  |  | STP2 r1611 |  | STP2+bp |  | Z3 3.2 |  | Z3+bp |  |
| :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: |
| Family | \# | time | fail | time | fail | time | fail | time | fail |
| VS3 | 11 | 0 | 1/11 | 0 | 1/11 | 548 | 7 | 847 | 7 |
| brummayerbiere | 28 | 231 | 1/12 | 271 | 1/12 | 583 | 2/13 | 707 | 2/13 |
| brummayerbiere2 | 50 | 2593 | 17 | 1812 | 14 | 2626 | 1/28 | 1388 | 1/18 |
| brummayerbiere3 | 50 | 1078 | 31 | 1832 | 29 | 1698 | 30 | 1476 | 30 |
| bruttomesso:lfsr | 50 | 6735 | 4 | 7353 | 5 | 862 |  | 838 |  |
| bruttomesso:simple_proc | 50 | 2866 | 6 | 3508 | 5 | 1742 | 3 | 2993 | 2 |
| calypto | 17 | 10 | 12 | 9 | 12 | 947 | 11 | 31 | 13 |
| galois | 3 | 0 | 3 | 0 | 3 | 0 | 3 | 0 | 3 |
| gulwani-pldi08 | 3 | 26 |  | 25 |  | 21 |  | 9 |  |
| pipe | 1 | 0 | 1 | 0 | 1 | 0 | 1 | 0 | 1 |
| rubik | 6 | 872 | 1 | 838 | 1 | 89 | 1 | 301 |  |
| sage:app1 | 50 | 269 |  | 281 |  | 220 |  | 187 |  |
| sage:app12 | 14 | 20 |  | 0 |  | 0 |  | 0 |  |
| sage:app2 | 1 | 0 |  | 0 |  | 11 |  | 0 |  |
| sage:app7 | 6 | 0 |  | 0 |  | 9 |  | 10 |  |
| sage:app8 | 50 | 461 |  | 20 |  | 66 |  | 56 |  |
| sage:app9 | 50 | 541 |  | 19 |  | 60 |  | 47 |  |
| spear:cvs_v1.11.22 | 28 | 65 |  | 59 |  | 130 |  | 139 |  |
| spear:inn_v2.4.3 | 50 | 55 |  | 46 |  | 297 |  | 277 |  |
| spear:openldap_v2.3.35 |  | 7 | 3 | 513 |  | 0 | 5 | 0 | 5 |
| spear:samba_v3.0.24 | 50 | 128 |  | 133 |  | 589 |  | 528 |  |
| spear:wget_v1.10.2 | 41 | 91 |  | 85 |  | 488 |  | 334 |  |
| spear:xinetd_v2.3.14 | 1 | 0 |  | 0 |  | 2 |  | 1 |  |
| spear:zebra_v0.95a | 5 | 3 |  | 3 |  | 14 |  | 15 |  |
| stp | 1 | 20 |  | 23 |  | 11 |  | 23 |  |
| stp_samples | 22 | 4 | 2 | 3 | 2 | 1 | 2 | 4 | 2 |
| tacas07 | 3 | 89 | 1 | 331 |  | 709 |  | 964 |  |
| uclid_contrib_smtcomp09 | 7 | 703 | 1 | 929 |  | 1893 |  | 1659 |  |
| uclid:catchconv | 50 | 65 |  | 49 |  | 142 |  | 166 |  |
| uum | 7 | 45 | 6 | 30 | 6 | 11 | 6 | 11 | 6 |
| wienand-cav2008:Booth | 5 | 82 | 4 | 82 | 4 | 35 | 4 | 42 | 4 |
| Sum | 715 | 17071 | 115 | 18267 | 105 | 13820 | 114 | 13065 | 104 |
| Time incl. penalty |  | 7468 |  | 708 |  | 7093 |  | 651 |  |

Table 4.2: STP2 and Z3 performance with and without bit propagation. "STP2" is STP2 with bit propagation disabled. The number of benchmarks that failed is given for each family. The number of times the memory limit was reached is indicated, for example, $1 / 11$ means 11 failures, one of which was a memory out. All times are in seconds. The times given are the sum of the times for the successful instances only. Limits of 500 seconds and 3GB are used. The bottom row gives the times with a penalty of 501 seconds counted for each failed problem. Times are measured on a single core of an Intel E5507 Linux computer.
question: Do the results from applying bit propagation improve if more precise propagators are used?

## CHAPTER 4. THEORY-LEVEL BIT PROPAGATION

We test that our implemented propagators are sound by comparing their effects against the optimal propagator's effects. The propagators we implement should produce a superset of the effect of the optimal propagator. That is, if the optimal propagator fixes a bit, then all propagators should produce that same value for the bit, or $\star$. For propagators we expect to be optimal, the sets from both should be the same.

We used two techniques to generate (the effect of) the optimal propagator. First, at small bit-widths, we generate the effect by exhaustively generating operands, then applying an operation to those operands, and then storing the operands and the result in a set. To determine the effect of the optimal propagator for abstract variables, an algorithm searches through the stored tuples and finds any calculations that are contained in that set. It applies the abstraction function to each matching tuple, then applies the join operation to those, giving the result of the optimal propagator. If no matching concrete values are found, it returns $\perp$.

More formally, to calculate the effect of the optimal propagator on the function $f\left(x^{[n]}, y^{[n]}\right)=r^{[n]}$, where $x^{[n]}, y^{[n]}, r^{[n]}$ are lists of ternary variables:

- Apply the function $f$ to all possible concrete operands, and store the tuple. For $i$ and $j \in\left(0 \ldots 2^{n}-1\right)$, add $(i, j, f(i, j))$ into a set $S$.
- Search through $S$ for concrete values that match elements in the set. Collect elements $\left\langle s_{0}, s_{1}, s_{2}\right\rangle$, where $\left(s_{0} \in \mu(x) \wedge s_{1} \in \mu(y) \wedge s_{2} \in \mu(r)\right)$.
- Apply the abstraction function $\alpha$ to each matching element.

For example, $(\langle 10 \star 0\rangle \ll\langle\star \star \star \star\rangle)=\langle 1 \star \star \star\rangle$ matches the following tuples: $\left\langle(1000)_{2},(0000)_{2},(1000)_{2}\right\rangle,\left\langle(1010),(0000)_{2},(1010)_{2}\right\rangle,\left\langle(1010),(0010)_{2},(1000)_{2}\right\rangle$. Applying the abstraction function gives: $(\langle 10 \star 0\rangle \ll\langle 00 \star 0\rangle)=\langle 10 \star 0\rangle$, which is the result of the optimal propagator.

For each two-input propagator we exhaustively generated all combinations for 1 bit through to 6 bits. At 6 bits we checked all $3^{18}$ distinct combinations. For propagators we believed were optimal, we checked: that the propagator is idempotent, the propagator and optimal propagator return $\perp$ at exactly the same time, and that the resulting partial assignments are the same. This identified many, but not all of the defects we found in our implementations of the propagators.

```
approx}(\varphi)=\operatorname{app}(\varphi,\perp
    app(\varphi,r)= if unsat( }\varphi)\mathrm{ then r
        else let s=r \sqcup\alpha{model(\varphi)}
                        in app( }\varphi\wedge\neg\gamma(s),s
```

Figure 4.4: Finding the optimal propagator: The approach of Reps, Sagiv and Yorsh. [RSY04]

Both techniques we use to generate the effect of the optimal propagator generalise from concrete values to abstract values by moving up the lattice.

At larger bit-widths, exhaustively generating values is impractical, so instead we compared our propagators against the optimal propagator, produced using the approach proposed by Reps, Sagiv and Yorsh [RSY04] (Figure 4.4). They show how to produce the result of the optimal propagator for domains that satisfy the ascending chain condition, which $\mathbf{3}$ does. Their algorithm uses a decision procedure which produces a model. We refer to their algorithm as RSY.

The approx method of Figure 4.4 is called with $\varphi$ where $\varphi$ describes the tuples that satisfy a relation. If the set of tuples is empty, then $\perp$ is returned, otherwise a search occurs to find tuples that are not contained in $r$. When no such $r$ exists, then $r$ is the best result possible.

Intuitively, rather than taking the union of all the tuples in the set described by the abstract variables, like the exhaustive approach does, RSY searches for new tuples that cause the abstract variables to change. If there are $k \star$ values initially, then the algorithm will perform the abstraction function at most $k+1$ times.

RSY first searches for a model, then for models which cause each of the bits to take the opposite value, that is, if they have been 1 in all prior models, to take a 0 .

## Example 4.12

Consider applying RSY to $\langle\star\rangle \times\langle\star\rangle=\langle 0\rangle$.

- Before calling the approx function, $\star$ values are replaced by fresh variables, setting $\varphi$ to $\left\langle v_{0}\right\rangle \times\left\langle v_{1}\right\rangle=\langle 0\rangle$.
- This is encoded as CNF, and its satisfiability is checked. In this case it is satisfiable, so the partial assignment is set to, say, $s\left(v_{0}\right) \leftarrow 0, s\left(v_{1}\right) \leftarrow 1$.
- Next the algorithm searches for a model where $v_{0}$ or $v_{1}$ take a different value, that is for a solution to $\left(v_{0} \times v_{1}=0\right) \wedge\left(v_{0} \neq 0 \vee v_{1} \neq 1\right)$.
- This is satisfiable. The returned model might be $v_{0}=0, v_{1}=0$. Now the abstraction function is applied to the model, and then the meet taken with the current partial assignment. That is: $s\left(v_{0}\right)=s\left(v_{0}\right) \sqcup \alpha(0), s\left(v_{1}\right)=s\left(v_{1}\right) \sqcup \alpha(0)$, giving $s=\left\{v_{0}=0, v_{1}=\star\right\}$.
- Next the algorithm asks for a model where $v_{0}$ does not equal 0 , that is: $\left(v_{0} \times v_{1}=\right.$ $0) \wedge\left(v_{0} \neq 0\right)$, which returns satisfiable, updating the partial assignment to: $s=\left\{v_{0}=\star, v_{1}=\star\right\}$.
- All of the variables are $\star$, so the next call to the SAT solver returns unsatisfiable, and the algorithm returns $v_{0}=\star, v_{1}=\star$

It is concluded that even the optimal propagator will not determine any bits for the example.

To further test propagators, we generated random tuples at bit-widths between 7 and 256 and tested that the result of our propagators is the same or a superset of the result from RSY.

Using this approach, an instance of a defect that we encountered at higher bit-width was that our implementation of left and right-shift relied on the 64-bit machine's semantics (using just the bottom 8 bits of the second argument) as distinct from the SMT-LIB semantics (use all the bits). So given a 64 -bit value left shifted by a large number with many trailing zeroes, our defective implementation returned the same input, rather than zero as the $\mathrm{QF}-\mathrm{BV}$ semantics dictates.

### 4.9 An Optimal 6-Bit Multiplication Propagator

The multiplication propagators that we have discussed so far are not optimal. In this section we describe a multiplication propagator that is optimal for the least significant $n$ bits (we use $n=6$ ). The idea we present is generally applicable. However, in practice it will often be too slow to be useful.

We start by exhaustively generating assignments. Then apply the multiplication propagators described in subsection 4.4.2, then compare the results with the optimal
propagator. Clauses are constructed to assign values that were missed by the multiplication propagators, and added to a SAT solver.

## Example 4.13

Given $(\langle 1 \star\rangle \times\langle 1 \star\rangle=\langle 1 \star\rangle)$, the propagators we have introduced will fail to determine that the least significant bit of the result must be 0 . Hence, we remedy this by generating the clause $\left(x_{1} \wedge y_{1} \wedge r_{1}\right) \rightarrow r_{0}$. Note that the left-hand side expresses the literals fixed prior to calling the multiplication propagator.

The algorithm we use is shown in Algorithm 4.6. Because implementations of two watched literals $\left[\mathrm{MMZ}^{+} 01\right]$ and unit propagation are so efficient, we precalculate clauses that give the effect of the optimal propagator when combined with the other propagators. We use the SAT solver, just with unit propagation, and with search as a multiplication propagator.

The algorithm compares the effect of the optimal propagator, to another propagator. Whenever the effects differ, clauses are generated that explain the difference. Immediately after generating the clauses that explain the difference, they are conjoined with the clauses that have already been discovered. The algorithm traverses from low to high bit-widths. At bit-width $i$ where $i \neq 1$, the clauses for an optimal multiplication propagator at bit-width $i-1$ have already been generated.

This approach is ideal for multiplication because the same clauses can be used to propagate on the least significant bits irrespective of the bit-width. The clauses that we calculate for the least significant 6-bits can be applied to multiplications of lesser and greater bit-width.

Running Algorithm 4.6, for $n=6$, produces 56,943 clauses. However, we did not remove all the redundant clauses, so a smaller number of clauses could have the same effect.

The clauses vary in length from 4 to 13 literals, so the probability of any given clause setting values in a random assignment to a multiplication, or detecting a conflict varies from about $10 \%$, to $.0005 \%$. Each clause requires work to be performed at runtime, but might avoid time spent performing conflict analysis. How useful a clause is, depends on how often it is applied, and how much work it saves minus the cost of applying unit propagation to it.

```
Algorithm 4.6 Generating clauses for \(n\)-bit multiplication
Require: \(n\) the bitwidth
    Create \(\varphi\), a set of CNF clauses
    \(\varphi \leftarrow \emptyset\)
    for \(i \in 1 \ldots n\) do
        Create \(x^{[i]}, y^{[i]}, r^{[i]}\), lists of ternary variables
        for all distinct assignments to: \(\left\langle x^{[i]}, y^{[i]}, r^{[i]}\right\rangle\) do
            repeat
            perform trailing zero propagation
            perform partial product bounds consistency
            perform unit propagation of \(\langle x, y, z\rangle\) on \(\varphi\)
            perform unit propagation of \(\langle y, x, z\rangle\) on \(\varphi\)
                until at a fixed point
                \(\left(\left\langle x_{p}, y_{p}, r_{p}\right\rangle \leftarrow \operatorname{maxPrecise}(x, y, z)\right)\)
                if \(\left\langle x_{p}, y_{p}, r_{p}\right\rangle \neq\langle x, y, z\rangle\) then
                    Set difference to the bits fixed in \(\left\langle x_{p}, y_{p}, r_{p}\right\rangle\), but not in \(\langle x, y, z\rangle\)
                    for all \(d \in\) difference do
                    Add to \(\varphi\) the implication that the fixed bits of, \(\langle x, y, z\rangle\) implies \(d\)
                    end for
                end if
        end for
    end for
    Output \(\varphi\)
```


### 4.10 Propagator Evaluation

To measure whether optimal propagators improve upon the results we have already presented (Table 4.2), we applied RSY after applying our imprecise propagators (multiplication, division, and remainder). None of the benchmarks contain signed modulus. We placed a limit of 500 seconds on each call to RSY; 594 instances finished. Of the problems that finished, extra bits were only fixed in one case; however, these extra assignments did not reduce the time taken.

We compare the precision of our propagators versus the precision of STP2's CNF encoding with unit propagation in Table 4.3a-Table 4.3d. Both begin with exactly the same initial assignment. We compare the effect of propagation for varying levels of information content. We compare on 100,000 64-bit values, with a fraction of bits in the operands and result set to 0 or 1 . For each run, we generated two random 64 bit operands, we generated assignments that are random and uniform over 0 and 1 . Then we applied the operation to those, saving the result. At the $1 \%$ level, next we set $99 \%$ of the bits of the operands and the result to $\star$. Likewise at the other levels. For example, with $5 \%$ information content, $2.5 \%$ of bits can be expected to be set to
4.10. PROPAGATOR EVALUATION

| Operation | $1 \%$ |  |  |  |  |
| :--- | ---: | ---: | ---: | ---: | ---: |
|  | UP time | prop. time | UP bits | prop. bits | $\%$ |
| signed greater than equals | 0.06 s | 0.01 s | 4 | 8 | $50 \%$ |
| unsigned less than | 0.04 s | 0.06 s | 6 | 9 | $66 \%$ |
| equals | 0.09 s | 0.01 s | 333 | 333 | $100 \%$ |
| bit-vector xor | 0.06 s | 0.08 s | 1880 | 1880 | $100 \%$ |
| bit-vector or | 0.04 s | 0.07 s | 95567 | 95567 | $100 \%$ |
| bit-vector and | 0.06 s | 0.08 s | 95245 | 95245 | $100 \%$ |
| right shift | 0.13 s | 2.32 s | 1604489 | 1604711 | $99 \%$ |
| left shift | 0.15 s | 2.27 s | 1615256 | 1615505 | $99 \%$ |
| arithmetic shift | 0.13 s | 3.97 s | 762885 | 767828 | $99 \%$ |
| addition | 0.04 s | 0.09 s | 41 | 42 | $97 \%$ |
| multiplication | 0.36 s | 0.21 s | 1473 | 1473 | $100 \%$ |
| multiplication (max $n=6)$ | 0.71 s | 4.34 s | 1437 | 1437 | $100 \%$ |
| unsigned division | 3.71 s | 0.99 s | 898513 | 899446 | $99 \%$ |
| unsigned remainder | 4.01 s | 9.94 s | 22 | 856 | $2 \%$ |
| signed division | 0.17 s | 3.38 s | 642 | 26850 | $2 \%$ |
| signed remainder | 0.26 s | 30.68 s | 8 | 286 | $2 \%$ |

Table 4.3a: Comparison of unit propagation and bit-blasting at $1 \%$. 100000 iterations at 64 bits.
$0,2.5 \%$ to 1 , and $95 \%$ to $\star$. The time we give excludes the time to create the random assignments. 'UP time' is the time to run unit propagation on a pre-generated CNF, 'prop. time' is the time to run the propagators, 'UP bits' is the number of extra bits fixed after calling unit propagation. 'prop. bits' is the number of extra bits fixed by the propagators. '\%' is the percentage of the bits fixed by the propagators that were fixed by unit propagation over the bit-blasted representation. Times are given in seconds and were measured on a single core of an Intel Q8400 Linux computer.

In Table 4.3a-Table 4.3d, the "multiplication ( $\max n=6$ )" entry corresponds to the propagator described in section 4.9.

For some operations, the CNF encoding assigned all possible values. Namely for equals, bit-vector exclusive-or, bit-vector or, and bit-vector and. That is, no initial assignments were found for which the CNF encoding of these operations was not optimal under unit propagation.

The results show that in general the CNF encoding with unit propagation propagates well. It determines $>80 \%$ of the assignments compared to our propagators.

Using unit propagation to obtain a 6-bit optimal propagator for multiplication ran 180 times slower than without (Table 4.3c), but discovered $30 \%$ more assignments. However, at $95 \%$ there was a large cost, taking 30 times longer than without

| Operation | $5 \%$ |  |  |  |  |
| :--- | ---: | ---: | ---: | ---: | ---: |
|  | UP time | prop. time | UP bits | prop. bits | $\%$ |
| signed greater than equals | 0.10 s | 0.07 s | 202 | 260 | $77 \%$ |
| unsigned less than | 0.14 s | 0.06 s | 172 | 243 | $70 \%$ |
| equals | 0.06 s | 0.03 s | 7310 | 7310 | $100 \%$ |
| bit-vector xor | 0.11 s | 0.11 s | 45700 | 45700 | $100 \%$ |
| bit-vector or | 0.14 s | 0.14 s | 464102 | 464102 | $100 \%$ |
| bit-vector and | 0.20 s | 0.11 s | 463171 | 463171 | $100 \%$ |
| right shift | 0.41 s | 0.98 s | 4713026 | 4715109 | $99 \%$ |
| left shift | 0.47 s | 0.78 s | 4716925 | 4718901 | $99 \%$ |
| arithmetic shift | 0.49 s | 1.68 s | 4566641 | 4591888 | $99 \%$ |
| addition | 0.16 s | 0.12 s | 875 | 1029 | $85 \%$ |
| multiplication | 1.49 s | 0.33 s | 7805 | 7816 | $99 \%$ |
| multiplication (max $n=6)$ | 3.90 s | 20.54 s | 7794 | 7815 | $99 \%$ |
| unsigned division | 15.70 s | 3.63 s | 3034619 | 3038246 | $99 \%$ |
| unsigned remainder | 15.16 s | 28.05 s | 1071 | 9213 | $11 \%$ |
| signed division | 1.27 s | 18.55 s | 34820 | 1264498 | $2 \%$ |
| signed remainder | 1.71 s | 121.96 s | 276 | 7407 | $3 \%$ |

Table 4.3b: Comparison of unit propagation and bit-blasting at 5\%. 100000 iterations at 64 bits.
for little gain. More work needs to be done to understand the trade-offs. Note that the "UP time" for both multiplication variants differs, even though the work is the same, because the "multiplication with UP" takes more of the processor's data cache.

As the percentage of bits that are assigned increases, unit propagation is more time consuming. At the $1 \%$ level, unit propagation for some operations is comparable in speed to the propagators, however, at $95 \%$ the propagators are substantially faster.

The random assignments we produced, for instance in Table 4.3c, are atypical for the shift operations. The second operand of the random assignments at $64-$ bits has a high probability of being greater than 64 , making the result 0 . Shifting random assignments, over-estimates how well the shift propagators will work on real problems.

The shift operations run faster as the number of bits fixed increases (Table 4.3aTable 4.3d), because the more bits that are assigned, the greater the probability that a bit is fixed that sets the result to zero. Setting the result to zero is fast to detect and perform.

| Operation | $50 \%$ |  |  |  |  |
| :--- | ---: | ---: | ---: | ---: | ---: |
|  | UP time | prop. time | UP bits | prop. bits | $\%$ |
| signed greater than equals | 1.05 s | 0.30 s | 17551 | 22026 | $79 \%$ |
| unsigned less than | 1.05 s | 0.23 s | 17577 | 22045 | $79 \%$ |
| equals | 0.56 s | 0.04 s | 50017 | 50017 | $100 \%$ |
| bit-vector xor | 1.09 s | 0.30 s | 2397869 | 2397869 | $100 \%$ |
| bit-vector or | 0.92 s | 0.29 s | 2798115 | 2798115 | $100 \%$ |
| bit-vector and | 0.78 s | 0.26 s | 2798778 | 2798778 | $100 \%$ |
| right shift | 1.53 s | 0.67 s | 3199611 | 3199611 | $100 \%$ |
| left shift | 1.58 s | 0.42 s | 3200933 | 3200933 | $100 \%$ |
| arithmetic shift | 1.66 s | 1.26 s | 3249594 | 3249594 | $100 \%$ |
| addition | 1.82 s | 0.29 s | 1137672 | 1689706 | $67 \%$ |
| multiplication | 20.45 s | 1.47 s | 148531 | 162680 | $91 \%$ |
| multiplication (max $n=6)$ | 47.03 s | 240.19 s | 148350 | 197310 | $75 \%$ |
| unsigned division | 111.80 s | 7.04 s | 3041509 | 3057485 | $99 \%$ |
| unsigned remainder | 120.01 s | 40.49 s | 763045 | 1122934 | $67 \%$ |
| signed division | 59.78 s | 29.82 s | 1198292 | 3053684 | $39 \%$ |
| signed remainder | 58.61 s | 136.00 s | 210381 | 841675 | $24 \%$ |

Table 4.3c: Comparison of unit propagation and bit-blasting at $50 \%$. 100000 iterations at 64 bits.

The unsigned division propagator fixed $99 \%$ of the possible bits. The signed modulus, remainder and division operations are the slowest; our implementation of these propagators is the least refined.

The clause encoding that the SAT solver uses is incremental, in that if some assignments change, only part of the work is redone. This contrasts to our propagator which begin again whenever an assignment changes. From Table 4.3d, at $95 \%$ known values our unsigned division propagator reaches fixed point 35 times faster than unit propagation does. So each unsigned division operation needs to be evaluated at least 35 times with various assignments, before the advantage of being incremental begins to outweigh the cost of having many clauses to propagate over. Because of this, propagator based approaches will be superior to CNF based approaches on easy problems which do not require the operation to be re-evaluated often.

To measure what percentage of the possible assignments our propagators derived, we ran the RSY algorithm on 1000 instances with various levels of information known. The results are in Table 4.4a - Table 4.4c. Time is the time to call both the propagator and RSY on the initial assignment. Initial is the number of bits initially randomly set in the instances, this varies with the percentage of values initially

| Operation | $95 \%$ |  |  |  |  |
| :--- | ---: | ---: | ---: | ---: | ---: |
|  | UP time | prop. time | UP bits | prop. bits | $\%$ |
| signed greater than equals | 1.74 s | 0.32 s | 12597 | 13081 | $96 \%$ |
| unsigned less than | 1.73 s | 0.26 s | 12554 | 13057 | $96 \%$ |
| equals | 1.19 s | 0.03 s | 5067 | 5067 | $100 \%$ |
| bit-vector xor | 1.34 s | 0.15 s | 866127 | 866127 | $100 \%$ |
| bit-vector or | 1.10 s | 0.24 s | 599980 | 599980 | $100 \%$ |
| bit-vector and | 1.17 s | 0.23 s | 599369 | 599369 | $100 \%$ |
| right shift | 4.23 s | 0.55 s | 319839 | 319839 | $100 \%$ |
| left shift | 4.14 s | 0.62 s | 320507 | 320507 | $100 \%$ |
| arithmetic shift | 4.81 s | 1.00 s | 324578 | 324578 | $100 \%$ |
| addition | 3.20 s | 0.26 s | 898103 | 907318 | $98 \%$ |
| multiplication | 115.68 s | 10.46 s | 735701 | 929134 | $79 \%$ |
| multiplication (max $n=6)$ | 150.77 s | 359.27 s | 736446 | 929917 | $79 \%$ |
| unsigned division | 196.44 s | 5.83 s | 339424 | 340891 | $99 \%$ |
| unsigned remainder | 205.60 s | 45.01 s | 694296 | 758022 | $91 \%$ |
| signed division | 212.79 s | 14.28 s | 331093 | 346955 | $95 \%$ |
| signed remainder | 218.18 s | 95.40 s | 644806 | 759474 | $84 \%$ |

Table 4.3d: Comparison of unit propagation and bit-blasting at $95 \%$. 100000 iterations at 64 bits.
assigned. Prop is the final (not the additional) number of values assigned after running our propagator. Max is the final number of values assigned after running RSY. Found is the percentage of possible additional bits fixed by our propagator.

These tables show that no initial assignment was discovered which caused the propagators we believe to be optimal, and the RSY algorithm to yield different assignments.

The tables also show why custom implementations of the propagators are necessary, versus using the RSY algorithm. For instance, 1,000 32-bit bit-vector exclusive-or propagations at $5 \%$ information took 2.92 seconds, this contrasts to the bit-vector exclusive-or propagator which took (Table 4.3b) 0.11 seconds to propagate on 100,00064 -bit assignments. That is, the custom implementation was approximately 5,000 times faster.

As the amount of known information increased, the RSY algorithm took less time. As the information increases, there are fewer possible assignments that the unassigned variables can take. This reduces the number of times the RSY algorithm needs to call the SAT solver.

To measure what percentage of the possible assignments our propagators derived at low bit-widths, we compare the results of the CNF encoding, and propa-

| Operation | $1 \%$ |  |  |  |  |
| :--- | ---: | ---: | ---: | ---: | ---: |
|  | Time | Initial | Prop. | Max | Found |
| signed greater than equals | 2.48 s | 627 | 627 | 627 | $100 \%$ |
| unsigned less than | 2.81 s | 639 | 639 | 639 | $100 \%$ |
| equals | 2.05 s | 658 | 661 | 661 | $100 \%$ |
| bit-vector xor | 3.12 s | 967 | 971 | 971 | $100 \%$ |
| bit-vector or | 2.51 s | 1189 | 1495 | 1495 | $100 \%$ |
| bit-vector and | 2.75 s | 1111 | 1424 | 1424 | $100 \%$ |
| right shift | 7.18 s | 983 | 4687 | 4687 | $100 \%$ |
| left shift | 7.08 s | 906 | 4648 | 4648 | $100 \%$ |
| arithmetic shift | 7.37 s | 1027 | 1937 | 1937 | $100 \%$ |
| addition | 5.77 s | 956 | 956 | 956 | $100 \%$ |
| multiplication | 58.30 s | 911 | 922 | 922 | $100 \%$ |
| unsigned division | 315.37 s | 967 | 3280 | 3285 | $99 \%$ |
| unsigned remainder | 1092.83 s | 964 | 974 | 974 | $100 \%$ |
| signed division | 159.61 s | 947 | 954 | 955 | $87 \%$ |
| signed remainder | 313.00 s | 947 | 947 | 954 | $0 \%$ |

Table 4.4a: Calculating the effect of the best propagator on 1000 32-bit operands; $1 \%$ of bits provided at random. 'Time' is the time to call both the propagator and RSY. 'Initial' is the number of bits initially randomly set in the instances. 'Prop' is the number of assignments after the propagator finished. 'Max' is the number of assignments after RSY finished. 'Found' is the percentage of possible bits fixed by our propagator.
gators versus the exhaustive approach (section 4.8) at small bit-widths. The results are shown in Table 4.5. The percentage gives the percentage of initial assignments where the exhaustive approach and the propagators or CNF respectively, did not produce the same answer. Unlike the prior tables, this does not count the extra assignments; if the propagators fixed 10 of 11 possible assignments then we count this as a failure. Unlike the prior tables we also generate conflicting assignments, for instance $(1+0)=0$, if unit propagation does not report a conflict, or the propagator does not report a conflict. This again is considered a failure.

Again we see the propagators we expect to be optimal have a $0 \%$ failure rate.
Unit propagation on the CNF encoding of left and right shift is not optimal in $1.8 \%$ of cases for a bit-width of 5 . This shows that the random assignments we produced for the prior tables over-estimated the power of unit propagation on the CNF.

Interestingly, Table 4.5 shows that at 5 -bits the CNF with unit propagation has almost 4 times fewer missed assignments than our multiplication propagator. That is, it is considerably better on the lower order bits than the multiplication propagator

| Operation | $5 \%$ |  |  |  |  |
| :--- | ---: | ---: | ---: | ---: | ---: |
|  | Time | Initial | Prop. | Max | Found |
| signed greater than equals | 2.51 s | 3231 | 3231 | 3231 | $100 \%$ |
| unsigned less than | 2.50 s | 3240 | 3240 | 3240 | $100 \%$ |
| equals | 1.90 s | 3219 | 3257 | 3257 | $100 \%$ |
| bit-vector xor | 2.92 s | 4927 | 5003 | 5003 | $100 \%$ |
| bit-vector or | 2.32 s | 5651 | 7147 | 7147 | $100 \%$ |
| bit-vector and | 2.58 s | 5662 | 7192 | 7192 | $100 \%$ |
| right shift | 5.57 s | 4797 | 19568 | 19568 | $100 \%$ |
| left shift | 5.48 s | 4769 | 20507 | 20507 | $100 \%$ |
| arithmetic shift | 5.91 s | 5223 | 18120 | 18120 | $100 \%$ |
| addition | 5.49 s | 4836 | 4843 | 4843 | $100 \%$ |
| multiplication | 67.55 s | 4782 | 4831 | 4831 | $100 \%$ |
| unsigned division | 235.35 s | 4735 | 14182 | 14208 | $99 \%$ |
| unsigned remainder | 908.75 s | 4685 | 4737 | 4759 | $70 \%$ |
| signed division | 249.20 s | 4751 | 6167 | 6967 | $63 \%$ |
| signed remainder | 576.09 s | 4895 | 4926 | 4971 | $40 \%$ |

Table 4.4b: Calculating the effect of the best propagator on 1000 32-bit operands; $5 \%$ of bits provided at random.

| Operation | $50 \%$ |  |  |  |  |
| :--- | ---: | ---: | ---: | ---: | ---: |
|  | Time | Initial | Prop. | Max | Found |
| signed greater than equals | 1.31 s | 32666 | 32773 | 32773 | $100 \%$ |
| unsigned less than | 1.29 s | 32890 | 32981 | 32981 | $100 \%$ |
| equals | 1.15 s | 32670 | 33152 | 33152 | $100 \%$ |
| bit-vector xor | 1.24 s | 55748 | 59673 | 59673 | $100 \%$ |
| bit-vector or | 1.19 s | 54143 | 62229 | 62229 | $100 \%$ |
| bit-vector and | 1.23 s | 53981 | 61879 | 61879 | $100 \%$ |
| right shift | 2.31 s | 47979 | 64072 | 64072 | $100 \%$ |
| left shift | 2.34 s | 47955 | 63870 | 63870 | $100 \%$ |
| arithmetic shift | 2.51 s | 48572 | 64493 | 64493 | $100 \%$ |
| addition | 2.00 s | 53523 | 56292 | 56292 | $100 \%$ |
| multiplication | 43.74 s | 48740 | 49471 | 50527 | $40 \%$ |
| unsigned division | 52.98 s | 48352 | 62780 | 62864 | $99 \%$ |
| unsigned remainder | 58.35 s | 50631 | 53064 | 56059 | $44 \%$ |
| signed division | 80.45 s | 48597 | 62480 | 62656 | $98 \%$ |
| signed remainder | 105.47 s | 50336 | 52402 | 55361 | $41 \%$ |

Table 4.4c: Calculating the effect of the best propagator on 1000 32-bit operands; $50 \%$ of bits provided at random.
that we have described. Given the smaller number of missed assignments, this might a better basis for the approach we described (section 4.9) for building an optimal multiplication propagator.

Table 4.5 shows that unit propagation applied to STP2's 2-bit addition CNF is not maximally precise. After unit propagation, there are 16 assignments which are not maximally precise. These assignments, excluding those equivalent via commutativity, are:

$$
\begin{aligned}
& (\langle 0 \star\rangle+\langle 0 \star\rangle)=\langle 00\rangle \\
& (\langle 0 \star\rangle+\langle\star \star\rangle)=\langle 01\rangle \\
& (\langle 0 \star\rangle+\langle 0 \star\rangle)=\langle\star 1\rangle \\
& (\langle 0 \star\rangle+\langle\star \star\rangle)=\langle 11\rangle \\
& (\langle 0 \star\rangle+\langle 1 \star\rangle)=\langle\star 1\rangle \\
& (\langle 0 \star\rangle+\langle 1 \star\rangle)=\langle 10\rangle \\
& (\langle\star \star\rangle+\langle 1 \star\rangle)=\langle 01\rangle \\
& (\langle 1 \star\rangle+\langle 0 \star\rangle)=\langle\star 1\rangle \\
& (\langle 1 \star\rangle+\langle 0 \star\rangle)=\langle 10\rangle \\
& (\langle 1 \star\rangle+\langle 1 \star\rangle)=\langle 00\rangle \\
& (\langle\star \star\rangle+\langle 1 \star\rangle)=\langle 11\rangle \\
& (\langle 1 \star\rangle+\langle 1 \star\rangle)=\langle\star 1\rangle
\end{aligned}
$$

In each of the assignments above, the carry is known and can be used to deduce additional bits. Note, these assignments are particular to STP2. For instance, Minisat+ [ES06] creates a CNF for addition operations which is maximally precise under unit propagation.

### 4.11 Related Work

Automatically generating propagators avoids the effort of building efficient propagators. It is practical to automatically generate propagators for simple operations, in particular the bit-wise operations (e.g. bit-vector exclusive-or, or bit-vector and). However, the propagators for more complex operations like multiplication and division that are automatically derived are currently too slow to be useful.

| Operation | 1 bits |  | 2 bits |  | 3 bits |  | 4 bits |  | 5 bits |  |
| :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: |
|  | Prop | BB | Prop | BB | Prop | BB | Prop | BB | Prop | BB |
| unsigned less than | 0.0 | 0.0 | 0.0 | 1.2 | 0.0 | 3.4 | 0.0 | 5.4 | 0.0 | 6.7 |
| equals | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 |
| bit-vector xor | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 |
| bit-vector or | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 |
| bit-vector and | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 |
| right shift | 0.0 | 0.0 | 0.0 | 1.1 | 0.0 | 2.0 | 0.0 | 2.1 | 0.0 | 1.8 |
| left shift | 0.0 | 0.0 | 0.0 | 1.1 | 0.0 | 2.0 | 0.0 | 2.1 | 0.0 | 1.8 |
| addition | 0.0 | 0.0 | 0.0 | 2.2 | 0.0 | 5.2 | 0.0 | 7.9 | 0.0 | 10.0 |
| subtraction | 0.0 | 0.0 | 0.0 | 2.2 | 0.0 | 5.2 | 0.0 | 7.9 | 0.0 | 10.0 |
| multiplication | 0.0 | 0.0 | 0.1 | 0.4 | 3.3 | 1.0 | 7.0 | 1.9 | 11.1 | 2.8 |
| unsigned division | 40.7 | 0.0 | 22.1 | 1.5 | 13.8 | 3.8 | 10.3 | 3.8 | 8.2 | 4.0 |
| unsigned remainder | 22.2 | 7.4 | 28.8 | 16.5 | 27.7 | 20.3 | 25.3 | 23.4 | 23.5 | 25.3 |

Table 4.5: Percentage of all the possible initial assignments at different bit-widths, where the CNF encoding with unit propagation, or our propagators either, missed at least one possible assignment, or missed a conflicting assignment.

Regehr and Duongsaa [RD06] automatically derive propagators for bit-vector operations on 3. The propagators transfer information from operands to results and not vice-versa. Their automatically generated bit-vector xor propagator performs 875,000 32-bit operations per second. However, the multiplication propagator performs only 400 32-bit operations per second. We follow them in testing our propagators by exhaustively taking the join of concrete elements. Regehr and Reid [RR04] also automatically derived $\mathbf{3}$ propagators, but for small bit-widths.

Bardin et al. [BHP10], describe a bit-vector solver with propagators for $\mathbf{3}$ (which they call the BitList domain). If we combined the ability to search and backtrack with the $\mathbf{3}$ propagators that we describe, it would also be a decision procedure. For arithmetic constraints they focus on cheap and correct propagators, as distinct from our work which focuses on precise propagators.

Michel and Van Hentenryck [MV12] give maximally precise propagators for an equivalent domain to 3 . They give algorithms for the bitwise operations, the comparisons, shifting and addition. They focus on bit-vectors which are shorter than the machine's bit-width, so can efficiently be implemented using data parallel machine instructions. They do not investigate multiplication and allied operations, or give experimental results.

Naveh et al. [ $\left.\mathrm{NRJ}^{+} 07\right]$ combine several domains including 3, which they call "set of masks", in a constraint propagation solver. Their paper gives only high level information about the propagators used.

Achterberg [Ach07] uses exclusive-or normal form to propagate both ways (between the operands and result) of bit-wise multiplication. Because a representation in exclusive-or normal form can grow exponentially large, a limit is placed on the size of formulae. This is an alternative approach to our work in section 4.9 for setting the least significant $\star$ bits involved in a multiplication.

Budiu et al.[BSWG00, BG00] use bit propagation, propagating from operands to result, to reduce the size of circuits generated by a compiler for reconfigurable hardware. The technical report gives pseudo-code for the propagators, which propagate other information, but not bit propagation information, from result to operands.

Strided intervals limit the values between the upper and lower bound to also have a fixed number of trailing bits set to zero. Balakrishnan [Bal07] describes strided-interval propagators for some bit-vector operations. Like a bitwise analysis, strided intervals can determine that all the trailing bits are zero.

Jung et al. [JBKW08] convert a type of graph that subsumes BDDs, to maximally precise CNF clauses. This is an alternate approach for achieving a maximally precise CNF encoding of operations like addition.

### 4.12 Conclusion

We described a theory-level analysis to determine the assignment some variables must take. We focused on building precise propagators, in particular we described the implementation of the equality, addition, multiplication and unsigned division propagators.

Our results show that our propagators can propagate information that unit propagation over the CNF does not. Using the propagators resulted in about $10 \%$ fewer failures on the test problems we chose. This shows that useful information is "lost in translation" to CNF. Reasoning at the theory-level may avoid the need to encode some operations to CNF, which for large bit-widths may overwhelm the SAT solver.

Using RSY to produce the effect of propagators allowed us to measure whether bit propagation simplified problems before investing the effort to build efficient implementations of the propagators. We believe this is an under-appreciated advantage of RSY. By solving a problem with bit propagation, using RSY, and then subtracting the time spent in RSY, we measured the maximum possible speed up from applying bit propagation, assuming propagators took zero time. By comparing this to the time taken to solve the problem without bit propagation, the maximum possible speedup can be measured.

The difficulty in applying RSY is that our implementation is slow for some operations. Table 4.4a - Table 4.4c shows the time taken to use RSY on 32-bit values, compared to the results in Table 4.3a - Table 4.3d. For instance, our implementation of RSY propagates arithmetic left shift more than 180 times slower than our hand crafted shift propagator.

The C++ source code for our implementation is included in the publicly available STP source code repository.

After analysis, some rewrites may be possible. For example, given ( $t_{0} \geq_{s} t_{1}$ ), where the top bit of both $t_{0}$ and $t_{1}$ is fixed, the signed comparison can be replaced by a cheaper unsigned comparison. We do not perform such strength reductions.

Adding redundant clauses to improve unit propagation on CNF encodings is possible; Eén and Sörensson [ES06] use redundant clauses to improve unit propagation of their bit-vector addition encoding. The CNF encoding of some operations could be improved by such clauses. The most promising being addition, which has a simple, repeating structure. If extra redundant clauses were added, then unit propagation on the CNF would improve, reducing an advantage of the propagator based approach. Unfortunately, adding such redundant clauses into the bit-blasted (section 3.8) AIG representation can be difficult. The AIGs are good at eliminating such redundancy.

The preference for propagators built for constraint solving is for them to be cheap and correct, rather than precise. Because in bit propagation the propagators are only run twice to global fixed point, we knew that the extra runtime cost of precise propagators was not onerous. However, we believe it is not widely appreciated how straightforward it is to measure a propagator's precision for small domain sizes, as we have done. Unless optimal propagators are first built, then

### 4.12. CONCLUSION

measured, it cannot be conclusively decided whether or not the extra time spent ensuring maximal precision is justified.

In this chapter we showed that:

- applying bit propagation as a pre-processor speeds up solving bit-vector problems;
- building optimal 3 propagators for many operations is practical; and
- measuring the precision of propagators on small domain sizes is a good way to test that they are precise.

Propagators are promising for bit-vector solving because they can express the same logic more compactly than is possible in CNF. In particular, improving the amount of propagation of CNF based representations might require impractically many redundant clauses to be added. Propagators consume less memory because their reasoning does not need to be entirely statically expressed in CNF.

## Building a Better Array Solver

IN the previous chapters, we described how to build a better bit-vector solver. In this chapter we describe how to build a better array solver. STP2 is an open-source solver for a fragment (without array extensionality) of the QF_ABV language, a combined quantifier-free theory of bit-vectors and arrays with extensionality.

Efficiently solving array problems is important for software verification. STP2 did not compete in the QF_ABV division of the SMT-COMP 2011 because it does not implement array extensionality, but it is competitive for QF -ABV problems without extensionality (see subsection 5.6.2).

In this chapter we compare approaches to enforcing the function congruence constraint $(\mathcal{F} C C)$. The $\mathcal{F} C C$ ensures that a relation is in fact a function, that is,

$$
\forall(i, j):(i=j) \Longrightarrow(f(i)=f(j))
$$

We consider the $\mathcal{F} C C$ just for unary functions. An instance of the $\mathcal{F} C C$ enforces the $\mathcal{F} C C$ for two particular applications of the same function. Because STP2 does not handle quantifiers, the $\mathcal{F} C C$ is instantiated for each pair of function applications. If there are $l$ applications of function $f$, then in the worst case the $\mathcal{F} C C$ is instantiated for each distinct pair of function applications, that is, $\frac{l(l-1)}{2}$ times. We refer to the exhaustive approach of asserting all $O\left(l^{2}\right)$ of these $\mathcal{F} C C$ instances as Ackermannization ( $\mathcal{A c k}$ ) ([Ack54] cited by [dMB08a]).

To avoid necessarily asserting quadratically many $\mathcal{F} C C$ instances via $\mathcal{A c k}$, we use the counter-example guided abstraction-refinement approach popularised by Clarke

## CHAPTER 5. BUILDING A BETTER ARRAY SOLVER

et al. [CGJ $\left.{ }^{+} 00\right]$, we refer to the approach as $\mathcal{A l b s r e f}$. $\mathcal{A l b s r e f}$ omits the $\mathcal{F} C C$ when the problem is initially asserted to the SAT solver. If the SAT solver returns a candidate model, then, and only then, outside the SAT solver, is the $\mathcal{F} C C$ checked. If unsatisfied $\mathcal{F} C C$ instances exist, then they are asserted to the SAT solver, otherwise the problem is satisfiable and work stops. Iterating between checking the $\mathcal{F} C C$ and SAT solving continues until either the SAT solver generates a model that satisfies the $\mathcal{F C C}$ (even though all the $\mathcal{F} C C$ instances might not be asserted), or until the SAT solver establishes that the problem is unsatisfiable. Leaving out the $\mathcal{F} C C$ is abstraction, checking the $\mathcal{F} C C$ and iteratively adding in unsatisfied $\mathcal{F} C C$ instances is refinement. The problem is initially under-constrained, then iteratively refined until, in the worst case, it is equisatisfiable with the result of $\mathcal{A c k}$.

Another approach, that we refer to as Delayed Congruence Instantiation (DCI) builds the ability to generate $\mathcal{F} C C$ instances into the SAT solver. This approach avoids the expense of the $\mathcal{A l b s r e f}$ refinement loop, because enforcing the $\mathcal{F} C C$ is performed inside the SAT solver. It is tailored to assert instances of the $\mathcal{F} C C$ close to when they have an effect. There is no widely used name for this type of approach, we were introduced to it as "Lazy Clause Generation" [OSC09]. We give the details of our implementation in section 5.5. In short, whenever the SAT solver makes completely the same assignment to the operands of two function applications, an instance of the $\mathcal{F C C}$ for those two function applications is asserted without leaving the SAT solver.

We operate on first-order quantifier-free formulae with single-dimensional bitvector arrays without extensionality. Both the indices and the values stored in the arrays are fixed-width bit-vectors, that is, they have finite width. It is not possible to use either arrays or propositions as indices or values. An array is a total function from bit-vectors to bit-vectors of a potentially different bit-width.

Let $a$ be a variable ranging over arrays. We use the bit-widths of arrays as superscripts on the variables name, for instance $a_{\ell}^{[2: 3]}$ maps from four 2-bit vectors to 3-bit vectors. In general, we do not distinguish between array literals, and array terms. When it is necessary, we mark array literals with a subscript $\ell$, for example $a_{\ell}$. The functions added for the array theory are:

- Select returns the bit-vector stored at an index; that is, select $\left(a^{[n: m]}, t^{[n]}\right)$ returns the value from $a^{[n: m]}$ at index $t^{[n]}$.
- Store non-destructively creates a new array that replaces the value at an index; store $\left(a^{[n: m]}, t_{0}^{[n]}, t_{1}^{[m]}\right)$ creates a new array the same as $a^{[n: m]}$ except that at index $t_{0}^{[n]}$ the value $t_{1}^{[m]}$ is stored.

More formally the axiom introduced is:

$$
\begin{equation*}
\forall(a, i, j, e)(\operatorname{select}(\operatorname{store}(a, i, e), j)=i \operatorname{te}(i=j, e, \operatorname{select}(a, j))) \tag{5.1}
\end{equation*}
$$

This is the "forwarding property" of hardware verification research: that a select returns the value most recently stored at an index.

- An array if-then-else (ITE) ite $\left(p, a_{0}^{[n: m]}, a_{1}^{[n: m]}\right)$ returns $a_{0}^{[n: m]}$ if $p$ is 1 , and $a_{1}^{[n: m]}$ otherwise.

Select is sometimes called "read", and store is sometimes called "write". Store and array-ITE can be used to create nested array terms.

The QF_ABV theory also includes the extensionality axiom, which states that two arrays are equal if they hold the same value at each index:

$$
\begin{equation*}
\left(a_{0}^{[n: m]}=a_{1}^{[n: m]}\right) \Leftrightarrow \forall(i)\left(\operatorname{select}\left(a_{0}^{[n: m]}, i\right)=\operatorname{select}\left(a_{1}^{[n: m]}, i\right)\right) \tag{5.2}
\end{equation*}
$$

However, we do not allow equality or disequality to be applied to terms of array type. Some array problems with extensionality can be rewritten to equisatisfiable array problems without extensionality by using:

$$
\left(a_{0}^{[n: m]} \neq a_{1}^{[n: m]}\right) \Longrightarrow \exists(i)\left(\operatorname{select}\left(a_{0}^{[n: m]}, i\right) \neq \operatorname{select}\left(a_{1}^{[n: m]}, i\right)\right)
$$

Software verification problems often model memory as a single array with $2^{32}$ or $2^{64}$ possible indices. So, to be practical, a solver must use memory or time which is sub-linear in the number of possible indices.

We use three phases to solve combined bit-vector and array problems. First, the array part of the problem is simplified by applying rewrite rules. Second, array-ITE and store expressions are removed. Third, the $\mathcal{F} C C$ instances corresponding to the remaining select expressions are instantiated by one of three approaches to be discussed below.

Of the three approaches, the $\mathcal{A c k}$ approach that we present is well known. The $\mathcal{A} b s r e f$ approach that we describe was implemented in STP 0.1; we provide a more complete description of the algorithm than has previously been published, but we did not develop the approach. The $\mathcal{D C I}$ approach that we present is novel.

### 5.1 Simplifying

STP2 applies the following rewrite rules to simplify expressions containing arrays. These simplifications are applied when expressions are created, what we call at creation-time (section 3.3). The variables below are typed term variables which match arbitrary expressions, so that distinct variable names can match distinct syntactic expressions. $p$ is an arbitrary expression of propositional type:

$$
\begin{aligned}
& \text { ite }\left(1, a_{0}^{[n: m]}, a_{1}^{[n: m]}\right) \triangleright a_{0}^{[n: m]} \\
& \text { ite }\left(0, a_{0}^{[n: m]}, a_{1}^{[n: m]}\right) \triangleright a_{1}^{[n: m]} \\
& \text { ite }\left(p, a_{0}^{[n: m]}, a_{0}^{[n: m]}\right) \triangleright a_{0}^{[n: m]} \\
& \text { ite }\left(p, a_{0}^{[n: m]}, \text { ite }\left(p, a_{1}^{[n: m]}, a_{2}^{[n: m]}\right)\right) \triangleright \quad \text { ite }\left(p, a_{0}^{[n: m]}, a_{2}^{[n: m]}\right) \\
& \text { ite( }\left(p, \text { ite }\left(p, a_{0}^{[n: m]}, a_{1}^{[n: m]}\right), a_{2}^{[n: m]}\right) \triangleright \operatorname{ite}\left(p, a_{0}^{[n: m]}, a_{2}^{[n: m]}\right) \\
& \text { ite }\left(\operatorname{not}(p), a_{0}^{[n: m]}, a_{1}^{[n: m]}\right) \triangleright \operatorname{ite}\left(p, a_{1}^{[n: m]}, a_{0}^{[n: m]}\right) \\
& \operatorname{store}\left(\operatorname{store}\left(a_{0}^{[n: m]}, i^{[n]}, j^{[m]}\right), i^{[n]}, k^{[m]}\right) \quad \text { store }\left(a_{0}^{[n: m]}, i^{[n]}, k^{[m]}\right) \\
& \operatorname{store}\left(a_{0}^{[n: m]}, t_{0}^{[n]}, \text { select }\left(a_{0}^{[n: m]}, t_{0}^{[n]}\right)\right) \triangleright a_{0}^{[n: m]} \\
& \operatorname{select}\left(\text { store }\left(a_{0}^{[n: m]}, t_{0}^{[n]}, t_{1}^{[m]}\right), t_{0}^{[n]}\right) \triangleright t_{1}^{[m]} \\
& \operatorname{select}\left(\operatorname{store}\left(a_{0}^{[n: m]}, t_{0}^{[n]}, t_{1}^{[m]}\right), t_{2}^{[n]}\right) \triangleright \operatorname{select}\left(a_{0}^{[n: m]}, t_{2}^{[n]}\right) \text {, } \\
& \text { when } t_{0}^{[n]}=t_{2}^{[n]} \text { is equivalent to } 0
\end{aligned}
$$

The last rule is the most complicated. It is a conditional rewrite that checks if the index of a select and the index of a store must be different; if so the store is discarded. This rule applies often, especially when the indices of the select and the store are different constants.

Even for an expression with sharing, these rules do not increase the total number of expressions. They are what we call sharing-aware (section 2.7). If there are already $l$ distinct expressions, after applying these rewrite rules there will be $l$ or fewer expressions.

STP2 also replaces select expressions with both a constant index and a constant result through out the problem. So if $\operatorname{select}\left(a_{\ell}, c_{0}\right)=c_{1}$ is conjoined with the root node, and $c_{0}$ and $c_{1}$ are both constants, then $\operatorname{select}\left(a_{\ell}, c_{0}\right)$ is replaced by $c_{1}$ throughout the problem.

### 5.2 Removing Store and Array-ITEs

Without extensionality, it is straightforward to remove select and array-ITE terms by repeatedly applying Equation 5.3 and Equation 5.4 below to a fixed point. Applying these rewrite rules generates equivalent expressions that contain the array theory axiom (Equation 5.1). After applying these rules the only array terms remaining are select terms. This considerably simplifies the later algorithms; the select terms require just the $\mathcal{F} C C$ to be enforced between them. Applying the equations gives a reduction from the theory of arrays to the theory of uninterpreted functions. Later, when converting to CNF, we further reduce from the theory of uninterpreted functions and bit-vectors to propositional logic.

ITE-lifting removes array-ITEs by converting them to term-ITEs:

$$
\begin{equation*}
\operatorname{select}\left(\operatorname{ite}\left(p, a_{0}^{[n: m]}, a_{1}^{[n: m]}\right), t^{[n]}\right) \triangleright \operatorname{ite}\left(p, \operatorname{select}\left(a_{0}^{[n: m]}, t^{[n]}\right), \operatorname{select}\left(a_{1}^{[n: m]}, t^{[n]}\right)\right) \tag{5.3}
\end{equation*}
$$

Select-over-store elimination, which is Equation 5.1 expressed as a rewrite rule, removes the store function:

$$
\begin{equation*}
\operatorname{select}\left(\operatorname{store}\left(a^{[n: m]}, t_{0}^{[n]}, t_{2}^{[m]}\right), t_{1}^{[n]}\right) \triangleright \operatorname{ite}\left(t_{0}^{[n]}=t_{1}^{[n]}, t_{2}^{[m]}, \operatorname{select}\left(a^{[n: m]}, t_{1}^{[n]}\right)\right) \tag{5.4}
\end{equation*}
$$

A disadvantage of this approach is that, in the worst case, it will introduce a quadratic number of extra expressions. Each application of Equation 5.4 may create a new, unique, term-ITE. Given an expression with sharing, several select expressions can reference either the same ITE or store term. So, when Equation 5.4 is applied it will create a distinct equality term between the read index of the select
term and write index of the store term. Roughly, given $l$ select terms that share a reference to a chain of $m$ store terms, after applying Equation 5.4 to a fixed point, $l \times m$ ITE expressions are introduced. We investigate this quadratic blow-up in subsection 5.6.4.

Applying Equation 5.3 to shared expressions has exponential time complexity if caching is not performed. The number of paths through array-ITEs can grow exponentially in the worst case.

## Example 5.1

As an example of the worst-case behaviour, consider the following conjuncts, where $S_{0}$ is the root node, and $S_{1}$ to $S_{4}$ are syntactic variables:

$$
\begin{aligned}
& S_{0}=\text { ite }\left(p_{0}, S_{1}, S_{2}\right) \\
& S_{1}=\text { ite }\left(p_{1}, S_{2}, S_{3}\right) \\
& S_{2}=\text { ite }\left(p_{2}, S_{3}, S_{4}\right) \\
& S_{3}=\operatorname{ite}\left(p_{3}, S_{4}, a_{0}^{[n: m]}\right) \\
& S_{4}=\operatorname{ite}\left(p_{4}, a_{0}^{[n: m]}, a_{1}^{[n: m]}\right)
\end{aligned}
$$

There are 8 distinct paths to reach $a_{0}^{[n: m]}$. If the same structure is extended to $S_{k}$, then the number of distinct paths to reach $a_{0}^{[n: m]}$ equals the $k^{\text {th }}$ Fibonacci number.

Our implementation performs caching to avoid repeated rewrites of the same expression. If an array-ITE has $l$ select expressions as ancestors, then the array-ITE will be rewritten by Equation 5.3 at most $l$ times.

In the worst case, the total number of applications of Equation 5.3 and Equation 5.4 is quadratic. It is bounded by the total number of select expressions multiplied by the total number of expressions.

## Example 5.2

Applying Equation 5.3 to the expression:

$$
\operatorname{select}\left(\text { ite }\left(b_{0}, \text { ite }\left(b_{1}, a_{0}, a_{1}\right), a_{1}\right), t\right)
$$

gives:

$$
\operatorname{ite}\left(b_{0}, \operatorname{ite}\left(b_{1}, \operatorname{select}\left(a_{0}, t\right), \operatorname{select}\left(a_{1}, t\right)\right), \operatorname{select}\left(a_{1}, t\right)\right)
$$

The initial expression has two references to the shared array-term $a_{1}$. Note, likewise, the rewritten expression has two references to the term $\operatorname{select}\left(a_{1}, t\right)$. The result of applying Equation 5.3 has similar sharing to the initial expression.

### 5.3 Eliminating selects: Ackermannization

In this section we describe two implementations of $\mathcal{A c c}$. Before applying $\mathcal{A c k}$, the array-ITEs and stores are removed, as described in the previous section. $\mathcal{A c k}$ eliminates select terms by instantiating the $\mathcal{F C C}$, that is it reduces to the theory of uninterpreted functions by writing the $\mathcal{F} C C$ into the problem.

If there are $l$ selects with syntactically distinct index terms $t_{0}^{[n]} \ldots t_{l-1}^{[n]}$ from array $a_{\ell}^{[n: m]}$, then $\mathcal{A c k}$ creates all $\frac{l(l-1)}{2} \mathcal{F} C C$ instances:

$$
\forall_{0 \leq j<k<l}\left(\left(t_{j}^{[n]}=t_{k}^{[n]}\right) \Longrightarrow\left(\operatorname{select}\left(a_{\ell}^{[n: m]}, t_{j}^{[n]}\right)=\operatorname{select}\left(a_{\ell}^{[n: m]}, t_{k}^{[n]}\right)\right)\right)
$$

## Example 5.3

If an expression contains 5 select expressions accessing array $a_{0}$, and 10 select expressions accessing array $a_{1}$, then as many as $\frac{5(5-1)}{2}+\frac{10(10-1)}{2}=55 \mathcal{F} C C$ instances are asserted.

The first implementation that we present, which we call $\mathcal{A c} k_{\text {cnf }}$, asserts the $\mathcal{F} C C$ directly to the SAT solver (Algorithm 5.1). $\mathcal{A c} k_{c n f}$ creates the $\mathcal{F} C C$ instances directly as CNF, bypassing the expensive bit-blasting and AIG to CNF encoding steps. Note that line 3 traverses the select terms after they have been topologically sorted, so the index of the select contains no select terms.

We use the $\mathcal{F} C C$ instance CNF representation introduced by Biere and Brummayer [BB08a]. The representation was described in more detail in Section 2.11.6 of Robert Brummayer's PhD thesis [Bru09]. Algorithm 5.2 shows how to generate the CNF. For two selects of index bit-width $n$ and result bit-width $m$, the algorithm adds $1+2 n+2 m$ clauses and creates $n+1$ fresh variables. Note that the clauses allow $e$ to

```
\(\overline{\text { Algorithm } 5.1 ~} \mathcal{A} c k_{\text {cnf }}\) removes select terms by asserting \(\mathcal{F} C C\) instances directly to
the SAT solver. The subterms method returns a list of \(e^{\prime}\) s subterms topologically
sorted.
Require: \(e\) a formula
    Create selects, a set of tuples of an array literal, an index, and a result
    selects \(\leftarrow\}\)
    for all select \((a, i) \in \operatorname{subterms}(e)\) do // in topological order
        Replace select \((a, i)\) in \(e\) by a fresh variable \(v\)
        Add \(\langle a, i, v\rangle\) to selects
    end for
    Assert \(e\) to the SAT solver // All select terms have been removed from \(e\)
    for all distinct array literals \(a \in\) selects do
        Create pairs, a list of 〈index, result〉 pairs
        pairs \(\leftarrow\) selects \([a]\)
        for all \(j\) from (1...(size(pairs) - 1)) do
            for all \(k\) from \((0 \ldots(j-1))\) do
                    if (pairs[j].index \(=\) pairs[k].index) does not equal 0 after bit-vector
    theory-level simplifications then
                    FCC_instance(pairs[j],pairs[k]) // Algorithm 5.2
                    end if
            end for
        end for
    end for
```

```
Algorithm 5.2 Creating \(\mathcal{F} C C\) instances in CNF, after Biere and Brummayer [BB08a,
Bru09].
Require: \(i_{0}^{[n]}, w_{0}^{[m]} \quad / /\) The index and result corresponding to a select
Require: \(i_{1}^{[n]}, w_{1}^{[m]} \quad / /\) The index and result corresponding to another select
    procedure FCC_Instance( \(\left(i_{0}, w_{0}\right)\), \(\left.\left(i_{1}, w_{1}\right)\right)\)
        Create the fresh variables \(v_{0} \ldots v_{n-1}, e\)
        for all \(i\) from \(0 \ldots(n-1)\) do
            Output \(\left(i_{0}[i] \wedge i_{1}[i]\right) \Longrightarrow v_{i}\)
            Output \(\left(\neg i_{0}[i] \wedge \neg i_{1}[i]\right) \Longrightarrow v_{i}\)
        end for
        Output \(\left(v_{0} \wedge \ldots \wedge v_{n-1}\right) \Longrightarrow e \quad / /\) Note: \(\mu(e)\) is 1 when \(\mu\left(i_{0}\right)=\mu\left(i_{1}\right)\)
        for all \(i\) from \(0 \ldots(m-1)\) do
            Output \(\left(e \wedge w_{0}[i]\right) \Longrightarrow w_{1}[i]\)
            Output \(\left(e \wedge w_{1}[i]\right) \Longrightarrow w_{0}[i]\)
        end for
    end procedure
```

be 1 when the indices are different. We have use the implication connective $(\Longrightarrow)$ for readability: note that all formulae output are disjunctions of literals.

## Example 5.4

Consider applying Algorithm 5.2 to assert an $\mathcal{F} C C$ instance for $\operatorname{select}\left(a^{[2: 3]}, t_{0}\right)$, and select $\left(a^{[2: 3]}, t_{1}\right)$, where the selects have been replaced by fresh variables $w_{0}, w_{1}$ respectively. Clauses are asserted to express that the indices' bits are pairwise equal. That is, these clauses are asserted:

$$
\begin{aligned}
\left(t_{0}[0] \wedge t_{1}[0]\right) & \Longrightarrow v_{0} \\
\left(\neg t_{0}[0] \wedge \neg t_{1}[0]\right) & \Longrightarrow v_{0} \\
\left(t_{0}[1] \wedge t_{1}[1]\right) & \Longrightarrow v_{1} \\
\left(\neg t_{0}[1] \wedge \neg t_{1}[1]\right) & \Longrightarrow v_{1}
\end{aligned}
$$

If the indices are pairwise equal then the clause $\left(v_{0} \wedge v_{1} \Longrightarrow e\right)$, forces $e$ to be 1 . When $e$ is 1 , the clauses:

$$
\begin{aligned}
& e \wedge w_{0}[0] \Longrightarrow w_{1}[0] \\
& e \wedge w_{1}[0] \Longrightarrow w_{0}[0] \\
& e \wedge w_{0}[1] \Longrightarrow w_{1}[1] \\
& e \wedge w_{1}[1] \Longrightarrow w_{0}[1] \\
& e \wedge w_{0}[2] \Longrightarrow w_{1}[2] \\
& e \wedge w_{1}[2] \Longrightarrow w_{0}[2]
\end{aligned}
$$

enforce that the bit-vectors $w_{0}^{[3]}$ and $w_{1}^{[3]}$ are the same bitwise.

## Example 5.5

Consider some expression that contains three select sub-expressions: select $(a, 5)$, $\operatorname{select}\left(b, t_{0}\right)$, and $\operatorname{select}\left(a, \operatorname{select}\left(b, t_{0}\right)\right)$.
$\mathcal{A c k} k_{\text {cnf }}$ (Algorithm 5.1) replaces select $(a, 5)$ by $v_{0}, \operatorname{select}\left(b, t_{0}\right)$ by $v_{1}$, and select $\left(a, v_{1}\right)$ by $v_{2}$. Because there is a single select from $b$, no $\mathcal{F} C C$ instances are needed. The $\mathcal{F} C C$ instance for $a$ is: $\left(5=v_{1}\right) \Longrightarrow\left(v_{0}=v_{2}\right)$, which is encoded via Algorithm 5.2.

The $\mathcal{A c} k_{\text {cnf }}$ implementation creates a bit-vector equality between index terms (Algorithm 5.1 line 13), then simplifies it. The simplifications are simple rewrite rules that may simplify the equality to 0 . If two indices are definitely not equal (the equality simplifies to 0 ), then no $\mathcal{F} C C$ instance is asserted. This omits the $\mathcal{F} C C$ instance in cases where indices are obviously not equal, for instance: $(t+4=t)$, $(8=10)$, and $\left(\left(\left(^{[1]}:: t\right)=\left(0^{[1]}:: t\right)\right)\right.$. We assume a DAG expression as input, so the $\mathcal{A c k}$ algorithms have no need to optimise for encountering syntactically identical select terms.

An alternative method is $\mathcal{A c} k_{\text {ite. }}$. It encodes the $\mathcal{F} C C$ as bit-vector terms using ITE expressions. It is given in Algorithm 5.3, and is identical to the $\mathcal{A c k}$ implementation of STP 0.1. The same method was also presented by Manolios et al. [MSV06]. The effect of $\mathcal{A c} k_{i t e}$ is to remove array terms entirely from the problem, reducing the problem to a bit-vector problem.

```
\(\overline{\text { Algorithm } 5.3 ~} \mathcal{A c} k_{\text {ite }}\) which removes select terms by replacing them with term-ITEs.
Require: \(e\), a formula
    : Create pairs, a map from array literals to lists of pairs of bit-vector indices and
    results
    pairs \(\leftarrow\}\)
    for all select \((a, i) \in \operatorname{subterms}(e)\) do // in topological order
        Create \(t\), an expression
        \(t \leftarrow\) a fresh variable \(v\)
        for all \(j\) from \((0 \ldots(\operatorname{size}(\) pairs \([a])-1))\) do
            \(t \leftarrow \operatorname{ITE}(i=\operatorname{pairs}[a][j]\).index, pairs \([a][j]\).result,\(t)\)
        end for
        Replace \(\operatorname{select}(a, i)\) in \(e\) by \(t\)
        Add the pair \((i, v)\) to the front of the list pairs \([a]\)
    end for
    Outpute
```


## Example 5.6

Consider applying $\mathcal{A c} c k_{\text {ite }}$ (Algorithm 5.3), given the selects: select $(a, s), \operatorname{select}(a, t)$, and select $(a, u)$. The first select is replaced by a fresh variable $v_{0}$, the second select is replaced by $\operatorname{ite}\left(t=s, v_{0}, v_{1}\right)$, and the third select is replaced by $\operatorname{ite}\left(u=s, v_{0}, i t e(u=\right.$ $\left.\left.t, v_{1}, v_{2}\right)\right)$. Note that $\mathcal{A c} k_{\text {ite }}$ must generate the term-ITE in a particular order; replacing $\operatorname{select}(a, u)$ with ite $\left(u=t, v_{1}\right.$, ite $\left.\left(u=s, v_{0}, v_{2}\right)\right)$ would be incorrect.

A major advantage of $\mathcal{A c k}$ is simplicity-it generates a single CNF representation of the problem. Because the SAT solver's programmatic interface is not needed, it is easy to change between SAT solvers. A consequence is that it is easy to upgrade to whichever is the best sequential or parallel SAT solver. Another advantage is that global AIG or CNF simplifications can be applied to the CNF. The next two approaches we investigate interface more closely with the SAT solver.

### 5.4 Eliminating selects: Abstraction-Refinement

Software verification problems may have a thousand or more selects from an array. With $\mathcal{A c k}$ this necessitates the inclusion of about five hundred thousand $\mathcal{F C C}$ instances. However, some $\mathcal{F} C C$ instances are unnecessary if:

- they constrain indices that because of other constraints can never be equal,
- they constrain results that because of other constraints can never be different, or,
- the satisfiability of the problem does not depend on the $\mathcal{F C C}$, for instance, if the bit-vector part of the problem alone is unsatisfiable.
$\mathcal{A} b s r e f$ overcomes the main problem of $\mathcal{A c k}$-that all the instances of the $\mathcal{F} C C$ are always sent to the SAT solver-by adding $\mathcal{F C C}$ instances as needed. In the worst case, all the $\mathcal{F} C C$ instances are asserted, so $\mathcal{A l} b$ sref produces a formula equisatisfiable with that from $\mathcal{A c k}$. In the best case, when the satisfiability is determined just by the bit-vector part of the problem, no $\mathcal{F} C C$ instances are asserted.
$\mathcal{A l b s r e f}$ asserts an over-approximation of the problem to the SAT solver, then uses the SAT solver's models to determine which $\mathcal{F} C C$ instances to assert. Initially $\mathcal{A} b s r e f$ omits the $\mathcal{F} C C$. It lets the SAT solver generate a candidate model, and then checks whether that model satisfies the $\mathcal{F} C C . \mathcal{F} C C$ instances that are violated are asserted to the SAT solver. If the resulting formula is unsatisfiable, then work has finished. However, if it is satisfiable, the omitted $\mathcal{F} C C$ instances are checked, unsatisfied instances are asserted, and the SAT solver restarted. Because the SAT solver is solving an increasingly more constrained problem, an important practical consideration is that much of the SAT solver's state can be kept between invocations.

In the worst case, if a single $\mathcal{F} C C$ instance is asserted in each refinement iteration, there will be quadratically many iterations. Each refinement step calls the SAT solver, which has a startup cost. To reduce the worst case, $\mathcal{F C C}$ instances can be asserted even if they currently evaluate to 1 . The $\mathcal{A l}$ ssref algorithm (Algorithm 5.4) of STP has no more refinement iterations than there are distinct select expressions. The $\mathcal{A l b s r e f}$ algorithm of STP2 differs from that of STP 0.1 in that, like $\mathcal{A} c k_{c n f}$, it asserts $\mathcal{F} C C$ instances as CNF using Algorithm 5.2. So the majority of the treatment in Vijay Ganesh's PhD thesis [Gan07] Section 4.5 is still accurate.

## Example 5.7

If one array literal appears in 30 selects, and the other in 100 selects, then there will be at most 130 refinement iterations. However, if instead $\mathcal{F} C C$ instances were added singly, there are $\frac{100 \times 99}{2}+\frac{30 \times 29}{2}=5385$ possible refinement iterations.

Implementations of $\mathcal{A l b s r e f}$ make a trade-off between calling the SAT solver many times and asserting many unnecessary $\mathcal{F} C C$ instances, that is, between asserting unnecessary clauses like $\mathcal{A c k}$ does, and risking quadratically many refinement iterations if single unsatisfied $\mathcal{F} C C$ instances are asserted. Because STP2 asserts extra $\mathcal{F} C C$ instances, for some problems in our suite (see subsection 5.6.3) it is more than two thousand times faster than Boolector, another $\mathcal{A l} b s r e f$ based solver. However, the risk is that the redundant $\mathcal{F C C}$ instances increase the memory required and slow down SAT solving.

It is common in software verification problems for the values at some indices to be specified. The CNF encoding of an equality between a constant and a variable is smaller than the encoding between two variables. The $\mathcal{A} b s$ sef implementation sorts the list of selects so that constant indices are checked first. Sorting the indices does not matter to the $\mathcal{A c k}$ approach because the same number of $\mathcal{F} C C$ instances will be asserted regardless of sorting. By sorting the list of selects we hope for $\mathcal{A l b s r e f}$ to assert fewer CNF clauses overall.

## Example 5.8

Consider applying Algorithm 5.4 to the expression $\left(\left(\operatorname{select}(a, 5)=\operatorname{select}\left(a, t_{0}\right)\right) \wedge\right.$ $\left.\left(\operatorname{select}\left(a, t_{1}\right)=t_{2}\right)\right)$.

```
Algorithm 5.4 STP2's algorithm for \(\mathcal{A l} b s r e f\)
Require: \(e\), a formula
    Create original
    original \(\leftarrow e\)
    Create pairs, a map from array literals to lists of pairs of bit-vector indices and
    results
    pairs \(\leftarrow\}\)
    for all select \((a, i) \in \operatorname{subterms}(e)\) do // in topological order
        Replace \(\operatorname{select}(a, i)\) in \(e\) by a fresh variable \(v\)
        Add the pair \((i, v)\) to the list pairs \([a]\)
    end for
    Assert \(e\) to the SAT solver
    if SAT_solve() is unsatisfiable then
        return unsatisfiable
    end if
    Create: next, later, lists of CNF clauses
    for all distinct array literals \(a\) encountered do
        Sort the indices in pairs[ \(a]\), so that constant indices are first.
        for all \(j\) from \((0 \ldots(\operatorname{size}(\) pairs \([a])-1))\) do
            for all \(k\) from \((j+1 \ldots(\operatorname{size}(\) pairs \([a]-1)))\) do
            if \((\mu(\) pairs \([a][j]\).index \()=\mu(\) pairs \([a][k]\).index \()) \wedge(\mu(\) pairs \([a][j]\).result \() \neq\)
    \(\mu(\) pairs[a][k].result)) then
                next.push(FCC_instance(pairs[a][j], pairs[a][k]))
                    else
                    later.push(FCC_instance(pairs[a][j],pairs[a][k]))
            end if
            end for
            if \(\operatorname{size}(\) next \()>0\) then
                    Assert next to the SAT solver
                    Empty next
                    if SAT_solve() is unsatisfiable then
                    return unsatisfiable
                    else if \(\mu(\) original \()=1\) then
                    return satisfiable
                    end if
            end if
        end for
    end for
    if \(\operatorname{size}(\) later \()>0\) then
        Assert later // Equisatisfiable to \(\mathcal{A c k}\).
        return SAT_solve()
    end if
```

Each select expression is replaced by a fresh variable, giving $\left(v_{0}=v_{1}\right) \wedge\left(v_{2}=t_{2}\right)$, which is asserted to the SAT solver. The original formula will be evaluated with the assignment from the SAT solver.

Consider a model: $\mu\left(v_{0}\right)=\mu\left(v_{1}\right)=5, \mu\left(v_{2}\right)=\mu\left(t_{2}\right)=6, \mu\left(t_{0}\right)=\mu\left(t_{1}\right)=2$. The original formula is evaluated with this assignment. To ensure that the $\mathcal{F} C C$ applies, the result at every index is kept when it is encountered. Because $v_{0}$ is 5 , select $(a, 5)=5$ is stored. Because $t_{0}$ is 2 , $\operatorname{select}(a, 2)=5$. Now $v_{2}=6$, and $t_{1}=2$, but $\operatorname{select}(a, 2)$ has already been set to 5 , so the prior value is used. The original formula evaluates to $(5=5) \wedge(5=6)$, which is 0 .

Refinement is now applied to each distinct pair of select indices. We have $\left(5 \neq \mu\left(t_{0}\right)\right)$, so the instance $\left(\left(5=t_{0}\right) \Longrightarrow\left(v_{0}=v_{1}\right)\right)$ is stored. Similarly, $\left(5 \neq \mu\left(t_{1}\right)\right)$, so the instance $\left(\left(5=t_{1}\right) \Longrightarrow\left(v_{0}=v 2\right)\right)$ is stored. $\left(\mu\left(t_{0}\right)=\mu\left(t_{1}\right)\right) \wedge\left(\mu\left(v_{1}\right) \neq \mu\left(v_{2}\right)\right)$, so the $\mathcal{F} C C$ instance $\left(\left(t_{0}=t_{1}\right) \Longrightarrow\left(v_{1}=v_{2}\right)\right)$ is asserted.

The SAT solver is called, and the process of checking the model and asserting extra $\mathcal{F} C C$ instances iterates.

The number of clauses that are asserted for an $\mathcal{F} C C$ instance depends on whether any values are known. If an index is a constant, then fewer clauses are asserted by variants of Algorithm 5.2.

## Example 5.9

The $\mathcal{F} C C$ instance $\left(2=i^{[n]}\right) \Longrightarrow\left(w^{[m]}=0\right)$ can be encoded as $(\neg i[n-1] \wedge \ldots \wedge i[1] \wedge$ $\neg i[0]) \Longrightarrow e$, and $\forall_{k=0}^{m-1}\left(e \Longrightarrow \neg w_{k}\right)$. This encoding has $m+1$ clauses and 1 fresh variable, versus the $1+2 n+2 m$ clauses and $n+1$ fresh variables in all variables case.

If the result at each possible index is known, and there are other select expressions. Then it is not necessary to create $\mathcal{F} C C$ instances for every distinct pair of selects. The next example shows an instance this occurring. Of the approaches we discuss, the $\mathcal{D C I}$ approach is the best at avoiding generation of unnecessary $\mathcal{F} C C$ instances.

## Example 5.10

Consider a problem where: $\forall_{0 \leq i<16}\left(\operatorname{select}\left(a^{[4: 5]}, i\right)=i\right)$, $\operatorname{select}\left(a^{[4: 5]}, t_{0}^{[4]}\right)=t_{1}^{[5]}$, and $\operatorname{select}\left(a^{[4: 5]}, t_{2}^{[4]}\right)=t_{3}^{[5]}$. The constant indices specify the array's value entirely, so there is no need to enforce $\mathcal{F} C C$ instances between the selects at $t_{0}$ and $t_{2}$; it is
enough to assert them between each of the constant indices and $t_{0}$ and $t_{2}$.

The $\mathcal{A l s s r e f}$ implementation does not detect and share clauses between $\mathcal{F} C C$ instances that are completely or partially identical. Of our implementations, only $\mathcal{A c k} k_{\text {ite }}$ will detect and share clauses between $\mathcal{F} C C$ instances.

## Example 5.11

If $\left(\operatorname{select}\left(a_{0}, t_{0}\right)=\operatorname{select}\left(a_{0}, t_{1}\right)\right)$ and $\left(\operatorname{select}\left(a_{1}, t_{0}\right)=\operatorname{select}\left(a_{1}, t_{1}\right)\right)$, then two $\mathcal{F} C C$ instances have the same left side. Assume $\operatorname{select}\left(a_{0}, t_{0}\right)$ is replaced by $v_{0}, \operatorname{select}\left(a_{0}, t_{1}\right)$ by $v_{1}$, $\operatorname{select}\left(a_{1}, t_{0}\right)$ by $v_{2}$, and $\operatorname{select}\left(a_{1}, t_{1}\right)$ by $v_{3}$. Then the $\mathcal{F C C}$ instances are $\left(\left(t_{0}=t_{1}\right) \Longrightarrow\left(v_{0}=v_{1}\right)\right)$, and $\left(\left(t_{0}=t_{1}\right) \Longrightarrow\left(v_{2}=v_{3}\right)\right)$. The CNF conversion algorithm (Algorithm 5.2) will create duplicate fresh variables to represent that the indices are bitwise equal.
$\mathcal{A} b s r e f$ as implemented by STP2 avoids adding all the $\mathcal{F} C C$ instances, but brings extra problems. The SAT solver may arbitrarily set variables that violate the $\mathcal{F} C C$ when they could have easily been set so that the $\mathcal{F} C C$ holds, requiring unnecessary refinement iterations.

## Example 5.12

Consider two selects, both of which have indices evaluating to 6 , with the corresponding results respectively: $\langle 110\rangle$ and $\langle 11 \star\rangle$. Because the indices are the same, the values must be the same. So the SAT solver must choose a 0 for the final $\star$ value. However, because some $\mathcal{F} C C$ instances have been omitted, it may be set to 1, requiring an extra refinement iteration.

In STP2's $\mathcal{A b s r e f ~ i m p l e m e n t a t i o n , ~ a f t e r ~ f i n d i n g ~ a ~ c a n d i d a t e ~ S A T ~ m o d e l , ~ a l l ~}$ the assignments to SAT variables are discarded. This is the standard behaviour of Minisat's programmatic interface. For instance, if there are 1 million clauses, after the refinement phase asserts extra clauses, the SAT solver finds a satisfying assignment to these 1 million clauses. On a single core of an Intel Q8400 Linux computer, Minisat 2.2 takes about 150 ms to do so. So on such problems, STP 0.1 and STP2 is limited to about 7 refinement iterations per second.

Other than the cost of checking the $\mathcal{F} C C$ and making extra SAT solver attempts, $\mathcal{A} b s r e f$ risks that the over-approximated problem might be more expensive to solve than the original problem.

Consider a hard bit-vector problem conjoined with an easily unsatisfiable array problem, where the array part is: $\left(\operatorname{select}\left(a, t_{0}\right) \neq \operatorname{select}\left(a, t_{1}\right)\right)$ where $t_{0}$ and $t_{1}$, in some opaque manner, are equivalent but syntactically different. Because abstraction replaces each select expression by fresh variables $v_{0}$ and $v_{1}$, the array part becomes $v_{0} \neq v_{1}$, which is trivially satisfiable. So the SAT solver must solve the hard bitvector problem before the problem can be refined. Because the array part of the problem has been abstracted away, and will not be refined until a satisfiable model is produced to the bit-vector problem, the easily unsatisfiable array part is essentially ignored until after a satisfying assignment is found.

We could reduce the overhead per refinement iteration, by allowing clauses to be asserted to the SAT solver during search, as we do in the $\mathcal{D C I}$ approach. However, preventing the SAT solver from being forced to solve a more difficult abstracted problem requires the SAT solver to have information about the $\mathcal{F} C C$. We discuss an approach that does this next.

### 5.5 Eliminating selects: Delayed Congruence Instantiation

The Delayed Congruence Instantiation ( $\mathcal{D C I ) ~ a p p r o a c h , ~ a s ~ w e ~ u s e ~ t h e ~ t e r m , ~ o p e r - ~}$ ates inside the SAT solver. $\mathcal{D C I}$ asserts $\mathcal{F} C C$ instances incrementally, as the indices progressively are assigned the same value. When asserting $\mathcal{F} C C$ instances, $\mathcal{D C I}$ preserves the SAT solver's partial assignment which our $\mathcal{A l s r e f}$ implementation discards. Similarly to the $\mathcal{A c k}$ approach, the $\mathcal{F} C C$ is always enforced-but without the requirement to add every $\mathcal{F} C C$ instance upfront. $\mathcal{D C I}$ thus attempts to overcome the problems of $\mathcal{A c k}$, which introduces $\mathcal{F} C C$ instances too early, and $\mathcal{A} b s r e f$, which introduces $\mathcal{F} C C$ instances too late. A disadvantage, however, is that the implementation of $\mathcal{D C I}$ is intimately tied with a particular SAT solver's implementation.

One way to think of this approach is that the SAT solver operates on an implicit CNF. $\mathcal{F} C C$ instances are instantiated only when they are likely to contribute to unit propagation. So instead of the SAT solver using memory to store clauses which do

```
Algorithm 5.5 Precursor phase to applying \(\mathcal{D C I}\)
Require: \(e\), a formula
    Create selects, a set of tuples of an array literal, an index, and a result
    selects \(\leftarrow\}\)
    for all \(\operatorname{select}(a, i) \in \operatorname{subterms}(e)\) do // in topological order
        Replace select \((a, i)\) in \(e\) by a fresh variable \(v\)
        Add \(\langle a, i, v\rangle\) to the selects
    end for
    Assert \(e\) to the SAT solver
```

not participate in unit propagation, the clauses are stored compactly as lists of select indices and results.

Another way to think of the approach is as a form of $\mathcal{A l b s r e f}$ which is able to enforce that the $\mathcal{F} C C$ is satisfied on partial assignments. The initial CNF asserted to the SAT solver, and the final CNF, in the worst case are identical for $\mathcal{D C I}$ and $\mathcal{A} b s r e f$.

The precursor steps to applying $\mathcal{D C I}$ are given in Algorithm 5.5. First, fresh variables replace select terms. Second, after replacing all the select terms with fresh variables, the problem is asserted to the SAT solver. To enforce the $\mathcal{F} C C$, the $\mathcal{D C I}$ algorithm must know which of the SAT solver's variables, if any, correspond to particular indices and results. So, when the $\mathcal{D C I}$ algorithm is invoked, it is told which variables in the CNF correspond to the indices and results of select terms.

Let us briefly review SAT solver terminology which was introduced in chapter 2. A SAT solver performs unit propagation, which assigns variables that are entailed by the current assignment. SAT solvers also perform search, which heuristically selects an unassigned variable, and assigns it a truth value. Search is performed only when unit propagation is at a fixed point. When assignments are made, the decision level is the number of assignments set via search. A conflict is when the assignment is inconsistent. A cancel undoes the work performed beyond a decision level. A trail is a list of pairs $(v, \ell)$, where $v$ is a variable that has been assigned a value, and $\ell$ is the decision level at which it happened.

The $\mathcal{D C I}$ algorithm (Algorithm 5.6) we first present is simplified to highlight its key features. A more efficient and complete implementation is given later (Algorithm 5.10). $\mathcal{D C I}$ is run alternately with unit propagation, until neither causes any changes. Then search is performed. The $\mathcal{D C I}$ algorithm does not change the assignments to variables, it simply asserts $\mathcal{F} C C$ instances. If the indices of two

```
\(\overline{\text { Algorithm 5.6 A simple } \mathcal{D C I ~ a l g o r i t h m . ~ R u n ~ a f t e r ~ u n i t ~ p r o p a g a t i o n ~ i n s i d e ~ t h e ~ S A T ~}}\)
solver.
Require: selects // A set of tuples of an array literal, an index, and a result
Require: knownIndices // A map from array literals to a map from integer indices
    to a list of \(\langle\) index, result \(\rangle\) pairs.
    procedure DCI (selects, knownIndices)
        for all \(\langle\) array, \(k\), index, result \(\rangle \in\) knownIndices do
            if \(\mu\) (index) contains a \(\star\) value then
                    if knownIndices[array][k][0] = index, result \(\rangle\) then
                                    // Instantiate the \(\mathcal{F} C C\) between the new \(0^{\text {th }}\) and others
                for all \(i \in 2 \ldots(\) size(knownIndices[array] \([k])-1)\) do
                    FCC_Instance(knownIndices[array][k][1], knownIndices[array][k][i])
                        end for
                    end if
                    remove \(\langle\) index, result \(\rangle\) from knownIndices[array][k]
            end if
        end for
        for all \(\langle\) array, index, result \(\rangle \in\) selects do
            if \(\mu\) (index) contains no \(\star\) value then
                Create: integer \(k \leftarrow \mu\) (index)
                if \(\langle\) index, result \(\rangle \notin\) knownIndices[array] \([k]\) then
                    knownIndices[array][k].add(〈array, index, result \(\rangle)\)
                        if \(\operatorname{size}(\) (knownIndices[array][k]) \(>1\) then
                                    \(/ / \mathcal{F} C C\) instance between \(0^{\text {th }}\) and newly assigned
                    FCC_Instance(knownIndices[array][k][0], <index, result〉)
                    end if
                end if
            end if
        end for
    end procedure
```

selects have the same assignment, but the results have different assignments, then the $\mathcal{D C I}$ algorithm will generate a conflict. After asserting an $\mathcal{F C C}$ instance, unit propagation is applied, the process repeats until neither cause a change. Then search is performed.
$\mathcal{D C I}$ introduces $\mathcal{F} C C$ instances between indices when they first have the same propositional assignment. For each array, it maintains a list of selects with completely specified indices, that is, the index contains no $\star$ values. When the final bit of an index is assigned, a lookup is performed to find other indices with the same assignment. If the index is currently the only index with that assignment, no $\mathcal{F} C C$ instance is asserted. However, if other indices evaluate to the same integer with the assignment, then an $\mathcal{F} C C$ instance is asserted.

The knownIndices map of Algorithm 5.6 holds a list of pairs which have indices that are currently assigned to the same integer. If $l$, where $l>1$ pairs have the same index assignment, then the algorithm instantiates $l-1 \mathcal{F} C C$ instances, between the zeroeth pair and each other pair.

Each index that is completely assigned, is stored in the knownIndices map. For each integer value, a list of the index/result pairs where the index evaluates to that same integer value is stored. Algorithm 5.6 maintains the invariant that for every $k$ there is an $\mathcal{F} C C$ instance asserted between knownIndices $[k][0]$ and knownIndices $[k][i]$, such that $i>0$.

An unapparent property of Algorithm 5.6 is the following: Consider the indices stored in the list knownIndices[array][k]. The decision levels at which they became totally assigned is monotonically increasing. Hence, when removing indices from the knownIndices[array][k] list, it is not necessary for the procedure to consider the impossible case when the zeroeth element of a list is removed, and other elements remain that require $\mathcal{F} C C$ instances to be asserted between them.

## Example 5.13

Consider three pairs $i_{0}, i_{1}, i_{2}$ stored at knownIndices[array][k]. Two $\mathcal{F} C C$ instances are asserted between these pairs when they are added to the list, that is, between $i_{0}$ and $i_{1}$, and between $i_{0}$ and $i_{2}$. If $i_{0}$ could be removed from the list, while $i_{1}$ and $i_{2}$ remained, then it would be necessary, when $i_{0}$ was removed, to assert an $\mathcal{F} C C$ instance between $i_{1}$ and $i_{2}$. However, because the decision level at which the indices were fully assigned is monotonically increasing, that is, $i_{0}$ has the lowest, or the equal lowest decision level, it is not possible for $i_{0}$ to be removed, while the others remain in the list.

The more straightforward approach is to instantiate the $\mathcal{F} C C$ between all the selects when an index first become known. By asserting the $\mathcal{F} C C$ instances just between the zeroeth entry and the others, we hope to sometimes avoid asserting quadratically many $\mathcal{F} C C$ instances.

## Example 5.14

Suppose the array $a^{[2: 3]}$ appears in 3 selects $\left\{\left\langle i_{0}, v_{0}\right\rangle,\left\langle i_{1}, v_{1}\right\rangle,\left\langle i_{2}, v_{2}\right\rangle\right\}$. Here the $i$ are vectors of SAT variables corresponding to the selects' indices, and the $v$ s are the SAT variables corresponding to the fresh variables that replaced the selects.

Suppose the assignment to the indices is $\mu\left(i_{0}\right)=\langle 1 \star\rangle, \mu\left(i_{1}\right)=\langle 10\rangle, \mu\left(i_{2}\right)=\langle 1 \star\rangle$, and assume that the SAT solver replaces the $\star$ assignment of $i_{0}$ by 0 . As the assignments to the indices $i_{1}$ and $i_{0}$ are now equal, an $\mathcal{F} C C$ instance is asserted, namely $\left(i_{1}=i_{0}\right) \Longrightarrow\left(v_{1}=v_{0}\right)$. We cannot assert that $\left(v_{1}=v_{0}\right)$ because the assignments to $i_{1}$ and $i_{0}$ might be changed later.

If the $\star$ assignment of $i_{2}$ is replaced by 0 , then another $\mathcal{F} C C$ instance is asserted: $\left(i_{1}=i_{2}\right) \Longrightarrow\left(v_{1}=v_{2}\right)$. Note that in this case, 3 indices have the same assignment, and only $2 \mathcal{F} C C$ instances have been asserted.

We should say that our implementation of the $\mathcal{D C I}$ approach sometimes asserts $\mathcal{F} C C$ instances after they would first have been useful. This is because $\mathcal{F} C C$ instances are added only after an index is entirely known. As a result, the SAT solver may assign literals wrongly, even when bits are deducible from the values assigned to other indices and values. This causes the SAT solver to perform extra work. The next example clarifies this point.

## Example 5.15

Suppose the array $a^{[3: 2]}$ appears in three selects, $\left\{\left\langle i_{0}, v_{0}\right\rangle,\left\langle i_{1}, v_{1}\right\rangle,\left\langle i_{2}, v_{2}\right\rangle\right\}$, and suppose the assignments are $i_{0}=\langle 10 \star\rangle, v_{0}=\langle 0 \star\rangle, i_{1}=\langle 100\rangle, v_{1}=\langle 00\rangle, i_{2}=\langle 101\rangle$, and $v_{2}=\langle 00\rangle$. Then the $\star$ value of $v_{0}$ must be 0 , because if $i_{0}$ is $\langle 100\rangle$, it is forced to be 0 , and if $i_{0}$ is $\langle 101\rangle$ it is also forced to be 0 .

Our $\mathcal{D C I}$ implementation may also assert some $\mathcal{F} C C$ instances before they are useful. As the next example shows, $\mathcal{F C C}$ instances can be asserted even if the values assigned to the results of two selects are identical.

## Example 5.16

Suppose the array $a^{[2: 2]}$ appears in two selects $\left\{\left\langle i_{0}, v_{0}\right\rangle,\left\langle i_{1}, v_{1}\right\rangle\right\}$. If both results are completely assigned, with $\mu\left(v_{0}\right)=\mu\left(v_{1}\right)$, and $\mu\left(i_{0}\right)$ becomes equal to $\mu\left(i_{1}\right)$, then the $\mathcal{F C C}$ instance $\left(\left(i_{0}=i_{1}\right) \Longrightarrow\left(v_{0}=v_{1}\right)\right)$ is asserted, even though it does not yet enable unit propagation.

The $\mathcal{D C I}$ algorithm that we have presented so far is inefficient. We now describe a version of the $\mathcal{D C I}$ algorithm which more closely matches our $\mathcal{D C I}$ implementation (Algorithm 5.8 - Algorithm 5.10 ).

We give the $\mathcal{D C I}$ algorithm in three parts corresponding to procedures that we built into the SAT solver in three places. The precursor procedure (Algorithm 5.8) initialises the state that the other procedures use; it is run before SAT solving begins. The second runs after the SAT solver's cancel function, which deletes assignments to variables (Algorithm 5.9). A third procedure runs after unit propagation (Algorithm 5.10).

Algorithm 5.7 shows where the $\mathcal{D C I}$ procedures fits in the SAT solver.

An improvement to the simple $\mathcal{D C I}$ algorithm we presented (Algorithm 5.6) is to use a one-watched literal scheme to determine when the final variable of an index is assigned. This corresponds to line 7 in Algorithm 5.10. After unit propagation, the list of variables that were recently assigned is iterated through. The trail is recorded by unit propagation; it records which variables have been set. By iterating through the trail, it is easy to check if a watched variable has been assigned. If the watched literal has been assigned, then each of the variables of the index are checked in turn to see if they are unassigned. If some other index variable is unassigned, then the watchlist is updated to refer to that variable. However, if no unassigned index variables remain, then the index is entirely assigned, so an $\mathcal{F} C C$ instance is asserted.

Another improvement is to record if $\mathcal{F} C C$ instances have been asserted between a pair of selects already. Before outputting $\mathcal{F} C C$ instances, a check is performed so that duplicate $\mathcal{F} C C$ instances are not asserted.

```
Algorithm 5.7 The \(\operatorname{DCI}\) algorithm implemented with a SAT solver. Given for the
one array case.
Require: \(e\) a formula
    Create integer decision_level \(\leftarrow 0 \quad / /\) the integer decision level
    Create \(\mu \quad / /\) assignments from variables to: \(\{0,1\}\)
    Create selects, a set of index/result pairs
    selects \(\leftarrow\}\)
    for all select \((a, i) \in \operatorname{subterms}(e)\) do // in topological order
        Replace \(\operatorname{select}(a, i)\) in \(e\) by a fresh variable \(v\)
        Add \(\langle i, v\rangle\) to the selects
    end for
    Convert \(e\) to CNF and assert to the SAT Solver
    PERFORM_PRECURSOR(selects) // Algorithm 5.8
    while some variable is not in \(\mu\) do
        while the size of the trail changes, and no conflict do
            Perform unit propagation
                assert_FCC( ) // Algorithm 5.10
        end while
        if a conflict occurred then
            Analyse the conflict
            Assert the conflict clause
            if decision_level \(=0\) then
                    return unsatisfiable
            end if
            Undo assignments until \(\mu\) is not in conflict
            Update the decision_level
            delete_watched( ) // Algorithm 5.9
        else
            if some variable is not in \(\mu\) then
                    Set a variable not in \(\mu\) to 1 or 0
                    Increment the decision_level
            end if
        end if
    end while
    return satisfiable
```


## Example 5.17

Suppose the assignment to some index is $\langle 111 \star \star\rangle$, and that the zeroeth literal is being watched. If, after unit propagation, the assignment becomes $\langle 111 \star 1\rangle$, then, the watched literal is no longer unassigned, so an iteration through each of the variables is performed. In this case, because the first literal is $\star$, it will become the new watched literal.

```
\(\overline{\text { Algorithm 5.8 Precursor steps for an improved } \mathcal{D C I} \text { algorithm. Given for the one }}\)
array case.
Require: pairs, a lists of pairs of bit-vector indices and results
    procedure PERFORM_PRECURSOR(pairs)
        Create integer checked \(\leftarrow 0\)
        Create dci_watchlist, a map from index variables to index/result pairs
        for all \(\langle\) index, result \(\rangle \in\) pairs do
            dci_watchlist[v].add( \(\langle\) index, result \(\rangle\) ), where \(v \in\) index \(\wedge \mu(v)=\star\)
        end for
        Create dci_trail, a list of tuples of decision level, integer assignment, index,
    result variables
        Create knownIndices, a map from integer indices to lists of index/result pairs
        Create asserted, a set of \(\mathcal{F} C C\) instances that have already been asserted
        knownIndices \(\leftarrow\) asserted \(\leftarrow\}\)
    end procedure
```

```
Algorithm 5.9 Steps performed after cancel for an improved \(\mathcal{D C I}\) algorithm. Uses
the variables defined in Algorithm 5.8. Given for the one array case.
    procedure delete_watched // This runs after the cancel function
        while size(dci_trail) > 1 do
            \(\langle\) level, \(k\), index, result \(\rangle \leftarrow\) dci_trail[size(dci_trail) - 1]
            if level < decision_level then
                    return
            end if
                    // At least one index variable is now unassigned
            Delete the last element of the dci_trail list
            Add \(\langle\) index, result \(\rangle\) to the dci_watchlist
            Remove \(\langle\) index, result \(\rangle\) from knownIndices[ \(k]\)
        end while
    end procedure
```

To speed up removal from the knownIndices map, which needs to occur whenever a variable in an index becomes unassigned, the decision level at which an index became entirely assigned is stored. In the algorithms we present, this is the role of the dci_trail. The algorithm that runs during cancellation (backtracking) Algorithm 5.9, uses the dci_trail to efficiently remove entries from the knownIndices map.

An invariant of this improved version is that the decision levels at which the indices stored in knownIndices[ $k$ ] became fully assigned is monotonically increasing. Because delete_watched is performed from higher to lower decision levels, the knownIndices[k][0] element will only be removed from the list when it is the only element in the list. Owing to this invariant, it is not necessary for the delete_watched

```
Algorithm 5.10 Steps after unit propagation for an improved DCI algorithm. Uses
the variables defined in Algorithm 5.8. Given for the one array case.
    procedure assert_FCC // This runs after unit propagation
        for \(k=\) checked \(\ldots(\) size \((\) trail \()-1)\) do
            if dci_watchlist(trail[k]) then
                index \(\leftarrow\) dci_watchlist(trail[k]).index // the index variables
                result \(\leftarrow\) dci_watchlist \((\) trail \([k])\).result // the result variables
                Delete trail \([k]\) from the dci_watchlist
                if another index variable \(v_{j} \in \operatorname{index}\) is \(\mu\left(v_{j}\right)=\star\) then
                                    // Another variable is unassigned
                    dci_watchlist.add( \(v_{j}\),index, result)
            else
                                    // No unassigned index variables
                Create integer \(k \leftarrow \mu(\) index \()\)
                    Add \(\langle\) index, result \(\rangle\) to knownIndices[ \(k\) ]
                        dci_trail.add(〈decision_level, \(k\), index, result \(\rangle\) )
                if (size \((k n o w n I n d i c e s[k])>1) \wedge((k n o w n I n d i c e s[k][0]\), select \() \notin\)
    asserted) then
                                    // Have not asserted the FCC already
                                    asserted.add(knownIndices[k][0], select)
                                    FCC_Instance(knownIndices[ \(k][0]\), select)
                                    if the SAT solver is now in conflict then
                                    // The indices are the same, but the results different
                                    return a conflict clause
                                    end if
                end if
            end if
        end if
    end for
    end procedure
    checked \(\leftarrow \operatorname{size}(\) trail \() \quad / /\) Track how much of the trail is checked
```

procedure to consider the case when the zeroeth element of a list knownIndices $[k]$ is removed. The list will always be empty in that case.

When assignments are cancelled, entries are removed from the knownIndices map until the decision level of the assignment equals the current decision level. When the decision level is less than the level at which the index became entirely assigned, then that index is no longer entirely assigned, so the entry must be removed from the knownIndices map.

### 5.6 Evaluation

We base our evaluations on the SMT-LIB QF_ABV problems ${ }^{1}$ as at the $1^{\text {st }}$ of September 2011. We removed the BrummayerBiere3 family, because they are crafted problems that test an unconstrained array simplification which STP2 does not implement. Next, we removed problems containing array extensionality. Finally, we removed 22 problems that use index bit-widths greater than 64-bits, which is currently a limit of our $\mathcal{D C I}$ implementation. Our $\mathcal{D C I ~ i m p l e m e n t a t i o n ~ u s e s ~ n a t i v e ~ m a c h i n e ~ i n t e g e r s ~}$ rather than arbitrary precision integers to store the known indices (Algorithm 5.10). We were left with 9796 problems.

Compared to SMT-COMP 2011, first, we do not use a problem scrambler. The problem scrambler randomly applies simple transformations like swapping the arguments to commutative operations to make cheating via pattern matching harder in the competition. The problems are unscrambled because scrambling produces results that are harder to reproduce. Second, we are using problems with an unknown satisfiability status. Third, we have excluded problems with array extensionality. Finally, we have not checked whether the satisfiability of problems depends on the semantics of division by zero. We compared the solvers' results for each problem. When multiple solvers answered the same problem they always had the same answer.

In general, the problems have few selects with indices that may be equal. The largest number of selects from a single array is 6096 , but all those indices are constants, meaning that no $\mathcal{F} C C$ instances are generated. Of problems that could be successfully converted to CNF, the most $\mathcal{F} C C$ instances added by $\mathcal{A c} k_{\text {ite }}$ was 9600 . Problems in the check family required more $\mathcal{F} C C$ instances, but they both exceeded the memory limit on all solvers.

### 5.6.1 A Comparison of Two $\mathcal{A} c k$ Implementations

We compare the performance of $\mathcal{A c} c k_{\text {ite }}$ and $\mathcal{A c} k_{\text {cnf }}$ combined with two SAT solvers. An advantage of $\mathcal{A c k}$ is that any SAT solver that reads DIMACS CNF format can easily be used. Minisat 2.2, the default SAT solver of STP2, placed sixteenth of the twenty-six solvers at the 2011 SAT Competition in the Application UNSAT+SAT

[^3]|  |  | $\mathcal{A} c k_{\text {cnf }}$ |  | $\mathcal{A l c k}_{\text {cnf }}+\mathcal{G}$ |  | $\mathcal{A c k}{ }_{\text {ite }}$ |  | $\mathcal{A} c k_{\text {ite }}+\mathcal{G}$ |  |
| :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: |
| Family | \# | time | fail | time | fail | time | fail | time | fail |
| bench_ab | 119 | 0 |  | 2 |  | 1 |  | 4 |  |
| brummayerbiere | 56 | 912 | 27 | 766 | 25 | 1177 | 26 | 864 | 25 |
| brummayerbiere2 | 22 | 931 | 1 | 783 |  | 566 | 1 | 480 |  |
| calc2 | 16 | 639 |  | 298 |  | 614 |  | 287 |  |
| check | 2 | 0 | 2 | 0 | 2 | 0 | 2 | 0 | 2 |
| dwp_formula | 1750 | 373 | 1 | 659 |  | 161 | 1 | 713 |  |
| egt | 7719 | 37 |  | 95 |  | 53 |  | 112 |  |
| platania | 20 | 217 |  | 208 |  | 230 |  | 714 |  |
| stp | 40 | 335 | 1 | 838 | 1 | 359 | 1 | 791 | 1 |
| stp_sample | 52 | 2 |  | 3 |  | 2 |  | 3 |  |
| Sum | 9796 | 3447 | 32 | 3652 | 28 | 3163 | 31 | 3969 | 28 |
| Time w/ penalty |  | 19479s |  | 17680s |  | 18694s |  | 17997s |  |

Table 5.1: $\mathcal{G}$ is Glucose. For each family and solver: 'time' is the time in seconds that solved problems took to complete; 'fail' is the number of problems that exceeded the time limit or memory limit. The 'Time w/ penalty' is the sum of 'time' plus 501 seconds per failed problem. ' $\#$ ' is the number of problems in a family.
division. We also evaluate with the first placed Glucose 2.0 [AS09] SAT solver. We use the same Glucose configuration as was submitted to the SAT Competition; it calls the SatELite [EB05] CNF simplifier before solving.

Tests were run using STP r1656 with a memory limit of 3GB and a time limit of 500 seconds on a single core of an Intel E5507 Linux computer.

STP2's default strategy, which we disabled, is to perform $\mathcal{A c} c k_{\text {ite }}$ upfront for problems with few array expressions.

Table 5.1 shows the results for the four $\mathcal{A c k}$ configurations. The combinations with SatELite and the Glucose SAT solver answer 3 or 4 more problems than does Minisat 2.2. This demonstrates the advantage of being able to easily upgrade SAT solver. Otherwise there are only small differences in performance.

### 5.6.2 A Comparison to Other Solvers

Next we ran Boolector 1.5.23, Sonolar r2483, Z33.2, and STP2 r1656 with both $\mathcal{A l b s r e f}$ and $\mathcal{D C I}$ on the selected SMT-LIB problems. Boolector won the 2011 SMT-COMP QF_ABV division, Z3 was second, and Sonolar was third. We also include the fastest $\mathcal{A} c k$ variant. The results are shown in Table 5.2.

All variants of STP2 solve at least 10 more problems than Boolector 1.5.23, the nearest competitor.

|  | Boole | ctor | Sono |  | $\mathcal{A} b s$ |  | $\mathcal{A} c k$ | +G | $\mathcal{D}$ |  | Z3 |  |
| :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: |
| Family | time | fail | time | fail | time | fail | time | fail | time | fail | time | fail |
| bench_ab | 6 | 1 | 5 |  | 0 |  | 2 |  | 0 |  | 2 |  |
| brummayerbiere | 996 | 27 | 1313 | 29 | 830 | 27 | 766 | 25 | 639 | 28 | 1093 | 25 |
| brummayerbiere2 | 799 | 6 | 790 | 7 | 604 | 1 | 783 |  | 737 | 1 | 957 | 14 |
| calc2 | 1283 |  | 493 | 4 | 620 |  | 298 |  | 611 |  | 614 | 4 |
| check | 0 | 2 | 0 | 2 | 0 | 2 | 0 | 2 | 0 | 2 | 0 | 2 |
| dwp_formulas | 677 | 4 | 247 | 4 | 144 |  | 659 |  | 139 |  | 990 | 8 |
| egt | 407 |  | 964 |  | 32 |  | 95 |  | 31 |  | 121 |  |
| platania | 94 |  | 1922 | 4 | 187 |  | 208 |  | 230 |  | 6 |  |
| stp | 1072 | 2 | 922 | 1 | 421 | 1 | 838 | 1 | 328 | 1 | 1516 | 15 |
| stp_samples | 4 |  | 3 |  | 2 |  | 3 |  | 2 |  | 3 |  |
| Sum | 5338 | 42 | 6658 | 51 | 2839 | 31 | 3652 | 28 | 2718 | 32 | 5303 | 68 |
| Time w/ penalty | 26380s |  | 32209s |  | 18370s |  | 17680s |  | $18750 \mathrm{~s}$ |  | 39371s |  |

Table 5.2: $\mathcal{G}$ is Glucose. For each family and solver: 'time' is the time in seconds that solved problems took to complete; 'fail' is the number of problems that exceeded the time limit or memory limit. The 'Time w/ penalty' is the sum of 'time' plus 501 seconds per failed problem. See Table 5.1 for the number of problems in each family.

| Solver | Native | Pre-process \& Solve |
| :--- | ---: | :---: |
| $\mathcal{A} b s r e f$ | 0.5 s | 0.7 s |
| $\mathcal{D C I}$ | 0.5 s | 0.7 s |
| $\mathcal{A} c k_{\text {cnf }}+$ glucose | 3.2 s | 1.2 s |
| Boolector 1.5.23 | 33.3 s | 5.0 s |
| Sonolar 2483 | 36.5 s | 1.8 s |
| Z3 3.2 | $>1500 \mathrm{~s}$ | 5.1 s |

Table 5.3: Time in seconds to solve the countbitstableoffbyone0128 benchmark. The first column give times using the specified solver. The second column gives times using STP2 with $\mathcal{A c k} k_{\text {ite }}$ as a pre-processor, then running the specified solver.

In particular, the STP2 variants are consistently better than other solvers for the brummayerbiere2 family. Table 5.3 gives the solvers' times, on a single core of an Intel Q8400 Linux computer, for one of the instances in that family, the countbitstableoffbyone0128 benchmark. STP2 with $\mathcal{A l b s r e f}$ is more than three thousand times faster than Z3. To try Z3 with $\mathcal{A c} k_{i t e}$, we used STP2 as a pre-processor, with most simplification disabled, to parse the problem, structurally hash it, perform $\mathcal{A c} k_{\text {ite }}$, and write the result out in SMT-LIB2 format. $\mathcal{A c} k_{\text {ite }}$ is the only approach of the four that we investigated that reduces to a bit-vector problem-a format which other solvers can input. The times to perform this preprocessing step and to run each solver are given. Including the time for pre-processing, the other solvers are between six and
three hundred times faster. For this problem, clearly the $\mathcal{A} c k_{\text {ite }}$ approach is superior to the approaches the other solvers currently implement.

On the platania family, Z 3 was by far the fastest solver, about 15 times faster than Boolector, and 31 times faster than STP2. The platania family is the only one for which STP2 is not competitive.

The ff.stp problem, from the STP set, exceeded the 3GB memory limit on all solvers. We reran it on a machine with 64GB of memory. STP2 with $\mathcal{D C I}$ and with $\mathcal{A c k}$ solved the problem in about 1000 seconds using 28GB of memory. The size of the CNF produced was about 95 million clauses in both cases. So many bit-vector constraints, rather than the array constraints, make this problem hard to solve.

Surprisingly, the approach we used to enforcing the $\mathcal{F} C C$ makes little difference to the overall result. The SMT-LIB problems we selected required no more than ten thousand $\mathcal{F} C C$ instances, which was not enough to contrasts the differences between $\mathcal{F} C C$ instantiation approaches. In the next section we evaluate the $\mathcal{F} C C$ approaches on a more difficult array problem.

### 5.6.3 A Problem Requiring Many $\mathcal{F} C C$ Instances

It is easy to build a more difficult array problem than those we have encountered so far. Given a bit-width $n$, for $2^{n}$ fresh variables $\left(v_{0} \ldots v_{2^{n}-1}\right)$ we now assert: $\forall_{i \in 0 \ldots\left(2^{n}-1\right)}\left(\operatorname{select}\left(a^{[n: n]}, i\right)=i \wedge \operatorname{select}\left(a^{[n: n]}, v_{i}\right)=i\right)$.

## Example 5.18

With $n=1$, the constraints are

$$
\begin{aligned}
& \operatorname{select}\left(a^{[1: 1]}, 0^{[1]}\right)=0^{[1]} \\
& \operatorname{select}\left(a^{[1: 1]}, 1^{[1]}\right)=1^{[1]} \\
& \text { select }\left(a^{[1: 1]}, v_{0}^{[1]}\right)=0^{[1]} \\
& \text { select }\left(a^{[1: 1]}, v_{1}^{[1]}\right)=1^{[1]}
\end{aligned}
$$

This problem creates fresh variables and constrains them to equal the value stored at the same index. For instance, $v_{6}$ must equal 6 . With $n=8$, there are 512
distinct select terms, but 256 of those have constant indices which are obviously not equal to each other. At worst, $\left(2^{2 n}+\frac{2^{n}\left(2^{n}-1\right)}{2}\right) \mathcal{F C C}$ instances are instantiated by $\mathcal{A} b s r e f$ and $\mathcal{A c k}$.

Because the result at every constant index is known initially, the $\mathcal{D C I}$ algorithm will only assert $\mathcal{F C C}$ instances between pairs of selects with one constant index and one variable index. For $\mathcal{D C I}$, the maximum number of $\mathcal{F} C C$ instances asserted for this problem is $2^{2 n}$.

For $n=8, \mathcal{A c k}$ and $\mathcal{A l b s r e f}$ will instantiate at most 98,176 instances, and $\mathcal{D C I}$ at most 65,536 instances. We ran the experiments in this section on a single core of a Intel Q8400 Linux computer, with a memory limit of 4GB. We observed these results:

- Z3 3.2: 0.5 seconds, 19MB
- STP2 r1659 with $\mathcal{D C I}: 0.5$ seconds, 32 MB , asserting $31,547 \mathcal{F} C C$ instances
- STP2 r1659 with $\mathcal{A c} k_{\text {cnf }}: 3.4$ seconds, 104 MB , asserting $98,176 \mathcal{F} C C$ instances
- STP2 r1659 with $\mathcal{A l b s r e f : ~} 17$ seconds, 106 MB , with 258 refinement iterations, asserting 98,176 $\mathcal{F} C C$ instances
- STP2 r1659 with $\mathcal{A c} k_{\text {ite }}: 19$ seconds, 718 MB , asserting $98,176 \mathcal{F} C C$ instances
- Boolector 1.5.23: 41,400 seconds, $109 \mathrm{MB}, 67,062$ refinement iterations

With $\mathcal{A l s s r e f}$, when the refinement limit is reached (line 35 of Algorithm 5.4), about $65,000 \mathcal{F} C C$ instances are asserted at once to the SAT solver. $\mathcal{A c k}$ ite uses about 7 times more memory than $\mathcal{A l b s r e f}$, even though both versions send the same number of $\mathcal{F} C C$ instances to the SAT solver. This is because $\mathcal{A} c k_{\text {ite }}$ performs an expensive AIG to CNF conversion step.

For $n=10, \mathcal{A c k}$ and $\mathcal{A l b s r e f}$ will instantiate at most $1,572,352 \mathcal{F C C}$ instances, and $\mathcal{D C I}$ at most $1,048,576$ instances. We observed these results:

- STP2 r1659 with $\mathcal{D C I}: 12.1$ seconds, 215MB, asserting 510,013 $\mathcal{F} C C$ instances
- Z3 3.2: 22.4 seconds, 56 MB
- STP2 r1659 with $\mathcal{A c k} k_{\text {cnf }}: 641$ seconds, 1929 MB, asserting 1,572,352 $\mathcal{F C C}$ instances
- STP2 r1659 with $\mathcal{A l b s r e f : ~ 1 4 , 1 0 3 ~ s e c o n d s , ~} 2780 \mathrm{MB}$, with 2050 iterations, asserting $1,572,352 \mathcal{F} C C$ instances
- STP2 r1659 with $\mathcal{A c} k_{i t e}$ : exceeded the 4GB memory limit after 50 seconds.

For $n=10, \mathcal{A l b s r e f}$ asserted 1,047,553 instances all at once after iterating through all select indices.

We did not rerun Boolector 1.5.23 at $n=10$, because it was the slowest on the $n=8$ instance.
$\mathcal{A} c k_{\text {ite }}$ uses about 7 times more memory than $\mathcal{A c} c k_{\text {cnf }}$ even though the number of $\mathcal{F C C}$ instantiated is the same. This is because $\mathcal{A c} k_{\text {ite }}$ bit-blasts to AIGs then undertakes an expensive AIG to CNF conversion phase.

Z3 3.2 uses the least memory of the solvers, and solves problems quickly.
These problems are well suited to $\mathcal{D C I}$ because one side of the $\mathcal{F} C C$ instances is always a constant. In the $n=10$ case, there only 11 clauses and one fresh variable introduced per $\mathcal{F} C C$ instance.

Because most of the $\mathcal{F} C C$ instances are needed to make this problem satisfiable, the $\mathcal{A l s r e f}$ approach asserts all of the $\mathcal{F} C C$ instances. At $n=10, \mathcal{A} c k_{\text {ite }}$, which asserts the same clausal form as $\mathcal{A} b s r e f$, was about 20 times faster. Generally $\mathcal{A l b s r e f}$ performs best when most of the $\mathcal{F} C C$ instances are unnecessary.

As the number of clauses that are asserted to the SAT solver grows, $\mathcal{A l b s r e f}$ performs fewer iterations per second. This is because the search is reset each time the SAT solver finds a model. Instead, if the $\mathcal{A} b s r e f$ implementation asserted clauses to the SAT solver during search (i.e. when the solver is not at decision level 0), like $\mathcal{D C I}$ does, then the number of refinement iterations performed per seconds would increase dramatically.

### 5.6.4 Quadratic Blow-Up of Select-over-Store Elimination

In this section we compare the performance of QF_ABV solvers when solving problems that are specially crafted to quadratically increase in size when select-overstore elimination (section 5.2) is applied. The problems we generate enforce that the index and the result are the same at least at one position of the array. In these problems we arbitrarily fix the index and result's bit-width to 20 .

The problems have a chain of store expressions, with the same number of select expressions reading from them. Note, there is no particular reason for the number of store and selects to be the same. Given a natural number $k$, for $k$ fresh variables $\left(v_{0} \ldots v_{k-1}\right)$ assert:

$$
S=\operatorname{store}\left(\text { store }\left(\ldots \operatorname{store}\left(a^{[20: 20]}, 0,0\right) \ldots, k-2, k-2\right), k-1, k-1\right)
$$

Because there is no array extensionality, here $S$ is a term variable. It allows us to use the same array term in each of the selects:

$$
\forall_{0 \leq i<k}\left(\operatorname{select}\left(S, v_{i}\right)=v_{i}\right)
$$

The problem we have defined is contrived to be difficult for solvers which apply Equation 5.4 to eagerly remove store expressions.

## Example 5.19

For $k=3$, the problem is:

$$
\begin{aligned}
S & =\operatorname{store}\left(\operatorname{store}\left(\operatorname{store}\left(a^{[20: 20]}, 0,0\right), 1,1\right), 2,2\right) \\
v_{0} & =\operatorname{select}\left(S, v_{0}\right) \\
v_{1} & =\operatorname{select}\left(S, v_{1}\right) \\
v_{2} & =\operatorname{select}\left(S, v_{2}\right)
\end{aligned}
$$

When select-over-store elimination is applied to the $k=3$ instance, the problem is transformed into the following constraints:

$$
\begin{aligned}
& v_{0}=\operatorname{ite}\left(v_{0}=2,2, \operatorname{ite}\left(v_{0}=1,1, \operatorname{ite}\left(v_{0}=0,0, \operatorname{select}\left(a, v_{0}\right)\right)\right)\right) \\
& v_{1}=\operatorname{ite}\left(v_{1}=2,2, \operatorname{ite}\left(v_{1}=1,1, \operatorname{ite}\left(v_{1}=0,0, \operatorname{select}\left(a, v_{1}\right)\right)\right)\right) \\
& v_{2}=\operatorname{ite}\left(v_{2}=2,2, \operatorname{ite}\left(v_{2}=1,1, \operatorname{ite}\left(v_{2}=0,0, \operatorname{select}\left(a, v_{2}\right)\right)\right)\right)
\end{aligned}
$$

Consider an instance where $k=300$. Initially there are about 600 array expressions. Applying select-over-store elimination increases this to about 90,000 ITE expressions.

To solve with $k=300$, Sonolar 2217 takes 0.07 seconds, Boolector 1.5 .23 takes 1.7 seconds and uses 3.8 MB of memory, STP2 r1398 takes 11 seconds and uses 1 GB of memory, and Z3 3.1 takes 3900 seconds and uses 740MB of memory.

To solve with $k=3000$, Sonolar 2217 takes 0.7 seconds, Boolector 1.5.23 takes 202 seconds and uses 36 MB , and STP2 r1398 exceeds the 4GB memory limit. We did not re-run Z3 3.1 because it was the slowest at $k=300$.

As can be seen, there is a large variation in the time and memory used by different solvers. At $k=300$, Sonolar 2217 is more than 55,000 times faster than $Z 3$ 3.1.

The quadratic blow-up due to select-over-store elimination badly affects STP2; it is not able to solve the $k=3000$ case.

### 5.6.5 A Comparison with STP 0.1

Ganesh and Dill [GD07] measure STP on 12 benchmarks. All of these benchmarks are contained in STP's public repository. The problems range in size from 8MB to 442 MB , in total they are 1.5 GB . To quantify STP2's improvement we re-ran the measurements on a single core of a Intel Q8400 Linux computer with a memory limit of 5GB. STP 0.1 is the version of STP initially open-sourced; it is downloadable from STP's web site. In Table 5.4 we compare STP 0.1 with STP2 r1656 using $\mathcal{A l b s r e f}$.

STP 0.1 exceeds the 5GB memory limit on one problem, and is faster than STP2 on two problems. Ignoring the memory-out, STP2 is about 7 times faster overall. Using STP2 with $\mathcal{D C I ~ i s ~} 10$ seconds faster using $\mathcal{A l b s r e f}$. Using STP2 with $\mathcal{A} c k$ ite is about 10 seconds slower than using $\mathcal{A} b s r e f$. So again, these problems are insensitive to how the $\mathcal{F} C C$ is enforced. In total, STP2 $\mathcal{A l b s r e f}$ spent about 10 seconds performing SAT solving; the rest of the time is spent parsing and simplifying the problem. So further improvements for these problems will likely come from speeding up the parsing and simplification phases.

| problem | STP 0.1 | STP2 r1656 $\mathcal{A l b s r e f}$ |
| :--- | ---: | ---: |
| $610 \mathrm{dd} 9 \mathrm{dc.T}$ | MO | 11 s |
| grep0084 | 68 s | 4 s |
| grep0095 | 84 s | 4 s |
| grep0106 | 83 s | 4 s |
| grep0117 | 94 s | 4 s |
| grep0777 | 236 s | 25 s |
| testcase15 | 26 s | 15 s |
| testcase16 | 28 s | 17 s |
| testcase20 | 26 s | 42 s |
| thumbnailout-noarg.9872 | 593 s | 49 s |
| thumbnailout-spin1-2.11493 | 1121 s | 94 s |
| thumbnailout-spin1-concreteget | 44 s | 56 s |
| Total | 2403 s | 325 s |

Table 5.4: Time in seconds for STP 0.1 and STP2 r1656 to solve the problems given in Ganesh and Dill [GD07]. 'MO' is memory-out.

### 5.7 Related Work

Because arrays model the behaviour of a computer with memory, and because they are a basic operation in many programming languages, their study has a long history. Ackermann [Ack54] realised that a theory with uninterpreted functions and equality can be reduced to a theory with equality by instantiating the $\mathcal{F} C C$.

### 5.7.1 STP 0.1

Ganesh and Dill [GD07] describe what they call the "array substitution" optimisation. Assume we are given select $(a, c)=t$, where $a$ is an array literal, $c$ is a constant, and $t$ is a term not containing any array terms. The optimisation substitutes the select expression throughout the problem by $t$. As described in section 5.1, STP2 only performs the replacement if $t$ is a constant, which avoids both the expense of traversing the expression $t$ to find any array expressions, and the need to sometimes bit-blast $t$ when it is needed to assert an $\mathcal{F} C C$ instance. The $\mathcal{A} b s r e f$ implementation is not able to bit-blast extra expressions, only to generate $\mathcal{F} C C$ instances. Extra work is needed to measure the cost and benefits of both approaches on a range of benchmarks.

Ganesh and Dill [GD07] describe the select abstraction-refinement algorithm of STP 0.1 which is the same, except for how the $\mathcal{F} C C$ instances are generated,
to Algorithm 5.4. They also describe a type of abstraction-refinement for stores, which is currently disabled in STP2. As seen in subsection 5.6.4, and as described by Ganesh and Dill, select-over-store elimination can quadratically increase the number of expressions. In some cases their approach can avoid this blow-up, but it complicates the implementation of the other algorithms. The store-absref algorithm is not given in the paper, but an implementation is contained in the STP 0.1 source code. Algorithm 5.11 gives the algorithm.

```
Algorithm 5.11 STP 0.1 store abstraction-refinement algorithm
Require: \(e\), a formula
    original a term variable, original \(\leftarrow e\)
    Create now, later, sets of CNF clauses
    Create \(l\), a list of pairs of select expressions, and variables.
    for all select(store(...), \(i) \in e\) do // after a reverse topological sort
        Replace select(store(...), i) in \(e\) by a fresh variable \(v\)
        Add \(\langle v\), select(store(...\(), i)\rangle\) to \(l\)
    end for
    Apply \(\mathcal{A l b s r e f}\) to \(e\), if it is unsatisfiable return unsatisfiable
    if \(\mu\) (original) \(=1\) then
        return satisfiable
    end if
    for all \((v, e) \in l\) do
        if \(\mu(v) \neq \mu(e)\) then
            Add the CNF for \(v=e\) to now
        else
            Add the CNF for \(v=e\) to later
        end if
    end for
    Assert now to the SAT solver
    If the SAT solver reports unsatisfiable, then return unsatisfiable
    Assert later to the SAT solver
    If the SAT solver reports unsatisfiable, then return unsatisfiable
```

The STP 0.1 approach to store-absref differs from other store abstraction approaches, for example Boolector's (described in the next section), in that storeabsref results in at most two extra SAT solver calls. If the abstracted problem can be solved, then we are done, otherwise select-over-store elimination is applied and the problem asserted to the SAT solver.

## Example 5.20

Consider:

$$
\operatorname{select}\left(\operatorname{store}\left(a, t_{0}, t_{1}\right), j\right)=\operatorname{select}\left(\operatorname{store}\left(a, t_{0}, t_{1}\right), k\right)
$$

If select-over-store elimination is applied, this becomes:

$$
\text { ite }\left(t_{0}=j, t_{1}, \operatorname{select}(a, j)\right)=i \text { ite }\left(t_{0}=k, t_{1}, \operatorname{select}(a, k)\right)
$$

Instead store-absref (Algorithm 5.11) asserts the abstracted formula $v_{0}=v_{1}$ to the SAT solver. The original problems is evaluated with the SAT solver's model. If $\mu\left(v_{0}\right) \neq \mu\left(\operatorname{select}\left(\operatorname{store}\left(a, t_{0}, t_{1}\right), j\right)\right)$, then: $v 0=\operatorname{ite}\left(t_{0}=j, t_{1}, \operatorname{select}(a, j)\right)$, is asserted to the SAT solver. This asserts Equation 5.1 which was previously omitted from the problem. Likewise, if $\mu\left(v_{1}\right) \neq \mu\left(\operatorname{select}\left(\operatorname{store}\left(a, t_{0}, t_{1}\right), k\right)\right)$, then $v_{1}=\operatorname{select}\left(\operatorname{store}\left(a, t_{0}, t_{1}\right), k\right)$ is asserted.

STP2 has the capability (inherited from STP 0.1 but currently disabled) to sort stores where the indices can never be equal using a rule:

$$
\text { Given: store(store }(a, i, j), k, l),
$$

if $i \neq k$ and $i$ is less than $k$ in a specified total order, rewrite to: $\operatorname{store}(\operatorname{store}(a, k, l), i, j)$.

That is, if two store indices can never be equal, then order the stores according to some total order on the index expressions. This normalises terms, but potentially increases the number of terms.

## Example 5.21

Consider the formula $v_{0}=\operatorname{select}\left(S, v_{1}\right)$, where $S$ is a syntactic variable

$$
S=\operatorname{store}\left(\operatorname{store}\left(a,\left(t_{1}+2\right), t_{2}\right),\left(t_{1}+4\right), t_{3}\right)
$$

If another formula is created, say, $v_{1}=\operatorname{select}\left(\operatorname{store}\left(S, t_{1}, t_{4}\right), t_{5}\right)$, and if the indices are ordered $t_{1}<\left(t_{1}+2\right)<\left(t_{1}+4\right)$, then, if sorting of indices is performed, the formula will be sorted so that the $t_{1}$ index is the innermost. This requires the creation of three store expressions, rather than just one.

Because of the potential for blowing up the number of terms, we have disabled this feature in STP2.

## CHAPTER 5. BUILDING A BETTER ARRAY SOLVER

### 5.7.2 Boolector

Brummayer and Biere [BB09] describe Boolector's abstraction-refinement algorithm for extensional arrays. Abstraction-refinement as implemented by Boolector [Bru09] is a more sophisticated implementation than STP2's $\mathcal{A l b s r e f}$. In particular, their approach has three features that our $\mathcal{A l s r e f}$ lacks: it handles array extensionality, it does not perform upfront select-over-store elimination or remove array-ITEs, and it encodes array indices to CNF when they are first needed (like STP 0.1 does).

Earlier versions of Boolector implemented an unconstrained simplification for arrays. That is, if there is only a single occurrence of the array variable $a$, then select $(a, t)$ is replaced by a fresh variable. In general STP2 does not implement unconstrained variable simplification for arrays but for problems with few array terms, STP2 applies $\mathcal{A c} k_{\text {ite }}$ to convert array problems to bit-vector problems early on. This has the same effect as performing the unconstrained array simplification. The brummayerbiere3 family, which we omitted from our experiments, are crafted benchmarks that are easy if unconstrained elimination of arrays is implemented.

Rather than remove array-ITEs and stores upfront, like STP2 does, which may quadratically blow-up, Boolector performs abstraction-refinement which asserts that the array theory axiom holds during the refinement phase. Unlike STP2, which asserts just $\mathcal{F C C}$ instances during refinement, Boolector also asserts instances of the array theory axiom (Equation 5.1), and the extensionality axiom (Equation 5.2). Boolector begins by replacing selects terms with fresh variables, then an abstractionrefinement loop checks the $\mathcal{F} C C$, the array axiom, and extensionality axiom.

Brummayer and Biere [BB09] Section 11.6, describe how $\mathcal{F} C C$ instances are encoded; STP2 uses the same CNF encoding which we presented as Algorithm 5.2.

Boolector asserts the CNF corresponding to an index expression when it is first required in an axiom instance. For example, if the select expression $\operatorname{select}(a, t)$ is replaced by a fresh variable $v$, and $t$ appears nowhere else, then $t$ is initially omitted from the CNF. When $t$ is required for an axiom instance, only then is it encoded to CNF. If $t$ is complex, and is not required for an axiom instance, there will be a considerable saving. To simplify our $\mathcal{A l b s r e f}$ algorithm, STP2 encodes all indices to CNF before beginning refinement. The CNF clauses corresponding to all index expressions are asserted to the SAT solver initially, because this makes the $\mathcal{A l b s r e f}$ algorithm simpler.

## Example 5.22

Consider the formula

$$
\operatorname{select}\left(\operatorname{store}\left(\operatorname{store}\left(a^{[n: m]}, t_{0}, t_{1}\right), t_{2}, t_{3}\right), t_{4}\right)=\operatorname{select}\left(\operatorname{store}\left(a^{[n: m]}, t_{5}, t_{6}\right), t_{7}\right)
$$

Initially each select is replaced with a fresh variable, giving: $v_{0}=v_{1}$. In the following we use $\mu$ to give the integer value from the SAT Solver's model. If the model returned by the SAT solver is: $\mu\left(t_{4}\right)=6, \mu\left(t_{2}\right)=6, \mu\left(t_{3}\right)=1$ and $\mu\left(v_{0}\right)=2$, then Equation 5.1 is not satisfied, so Boolector asserts that: $\left(t_{4}=t_{2}\right) \Longrightarrow\left(t_{3}=v_{0}\right)$.

If another model is returned where none of the store indices equals a select index, that is, $\left(\left(\mu\left(t_{4}\right) \neq \mu\left(t_{2}\right)\right) \wedge\left(\mu\left(t_{4}\right) \neq \mu\left(t_{0}\right)\right) \wedge\left(\mu\left(t_{7}\right) \neq \mu\left(t_{5}\right)\right)\right)$, and the result of the selects differs, i.e. $\left(\mu\left(v_{0}\right) \neq \mu\left(v_{1}\right)\right)$, then Boolector asserts: $\left(\left(t_{4} \neq t_{2}\right) \wedge\left(t_{4} \neq t_{0}\right) \wedge\left(t_{7} \neq t_{5}\right)\right) \Longrightarrow$ $\left(v_{0}=v_{1}\right)$

### 5.7.3 BAT

Manolios et al. [MSV06] describe the Bit-level Analysis Tool (BAT), an eager bitvector and extensional array solver. Their insight is that it is practical to enforce array extensionality if the bit-width of indices is reduced-so that arrays can be compared at every possible array index. How many indices the arrays need to have to preserve satisfiability is calculated.

First, they apply the rewrite rules described in section 5.2. Because their problems may contain extensionality, this gives fewer rather than no stores and arrayITEs. Then, they count the number of array accesses for each array and for each array that it is transitively related to. Next, the count is increased to allow arrays to differ at some positions. The indices' bit-width is narrowed so that the cardinality contains at least the necessary number of results. A quadratic number of constraints are asserted, to enforce that the same index maps to the same reduced index.

## Example 5.23

If there are only two selects from an array $a$ : select $\left(a, t_{0}^{[32]}\right)$ and $\operatorname{select}\left(a, t_{1}^{[32]}\right)$, then the indices are narrowed to 1-bit. Two fresh 1-bit variables are created, $v_{0}^{[1]}$ and $v_{1}^{[1]}$, and
these constraints are asserted $\left(t_{0}=t_{1}\right) \Longrightarrow\left(v_{0}=v_{1}\right)$ and $\left(v_{0}=v_{1}\right) \Longrightarrow\left(t_{0}=t_{1}\right)$.

## Example 5.24

Assuming an expressions contains only the following array expressions:

$$
\operatorname{select}\left(a_{0}^{[666]}, t_{0}\right), \operatorname{select}\left(a_{0}^{[6: 6]}, t_{1}\right), \operatorname{select}\left(a_{1}^{[6: 6]}, t_{2}\right), \operatorname{select}\left(a_{1}^{[6: 6]}, t_{3}\right), a_{0}=a_{1}
$$

Then it is always possible to have $(a \neq b)$, because there are $2^{6}$ possible indices for each array, but constraints on only two of those indices per array.

BAT will calculate that is equisatisfiable to the case that uses an index bit-width of 2 bits. Because the result is 6 -bits, the equality $(a=b)$ being 1 implies that all 24 bits (4 locations of 6 bits each) of each array are identical. This is less expensive than asserting that all $2^{6}$ locations are identical.

Reducing the bit-width of indices is useful because BAT compares all the values stored in arrays to determine whether two arrays are equal. Reducing the number of possible indices makes this comparison practical.

### 5.7.4 Other Solvers

Biere and Brummayer [BB08a] describe a lazy solver for all-different constraints over bit-vectors. An all-different constraint enforces that the bit-vectors in a set are all assigned different values. We, like them, have a one-watched literal scheme. In the conclusion of their paper they propose $\mathcal{D C I}$.

Integrating clause propagation with specialised reasoning as we do for $\mathcal{D C I}$ has been done previously in other contexts. Chu Min Li [Li00] describes EqSatz, a SAT solver that handles equalities specially. Equalities (bi-implications) between literals are discovered in the CNF, and propagated by inference rules. One of the inference rule given is: $\left(l_{1} \Leftrightarrow l_{2} \Leftrightarrow l_{3}\right) \wedge\left(l_{1} \Leftrightarrow l_{2} \Leftrightarrow l_{4}\right)$ implies $l_{3} \Leftrightarrow l_{4}$. This equivalence reasoning is performed after unit propagation, and before search-like we do. Likewise, the SAT solver Cryptominisat [SNC09] extracts exclusive-ors from its CNF input, and applies Gaussian elimination to remove variables.

We were introduced to the idea of generating CNF clauses inside the SAT solver by the "Lazy Clause Generation" approach of Ohrimenko et al. [OSC09].

Ganesh et al. [GOSL ${ }^{+}$12] describe a SAT solver programmatic interface which allows the trail to be iterated over, and for clauses to be introduced during search. They call the approach of instantiating clauses as needed "online abstractionrefinement". If such interfaces became more expressive, and common across different SAT solvers, it would reduce the coupling between implementations of $\mathcal{D C I}$ and a particular SAT solver.

Bruttomesso et al. $\left[\mathrm{BCF}^{+} 06\right]$ decide whether to include all the $\mathcal{F C C}$ instances upfront or to instead use interface variables to communicate between theory solvers. They determine which approach is better based on the number of extra equalities introduced.

Nelson and Oppen [NO80] give a congruence closure algorithm for solving problems in the theory of uninterpreted functions with equality. Their congruence closure algorithm is good at reasoning about the effect of nested function calls, for instance, that $f(f(f(a)))=a \wedge f(f(f(f(f(a)))))=a$, implies $f(a)=a$. Nieuwenhuis and Oliveras [NO05] give a congruence closure algorithm that can quickly explain which equalities imply another equality. We expect the nesting of selects to be too shallow in software verification problems to justify their approaches. That is, it is rare to have deeply nested selects like select $\left(\operatorname{select}\left(\operatorname{select}\left(a, t_{0}\right), t_{1}\right), t_{2}\right)$

Stump et al. [SBDL01] give a refutation procedure for the extensional theory of arrays based on congruence closure which does not use a SAT solver. Brummayer and Biere [BB08b] compare Stump et al.'s algorithm against Boolector's abstractionrefinement approach, finding their abstraction-refinement implementation to be hundreds of times faster. It would have been interesting to explore whether SAT based approaches like $\mathcal{A l}$ ck perform much worse than $\mathcal{A} b s r e f$ approaches when solving extensional array problems. We leave this for later work.

### 5.8 Conclusion

We focused on approaches to enforcing the $\mathcal{F} C C$, in effect solving problems in the theory of bit-vectors and uninterpreted functions. We found the SMT-LIB benchmarks we selected to be insensitive to the approach chosen. We believe this
is because few $\mathcal{F} C C$ instances are required. No SMT-LIB instance needed more than $10,000 \mathcal{F} C C$ instances. So, even the result of $\mathcal{A c k}$ was a reasonably sized CNF. Reducing, via $\mathcal{A c k}$, to the theory of bit-vectors has the advantages of being able to use: bit-vector theory simplifications, AIG simplifications, CNF simplifications, and of being able to easily use, by some measures, the fastest available SAT solver (currently Glucose 2.0). Using Glucose gave about 10\% fewer failures than using Minisat 2.2.

Solvers implement abstraction-refinement of selects to avoid necessarily asserting quadratically many $\mathcal{F} C C$ instances. Solvers implement abstraction-refinement of stores, and array-ITEs to reduce the chance of a quadratic blowup in the number of expressions. On the SMT-LIB problems we examined, both approaches are unjustified. However, we demonstrated crafted problems showing the consequences of both blow-ups.

On a crafted benchmark (subsection 5.6.3), more $\mathcal{F} C C$ instances are required, so the differences between $\mathcal{F} C C$ instantiation approaches was apparent. In particular, STP2's $\mathcal{A l} b s r e f$ implementation did not perform well when a high proportion of the $\mathcal{F} C C$ instances were required. Because most of the $\mathcal{F} C C$ instances are necessary, it is 20 times faster just to assert all the $\mathcal{F} C C$ instances upfront, rather than to perform refinement phases that gradually approach the effect of $\mathcal{A c c}$. Boolector performed 250 times more refinement iterations than STP2's $\mathcal{A l b s r e f}$ implementation. STP2, inherited from STP 0.1, has an upper bound on the number of refinement iterations that are performed. $\mathcal{A l s s r e f}$ does not perform more refinement iterations than there are distinct select expressions (Algorithm 5.4). Significant amounts of time can be saved by placing an upper limit on the number of refinement iterations that are performed. This demonstrates that asserting few clauses per iteration does not guarantee good performance.

We demonstrated an additional advantage of $\mathcal{A c k}$ : the ability to easily change the SAT solver. STP with $\mathcal{A c} k_{\text {cnf }}$ and Glucose solved the most SMT-LIB problems. This is a reasonable comparison as we started work on our $\mathcal{D C I}$ implementation before Glucose won the competition. It shows an advantage of being able to easily follow improved SAT solver performance. Our $\mathcal{D C I}$ implementation is tied closely with the SAT solver. At present there is no commonly used "low level" programmatic
interface to SAT solvers. Such an interface would make it easier to move between SAT solvers. The $\mathcal{A c k}$ approaches with Glucose 2.0 solved the most problems.

We showed crafted examples where $\mathcal{D C I}$ significantly outperformed $\mathcal{A c k}$ and $\mathcal{A l b s r e f}$. The larger the number of redundant $\mathcal{F} C C$ instances, the better the relative performance of $\mathcal{D C I}$. On the SMT-LIB2 problems, the $\mathcal{A b s r e f}$ approach solved one more problem than $\mathcal{D C I}$ did. In general, we believe the $\mathcal{D C I}$ approach is superior to $\mathcal{A l b s r e f}$, but at present lack the real-world problems to make a convincing case.

We showed that STP2 is a significant advance on STP 0.1, being about 7 times faster (subsection 5.6.5). Although the other solvers that we compared against allow the theory of extensional arrays, it is a quick syntactic check to identify whether a problem has extensionality. The approaches that we have described could be implemented as a special case for problems without extensionality by those solvers.

STP2's default strategy is to perform $\mathcal{A c} k_{\text {ite }}$ upfront for problems with a small number of array expressions. Currently the limit is ten. This conversion was disabled in all of the experiments in this chapter. This conversion occurs before the bulk of the bit-vector simplifications have occurred, converting the array part of the problem into a bit-vector problem. If array expressions remain, then $\mathcal{D C I}$ is performed. STP2's current strategy is not ideal for the SMT-LIB problems, but works well for the software verifications problems that STP2 is commonly used to solve.

## 6

## Symbolic Execution for Automated Test

## Generation

AUTOMATED software verification and testing increases the confidence that programs behave as intended. Research into the mechanisation of the task began more than 30 ago, but waned as the difficulty of the task became clearer; in particular scalability proved to be elusive. But recent advances in constraint solving technology have rekindled optimism. At the same time, the amount of software that needs to be trusted, in particular binary code, is growing.

In this chapter we describe a tool which we call, for no particular reason, MinkeyRink. MinkeyRink performs automated test generation for binary programs via structural fuzzing of unmodified Linux x86 binaries. It was our experience in building MinkeyRink that motivated our work on the STP2 bit-vector and array solver; we described that work in the first part of this dissertation. As we shall show, MinkeyRink depends greatly on efficient and correct bit-vector and array solving.

MinkeyRink analyses machine code programs, and generates problems which bit-vector and array solvers, like STP2, are ideally suited to solving. In the evaluation (section 6.7) we show that the majority of the time taken by our automatic test generator is spent performing bit-vector and array solving.

```
int32_t x; // A 32-bit variable.
input(x); // Read a value into x.
int32_t y = 4 * x;
if (x !=3 && y == 12)
    print("fail");
else
    print("OK");
```

Figure 6.1: A C-language program that sometimes fails. There are two paths through the program. One path is taken for 3 possible assignments to $x$, the other path is taken otherwise.

### 6.1 Background

Fuzzing is commonly used to discover inputs that cause a program to fail. Fuzzers generate random instances of a language for use as inputs to a program. An oracle then checks if the input causes the program to fail. If it does, then the input is reported to a programmer for investigation.

Although useful, fuzzers are limited because they do not consider the internal structure of a program. If only a small proportion of inputs can reach a part of the program, a fuzzer is unlikely to randomly generate one of those inputs. For instance, in Figure 6.1, just three of the $2^{32}$ possible values for $x$ will cause the program to print "fail". Structural fuzzers overcome this limitation by analysing the program, and using its structure to generate inputs. Variations of the same approach go by many names: smart-fuzzers, glass-box fuzzers, white-box fuzzers, concolic testing, directed automated random testing, and dynamic symbolic execution.

We, like others (section 6.10), use symbolic execution (SE) as the basis for a structural fuzzer. Symbolic execution builds formulae that precisely describe the output state of a program in terms of its inputs. The terminology of symbolic execution (sometimes called symbolic simulation) varies, so let us fix our use of terms. SE makes use of a concept of the state of a computation, where a variable's value has been replaced by an expression that denotes a function of the program's input. We denote the instruction pointer register (which holds the address of the next instruction to execute) by IP. A path is a sequence of instructions. For a given input, a path can be constructed by running the program and listing the successive IP values. (We assume, without loss of generality, that a program has a single entry
point, $I P_{\text {initial }}$, and a single exit point $I P_{\text {final }}$.) But even simple programs can have trillions of paths, so a path-by-path analysis is impractical. When instructions are in loops, or called multiple times in procedures, those instructions will occur multiple times in the path. By a trace we mean a path and the input and output of the system calls that occur along that computation path. A (symbolic) state is a map from locations to expressions, together with the IP, where a location may be a register or a memory address. The expressions that are mapped to are $\mathrm{QF}-\mathrm{ABV}$ expressions which may contain symbolic variables.

Symbolic execution may fork at some conditional branch instructions. Additionally, a state is equipped with a summary of control-flow history: a path constraint (PC) keeps track of the class of inputs that would have caused the same flow of control. A state is feasible if it can occur along some path. It is natural to associate a path constraint with a state, the PC being a constraint over the program input variables. The PC describes the inputs that would take the same path through the program, that is, the inputs for which the associated state is valid. States can be seen to form a state tree, with branching occurring whenever a conditional branch instruction depends on symbolic values. In the state tree, a parent's path constraint is equivalent to the disjunction of its children's path constraints.

For instance, consider again the program (Figure 6.1) which fails if the input is not equal to 3 and if the input multiplied by 4 is 12 . When this program is symbolically executed, a special symbolic variable (as distinct from the program's variables) is associated with the values returned from the input procedure. These symbolic variables, and expressions that contain them, are contained in the symbolic state. The symbolic variable, which we call $s$, is assigned to $x$ in the symbolic state, later when $x$ is multiplied by 4 , the symbolic state of $y$ is updated to contain the expression $4 * s$. With symbolic execution the analysis is simplified to reasoning about just two possible paths-a tremendous speed-up.

Symbolic execution attempts to overcome the state space explosion. By maintaining a state that describes many possible inputs, it is able to tractably reason about many inputs simultaneously. In the example we just saw, billions of possible inputs were split across just two states.

When the program of Figure 6.1 is run on some concrete input, at any control transfer instructions, such as the ( $\mathrm{x}!=3 \& \& \mathrm{y}==12$ ) test, the path constraint is

```
int32_t x;
input(x);
print(12 / x);
```

Figure 6.2: A C-language program that may divide by zero
updated to track which branch was taken. In this example, assuming the condition is 0 , the path constraint is updated, from being empty (that is 1$)$, to $(s \neq 3 \wedge(2 \times s)=12)$.

If the path constraint is negated, then the constraint solver can calculate if there exist inputs which would cause the program to take a different path. For example, given a path constraint of $(s \neq 3 \wedge(4 \times s)=12)$ and a state of $(y=4 * s, x=s)$, the solver may give $\left(y=12, x=\left(2^{30}+3\right)\right)$. In our simple example, through the source code, there are just two paths through the program. One path is taken on three inputs $\left(x=2^{31}+2^{30}+3, x=2^{31}+3\right.$, and $x=2^{30}+3$ ), and the other path on all other inputs. A fuzzer that did not consider the structure would run for a long time before discovering an input to cause a failure. A structural fuzzer, however, can stop after exploring two paths; it has explored all possible paths.

If the multiplication in our example did not overflow, then taking the true branch of the conditional would be impossible. This shows that, in reasoning about such programs, considering overflow, as bit-vector solvers do, is important for ensuring correctness.

The C++ language [ISO12] does not define the semantics of signed overflow, so the semantics of the example program is undefined. That is, a $\mathrm{C}++$ compiler is free to return any value for $4 \times 2^{30}$. So analysing the source code alone is not enough to define the semantics of the compiled program. Without knowing how the compiler translates this program to machine code, it is difficult to say with certainty what this program does. Analysing the binary, as we do, gives a more complete picture of what the program does, because less is hidden.

Structural fuzzers generate and run fewer inputs per second than traditional fuzzers. This is due to the considerable overhead involved in analysing the program and generating new inputs.

The oracle needs to be more sophisticated if the paths through the program are not apparent and depend on the data. A program with a division by zero, such as Figure 6.2, is an example. In this example, there is only one path through the

```
uint32_t x; // Unsigned 32-bit integer
input(x);
int count = 0;
for (int i = 0; i < 32; i++)
    if ((x >> i) & 1)
        count = count + 1;
    print(count);
```

Figure 6.3: Pop count, with the path explosion.
program, but the program will behave differently depending on whether or not $x=0$. Because the division by zero does not execute a separate path to the other inputs, an oracle that tests if a program possibly has a division by zero needs to check the symbolic state at any division to check if the denominator could possibly be zero. In this dissertation we do not consider the very real challenges of building oracles.

Unfortunately, the path explosion problem afflicts symbolic execution. For instance, Figure 6.3 counts the number of 1-bits in $x$. Unfortunately, this has as many paths as inputs, since each input causes a different path to be taken. If we use normal symbolic execution on this program, then when we reach the print() procedure, the symbolic state will only describe one input. This is bad because the advantage of symbolic execution has been lost, that is, we are reasoning about each of the inputs separately. This is the path explosion problem which we address in this chapter.

We investigate a state joining approach to making symbolic execution more practical, and in this chapter we describe the challenges of applying state joining to the analysis of unmodified Linux x86 executables. The results so far are mixed, with good results for some code. We describe an algorithm which, in effect, analyses an equivalent program with only a single path. For example, the state joining algorithm will effectively transform the code in Figure 6.3 into that of Figure 6.4, allowing the code to be quickly analysed.

Although relevant to other uses of symbolic execution, we are interested in state joining in the context of the verification of binary executables. Dealing with the path explosion is the key challenge in making symbolic execution useful for software verification.

By combining the results of symbolic execution along all the program's paths, the behaviour of the program is described. Although the number of paths through

```
uint32_t x; // Unsigned 32-bit integer
input(x);
int count = 0;
for (int i = 0; i < 32; i++)
    count += ((x >> i) & 1);
print(count);
```

Figure 6.4: Pop count, without the path explosion.
a deterministic program is less than the number of its input states, real programs still have intractably many possible paths. In traditional symbolic execution, states are split at control flow instructions, and not subsequently joined, even where the program's control flow merges. State joining addresses the path explosion problem by coalescing states with the same instruction pointer. In the presence of state joining, the state tree becomes a (directed, acyclic) state graph.

MinkeyRink dynamically disassembles a given executable. A natural approach is to use symbolic execution on semi-random input to explore the space of possible run-time states. The instructions executed on each branch are combined together to produce, over time, a disassembled version of the binary.

We use dynamic disassembly to incrementally build a partial Control Flow Graph (CFG). It is partial, because other blocks or transitions might have occurred if the results from system calls were different, or a different length of input was used. We use the partial CFG to determine when states can be joined.
section 6.2 explains why we are interested in analysing binary code, rather than source code programs. section 6.3 gives a more detailed example to clarify our approach. The example shows the advantages of state joining, disregarding the practical difficulties (these are discussed in section 6.9). section 6.4 contains a description of the overall algorithm. Without applying simplifications, the symbolic formulae quickly grow unmanageable. section 6.5 describes how Boolean minimisation can be used to reduce a formula's size. section 6.6 describes the specifics of MinkeyRink. In section 6.7 and section 6.8 we compare runs with and without state joining. Mostly state joining helps, but it does produce more cumbersome constraint expressions, which can slow down the analysis. We discuss additional obstacles in section 6.9 and compare our approach to others that tackle symbolic execution's path explosion problem (section 6.10).

### 6.2 Why Binary Analysis?

Analysing source code, rather than binaries, is the more traditional approach to bug finding. Each approach has its advantages and disadvantages.

Analysing binaries has the advantage that all the components that are used at runtime irrespective of their source language have been converted into a common language, that is machine code, which in most cases has precisely defined semantics. This avoids some of the challenges of analysing source code: of analysing different languages, of obtaining the source code that is used at runtime, of determining how the compiler translated the source code, i.e., which compiler options where used, and of determining which compilers were used.

Source code has the advantage that the type and signedness information is present. From binaries it is not straightforward to determine, for example, which data elements are pointers and which are integers. Defects that are reported by a binary analysis tool are harder for programmers to fix, requiring them to trace back to the source code that produced the machine code.

Source code analysis is not specific to a given binary representation. A defect that is specific to a particular architecture will not be identified by binary analysis on another architecture, for example, if a defect occurs for only one possible memory layout of a procedure's parameters. A source code analysis can consider multiple semantics for the source code, rather than just a single specific semantics. If source code is distributed and users compile the software themselves, it is good to be able to analyse several of the possible binary forms simultaneously, rather than just a particular form.

However, the source code may not be available to analyse. For example, when using closed source components, compilers, libraries, device drivers, and operating systems, a source code analysis will have procedure calls without the corresponding source code. Binary analysis provides the ability to analyse not just to the interface with library functions, or system calls, but right through the operating system kernel to the interaction with hardware.

Large systems are often built using multiple languages. For example, C code may contain inline assembly code, and the interacting components may be difficult to check. Large programs combine components (some closed source) from diverse

```
uint64_t multiply(uint64_t x0,uint64_t y0) {
    uint64_t x = x0, y = y0, z = 0;
    while(x!=0) {
        if (!even (x)) /* (x&1) */
            z += y;
        x = x >> 1;
        y = y << 1;
    }
    return z;
0}
```

Figure 6.5: Using shift-and-add to multiply positive integers $x_{0}$ and $y_{0}$
organisations, and use multiple high level languages. Binaries give a common representation.

### 6.3 State Joining: A More Detailed Example

To provide the intuition of state joining, we will describe the analysis of the Multiply example in Figure 6.5. The program uses shift-and-add to multiply two nonnegative integers $x^{[64]}$ and $y^{[64]}$, leaving the result in $z^{[64]}$. We show C-style code for clarity—all analysis is performed on unmodified executables.

Figure 6.6 shows a control-flow graph for the function, and Figure 6.7 shows the state graph that is produced during joining. For each distinct $x^{[64]}$ value, a Multiply call takes a different path through the function body; these many paths make symbolic execution slow. For 64-bit integers, there are $2^{64}$ distinct paths through the function. To see this, consider the branching on the even condition (block $B_{3}$ ), which checks the rightmost bit of the $x^{[64]}$ variable. On each iteration, the bits of $x^{[64]}$ are shifted one to the right, giving a new rightmost bit. For a given $x_{0}^{[64]}$, the resulting symbolic expression for $z^{[64]}$ is an expression valid for every $y^{[64]}$ value. For example, given $\left(x_{0}^{[64]}=2\right)$, the symbolic expression produced for $z^{[64]}$ is $\left(z^{[64]}=\left(y_{0} \ll 1\right)\right)$, with a PC that simplifies to $\left(x_{0}^{[64]}=2\right)$. The resulting expression for $z^{[64]}$ describes the output state in terms of the input state. Substituting the input values into the formula for $z^{[64]}$ gives exactly the same result as if the function were run with the same inputs.

Symbolic execution addresses the state explosion problem by reasoning about all the states that take a particular path. In this case there are $2^{64}$ paths to symbolically


Figure 6.6: CFG for multiplication code
execute, versus $2^{64} \times 2^{64}$ initial states. We now face the path explosion problem-this function still contains $2^{64}$ paths: too many to reason about one-by-one. With state joining, states with the same IP are joined. This makes good sense for the example, as two paths that have previously differed in whether one bit was one or zero could have identical subsequent paths.

Each time a control transfer instruction is reached, a constraint solver call is made to check which branches can be taken. For example, on reaching block $B_{2}$, in Figure 6.6, we call the solver to check which of the branches to points 2 or 5 can be taken. In this example both branches can be taken, so the state is split, with one branch conjoining the PC with $x=0$, and the other using $x \neq 0$. We may consider the state of the function to have 5 elements: $x^{[64]}, y^{[64]}, z^{[64]}$, the block that will next be executed (the IP), and the PC. Figure 6.7 shows how the states are split and joined. The topmost state in the figure corresponds to having just entered the function, with the IP at program point 1. The second row results from handling the $x^{[64]}=0$ condition. One state corresponds to $x^{[64]}=0$, the other to entering the loop. There are two joins, shown with bold borders. If the states being joined


Figure 6.7: The state graph for the example of Figure 6.6
have the same expression, then the value is unchanged, but if they are different, an ITE constructor (for if-then-else) joins the different expressions. The result of a join holds all the information of the component states. No information is discarded.

Eventually $x^{[64]}$ 's value at the end of $B_{5}$ is $\left(x_{0}^{[64]}>_{l} 64\right)$, so that $x \neq 0$ is unsatisfiable. Then only the branch to point 5 will be taken. When this happens, the final states will be joined together at point 5, producing the complete expression for $z^{[64]}$ in terms of its inputs. The loops are unrolled automatically-no domain knowledge needs to be entered about the number of times to unroll loops. Note that this analysis is "anytime": At any point in time, the analysis can be paused, and information can be read out that correctly describes the program's runtime behaviour.

```
Algorithm 6.1 Applying state joining to a program
Require: A CFG and an initial symbolic state \(s_{0}\)
    Fringe \(\leftarrow\left\{s_{0}\right\}\)
    while some \(s \in\) Fringe has \(i p(s) \neq I P_{\text {final }}\) do
        for all \(\left\{s_{1}, s_{2}\right\} \subseteq\) Fringe such that \(i p\left(s_{1}\right)=i p\left(s_{2}\right)\) do
            Fringe \(\leftarrow\left\{s_{1} \sqcup s_{2}\right\} \cup\left(\right.\) Fringe \(\left.\backslash\left\{s_{1}, s_{2}\right\}\right)\)
        end for
        \(s \leftarrow\) choose(Fringe)
        Fringe \(\leftarrow(\) Fringe \(\backslash\{s\}) \cup \operatorname{execute}(s)\)
    end while
    return \(s\) where \(\{s\}=\) Fringe
```


### 6.4 The Algorithm

The algorithm (Algorithm 6.1) takes a program to analyse, in the form of a CFG, and the initial state comprising the PC 1, the IP IP initial , and symbolic variables for all inputs. The symbolic execution will then produce results covering all possible values of these symbolic variables. Note that, because of the state joining at Line 4, at Line 6, at most one element of Fringe will correspond to $I P_{\text {final }}$. Therefore, at the termination of the algorithm, Fringe will be a singleton set whose element corresponds to $I P_{\text {final }}$. The algorithm presented is somewhat simplified, in that it does not show the decompilation of the program needed because we are analysing binaries. In practice, we interleave decompilation with the symbolic execution.

Algorithm 6.1 relies on three key functions not defined here: execute, choose, and $\sqcup$ (join on states). The execute operation extends the supplied state by executing the instruction at its IP. Note that this will produce more than one state for selection (conditional or computed branch) instructions. The next sections describe the choose and $\sqcup$ functions.

### 6.4.1 Preparing to Join

We want to propagate states, and then stop them where they can be joined. Figure 6.8 shows two fragments of control-flow graphs. On the left, state $m$ 's instruction pointer is two blocks away from a join point, while state $n$ 's instruction pointer is one block away from the same point. We wish to execute $n$ for one block, and $m$ for two blocks. The two states could then be joined. If either state is run too far, past the same-IP point, that chance to join states is lost.


Figure 6.8: CFG fragments. State $m^{\prime}$ s IP is two blocks from a join point. State $n^{\prime}$ s IP is one block away.

For each state, we find all of the descendants of that state in the partial CFG. Because every path finishes at the same exit block, the paths from the states will intersect. For each state we find the minimum distance of that state to its earliest descendant that is common to another state-we call these join points. We find for each state the minimum number of edges that can be traversed before a join point is reached. For each state we now have the minimum distance to its next join point.

Next, we remove from consideration as the next state to run, any state that postdominates another state. A node $o$ post-dominates another node $p$ if all paths from $p$ to the exit node must pass through $o$. Since the post-dominated node $p$ should pass through the dominator 0 , we do not wish to execute the post-dominating state.

The right of Figure 6.8 shows an example where the post-dominance check applies. Each node is zero distance from its earliest join point. That is, if each node is not advanced, another state may be joined with it. In the figure if state $o$ were advanced, it would be moved away from its join with states $q$ and $p$.

We then choose a state to run for as many blocks as it is from its nearest join point. This is not perfect, for instance a control transfer instruction might visit a new block, causing us to transition through and miss a join point. We build the control flow graph from all the control flow transfers that have occurred so far in the simulator, as well as the static jumps-that is, jumps to a fixed location, where that location has already been disassembled. Runtime calculated jumps (such as returns from functions) that we have not yet seen are omitted. When control transfers to a new location, we perform a dynamic disassembly and incorporate the resulting
blocks into the control flow graph. Note that, if the nearest join point is missed, this does not compromise correctness.

### 6.4.2 Joining and Splitting

States are joined, and sometimes split. A join occurs when two states will execute the same instruction next. A split occurs when the location of the next address to execute depends on a symbolic expression. We split in two contexts. Firstly, on reaching a conditional branch instruction whose condition $\varphi$ depends on symbolic values, the current PC is conjoined with $\varphi$, and separately with $\neg \varphi$. If both are satisfiable, the state is split and two states are created.

The second context involves states that have previously been joined. When a function call may return to multiple locations because the return address was joined from states that called the function at different locations, the state needs to be split at the return statement of the function to return the respective states back to their call sites. We create a new state for each distinct next instruction. For example, if the instruction pointer can be 2 or 4 , then we split the state, resulting in two states, one with a PC of $(P C \wedge I P=2)$ and another with a PC of $(P C \wedge I P=4)$.

To join states $s_{1}$ and $s_{2}$ where $s_{k}$ contains $P C_{k}$ and register and memory locations $l o c[i]$, create a new state $s$ with $P C_{s}=P C_{1} \vee P C_{2}$, and for all $i, l o c_{s}[i]=$ $\operatorname{ITE}\left(P C_{1}, l o c_{1}[i], l o c_{2}[i]\right)$. That is, if $P C_{1}$ is 1 , use the value from the first state, otherwise use the value from the second state. This is allowable because PCs are always disjoint.

### 6.5 Simplifications and Approximations

Symbolic execution, even of small programs, can result in large symbolic expressions. This is especially so when analysing machine code. The symbolic expressions that are built are structurally hashed (section 2.6), but even so become large. Every machine instruction which is executed may create a new symbolic expression.

We also apply range and domain analysis of pointers to reduce the number of solver calls. The approximations we use over-approximate the encoding of the constraints, but those constraints describe precisely (without approximation) the behaviour of the program.

## CHAPTER 6. SYMBOLIC EXECUTION FOR AUTOMATED TEST GENERATION

### 6.5.1 Path Constraint Simplification

The Multiply example (Figure 6.5) has a simple branching structure, making its PC easy to simplify. However, programs with more complicated control flow, emanating, for example, from break statements, can benefit from PC simplification. Without simplification, the PC becomes large.

In this section we simplify the PC by abstracting its primitive constraints as propositional variables. We then apply Boolean simplifications to reduce the number of propositional variables in the PC, hopefully making the PC easier to handle for a theory solver. We use $a, b$, and $c$ to refer to propositional variables that describe individual QF_ABV constraints.

Continuing with the Multiply example, consider point 4 in Figure 6.6, where the $(x \neq 0 \wedge$ even $(x))$ state joins with the $(x \neq 0 \wedge \neg$ even $(x))$ state. Letting $a=(x \neq 0)$, and $b=e v e n(x)$, the joined PC becomes $(a \wedge b) \vee(a \wedge \neg b)$. Applying the obvious simplification reduces the joined state's PC to $(x \neq 0)$. Heuristic DNF minimisation tools that apply such rules are available; we use Espresso [Rud86].

State splitting complicates this minimisation. Consider for example a return statement from a function that returns to one of three addresses; perhaps because calls from three different sites were joined. Let the potential new IP addresses be 4, 8 and 12 , let the PC before the split be $P C_{0}$, and let IP be the symbolic expression for the instruction pointer when the transfer occurs. Then after the split there will be three states: $\left(P C_{0} \wedge I P=4\right),\left(P C_{0} \wedge I P=8\right)$ and $\left(P C_{0} \wedge I P=12\right)$. If all three states are later joined, the $P C$ will become $P C_{0} \wedge(I P=4 \vee I P=8 \vee I P=12)$ which should be simplified to $P C_{0}$. A Boolean minimisation algorithm will not do so-because the second disjunction is not obviously true. We can, however, assist the minimisation algorithm by modifying the constraints. Let $a=(I P=4 \vee I P=8)$, and $b=(I P=4)$. Three equivalent $P C$ that can easily be minimised are: $\left(P C_{0} \wedge a \wedge b\right),\left(P C_{0} \wedge a \wedge \neg b\right)$, $\left(P C_{0} \wedge \neg a\right)$.

Removing tautologies from the PC helps simplification. For example, consider a PC that contains both $x=0$, and $\neg(x \neq 0)$. Label them $a$ and $b$. No state will have the PC $(a \wedge \neg b)$ that can simplify the $b$ term. It is safe to remove constraints entailed by other conjoined constraints, these constraints subsume prior constraints-they add no information.


Figure 6.9: Two path constraints

State joining produces many ITE expressions, so we carefully simplify their guards. We calculate four possible guards and, as we describe, use the smallest one. Consider two states to be joined, with $P C_{1}=(\neg a \wedge \neg c)$ and $P C_{2}=((a \wedge \neg c) \vee(a \wedge b \wedge c))$. The PCs are shown on Figure 6.9's unit cube-hollow circles for $P C_{1}$ and filled circles for $P C_{2}$. One possible ITE to use for the joined locations is: $\operatorname{loc}_{\text {new }}[i]=$ $\operatorname{ITE}\left((\neg a \wedge \neg c), l o c_{1}[i], l o c_{2}[i]\right)$. Another reasonable choice for an ITE is to take $P C_{2}$ as the guard, and swap the order of the remaining arguments. However it may be possible to generate a smaller guard by considering that the ITE expression will only be evaluated if $P C_{1} \vee P C_{2}$ is 1 . Inspection of the cube shows that the potentially simpler guard of $a$ is equivalent to $a \wedge \neg c$. $a$ covers only vertices of $P C_{1}$ and those that we don't care about, it covers none of $P C_{2}$ 's vertices.

To minimise the guard, we mark the vertices of $P C_{1}$ as 1 , those of $P C_{2}$ as 0 , and the rest as don't care. Then we minimise using Espresso. Then we swap, marking $P C_{2}$ 's vertices $1, P C_{1}$ 's 0 , and we minimise again.

As guard we choose the expression with the smallest number of nodes in its QF_ABV representation. These are candidate guards: $P C_{1}, P C_{2}$, the restriction of $P C_{1}$ to $P C_{1} \vee P C_{2}$, and the restriction of $P C_{2}$ to $P C_{1} \vee P C_{2}$.

During the constructions of symbolic expressions we follow standard practice and apply rewriting rules to simplify expressions, for example turning $x+0$ into $x$.

### 6.5.2 Value Analysis for Pointers

In the atol() function used by a later example (Figure 6.10), each character is looked up in an array to determine if it is a digit. So there is a lookup: isDigit $[c]$, where isDigit is an array of 256 values indicating whether or not each $c$ is a digit. We analyse such pointer accesses using three techniques: first by analysing the domain

## CHAPTER 6. SYMBOLIC EXECUTION FOR AUTOMATED TEST GENERATION

of the expression; if that fails, by analysing the range of the expression; and if that fails, by solving for each memory address in turn.

The isDigit[c] expression will translate into a symbolic memory access such as (base $+(c \ll 1)$ ), that is, access the memory location the value of the character times two plus some fixed base (the location in memory of the zero index). As this expression contains only a single 8 -bit variable, it can take on only 256 possible values. Because the domain is small, we build an expression using each of the possible concrete values, ignoring the PC.

If the domain of an expression is large, we use a bit-vector interval analysis [War02] of the expression to calculate the interval of the range. We use a signed interval, where the lower bound is smaller than the upper bound. For example, a subtraction expression over two 8-bit values each with a range of [0,1] has a resulting range of $[-1,1]$. We widen to the whole range when an overflow occurs. A signed range allows us to handle an expression such as $\left(2 \times t^{[8]}\right)-1$. If $t^{[8]}$ is in $[0,12]$, then the range of the expression is in the signed interval [-1,23]; to be correct, an unsigned interval would widen terribly to [1,255]. Because of the ubiquity of aligned memory accesses we use strides, allowing regular gaps in the interval, for example a stride of 2 on the interval [ 0,6 ] contains $[0,2,4,6]$. When performing the interval analysis, we likewise ignore the PC. Balakrishnan [Bal07] gives the functions to calculate the precise ranges for logical and arithmetic operations for strided intervals. Navas et al. [NSSS12] use intervals that gracefully handle overflow, maintaining interval precision even in the presence of overflow.

If the domain and range analysis produce too many values, or if some of the values are not addressable, we (inefficiently) solve iteratively for the index subject to the PC. For example, given a symbolic expression index, which first solves to 2 , we then solve index such that index $\neq 2$, and so on. The solvers we use do not have the option to produce all the satisfying assignments to a formula-which we require to check all possible memory addresses. If there are many possible memory addresses, then this operation will be slow.

### 6.6 MinkeyRink's Implementation

We analyse Linux x86 executables, both compiled binaries and interpreted scripts. Given an executable, and an input width, we dynamically disassemble the executable as it runs on a random input of specified length. The result of this dynamic disassembly is a trace through the program. MinkeyRink then symbolically executes using the control flow graph derived from the trace. When a control transfer instruction is reached which depends on a symbolic variable, the bit-vector/array solver is called and the state split if necessary. When symbolic execution reaches an address that has not already been disassembled, a new dynamic disassembly is triggered. The resulting trace is merged with previous disassembled traces to build a more precise control flow graph. We follow Minato [Min96] and check the satisfiability of each guard. A loop is unrolled until its guard evaluates to 0 .

We use the Valgrind framework [NS07] to disassemble. Valgrind is a widely used dynamic instrumentation framework for analysing x86 and PowerPC executables on the Linux and AIX operating systems. We use Valgrind to identify instructions, and also because it translates x86 instructions into a more manageable RISC language. There are three advantages of dynamic disassembly over static disassembly. First, during disassembly the results from system calls are collected; these are later played back to the simulated program. Second, we disassemble the dynamically linked libraries no differently from the program's instructions. Third, interpreted scripts, and their interpreters, can be analysed. Our dynamic disassembly is time consuming, others [NLLC06] have used an initial static disassembly to save time.

The PC and symbolic expressions are stored inside MinkeyRink as Valgrind expression graphs, and converted to the QF_ABV language for solving. We do not utilise the solver's custom interface. Instead we generate standard SMT-LIB format and communicate via temporary files. Using the standard interface requires the solver to redo work, but allows easy experimentation with different solvers.

We have built a simulator of Valgrind's instructions, which executes the instructions symbolically and concretely. It simulates many, but not all, of Valgrind's instructions. Currently, MinkeyRink does not handle multi-threaded programs, asynchronous signals, or floating point instructions.

```
int main(void) {
    char s[9];
    long number;
    printf("Enter a Number:");
    fgets(s, sizeof(s), stdin);
    number = atol(s);
    if (number == 12345678) {
        fail();
    }
    return 0;
```

Figure 6.10: Number: Error on input 12345678

```
uint64_t popCount (uint64_t y) {
    uint64_t c;
    for (c = 0; y; c++)
        y &= y-1;
    return c;
}
```

Figure 6.11: Wegner: Counting 1-bits

The program's state is modified by Linux system calls. We have symbolic versions of some system calls only, the remainder we replay from the traces captured during dynamic disassembly.

### 6.7 Results

To explore the usefulness of the state joining approach we analyse four programs. One is a simple C program that will fail if a particular integer is input (Figure 6.10). In this example, the function describing the result of atol() is surprisingly complicated: Given 9 characters of input, there are about $10^{20}$ different inputs that will evaluate to 1 , corresponding to white space followed by an optional plus sign, followed by 1 , followed by non-numeric characters. The second is a program that counts the number of 1-bits in a number (Figure 6.11). The third is the Multiply example of Figure 6.5. The fourth example is the gzip file compression program.

In the Number example, an input that causes the program to fail is derived. For the remaining three programs the input-output function is derived. For the Multiply

|  | Problem |  |  |  |
| :---: | :---: | :---: | :---: | :---: |
| Category | Number | Gzip | Wegner | Multiply |
| Bytes input | 9 | 1 | 8 | 16 |
| Dynamic Disassemblies | $9-16$ | 1 | 1 | 1 |
| With joining |  |  |  |  |
| Total time | 126 s | $>30000 \mathrm{~s}$ | 33 s | 45 s |
| Solver time (STP r60) | 90 s | $>26973 \mathrm{~s}$ | 28 s | 11 s |
| Solver calls | 110 | $>684$ | 67 | 131 |
| Boolean simplification | 4.5 s | $>103 \mathrm{~s}$ | 3 s | 5.5 s |
| Joins | 60 | $>510$ | 64 | 127 |
| Maximum height | 94 | $>500$ | 67 | 131 |
| Maximum width | 5 | 2 | 2 | 3 |
| Without joining |  |  |  |  |
| Total time | 732 s | 4006 s | 35 s | - |
| Solver time | 555 s | 3312 s | 25 s | - |
| Solver calls | 6929 | 34944 | 66 | - |
| Boolean simplification | 82 s | 52 s | 3 s | - |
| Paths for Symbolic Exec. | 1662 | 256 | 65 | $2^{64}$ |

Table 6.1: Results of applying state joining. Running with STP revision 60. Gzip with state joining timed out after 30,000 seconds.
example, this could be used to show the commutativity of multiplication. For the Wegner example [Weg60], this could be used to determine that the return value is always less than 65. For gzip, this could be used as the first part of establishing that gunzip composed with gzip is the identity function for some size inputs.

Table 6.1 shows the results of running these four programs. Some of the rows are: Dynamic Disassemblies: the number of calls to the dynamic disassembler (the value varies depending on the initial random input chosen); Joins: the number of pairs of states that were joined; Maximum Width: the greatest number of active states at any time (the maximum width of the state graph); Maximum Height: the longest path through the state graph.

We made three runs of each analysis; the times shown in Table 6.1 are arithmetic averages. Times were measured on a single core of a Pentium D 3GHz, running Ubuntu 8.04. We use revision 60 of STP, which we found to be the fastest available solver at the time.

A program that generates all one byte files, then gzips them, then builds an input-output function takes 7 seconds to run. To produce the equivalent formula

## CHAPTER 6. SYMBOLIC EXECUTION FOR AUTOMATED TEST GENERATION

by symbolic execution takes 4006 seconds, and to produce the same formula by state joining timed out after 30,000 seconds.

Gzip, when symbolically executed, produces singleton states-each input follows a different path. So symbolic execution has no advantage over dynamic analysis. The PC that state joining produces is more difficult to solve, overwhelming the savings from merging.

For the Wegner example, the number of paths is equal to the bitwidth plus one; in our example, 65 paths. State joining on this example joins just before the exit-the same as symbolic execution. The Wegner example benefits neither from the Boolean simplifications nor from the pointer value analysis. The Boolean simplifications occur at the join point at the function's return, when the PC is no longer used. The pointer value analysis, as for the Multiply example, does not help because the analysis produces no symbolic pointers.

The Multiply example has a simple structure, with control transfer instructions that join back on each other-producing constraints that easily cancel out. With Boolean simplifications disabled, the example takes more than 50 times longer to run. It is well suited to state joining. Note that we pass the function symbolic variables, not characters turned into numbers. Parsing the numbers would require effort comparable to that in the Number example.

### 6.8 MinkeyRink with STP2

When we performed the evaluation of MinkeyRink, before we began work on STP2; STP r60 was the best available bit-vector and array solver for the problems we generated. This is a version of STP prior to the improvements described earlier in this dissertation. In this section, we re-run some of the same test problems shown in Table 6.1 with STP2 r1654. The results in Table 6.2 show the time spent solving on a single core of a Pentium D 3GHz, running Ubuntu 9.04. The table gives just the time spent in STP2. Again, we did not attempt to run the Multiply example without state joining because of the very large number of paths.

We have not been able to get MinkeyRink analysing gzip reliably on Ubuntu 9.04. Since we did the initial work, vector instructions have been compiled into the standard libraries used by gzip, which MinkeyRink does not currently analyse.

| Problem | Solver calls | STP r60 | STP r1654 |
| :---: | :---: | :---: | :---: |
| With joining |  |  |  |
| Number | 179 | 101 s | 39 s |
| Wegner | 67 | 30 s | 14 s |
| Multiply | 131 | 12 s | 17 s |
| Gzip | - | - | - |
| Without joining |  |  |  |
| Number | 3697 | 375 s | 200 s |
| Wegner | 66 | 27 s | 13 s |
| Multiply | $2^{64}$ | - | - |
| Gzip | 34944 | 2031 s | 2878 s |

Table 6.2: Results of applying state joining, comparing STP r60 and STP r1654.

However, we stored the longest running QF_ABV problem we encountered during the evaluation (Table 6.1). STP r60 takes 2862 seconds to solve this problem. STP2 r1661 solves the same problem on the same computer in just 26 seconds.

The results in Table 6.2 show only a modest improvement for STP2 versus STP. Many of the problems are small and easy. The more advanced simplifications that STP2 performs have an overhead that is not justified for such easy problems.

### 6.9 Complications

There are some practical complications with state joining for executables. Linux has hundreds of system calls that can modify the program's state. MinkeyRink has symbolic versions of the semantics for only a few system calls; for the remainder we replay the system call's results which Valgrind captured during the dynamic disassembly. Before replaying the result of a system call we check that the parameters are the same. Our assumption is that system calls that are called in the same order with the same input will produce the same results. This limits further the strength of the guarantee we extract. For example, if a program would behave differently on different dates, we would not discover this, as the result of the system call that returns the date is not made symbolic.

Because we replay traces of the system calls, if inputs cause the program to make different sequences of system calls, the analysis will not have the appropriate system call to replay. One solution may be to split the state whenever the sequence of system calls changes, but we do not do this yet.

```
if (a>0)
    p = malloc(100000);
else
    p = malloc(0);
*p = 0;
```

Figure 6.12: A complication for state joining

If two states have different system call traces then they might not be appropriate to join. At present we cannot analyse the program shown in Figure 6.12, as the memory assigned on each branch is different. If we allowed the memory mapping system calls to vary on branches then one state would have memory allocated that the other did not. States cannot be joined just when their next instructions are the same. The system calls to the operating system that have been performed, such as allocating memory, and opening files need to be checked when joining states. We only join states that have performed system calls we know are safe. For instance, we do not join states that have different files open, or different memory allocated.

Using dynamic disassembly, we cannot visit locations that are not reached at runtime, so we cannot analyse error handling code unless the error occurs at runtime. For example, if we wish to insert error return values from system calls, such as a file read failing, then we need to disassemble the error handling code. Currently we cannot introduce failures when performing a disassembly, so we cannot explore the error handling code.

Over-zealous joining is detrimental. For small functions, joining is undesirable, as the cost of joining/splitting overwhelms the saving. If a function is called from different sites and contains few instructions, the joined states will run for a few instructions before splitting when the function returns to the different call sites. We need heuristics to decide when to join.

Another limitation is that symbolic execution generally operates on a fixed input width, say 20 bits. Depending on the program structure, greater input lengths may be required to cause a particular failure. Symbolic execution builds expressions that describe the program for some fixed length input. For some inputs this may be equivalent to checking each smaller length input, but usually not.

When choosing what to join, we do not consider the call site. Consider a function which is called at the start and end of a program. Our analysis does not take the call site into consideration, so it believes that calling the function could return to either the start or end of the program. Our analysis is context insensitive: it does not determine that the return address on the stack is the particular address that will be returned to.

### 6.10 Related Work

There has been a tremendous amount of software verification research. In this section we focus just on closely related work. Bounded model checking and abstract interpretation are the two most popular competing alternatives to symbolic execution.

The idea of symbolic execution (MinkeyRink's conceptual basis), is usually attributed to King [Kin76], but others such as Howden [How77] and Clarke [Cla76] were working on the approach at the same time.

From the early 1980s until about 2000, little research was conducted into the symbolic execution of software. By 2000, enough had changed to make symbolic execution of software practical. First, owing to the rise of the internet, we saw more security critical programs; many more programs were exposed to untrusted inputs, for instance web browsers, document readers, and media players. Second, the size of the programs we wish to analyse has grown slower than the speed of computers. Now, we are interested primarily in the parts of the programs that read and verify the structure of untrusted input. Third, Boolean satisfiability solvers are faster than ever.

The most obvious change to automated test generation (ATG) systems over the last 30 years has been that the constraint solving engines available to ATG systems have become more powerful. SELECT [Kin76] solved problems using a linear and conjugate gradient solver. Clarke analysed FORTRAN using linear solver, while EFFIGY used a polynomial solver. Systems like EXE [CGP ${ }^{+} 06$ ] use SMT bit-vector and array solvers.

SE tools need to mark some values as symbolic. This is typically done by marking input from the user, file or network as symbolic variables. However, it

## CHAPTER 6. SYMBOLIC EXECUTION FOR AUTOMATED TEST GENERATION

could be any values, even the intermediate values in computations. In our work we mark input as symbolic. Godefroid et al. [GKL08], instead of marking the input as symbolic, mark the result of the lexer as symbolic. Marking the lexer's result as symbolic avoids redundantly generating input that the lexer considers to be identical, for example multiple white space characters. Wherever in the program the symbolic variables are introduced-the tools and theory are the same.

### 6.10.1 Tools

Currie et al. [CHR00] use SE to show equivalence between optimised and unoptimised sections of assembly code for a digital signal processor.

Larson and Austin [LA03] track the intervals that variables may range over in a symbolic state. They use the analysis to find accesses to out of bounds memory. Their tool does not reason precisely about overflow. Cadar and Engler [CE05] describe EGT, a structural fuzzer which uses source-to-source transformation, and the CVCL SMT solver. Sen et al. [SMA05] describe CUTE, a tool which operates on the source code representation and uses linear constraints in the symbolic state.

EXE [CGP ${ }^{+} 06$ ], and its successor Klee [CDE08], starts from an actual execution, then negates each constraint of the path constraint one-by-one. A new execution path is generated that takes the same path up until the inverted constraint, but afterwards it takes a different branch. Paths that will visit previously un-reached locations are prioritised. EXE, with its aim of visiting previously un-reached statements, has heuristics that produce statement coverage. These heuristics are much simpler than attempting to derive input that reaches a particular instruction. In practice they achieve good results.

SAGE [GLM08] likewise negates constraints one-by-one and solves them, but prioritises the PCs that visited new states. The SAGE system starts from the first constraint in the PC, and negates it. If the prior constraints conjoined with the negated constraint is satisfiable, then a new input has been found. The program is run dynamically with that new input. Inputs are scored based on the number of new blocks they reach. The PC of inputs that discover more new blocks are scheduled before those that discover fewer.

EXE and SAGE are both defect-finding tools; they do not exhaustively check each path. Their heuristic of preferring paths that visit previously unvisited statements
prevents the path search from unrolling the same loop indefinitely. Neither tool performs a directed search for problems. Instead, both tools happen across defects as they explore paths.

Other approaches have been developed to target particular parts of a program for execution, for instance Ma et al. [MPFH11]. Several other structural fuzzers for binaries have been developed, for instance OSMOSE [BH11] and Avalanche [IS10].

### 6.10.2 Symbolic Memory Accesses

Symbolic memory accesses are generally handled in one of three ways: by concretisation, exhaustion, or abstraction. Concretisation dramatically increases the number of paths through the program. Handling the accesses exhaustively may require large arrays to be included in the formula. For example, if a symbolic index could take on $2^{12}$ values then all $2^{12}$ values need to be encoded in the formula. Abstraction of arrays is often performed by first setting the result of the memory access to an unconstrained value, that is, any possible value. The precise expression is included only if needed.

SAGE concretises symbolic index expressions. Coen-Porisini and De Paoli [CPdP93] employ the exhaustive approach, associating with each symbolic variable a set of symbolic values and the predicates for when the symbolic value applied. They provide a denotational semantics for the approach, but not a description of how to compress the constraint. King [Kin76] forks execution on each constraint value, in effect checking concretisations one by one, which we found to be too slow.

EXE divides memory into regions; this allows the solver to work on different distinct parts of memory separately. It determines which region the concrete pointer indexes into, and constrains the symbolic value to be inside this region. This is similar to solving the bounds to establish the range of the pointer.

The abstraction approach to handling symbolic index expressions is used in a different setting by Engler and Dunbar [ED07]. Under-constrained symbolic variables are variables that do not have all the appropriate constraints applied to them. If an under-constrained variable causes a failure, an appropriate constraint (such as that the denominator is not zero) is applied to it, and execution continues if the constraint system is satisfiable. If the constraint is unsatisfiable then the failure must occur. This is unsound, but useful in their setting, for testing subsystems.

## CHAPTER 6. SYMBOLIC EXECUTION FOR AUTOMATED TEST GENERATION

### 6.10.3 Handing the Path Explosion

The path explosion problem of symbolic execution has been addressed by others. Kölbl and Pixley [KP05] investigate state joining of programs written in a subset of C++, and describe it well. The principal difference with our work is that we focus on analysing arbitrary binaries which can use dynamic memory and pointer arithmetic.

Boonstoppel et al. [BCE08] discard states that differ from other states only in locations that will not later be read. Consider a conditional output statement such as if (guard) \{printf(' 'value' '); \}. If both branches are taken and outputs are ignored, the states do not differ. One can be discarded, as the remainder of their paths will be the same. This approach requires the calculation of which locations will be read and written to in the remainder of the path. Deciding this statically for machine code is more difficult owing to the more complicated control flow transfers. Our approach is more general-allowing the joining of states that differ in variables, while not requiring the calculation of which variables may be read from or written to later. The calculation does allow discarding of symbolic expressions that will not be used later-a good way to conserve memory.

Kuznetsov et al. [KKBC12] estimate the effect of merging states based on how symbolic variables are later used. They show promising results.

### 6.10.4 Other

To automatically vectorise loops during compilation, Allen et al. [AKPW83] use Boolean simplification and if-conversion to remove if-then-else statements. A statement such as $\operatorname{if}(g)\{y=x\}$ else $\{y=z\}$ is converted into:
$\mathrm{y}=(\mathrm{x} \&$ guard | y \& guard), where guard is the sign extension of $g$ to the bitwidth of $y$, where $g$ is considered to be a 1-bit variable. We suspect that compilers, like GCC, which implement if-conversion will produce binaries that MinkeyRink can analyse more efficiently, because fewer states are split.

Arons et al. $\left[\mathrm{AEO}^{+} 08\right]$ fork execution at each branch, adding both paths to a list of paths to explore. If paths are stopped at the same location, they are merged. The merging is performed like we do, effectively creating an if-then-else where the guard is the respective path-constraint. The merging is sensitive to the order of
path exploration; only paths currently stopped at that location are merged. The approach, while good, relies on users choosing good merge points. The approach that we develop is completely automatic.

The Calysto tool [BH08] also merges paths, but only unrolls loops once. Calysto symbolically executes the program's functions, so each function is only analysed once. Calysto then inlines each function at call sites. In our case, unrolling the loop once only causes an unacceptable loss in precision, probably because we analyse at a binary level where each iteration does less. Calysto, on the other hand, achieved good results with the once-only strategy.

Symbolic execution has also been used to show the equivalence of programs [Min96]. Minato's system [Min96] handles conditional branches (if-then-else) and data dependent loops (while-end) using BDDs. Minato demonstrates this on Euclid's GCD algorithm, producing a BDD that encodes the function for up to two 10-bit inputs. Conditional branches are transformed statically to the equivalent ITE functions. So if (a) then $\{b=c\}$ else $\{b=d\}$ becomes $b \leftarrow I T E(a, c, d)$. Loops are handled similarly: new ITE expressions are introduced until the BDD that encodes the guard is unsatisfiable. So after two passes, the loop while(a) $\{b=c\}$ is transformed to $b \leftarrow \operatorname{ITE}\left(a_{2} \wedge a_{1}, c_{2}, \operatorname{ITE}\left(a_{1}, c_{1}, c_{0}\right)\right)$, where the subscripts refer to the value of the variable at that iteration.

Godefroid [God07] describes a compositional analysis. When a function is called, the changes that the function makes are recorded, and the PCs added by the function are recorded. Later if the function is called again, the PC is checked for satisfiability; if it is satisfiable, then the results from the prior function call are written into the state. If the PC is unsatisfiable, then the function is executed, and the PC and results stored. With enough calls, the disjunction of PCs from the executions will cover any possible input. This system differs from Calysto in that the results of functions are composed run-by-run. The stated advantage of this demand-driven approach is that extra unfeasible paths are not summarised. With machine code, this approach is more difficult to apply. There could be differences between the calling contexts of a function that are not obvious, especially when analysing machine code, where the operands of functions are not explicitly indicated.

Clarke et al. [EC03] unroll loops and install an unwinding assertion. Loops are unrolled, up until a limit, until the unwinding assertion is necessarily 0 . They apply

```
if(e)
{
    I();
    if (e)
    {
        I();
        if (e)
        {
            I();
            assert(!e);
        }
    }
}
```

Figure 6.13: Example of unrolling 3 times. The loop is unrolled until the unwinding assertion, shown here as a n "assert" is necessarily false, or until an unwinding limit is reached.
source transformations to reduce the control transfer instructions to just goto and if. For example, unrolling while(e) \{I();\} three times produces Figure 6.13.

Xie and Aiken [XA05] unroll loops using BDDs to associate guards with each statement. They contribute a programming language semantics that details formally how to translate operations into updates of the guards and states. One state subsumes another if its state is a superset of the other state. For example, two paths through a program that differ in what they output could be joined if the output makes no difference to the state.

Boonstoppel et al. [BCE08] perform a liveness analysis on symbolic expressions, pruning those that are unreachable from the path constraint, and then merging paths. Their insight is that multiple paths often produce the same effect, either if there are no side effects from branches, or after the side effects have operated. For example, at the end of a block, one path may have the constraint that $\{c \neq 0\}$ and the other that $\{c=0\}$. If $c$ is dead, however, this distinction is immaterial. Note that the properties need to be checked before merging, in case the path difference triggered a fault.

### 6.11 Conclusion

State joining as we have implemented it has varying performance. The performance of the approach depends on the difficulty of solving the generated constraints. On
the gzip example, the constraints became so expensive to solve that state joining was slower than both exhaustively executing the program and symbolically executing it. In particular, gzip produced many symbolic memory indexes which slowed down STP r60.

However, as SMT bit-vector and array solvers become more sophisticated, approaches like the one we have described will become more practical. Over time we hope that the underlying solvers will increase in sophistication, removing the need for tools to carefully manipulate the problems they need solved.

Incidental to deriving the program's input-output function, we extract an accurate (partial) CFG from the binary code. The approach generates a safe underapproximation of the CFG using a flow sensitive analysis. A corresponding upperbound of the CFG can be produced by abstract interpretation [KVZ09].

State joining is useful if the following conditions apply: the paths call a similar sequence of system calls, the number of paths through the program is large, and memory is rarely written to at symbolic locations.

Three improvements to our implementation are apparent. First, it is common around loops for later constraints to imply earlier ones. It is not apparent to a propositional simplifier that (from the Multiply example): $\left(x_{0} \gg / 2\right) \neq 0$ implies $\left(x_{0}>_{l} 1\right) \neq 0$. Removing earlier constraints when they are implied by later constraints is desirable because it reduces the redundancy. Second, and related, is that with our current simplification scheme based on propositional variables, the performance of the analysis is dependent on whether the constraints simplify during joining. If the joined PC can be simplified, as in the Multiply example, performance is good. However, slight syntactic changes to conditionals can dramatically increase running time. For instance, changing the Multiply example slightly so that the diamond shape is lost, causes analysis to take more than 100 times longer. Using the solver's native interface to maintain the state's PC would reduce the amount of work the solver needs to perform. Third, the use of a generalised memoization (compositional) would reduce the amount of re-work performed. Currently we reprocess functions repeatedly rather than reusing the prior work that was performed.

Normal symbolic execution of binaries allows arbitrary properties about the input-output function of programs to be verified, but the technique works poorly on programs that have many paths through them. We have investigated how state
joining may help. So far we have a number of promising results for analysing unmodified executables, as well as examples that do not benefit. However, this is an area full of opportunities for future improvement.

## 7

## Conclusion

THIS thesis has investigated building bit-vector and array solvers, and their application to analysing machine code programs. Bit-vector and array solvers are widely used to answer questions about the behaviour of software. Analysing machine code programs is a way to automatically discover lowlevel defects, like division by zero.

In our investigation of bit-vector solvers we largely focussed on simplificationsways to make bit-vector problems easier to solve. We now summarise our contributions to simplification.

Our variable elimination approach (section 3.4) is a conceptually simple way of repeatedly isolating variables on one side of an equality. It is more general than other approaches because, for instance, it also eliminates variables that occur in bit-vector xors.

Bit-blasting equivalence checking (section 3.8) transfers equivalences detected by and-inverter graphs (AIGs) back to the bit-vector theory-level. The equivalences that are deduced during bit-blasting are used to further simplify the bit-vector problem.

A new approach to discovering equivalences (subsection 3.11.1) provides a way for authors of bit-vector solvers to discover new and interesting rewrite rules. Automatically discovering equivalences makes it less likely that useful rules will be missed. It helped us find rules that we would otherwise not have discovered.

Theory-level bit propagation (chapter 4) simplifies bit-vector problems by deducing the value of some bits at the bit-vector level, rather than at the propositional
level. For each operation in QF_BV we built a propagator which transferred bit information between its operands and result. We found that Z3, when enhanced by this approach, answered $10 \%$ more of the test problems.

Comparing each propagator against the result of the corresponding optimal propagator (section 4.8) showed that the implementation of several of the propagators was optimal on all the assignments generated. Likewise, on all the assignments generated, unit propagation over the CNF of bit-vector xor, bit-vector or, bit-vector and, and equals were optimal. If building a combined SAT and propagation solver, it would make sense to encode those operations as CNF rather than using propagators for them because their CNF encodings are compact and propagate strongly. For multiplication we applied a novel technique which we called column bounds propagation, which subsumes other more simple propagators.

A common reason not to build optimal propagators is that the advantage of the extra precision is outweighed by the extra time it takes to obtain the precision. Measuring the running time of the propagators (section 4.10) showed that generally the implementations are efficient. However, the implementation of the 6-bit optimal multiplication propagator (section 4.9) was too slow to be useful. We found it useful to calculate the effect of the optimal propagator without undertaking the effort to build the propagators. We determined that theory level bit propagation would provide a benefit before having invested the effort to build the propagators.

The $\mathcal{D C I}$ array solver (chapter 5) includes a lazy approach to clause generation. Compared to an abstraction-refinement solver, it asserts clauses to the SAT solver sooner after they are required. The $\mathcal{D C I}$ solver works particularly well on problems that require many abstraction-refinement iterations. However, the disadvantage of the $\mathcal{D C I}$ approach is that the implementation is tied closely to a particular SAT solver, in our case Minisat 2.2. Implementing the approach in a more modern SAT solver is complicated because of the complex invariants that need to be maintained. For instance, Lingeling ([Bie12]) maintains several data structures relating to the clauses that need to be incrementally updated as extra clauses are added asserted during the search. Other approaches (like $\mathcal{A c k}$ ) have better modularity, making it easy to use whichever SAT solver is the best or most appropriate at a given time.

The $\mathcal{D C I}$ array solver was slower on the evaluation problems than another much simpler array solver ( $\mathcal{A c k}$ ). On test problems STP was about 7 times faster than a prior STP version (subsection 5.6.5).

A goal of some abstraction-refinement array solvers is to instantiate a small number of function congruence constraints each iteration. In the worst case these solvers perform $O\left(k^{2}\right)$ iterations, where $k$ is the number of array selects. STP 0.1 has a lower upper limit of $k$, asserting all of the function congruence constraints when the limit is reached-this can save considerable time.

MinkeyRink (chapter 6) is a tool for analysing binary programs. MinkeyRink joins and splits symbolic states to overcome the path-explosion problem of symbolic execution. Joining states means that an exponential blow-up in the number of states can sometimes be avoided. Our implementation was sensitive to the order in which states were joined. If states were joined so that the path constraint simplified nicely then it was effective. The tool could be improved by taking more care when choosing which states to run so that the path constraint simplified more often.

Some of the research we have presented, we believe, merits further research.
Technology mapping was the most effective of the solver's phases. However, the comparison we performed compared technology mapping to the Tseitin transformation. A more relevant comparison, which we leave for future work, is to compare Technology Mapping to more modern encodings like the Plaisted and Greenbaum translation. Further, we performed parameter optimisation to amongst other things, select good bit-blasted encodings for technology mapping. To make the comparison fairer, all the parameters should be returned.

The encoding of multiplication using sorting networks seems like it should be effective. Future work is required to understand exactly why it is not helpful.

The algorithm we gave for generating rewrite rules may generate more applicable rewrite rules if constants are not generated on the left-hand side. Further, it might be possible to improve the efficiency of the search for new rewrite rules by better caching calculations.

A recurring theme in this thesis has been that different problems solve faster with different simplifications enabled. The light-weight solver 4Simp (section 3.22) outperformed STP2 on our evaluation problems. However, because 4Simp omits standard simplifications it is not able to quickly answer many problems that STP2
finds easy. For instance, problems that require associative reasoning, such as $a \times$ $(b \times c)=(a \times b) \times c$, are solved much faster by STP2.

For easy problems derived from our software verification tool (MinkeyRink), STP 60 was overall faster than STP2. STP2's cost of performing the extra simplifications is not justified for such easy problems.

Selecting a universally good set of parameters for a solver is difficult, as has been observed by others ([dMP12]). For each simplification we have shown, it is easy to craft problems which are very slow with that simplification disabled. However, enabling simplifications has a cost which sometimes overwhelms the benefit-as we showed with our 4 Simp solver.

In this thesis we have focussed primarily on solving isolated problems. In practice, tools like our MinkeyRink analysis tool produce a sequence of related QF_ABV problems. A promising avenue of future research is to adapt the simplifications to apply to sequences of problems dynamically. Such an approach might apply parameter optimisation to a sample from the sequence of QF_ABV problems it receives and adapt appropriately.

The improved performance of bit-vector and array solvers has enabled tools to reason more precisely and efficiently about the effect of programs. The faster solvers become, the more useful the tools are. There are many promising improvements to apply to solvers and tools, with hard work, these will help millions of programmers discover defects in their software.

## Bibliography

[Ach07] Tobias Achterberg. Constraint Integer Programming. PhD thesis, Technische Universität Berlin, 2007.
[ACHB11] Thanassis Avgerinos, Sang Kil Cha, Brent Lim Tze Hao, and David Brumley. AEG: Automatic exploit generation. In Network and Distributed System Security Symposium, February 2011.
[Ack54] W. Ackermann. Solvable Cases of the Decision Problem. North-Holland Publishing Company, 1954.
[AEO $\left.{ }^{+} 08\right]$ Tamarah Arons, Elad Elster, Shlomit Ozer, Jonathan Shalev, and Eli Singerman. Efficient symbolic simulation of low level software. In DATE '08: Proceedings of the Conference on Design, Automation and Test in Europe, pages 825-830, 3001 Leuven, Belgium, 2008. European Design and Automation Association.
[AKPW83] J. R. Allen, Ken Kennedy, Carrie Porterfield, and Joe Warren. Conversion of control dependence to data dependence. In POPL '83: Proceedings of the 10th ACM SIGACT-SIGPLAN Symposium on Principles of Programming Languages, pages 177-189, New York, NY, USA, 1983. ACM.
[Ard96] Laurent Arditi. BMDs can delay the use of theorem proving for verifying arithmetic assembly instructions. In Mandayam Srivas and Albert Camilleri, editors, Proceedings of the First International Conference on Formal Methods in Computer-Aided Design, volume 1166 of Lecture Notes in Computer Science, pages 34-48, London, UK, 1996. Springer.
[AS09] Gilles Audemard and Laurent Simon. Predicting learnt clauses quality in modern SAT solvers. In Proceedings of the 21st International Joint Conference on Artificial Intelligence, IJCAI'09, pages 399-404, San Francisco, CA, USA, 2009. Morgan Kaufmann Publishers Inc.
[Bab08] Domagoj Babić. Exploiting Structure for Scalable Software Verification. PhD thesis, University of British Columbia, Vancouver, Canada, 2008.
[Bac07] Fahiem Bacchus. GAC via unit propagation. In Christian Bessière, editor, Proceedings of the 13th International Conference on Principles and Practice of Constraint Programming, volume 4741 of Lecture Notes in Computer Science, pages 133-147, Berlin, Heidelberg, 2007. Springer.
[Bal07] Gogul Balakrishnan. WYSINWYX: What You See Is Not What You eXecute. PhD thesis, Computer Sciences Department, University of Wisconsin, Madison, WI, 2007.
[Ban08] Sorav Bansal. Peephole Superoptimization. PhD thesis, Stanford University, 2008.
[BB04] P. Bjesse and A. Boralv. DAG-aware circuit compression for formal verification. In Proceedings of the 2004 IEEE/ACM International Conference on Computer-Aided Design, ICCAD '04, pages 42-49, Washington, DC, USA, 2004. IEEE Computer Society.
[BB06] Robert Brummayer and Armin Biere. Local two-level and-inverter graph minimization without blowup. In Proceedings of the 2nd Doctoral Workshop on Mathematical and Engineering Methods in Computer Science (MEMICS06), 2006.
[BB08a] Armin Biere and Robert Brummayer. Consistency checking of all different constraints over bit-vectors within a SAT solver. In Proceedings of the 2008 Conference on Formal Methods in Computer-Aided Design, pages 1-4. FMCAD Inc., 2008.
[BB08b] Robert Brummayer and Armin Biere. Lemmas on demand for the extensional theory of arrays. In Proceedings of the Joint Workshops of the 6th International Workshop on Satisfiability Modulo Theories and 1st International Workshop on Bit-Precise Reasoning, SMT '08/BPR '08, pages 6-11, New York, NY, USA, 2008. ACM.
[BB09] Robert Brummayer and Armin Biere. Lemmas on demand for the extensional theory of arrays. Journal on Satisfiability, Boolean Modeling and Computation, 6(1-3):165-201, 2009.
[BCCZ99] Armin Biere, Alessandro Cimatti, Edmund M. Clarke, and Yunshan Zhu. Symbolic model checking without BDDs. In W. Rance Cleaveland, editor, Proceedings of the 5th International Conference on Tools and Algorithms for Construction and Analysis of Systems, volume 1579 of Lecture Notes in Computer Science, pages 193-207, London, UK, 1999. Springer.
[BCE08] Peter Boonstoppel, Cristian Cadar, and Dawson Engler. RWset: Attacking path explosion in constraint-based test-generation. In C. R. Ramakrishnan and Jakob Rehof, editors, International Conference on Tools and Algorithms for the Constructions and Analysis of Systems, volume 4963 of Lecture Notes in Computer Science, pages 351-366. Springer, 2008.
[ $\left.\mathrm{BCF}^{+} 06\right]$ Roberto Bruttomesso, Alessandro Cimatti, Anders Franzén, Alberto Griggio, Alessandro Santuari, and Roberto Sebastiani. To Ackermann-ize or not to Ackermann-ize? On efficiently handling uninterpreted function symbols in SMT (EUF $\cup T$ ). In Miki Hermann and Andrei Voronkov, editors, Logic for Programming, Artificial Intelligence, and Reasoning, volume 4246 of Lecture Notes in Computer Science, pages 557-571. Springer, 2006.
[ $\mathrm{BDdM}^{+}$12] Clark Barrett, Morgan Deters, Leonardo de Moura, Albert Oliveras, and Aaron Stump. 6 years of SMT-COMP. Journal of Automated Reasoning, pages 1-35, 2012.
[BDL98] Clark W. Barrett, David L. Dill, and Jeremy R. Levitt. A decision procedure for bit-vector arithmetic. In Proceedings of the 35th Annual Design Automation Conference, DAC '98, pages 522-527, New York, NY, USA, 1998. ACM.
[BFSW11] Sascha Böhme, Anthony C. J. Fox, Thomas Sewell, and Tjark Weber. Reconstruction of Z3's bit-vector proofs in HOL4 and Isabelle/HOL. In Proceedings of the First International Conference on Certified Programs and Proofs, CPP'11, pages 183-198, Berlin, Heidelberg, 2011. Springer.
[BG00] Mihai Budiu and Seth Copen Goldstein. BitValue inference: Detecting and exploiting narrow bitwidth computations. Technical Report CMU-CS-00-141, Carnegie Mellon University, June 2000.
[BH08] Domagoj Babić and Alan J. Hu. Calysto: Scalable and precise extended static checking. In ICSE '08: Proceedings of the 30th International Conference on Software Engineering, pages 211-220, New York, NY, USA, 2008. ACM.
[BH11] Sébastien Bardin and Philippe Herrmann. OSMOSE: Automatic structural testing of executables. Software Testing, Verification and Reliability, 21(1):29-54, 2011.
[BHP10] Sébastien Bardin, Philippe Herrmann, and Florian Perroud. An alternative to SAT-based approaches for bit-vectors. In J. Esparza and R. Majumdar, editors, TACAS, volume 6015 of Lecture Notes in Computer Science, pages 84-98. Springer, 2010.
[BHvMW09] Armin Biere, Marijn Heule, Hans van Maaren, and Toby Walsh, editors. Handbook of Satisfiability, volume 185 of Frontiers in Artificial Intelligence and Applications. IOS Press, Amsterdam, The Netherlands, 2009.
[Bie12] Armin Biere. Lingeling and friends entering the sat challenge 2012. In Adrian Balint, Anton Belov, Daniel Diepold, Simon Gerber, Matti Järvisalo, and Carsten Sinz, editors, Proceedings of SAT Challenge 2012: Solver and Benchmark Descriptions, volume B-2012-2 of Department of Computer Science Series of Publications B. University of Helsinki, 2012. ISBN 978-952-10-8106-4.
[ $\mathrm{BKO}^{+}$07] Randal E. Bryant, Daniel Kroening, Joel Ouaknine, Sanjit A. Seshia, Ofer Strichman, and Bryan Brady. Deciding bit-vector arithmetic with abstraction. In Proceedings of TACAS 2007, volume 4424 of Lecture Notes in Computer Science, pages 358-372. Springer, 2007.
[BM10] Robert K. Brayton and Alan Mishchenko. ABC: An academic industrial-strength verification tool. In Tayssir Touili, Byron Cook,
and Paul Jackson, editors, Proceedings of the 22nd International Conference on Computer Aided Verification (CAV'2010), volume 6174 of Lecture Notes in Computer Science, pages 24-40, 2010.
[BRST08] Clark Barrett, Silvio Ranise, Aaron Stump, and Cesare Tinelli. The Satisfiability Modulo Theories Library (SMT-LIB). www. SMT-LIB. org, 2008.
[Bru08] Roberto Bruttomesso. RTL Verification: From SAT to $\operatorname{SMT}(B V)$. PhD thesis, University of Trento, 2008.
[Bru09] Robert Brummayer. Efficient SMT Solving for Bit-Vectors and the Extensional Theory of Arrays. PhD thesis, Johannes Kepler University, 2009.
[Bry86] Randal E. Bryant. Graph-based algorithms for Boolean function manipulation. IEEE Transactions on Computers, 35(8):677-691, August 1986.
[BST10] Clark Barrett, Aaron Stump, and Cesare Tinelli. The SMT-LIB standard - version 2.0. In Proceedings of the $8^{\text {th }}$ International Workshop on Satisfiability Modulo Theories (SMT '10), July 2010. Edinburgh, Scotland.
[BSWG00] Mihai Budiu, Majd Sakr, Kip Walker, and Seth C. Goldstein. BitValue inference: Detecting and exploiting narrow bitwidth computations. In A. Bode, T. Ludwig, W. Karl, and R. Wismüller, editors, Proceedings of the EuroPar 2000 European Conference on Parallel Computing, volume 1900 of Lecture Notes in Computer Science, pages 969-979. Springer, 2000.
[CDE08] Cristian Cadar, Daniel Dunbar, and Dawson Engler. KLEE: Unassisted and automatic generation of high-coverage tests for complex systems programs. In Proceedings of the 8th USENIX Conference on Operating Systems Design and Implementation, pages 209-224, Berkeley, CA, USA, 2008. USENIX Association.
[CE05] Cristian Cadar and Dawson Engler. Execution generated test cases: How to make systems code crash itself. In Patrice Godefroid, editor, Proceedings of the 12th International Conference on Model Checking Software, volume 3639 of Lecture Notes in Computer Science, pages 2-23, Berlin, Heidelberg, 2005. Springer-Verlag.
[CGJ $\left.{ }^{+} 00\right]$ Edmund M. Clarke, Orna Grumberg, Somesh Jha, Yuan Lu, and Helmut Veith. Counterexample-guided abstraction refinement. In E. Allen Emerson and Aravinda Prasad Sistla, editors, Proceedings of the 12th International Conference on Computer Aided Verification (CAV'00), volume 1855 of Lecture Notes in Computer Science, pages 154-169, London, UK, 2000. Springer-Verlag.
[CGP ${ }^{+}$06] Cristian Cadar, Vijay Ganesh, Peter M. Pawlowski, David L. Dill, and Dawson R. Engler. EXE: Automatically generating inputs of death. In CCS '06: Proceedings of the 13th ACM Conference on Computer and Communications Security, pages 322-335, New York, NY, USA, 2006. ACM.
[CHR00] David W. Currie, Alan J. Hu, and Sreeranga P. Rajan. Automatic formal verification of DSP software. In Proceedings of the 37th Annual Design Automation Conference, pages 130-135. ACM, 2000.
[Cla76] Lori A. Clarke. A program testing system. In ACM 76: Proceedings of the Annual Conference, pages 488-491, New York, NY, USA, 1976. ACM.
[CMR97] David Cyrluk, M. Oliver Möller, and Harald Rueß. An efficient decision procedure for the theory of fixed-sized bit-vectors. In Orna Grumberg, editor, Proceedings of the 9th International Conference on Computer Aided Verification (CAV'97), volume 1254 of Lecture Notes in Computer Science, pages 60-71. Springer-Verlag, 1997.
[Coo71] Stephen A. Cook. The complexity of theorem-proving procedures. In Proceedings of the Third Annual ACM Symposium on Theory of Computing (STOC'72), pages 151-158, New York, NY, USA, 1971. ACM.
[CPdP93] Alberto Coen-Porisini and Flavio de Paoli. Array representation in symbolic execution. Computer Languages, 18(3):197-216, 1993.
[DLL62] Martin Davis, George Logemann, and Donald Loveland. A machine program for theorem-proving. Communications of the ACM, 5(7):394397, July 1962.
[dM11] Leonardo de Moura. What methods does Z3 use to solve quantifier-free bit-vector formulas (QF_BV)? http://www.stackoverflow.com/questions/7268221/, September 2011.
[dMB08a] Leonardo de Moura and Nikolaj Bjørner. Model-based theory combination. Electronic Notes in Theoretical Computer Science, 198(2):37-49, May 2008.
[dMB08b] Leonardo de Moura and Nikolaj Bjørner. Z3: An efficient SMT solver. In Proceedings of the 14th International Conference on Tools and Algorithms for the Construction and Analysis of Systems (TACAS'08), pages 337-340, Berlin, Heidelberg, 2008. Springer.
[dMB11] Leonardo de Moura and Nikolaj Bjørner. Satisfiability modulo theories: Introduction and applications. Communications of the ACM, 54(9):69-77, September 2011.
[dMP12] Leonardo de Moura and Grant Olney Passmore. The strategy challenge in SMT solving, 2012. unpublished.
[DP60] Martin Davis and Hilary Putnam. A computing procedure for quantification theory. Journal of the ACM, 7:201-215, 1960.
[EB05] Niklas Eén and Armin Biere. Effective preprocessing in SAT through variable and clause elimination. In Fahiem Bacchus and Toby Walsh, editors, Proceedings of the 8th International Conference on Theory and Applications of Satisfiability Testing (SAT'05), volume 3569 of Lecture Notes in Computer Science, pages 61-75, Berlin, Heidelberg, 2005. Springer.
[EC03] Karen Yorav Edmund Clarke, Daniel Kroening. Behavioral consistency of C and Verilog programs. Technical Report CMU-CS-03-

126, Computer Science Department, School of Computer Science, Carnegie Mellon University, 2003.
[ED07] Dawson Engler and Daniel Dunbar. Under-constrained execution: Making automatic code destruction easy and scalable. In ISSTA '07: Proceedings of the 2007 International Symposium on Software Testing and Analysis, pages 1-4, New York, NY, USA, 2007. ACM.
[EMS07] Niklas Eén, Alan Mishchenko, and Niklas Sörensson. Applying logic synthesis for speeding up SAT. In João Marques-Silva and Karem A. Sakallah, editors, Proceedings of the 10th International Conference on Theory and Applications of Satisfiability Testing, volume 4501 of Lecture Notes in Computer Science, pages 272-286, Berlin, Heidelberg, 2007. Springer.
[ES04] Niklas Eén and Niklas Sörensson. An extensible SAT-solver. In Enrico Giunchiglia and Armando Tacchella, editors, Theory and Applications of Satisfiability Testing, volume 2919 of Lecture Notes in Computer Science, pages 333-336. Springer-Verlag, 2004.
[ES06] Niklas Eén and Niklas Sörensson. Translating pseudo-Boolean constraints into SAT. Journal on Satisfiability, Boolean Modeling and Computation, 2:1-26, 2006.
[Fox11] Anthony C. J. Fox. LCF-style bit-blasting in HOL4. In Marko van Eekelen et al., editor, Proceedings of the Second International Conference on Interactive Theorem Proving, volume 6896 of Lecture Notes in Computer Science, pages 357-362, Berlin, Heidelberg, 2011. Springer.
[Fra10] Anders Franzén. Efficient Solving of the Satisfiability Modulo Bit-Vectors Problem and Some Extensions to SMT. PhD thesis, University of Trento, 2010.
[Gan07] Vijay Ganesh. Decision Procedures for Bit-Vectors, Arrays and Integers. PhD thesis, Computer Science Department, Stanford University, 2007.
[GBD05] Vijay Ganesh, Segey Berezin, and David L. Dill. A decision procedure for fixed-width bit-vectors. Technical report, Stanford University, 2005.
[GD07] Vijay Ganesh and David L. Dill. A decision procedure for bit-vectors and arrays. In Werner Damm and Holger Hermanns, editors, Proceedings of the 19th International Conference on Computer Aided Verification, volume 4590 of Lecture Notes in Computer Science, pages 519-531, Berlin, Heidelberg, 2007. Springer.
[GKA $\left.{ }^{+} 11\right]$ Vijay Ganesh, Adam Kieżun, Shay Artzi, Philip J. Guo, Pieter Hooimeijer, and Michael Ernst. HAMPI: A string solver for testing, analysis and vulnerability detection. In Ganesh Gopalakrishnan and Shaz Qadeer, editors, Proceedings of the 23rd International Conference on Computer Aided Verification (CAV 2011), volume 6806 of Lecture Notes in Computer Science, pages 1-19, Berlin, Heidelberg, 2011. SpringerVerlag.
[GKL08] Patrice Godefroid, Adam Kiezun, and Michael Y. Levin. Grammarbased whitebox fuzzing. In PLDI '08: Proceedings of the 2008 ACM SIGPLAN Conference on Programming Language Design and Implementation, pages 206-215, New York, NY, USA, 2008. ACM.
[GLM08] Patrice Godefroid, Michael Y. Levin, and David Molnar. Automated whitebox fuzz testing. In NDSS 2008: Proceedings of the 15th Annual Network \& Distributed System Security Symposium, pages 151-166, 2008.
[God07] Patrice Godefroid. Compositional dynamic test generation. In POPL '07: Proceedings of the 34th Annual ACM SIGPLAN-SIGACT Symposium on Principles of Programming Languages, pages 47-54, New York, NY, USA, 2007. ACM.
[GOSL $\left.{ }^{+} 12\right]$ Vijay Ganesh, Charles W. O'Donnell, Armando Solar-Lezama, Srinivas Devadas, Mate Soos, and Martin Rinard. Lynx: A programmatic SAT solver for the RNA-folding problem. In Alessandro Cimatti and Roberto Sebastiani, editors, Proceedings of the 15th International Conference on Theory and Applications of Satisfiability Testing (SAT 2012),
volume 7317 of Lecture Notes in Computer Science, pages 143-156. Springer-Verlag, 2012.
[HBHH07] Frank Hutter, Domagoj Babić, Holger H. Hoos, and Alan J. Hu. Boosting verification by automatic tuning of decision procedures. In Proceedings of the Formal Methods in Computer Aided Design, FMCAD '07, pages 27-34, Washington, DC, USA, 2007. IEEE Computer Society.
[HHLBS09] Frank Hutter, Holger H. Hoos, Kevin Leyton-Brown, and Thomas Stützle. ParamILS: An automatic algorithm configuration framework. Journal of Artificial Intelligence Research, 36(1):267-306, September 2009.
[How77] W.E. Howden. Symbolic testing and the DISSECT symbolic evaluation system. IEEE Transactions on Software Engineering, 3(4):266-278, 1977.
[HSS09] T.Hansen, P. Schachte, and H. Søndergaard. State joining and splitting for the symbolic execution of binaries. In S. Bensalem and D. A. Peled, editors, Runtime Verification, volume 5779 of Lecture Notes in Computer Science, pages 76-92. Springer, 2009.
[Hua08] Jinbo Huang. Universal Booleanization of constraint models. In Peter Stuckey, editor, Proceedings of the 14th International Conference on Principles and Practice of Constraint Programming (CP'08), volume 5202 of Lecture Notes in Computer Science, pages 144-158, Berlin, Heidelberg, 2008. Springer.
[IS10] I. K. Isaev and D. V. Sidorov. The use of dynamic analysis for generation of input data that demonstrates critical bugs and vulnerabilities in programs. Programming and Computer Software, 36(4):225-236, July 2010.
[ISO12] ISO. ISO/IEC 14882:2011 Information Technology — Programming Languages - C++. ISO, February 2012.
[JBH10] Matti Järvisalo, Armin Biere, and Marijn Heule. Blocked clause elimination. In Javier Esparza and Rupak Majumdar, editors, Proceedings
of the 16th International Conference on Tools and Algorithms for the Construction and Analysis of Systems (TACAS'10), volume 6015 of Lecture Notes in Computer Science, pages 129-144, Berlin, Heidelberg, 2010. Springer-Verlag.
[JBH11] Matti Järvisalo, Armin Biere, and Marijn Heule. Simulating circuitlevel simplifications on CNF. Journal of Automated Reasoning, pages 1-37, 2011.
[JBKW08] Jean Christoph Jung, Pedro Barahona, George Katsirelos, and Toby Walsh. Two encodings of DNNF theories. ECAI'08 Workshop on Inference Methods Based on Graphical Structures of Knowledge, July 2008.
[JC09] Himanshu Jain and Edmund M. Clarke. Efficient SAT solving for non-clausal formulas using DPLL, graphs, and watched cuts. In Proceedings of the 46th Annual Design Automation Conference (DAC'09), pages 563-568, New York, NY, USA, 2009. ACM.
[JC11] Ajith K. John and Supratik Chakraborty. A quantifier elimination algorithm for linear modular equations and disequations. In Ganesh Gopalakrishnan and Shaz Qadeer, editors, Proceedings of the 23rd International Conference on Computer Aided Verification, volume 6806 of Lecture Notes in Computer Science, pages 486-503, Berlin, Heidelberg, 2011. Springer.
[JD01] Peer Johannsen and Rolf Drechsler. Formal verification on the RT level computing one-to-one design abstractions by signal width reduction. In In IFIP International Conference on Very Large Scale Integration (VLSI'01), Montpellier, 2001, pages 127-132, 2001.
[JLS09] Susmit Jha, Rhishikesh Limaye, and Sanjit Seshia. Beaver: Engineering an efficient SMT solver for bit-vector arithmetic. In Ahmed Bouajjani and Oded Maler, editors, Proceedings of the 21st International Conference on Computer Aided Verification (CAV'2009), volume 5643 of Lecture Notes in Computer Science, pages 668-674. Springer-Verlag, 2009.
[KFB12] Gergley Kovásznai, Andreas Fröhlich, and Armin Biere. On the complexity of fixed-size bit-vector logics with binary encoded bit-width. In Proceedings of the $10^{\text {th }}$ International Workshop on Satisfiability Modulo Theories (SMT'12), pages 44-55, 2012.
[Kin76] James C. King. Symbolic execution and program testing. Communications of the ACM, 19(7):385-394, 1976.
[KJJP09] Alfred Koelbl, Reily Jacoby, Himanshu Jain, and Carl Pixley. Solver technology for system-level to RTL equivalence checking. In Proceedings of the Conference on Design, Automation and Test in Europe, DATE '09, pages 196-201, 3001 Leuven, Belgium, Belgium, 2009. European Design and Automation Association.
[KKBC12] Volodymyr Kuznetsov, Johannes Kinder, Stefan Bucur, and George Candea. Efficient state merging in symbolic execution. In Proceedings of the 33rd Conference on Programming Language Design and Implementation (PLDI 2012), pages 193-204. ACM, 2012.
[KP05] Alfred Kölbl and Carl Pixley. Constructing efficient formal models from high-level descriptions using symbolic simulation. International Journal of Parallel Programming, 33(6):645-666, 2005.
[KSJ09] Hyondeuk Kim, Fabio Somenzi, and HoonSang Jin. Efficient term-ITE conversion for satisfiability modulo theories. In Oliver Kullmann, editor, Theory and Applications of Satisfiability Testing - SAT 2009, volume 5584 of Lecture Notes in Computer Science, pages 195-208. Springer, 2009.
[KVZ09] Johannes Kinder, Helmut Veith, and Florian Zuleger. An abstract interpretation-based framework for control flow reconstruction from binaries. In Proc. of the 10th Int. Conf. on Verification, Model Checking, and Abstract Interpretation (VMCAI 2009), pages 214-228. Springer, 2009.
[LA03] Eric Larson and Todd Austin. High coverage detection of inputrelated security faults. In Proceedings of the 12th USENIX Security Symposium, page 9, Berkeley, CA, USA, 2003. USENIX Association.
[Lar90] Tracy Larrabee. Efficient Generation of Test Patterns Using Boolean Satisfiability. PhD thesis, Stanford University, 1990.
[Li00] Chu Min Li. Integrating equivalency reasoning into Davis-Putnam procedure. In Proceedings of the Seventeenth National Conference on Artificial Intelligence and Twelfth Conference on Innovative Applications of Artificial Intelligence, pages 291-296. AAAI Press, 2000.
[LS10] Rhishikesh Shrikant Limaye and Sanjit A. Seshia. Beaver: An SMT solver for quantifier-free bit-vector logic. Technical Report UCB/EECS-2010-67, Electrical Engineering and Computer Sciences, University of California at Berkeley, 2010.
[MCB06] Alan Mishchenko, Satrajit Chatterjee, and Robert Brayton. DAGaware AIG rewriting: A fresh look at combinational logic synthesis. In Proceedings of the 43rd Annual Design Automation Conference, DAC '06, pages 532-535, New York, NY, USA, 2006. ACM.
[Min92] Shin-ichi Minato. Fast generation of irredundant sum-of-products forms from binary decision diagrams. In Proceedings of the Synthesis and Simulation Meeting and International Interchange (SASIMI'92)", 1992.
[Min96] Shin-ichi Minato. Generation of BDDs from hardware algorithm descriptions. In ICCAD '96: Proceedings of the 1996 IEEE/ACM International Conference on Computer-Aided Design, pages 644-649, Washington, DC, USA, 1996. IEEE Computer Society.
[MMZ ${ }^{+}$01] Matthew W. Moskewicz, Conor F. Madigan, Ying Zhao, Lintao Zhang, and Sharad Malik. Chaff: Engineering an efficient SAT solver. In Proceedings of the 38th Annual Design Automation Conference, DAC '01, pages 530-535, New York, NY, USA, 2001. ACM.
[MPFH11] Kin-Keung Ma, Khoo Yit Phang, Jeffrey S. Foster, and Michael Hicks. Directed symbolic execution. In Eran Yahav, editor, Static Analysis: Proceedings of the 18th International Symposium, volume 6887 of Lecture Notes in Computer Science, pages 95-111, Berlin, Heidelberg, 2011. Springer-Verlag.
[MS98] Kim Marriott and Peter J. Stuckey. Programming with Constraints: An Introduction. The MIT Press, 1998.
[MSV06] Panagiotis Manolios, Sudarshan K. Srinivasan, and Daron Vroon. Automatic memory reductions for RTL model verification. In Proceedings of the 2006 IEEE/ACM International Conference on Computer-Aided Design, ICCAD '06, pages 786-793, New York, NY, USA, 2006. ACM.
[MV12] Laurant D. Michel and Pascal Van Hentenryck. Constraint satisfaction over bit-vectors. In M. Milano, editor, Constraint Programming: Proceedings of the 2012 Conference, volume 7514 of Lecture Notes in Computer Science, pages 527-543. Springer, 2012.
[NLLC06] Susanta Nanda, Wei Li, Lap-Chung Lam, and Tzi-cker Chiueh. BIRD: Binary interpretation using runtime disassembly. In CGO '06: Proceedings of the International Symposium on Code Generation and Optimization, pages 358-370, Washington, DC, USA, 2006. IEEE Computer Society.
[NO80] Greg Nelson and Derek C. Oppen. Fast decision procedures based on congruence closure. Journal of the ACM, 27(2):356-364, April 1980.
[NO05] Robert Nieuwenhuis and Albert Oliveras. Proof-producing congruence closure. In Jürgen Giesl, editor, Proceedings of the 16th International Conference on Rewriting Techniques and Applications, volume 3467 of Lecture Notes in Computer Science, pages 453-468. Springer, 2005.
[ $\mathrm{NRJ}^{+}$07] Yehuda Naveh, Michal Rimon, Itai Jaeger, Yoav Katz, Michael Vinov, Eitan s Marcu, and Gil Shurek. Constraint-based random stimuli generation for hardware verification. AI Magazine, 28(3):13-30, 2007.
[NS07] Nicholas Nethercote and Julian Seward. Valgrind: A framework for heavyweight dynamic binary instrumentation. In Proceedings of ACM SIGPLAN 2007 Conference on Programming Language Design and Implementation (PLDI 2007), pages 89-100, New York, NY, USA, 2007. ACM.
[NSSS12] J. A. Navas, P. Schachte, H. Søndergaard, and P. J. Stuckey. Signednessagnostic program analysis: Precise integer bounds for low-level code.

In R. Jhala and A. Igarashi, editors, APLAS 2012: Proceedings of the 10th Asian Symposium on Programming Languages and Systems, volume 7705 of Lecture Notes in Computer Science, pages 115-130. Springer, 2012.
[OSC09] Olga Ohrimenko, Peter J. Stuckey, and Michael Codish. Propagation via lazy clause generation. Constraints, 14(3):357-391, September 2009.
[PG86] David A. Plaisted and Steven Greenbaum. A structure-preserving clause form translation. Journal of Symbolic Computation, 2(3):293-304, September 1986.
[RD06] John Regehr and Usit Duongsaa. Deriving abstract transfer functions for analyzing embedded software. In LCTES '06: Proceedings of the 2006 ACM SIGPLAN/SIGBED Conference on Languages, Compilers, and Tool Support for Embedded Systems, pages 34-43, New York, NY, USA, 2006. ACM.
[RR04] John Regehr and Alastair Reid. HOIST: A system for automatically deriving static analyzers for embedded systems. In ASPLOS-XI: Proceedings of the 11th International Conference on Architectural Support for Programming Languages and Operating Systems, pages 133-143, New York, NY, USA, 2004. ACM.
[RSY04] Thomas Reps, Mooly Sagiv, and Greta Yorsh. Symbolic implementation of the best transformer. In B. Steffen and G. Levi, editors, Verification, Model Checking and Abstract Interpretation, volume 2937 of Lecture Notes in Computer Science, pages 252-266. Springer, 2004.
[Rud86] Richard L. Rudell. Multiple-valued logic minimization for PLA synthesis. Technical Report UCB/ERL M86/65, EECS Department, University of California, Berkeley, 1986.
[Rus99] David M. Russinoff. A mechanically checked proof of correctness of the AMD K5 floating point square root microcode. Formal Methods in System Design, 14(1):75-125, January 1999.
[SBDL01] Aaron Stump, Clark W. Barrett, David L. Dill, and Jeremy Levitt. A decision procedure for an extensional theory of arrays. In Proceed-
ings of the 16th Annual IEEE Symposium on Logic in Computer Science (LICS'01), pages 29-37, Washington, DC, USA, 2001. IEEE Computer Society.
[SD11] Sol Swords and Jared Davis. Bit-blasting ACL2 theorems. In David Hardin and Julien Schmaltz, editors, Proceedings 10th International Workshop on the ACL2 Theorem Prover and Its Applications, Austin, Texas, USA, November 3-4, 2011, volume 70 of Electronic Proceedings in Theoretical Computer Science, pages 84-102. Open Publishing Association, 2011.
[Seb07] Roberto Sebastiani. Lazy satisability modulo theories. Journal on Satisfiability, Boolean Modeling and Computation, 3(3-4):141-224, 2007.
[Sed98] Robert Sedgewick. Algorithms in C. Addison-Wesley, third edition, 1998.
[SMA05] Koushik Sen, Darko Marinov, and Gul Agha. CUTE: A concolic unit testing engine for C. In Proceedings of the 10th European Software Engineering Conference Held Jointly with 13th ACM SIGSOFT International Symposium on Foundations of Software Engineering, ESEC/FSE-13, pages 263-272, New York, NY, USA, 2005. ACM.
[Smi11] Eric Whitman Smith. Axe, an automated formal equivalence checking tool for programs. PhD thesis, Stanford University, 2011.
[SNC09] Mate Soos, Karsten Nohl, and Claude Castelluccia. Extending SAT solvers to cryptographic problems. In Oliver Kullmann, editor, Proceedings of the 12th International Conference on Theory and Applications of Satisfiability Testing (SAT'09), volume 5584 of Lecture Notes in Computer Science, pages 244-257, Berlin, Heidelberg, 2009. Springer-Verlag.
[Tse83] G. S. Tseitin. On the complexity of derivation in propositional calculus. In J. Siekmann and G. Wrightson, editors, Automation of Reasoning, Vol. 2: Classical Papers on Computational Logic 1967-1970, pages 466483. Springer, 1983. Originally published as "O slozhnosti vyvoda v ischislenii vyskazyvaniy", Zapiski Nauchnykh Seminarov LOMI 8:234259, Steklov Inst. Math., Leningrad, 1968.
[War02] Henry S. Warren Jr. Hacker's Delight. Addison-Wesley Longman Publishing Co., Inc., Boston, MA, USA, 2002.
[Weg60] Peter Wegner. A technique for counting ones in a binary computer. Communications of the ACM, 3(5):322, May 1960.
[WHdM10] Christoph M. Wintersteiger, Youssef Hamadi, and Leonardo de Moura. Efficiently solving quantified bit-vector formulas. In Proceedings of the 2010 Conference on Formal Methods in Computer-Aided Design, FMCAD '10, pages 239-246, Austin, TX, 2010. FMCAD Inc.
[XA05] Yichen Xie and Alex Aiken. Scalable error detection using Boolean satisfiability. In POPL '05: Proceedings of the 32nd ACM SIGPLANSIGACT Symposium on Principles of Programming Languages, pages 351363, New York, NY, USA, 2005. ACM.
[XHHLB08] Lin Xu, Frank Hutter, Holger H. Hoos, and Kevin Leyton-Brown. SATzilla: Portfolio-based algorithm selection for SAT. Journal of Artificial Intelligence Research, 32(1):565-606, 2008.
[ZKC01] Zhihong Zeng, Priyank Kalla, and Maciej Ciesielski. LPSAT: A unified approach to RTL satisfiability. In DATE 2001, pages 398-402. IEEE Press, 2001.


[^0]:    Trevor Alexander Hansen

[^1]:    ${ }^{1}$ The SimplifyingNodeFactory.cpp file in STP's source-code repository contains the implementation.

[^2]:    ${ }^{2}$ The source code file Bitblaster.cpp in STP2's source-code repository contains the encodings we chose.

[^3]:    ${ }^{1}$ These are contained in the QF_AUFBV category, which contains no problems with Uninterpreted Functions (UF).

