Keywords

1 Introduction

Numerous recent publications with encodings of problems into SAT and QBF do not use SAT or QBF solvers directly [6, 16, 18]. SMT solvers, often the feature-rich and popular state-of-the-art solver Z3 [8], are used instead, as it “makes implementation much simpler” [17], although no theory reasoning is involved. Z3’s programming API or the Common Lisp compatible SMT-LIB standard [3] are well documented and regarded by many as easy to use. While this ease of use leads to wide adoption and fast results, adapting encodings to use less general solving backends that are potentially more efficient for the problem at hand remains hard, e.g. switching from SMT solving to using a less general SAT solver. Researchers focus on optimizing their encodings against an SMT solver’s performance characteristics, instead of testing them against many different (also non-SMT) solvers. We consider the transformation of formulas into conjunctive normal form (CNF) required for SAT and QBF solvers to be non-trivial, especially for beginners. Seemingly bad intermediary results are discarded, as the effort required to re-encode the problem to be solvable with SAT or QBF solvers is too large to be spent during prototyping, effectively forming a solver gap. We want to bridge over this gap and reduce the friction involved with testing other solvers outside the SMT world, without extensive modifications to encoding generators. In this work, we analyze what features are required to build this bridge and develop Booleguru, a polyglot for (quantified) Boolean logic. Our tool is available under the permissive MIT license at:

https://github.com/maximaximal/booleguru

1.1 Bridging the Solver Gap with Propositional Logic

Table 1. Formula formats that are optimized (✓), usable (\(\approx \)), or unusable (✗) for encoding the respective problem
Table 2. File formats and their capabilities.

In order to build a bridge over the solver gap described above, we first have to identify it. When encoding problems into logic, one also has to decide which format to encode into. The encoding itself is then typically accomplished with some encoder program, which is closely tied to the problem to be encoded. Changing the output format of an encoder involves considerable effort, as encoders are typically tailored to the output formats they were designed to support. As encoders grow in complexity, communities form around them, while still relying on the original output format. The decision at the beginning of an encoder’s development then influences the newly formed community, as changing their encoder to generate a different output format involves considerable effort. We therefore identify gaps in the solving landscape opening between the different input formats of different solvers for different problems. Table 1 lists a selection of different problems with their associated dominant file formats. Table 2 lists features offered by each format. All of them offer ways to encode propositional logic in CNF, with some of them extending it with quantifiers over variables (\(\forall \), \(\exists \)). More advanced formats also allow encoding formulas in Non-CNF, i.e., formulas built from expressions and more complex logical operators. If a format supports quantifiers, they may always be added as a prefix to the formula. While every formula with embedded quantifiers can be prenexed to be in such a prenex form, some formats also allow encoding formulas in non-prenex form, extending a format’s expressiveness. Some formats also allow structure-sharing, which lets problems reference sub-expressions multiple times, without repeating them. The overlapping capabilities of different formats suggest that a conversion tool has to be able to process all of them, and to serialize complex features into less complex formats, where supported.

1.2 Related Work

Other communities already went through this bridge-building effort in order to reduce duplicated work and advance their fields. One of the biggest examples are the researchers of the machine learning community, who commonly use libraries like PyTorch [15], SciPy [21], and Pandas [20]. This allows others to use new innovations in these libraries, such as newly added learning algorithms or properties in PyTorch, or better storage formats in Pandas.

Multiple conversion tools already exist for QBF [9, 13, 19]. While all of these tools convert between specific formats, no tool tries to encompass multiple conversion or combination capabilities. Some SMT solvers are able to read multiple input formats [2, 7, 8], with all supporting SMT-LIB2, the format favored by the SMT-Competition [22]. SMT solvers do not offer to combine multiple formulas seamlessly. Booleguru fills this niche and provides such a convert & combine capability, while also enabling a unique development environment to create new formulas. It enables previously tedious comparisons between different solvers solving similar problems, like SMT and QBF solvers, as shown in Fig. 2.

Fig. 1.
figure 1

Booleguru Architecture, Transformers may be arbitrarily combined.

2 Booleguru, the Propositional Multitool

After introducing the overlaps between file formats and solving communities, we now introduce our conversion tool: Booleguru. As shown in the architecture diagram in Fig. 1, it consists of readers, transformers, and serializers for propositional logic and extensions. Inputs in arbitrary formats may be read, modified, and then serialized in the same or a different file format. This section first describes how Booleguru stores propositional formulas in memory, so that all capabilities described in Table 2 can be provided. We then introduce features intended for working with formulas.

2.1 Representing Propositional Formulas in Memory

In order to be accepted as an efficient tool to work with propositional formulas, they have to be both fast to create and to traverse. While this is true for a tool in any problem domain, the lower expressiveness of propositional logic compared to bit-vectors or more complex theories leads to large formulas with many nodes, relying on tools with especially high throughput. For this, they are stored in a directed-acyclic-graph (DAG), enabling structure-sharing. Each node in the DAG is either a variable, a unary (negation), or a binary operation (and, or, etc.). A node is stored in a struct of 16B, which (on most architectures) exactly fits into a single cache-line. The remaining bits to fill the cache line are occupied by structural information of the expression and user-writable extra data to be stored within the DAG, which speeds up transformers that need to store temporary data on nodes. Beside the user writable data, nodes stay constant over the whole execution.

References between nodes are stored as 32bit unsigned integers, which are evaluated relatively to the nodes of a whole formula. When creating a formula, a hash table is used to check if a given node already exists, and if it does not, a new node is appended at the back of the node array. References to previous nodes are immutable, which enables cheaply appending new expressions that are composed of others. Expressions may only reference other expressions with IDs smaller than themselves, cycles imply a malstructured formula. Traversing this DAG does not involve hash lookups or pointer indirections, as every reference can be resolved directly through the child’s index in the array. Information about a sub-expression is collected during insertion of a new node, removing the requirement to scan the DAG in order to check for commonly required structural information. The 32bit references make traversal very efficient, but limits formula sizes to \(2^{32}-1\) nodes. We will provide a compile-time switch to increase reference sizes in a future version.

2.2 Parsing Formulas

We already implemented several parsers:

figure a

The readers are mostly implemented using the ANTLR parser generator [14]. While slower than hand-written parsers, the library allowed us to iterate faster during development and add more input languages with a shared base. Some parsers are hand-written, and performance critical ones are incrementally optimized to a specialized implementation. Each parser produces a reference to an expression inside a shared expression manager. Multiple expressions can then be combined into new ones, disregarding their source format. The command-line parsing is also fully done using an ANTLR parser producing an expression, in order to provide a language for composing formulas from multiple files or scripts.

2.3 Transforming Formulas

Transformers are the umbrella term for functions that work on one or more expressions passed to them. They may return new expressions built from their inputs and may be chained together. They are either implemented in native C++, Lua [12], or FennelFootnote 1, with Lua and Fennel possibly supplied at runtime by a user without re-compiling Booleguru. Several transformations are already implemented, with more being added in future releases. The list below uses the Colon Operator (:op) notation, which is transformed into Fennel function calls during CLI parsing. Each such transformation can be supplied to the Booleguru CLI, where they strongly bind (stronger than binary operators) to the expression preceding them. Generating transformers without expressions as inputs have to be written without an expression preceding them, akin to variables.

figure b

2.4 Serializing Formulas

Fig. 2.
figure 2

Z3 and selected QBF solvers solving the QCIR track of QBFGallery 2023

After reading, combining, and transforming input formulas, they can be printed in different output formats. For this, several serializers have been developed, which are listed below. (Q)DIMACS relies on the provided Tseitin encoding by default, but one may use other methods that arrive on an expression which is tagged to be in CNF. This made Booleguru a helpful tool in the QBFGallery 2023Footnote 2.

figure c

3 Booleguru, the Programming Environment

During development of new problem encodings or crafted formulas, there is usually a step where an encoding tool or a formula generator is written [1, 4, 11]. In our experience, these tools often rely on similar primitives:

  • Create a new variable.

  • Compose an expression based on sub-expressions.

  • Read from an external source or draw random data.

  • Write the formula in the desired output format.

The Lua, Fennel, and Python APIs offered by Booleguru abstract over multiple output formats and the concept of writing formulas into files. When writing a generator for a new formula, the Booleguru primitives offer the user to work directly with the formula’s AST, instead of having to generate syntax describing the AST in the target language. This makes generators more suited to change, as they are always composed of (nested) functions, each generating sub-expressions.

In addition to using Booleguru as a first-class execution environment for formula generators, it may be used to reduce SMT encodings developed using Z3Py to a SAT or QBF solving problem. Booleguru optionally generates a Python module that emulates the popular interface of Z3.

Lua and Fennel Interface. The Lua and Fennel interfaces are both accessible through an embedded interpreter. Lua and Fennel scripts that are already distributed as a part of Booleguru are compiled ahead of time by LuaJITFootnote 3. This makes both initialization and execution of scripts as efficient as possible in the Release build of Booleguru. User-supplied scripts are compiled at runtime using LuaJIT.

Python Interface. Additionally to the specialized Lua and Fennel interfaces, Booleguru provides the pybooleguru interface which is directly modelled after the widely used Z3Py Python library. We observed that when using Z3Py during development of a new encoding, the jump from the SMT solver Z3 to more fundamental SAT or QBF solving becomes harder. This interface is intended to be a drop-in replacement to Z3Py, enabling the conversion of a complex Z3-specific encoder written in Python into an encoder capable of producing additional output of a different format. Python scripts may be read as inputs or a script may import pybooleguru instead of z3py.

C++ Interface.

Additionally to the scripting interfaces, the C++ interface itself may also be used. Functions are provided to easily build expressions using C++, which can be useful when developing new tools in systems languages.

3.1 Command-Line Interactive Interface

The command-line interface of Booleguru is also considered a programming environment, as it seamlessly merges a grammar for propositional logic in infix notation with the Scheme-based Fennel, a programming language written in prefix-notation. Each Fennel expression has to return a logical expression, which it builds using the provided primitives. By combining transformations and working with expressions, new functionality can be implemented using the CLI alone. For example, for the formula \((a\wedge {}b)\vee {}(a\wedge {}\lnot {}b)\), both solutions can be extracted using the CLI:

figure d
figure e
figure f

3.2 Developing Booleguru

Our tool is implemented in C++, with some helpers provided to ease development. The build system is realized using CMake, which makes Booleguru easily embeddable into other projects. All modules have test suites to be run during development. They test basic features like the expression tree, but also transformers and serializers. Defining new tests is easy and running all tests is quick and parallelizable. The build system offers a fast Release build, a slower Debug build without optimization options, and a Sanitized build with enabled address- and undefined behavior sanitizers. If available, LuaJIT is used for executing Lua and Fennel scripts, otherwise the regular Lua distribution is provided as a fallback. Booleguru also natively supports to be built as a WebAssembly executable, making it runnable in browsers, including Lua and Fennel scripts.

Embedded Fuzzer. Transformers that process operations may also be fuzzed using the LLVM libFuzzer integration. Booleguru has to be built in the fuzzing mode, which creates the specialized booleguru-fuzzer binary. The command-line has to be provided via the environment variable BOOLEGURU_ARGS, with a fuzz file that serves as the injection point for fuzzed inputs. Arbitrary transformations may then be performed on the expressions, which are called with every iteration of the fuzzer. Booleguru’s embedded mutator (inspired by the mutator of Google Protocol Buffers) randomly creates arbitrary many input structures until unexpected stops are encountered. The fuzzing capability was used extensively during development of the transformers. The resulting inputs can be displayed in Limboole format using the booleguru-print-corpus tool.

4 Conclusion

We developed the propositional polyglot Booleguru, which can be used to convert between several widely used logic formats, to transform or combine formulas, and to develop new encodings more efficiently. We discussed the requirements our tool has to fulfill and introduced the implementation based on these requirements. Finally, we explained how Booleguru is used to generate new encodings, using the embedded Lua, Fennel, and Python scripting support. Booleguru already proved itself as a valuable tool during the QBFGallery 2023, for revisiting quantifier shifting in QBF [10], and other projects.