Back in this post I introduced Absolut, a Rust library to facilitate the implementation of bytewise SIMD lookup tables. Version 0.1.1 of Absolut heavily relied on the Z3 Theorem Prover to compute lookup tables from declarative map definitions. This resulted in a few issues:
-
The
z3
crate either uses a system-installed copy of Z3 or builds a locally bundled copy of it. Needless to say, this is too much software bloat for what Absolut essentially is: a set of procedural macros. -
An SMT solver is a relatively opaque program. When Absolut inevitably fails to generate a lookup table because Z3 deemed it unsatisfiable, it's hard to determine the reason for the failure.
-
The execution time of Z3 is unpredictable. While it's possible to set an upper limit to it, such a change would result in less chances of finding a solution.
Putting aside technical issues around the use of Z3, the project suffered from a
lack of documentation: except for the aforementioned post, I provided
no README
and no rustdoc
documentation. At the time, I was so focused on
getting a proof of concept off the ground that I neglected everything I didn't
believe to be not strictly essential.
Moving away from SMT
In version 0.1.1, Absolut formulated the problem of generating a lookup table (referred to from now on as "The Problem") in first order logic. This allowed for expressing The Problem in terms of Z3's supported theories (e.g. fixed-size bit-vectors) at a relatively high level. In other words, using an SMT solver provided greater expressive power to implement and maintain Absolut.
On the downside, I was locked into the ecosystem of SMT solvers, which are complex and hard to embed into a small Rust library. So I reformulated The Problem in terms of propositional logic to use an SAT solver instead. A SAT solver is relatively easier to implement and there even exist two notable CDCL-based implementations for Rust: Varisat and Splr.
My goal at this point was to implement an embedded SAT solver for Absolut. I started out by implementing Donald Knuth's SAT0 algorithm in Rust, but I wasn't satisfied with its performance (or rather lack thereof). It often failed to terminate after multiple hours when used to generate a lookup table.
Before putting enormous effort into implementing a more sophisticated SAT solver, I asked on StackOverflow if I would benefit from a specialized SAT solver in the first place. The answer was the following:
[...] That points to the main weakness of the SAT formulation: it’s highly symmetric, and basic branching strategies will struggle. Sophisticated SAT solvers can detect symmetry and prune redundant branches, but the logic is complicated, and you really don’t want to implement it yourself for one application.
— David Eisenstat, stackoverflow.com
At this point, my goal of implementing a specialized SAT solver specifically optimized for usage inside Absolut grew more daunting. So I set out to evaluate the existing Rust-based SAT solvers to see if they work well enough for my use case.
I proceeded to contact Jannis Harder aka jix, the author of Varisat, to inquire about the state of the project. They told me that they would write Varisat very differently had they started it today, but there are no outstanding issues with the code nonetheless. They added that Splr on the other hand is closer to the state of art than Varisat and is being actively developed.
I didn't put too much thought into choosing between Varisat and Splr. In any case, the SAT solver is well encapsulated in the code of Absolut and could be readily swapped if needed. Ultimately, I was more familiar with Varisat as I had played with it in the past, So I went with it as I didn't encounter any issues with it while developing Absolut 0.2.0.
Moving away from SAT
David Eisenstat also pointed out in the StackOverflow answer that I can do without a SAT solver in some particular cases. To understand why, we need a quick recap of what The Problem is. Consider the following lookup table, described using Absolut:
#[absolut::one_hot]
enum Brainfuck {
#[matches(b'<', b'>')]
IncDec,
#[matches(b'+', b'-')]
IncDecPtr,
#[matches(b',', b'.')]
ReadWrite,
// My syntax highlighter is struggling (;⌣̀_⌣́)
#[matches(b'[', b']')]
Jump,
#[wildcard]
Ignored,
}
The above Brainfuck
lookup table
maps <
and >
to Brainfuck::IncDec
,
maps +
and -
to Brainfuck::IncDecPtr
,
maps ,
and .
to Brainfuck::ReadWrite
,
maps [
and ]
to Brainfuck::Jump
and
maps everything else to Brainfuck::Ignored
.
Such a lookup table could be useful for quickly identifying Brainfuck commands
in source code, while quickly ignoring comments1. The following
lookup
function illustrates, using non-vectorized code for demonstration
purposes, how to use the Brainfuck
lookup table:
fn lookup(input: &[u8; 16], output: &mut [u8; 16]) {
for (index, byte) in input.iter().copied().enumerate() {
let (lo, hi) = (byte & 0b1111, byte >> 4);
output[index] = Brainfuck::TABLE_LOW_NIBBLES[lo as usize]
& Brainfuck::TABLE_HIGH_NIBBLES[hi as usize];
}
}
Absolut solves for the Brainfuck::TABLE_LOW_NIBBLES
and
Brainfuck::TABLE_HIGH_NIBBLES
tables, as well as all of the enum variants of
Brainfuck
. To do this, it suffices to encode the variants using the
One-hot binary encoding. Such that
every variant is an 8-bit integer with exactly one set bit, except for the
#[wildcard]
-annotated variant which is always 0
.
This is because the set of bytes matched by each variant can be expressed as a
Cartesian product, e.g. IncDec
matches {0x3} ⨯ {0xE, 0xC} == {(0x3, 0xE), (0x3, 0xC)}
where (0x3, 0xE)
encodes 0x3E == b'<'
and (0x3, 0xC)
encodes
0x3C == b'>'
. When this property is satisfied, one can simply set the exact
same bit position in Brainfuck::TABLE_LOW_NIBBLES[0x3]
,
Brainfuck::TABLE_HIGH_NIBBLES[0xE]
, Brainfuck::TABLE_LOW_NIBBLES[0xC]
and
Brainfuck::IncDec
.
Brainfuck::IncDec as u8 == Brainfuck::TABLE_LOW_NIBBLES[0x3]
& Brainfuck::TABLE_HIGH_NIBBLES[0xE];
Brainfuck::IncDec as u8 == Brainfuck::TABLE_LOW_NIBBLES[0x3]
& Brainfuck::TABLE_HIGH_NIBBLES[0xC];
This algorithm is implemented in the
absolut::one_hot
procedural macro. The downside of one_hot
is that it only supports up to 9
variants. On the upside, it's much simpler to implement and reason about.
Absolut 0.2.1 introduces yet another lookup table generation algorithm, called
absolut::composite
,
which allows for arbitrary byte-to-byte maps using NEON on AArch64. If we
change the above Brainfuck
lookup table to use #[absolut::composite]
instead
of #[absolut::one_hot]
, then the lookup
function becomes:
fn lookup(input: &[u8; 16], output: &mut [u8; 16]) {
for (index, byte) in input.iter().copied().enumerate() {
output[index] = match byte {
0..=63 => Brainfuck::TABLE_QUARTERS[0][byte as usize],
64..=127 => Brainfuck::TABLE_QUARTERS[1][byte as usize - 64],
128..=191 => Brainfuck::TABLE_QUARTERS[2][byte as usize - 128],
192..=255 => Brainfuck::TABLE_QUARTERS[3][byte as usize - 192],
};
}
}
The idea here is to use four distinct lookup tables, one for each quarter of the range of all bytes. I recommend reading Arbitrary byte-to-byte maps using ARM NEON? by Daniel Lemire for more information. I also came across an application of this idea in Beating the Fastest Lexer Generator in Rust by Adrian Alic.
rustdoc
shenanigans
When writing rustdoc
documentation for Absolut 0.2.1, I couldn't figure out
how to get the "Available on crate feature feature
only." notices. rustdoc
by default only builds documentation for default crate features. However, if
you enable a non-default feature in the rustdoc
build, rustdoc
doesn't mark
the featured-gated items as such. To achieve that, you need to enable the
doc_auto_cfg
unstable language feature.
But that doesn't work in Stable Rust! Thankfully, the docs.rs build uses Nightly Rust. So it's possible to enable it there only:
[package.metadata.docs.rs]
all-features = true
rustdoc-args = ["--cfg", "doc_auto_cfg"]
What took me the longest time to realize was how to test the build with
doc_auto_cfg
locally without building with Nightly Rust and adding
#![feature(doc_auto_cfg)]
in the crate root. For that something like
#![cfg_attr(absolut_docs, feature(doc_auto_cfg))]
is needed, which means that
if and only if the crate is configured with --cfg absolut_docs
then the
attribute should expand to #![feature(doc_auto_cfg)]
, otherwise no feature is
enabled.
If you manage to implement such a thing, please email me! Bonus points if it's faster than scalar code!