bitvector is a compact C++17 bit-vector prototype for high-throughput
packed-bit workloads. It targets code that wants the memory density of
std::vector<bool> while reducing overhead in common set, access, append, and
scan paths.
This repository is intentionally small: one core header, focused unit tests, Google Benchmark coverage, CMake configuration, and CI workflows. It is best read as a performance-engineering proof point rather than a finished production library.
Packed bitmaps show up anywhere large boolean state needs to move quickly: search and filtering, indexes, compression, graph traversal, allocation maps, schedulers, feature flags, and analytics engines. In these systems, memory traffic and per-bit branching can dominate runtime.
For an investor, this project demonstrates that focused low-level optimization can unlock measurable gains in infrastructure components where small per-bit improvements compound at scale. For an interviewer, it shows hands-on systems work: custom storage, proxy references, iterators, bounds-check tradeoffs, SIMD-aware operations, benchmark design, and CI validation.
- Header-first C++17 implementation in
bitvector.hpp. - Benchmarked against
std::vector<bool>with Google Benchmark. - Unit-tested with GoogleTest and exercised in GitHub Actions.
- Uses packed word storage, fast word indexing, BMI/TZCNT scanning, and SIMD-aware specialized setters.
- CMake detects native compiler/CPU flags where possible and can enable AVX2 or optional AVX-512 paths.
The table below is a CI snapshot recorded for 1 << 20 bits. It is useful as a
directional proof point, not a universal performance guarantee. Hardware,
compiler, flags, bit density, access pattern, and safety settings can all change
the result, so target workloads should rerun the benchmark locally.
| BitVector Benchmark | Time (ns) | std::vector<bool> Benchmark |
Time (ns) | Speedup |
|---|---|---|---|---|
| BM_Bowen_Set | 1826751 | BM_Std_Set | 2975545 | 1.63x |
| BM_Bowen_PushBack | 1998142 | BM_Std_PushBack | 2990558 | 1.50x |
| BM_Bowen_Access | 984200 | BM_Std_Access | 2257978 | 2.29x |
| BM_Bowen_SetBit | 2123333 | BM_Std_SetBit | 2740340 | 1.29x |
| BM_Bowen_SetBitTrueUnsafe | 1992765 | BM_Std_SetBitTrueUnsafe | 2558438 | 1.28x |
| BM_Bowen_SetBitTrue6 | 1545274 | BM_Std_SetBitTrue6 | 2601283 | 1.68x |
| BM_Bowen_QSetBitTrue6V2 | 2248553 | BM_Std_QSetBitTrue6 | 3133552 | 1.39x |
| BM_Bowen_IncrementUntilZero | 34849 | BM_Std_IncrementUntilZero | 1941798 | 55.72x |
The largest speedup comes from incrementUntilZero, a specialized scan that
skips whole machine words by counting trailing zeros in the inverted word.
Benchmark environment for the snapshot above:
Architecture: x86_64
CPU(s): 5
Model name: Intel(R) Xeon(R) Platinum 8272CL CPU @ 2.60GHz
- Packed storage: bits are stored in
unsigned longwords, with word and bit offsets computed through shifts and masks. - Fast indexing:
WORD_SHIFTis computed at compile time and checked with astatic_assertso word indexing can use shifts instead of division. - Proxy references: mutable
operator[]returns a bit reference object, so callers can writebv[i] = truewhile the vector remains packed. - Specialized scan:
incrementUntilZero(size_t& pos)advances through runs of set bits and uses_tzcnt_u64to skip full words efficiently. - SIMD-aware setters: functions such as
set_bit_true_6andqset_bit_true_6_v2explore faster paths for structured write patterns. - Allocator flexibility: the implementation is allocator-templated and
includes an aligned allocator option backed by
_mm_malloc. - Build-time tuning: CMake enables BMI support, attempts native/AVX2 compiler flags, includes SIMDe, and leaves AVX-512 behind an explicit option.
This project is optimized for experiments and benchmarks, so it exposes both safe and unsafe paths:
- The current CMake default sets
BITVECTOR_NO_BOUND_CHECK=ON, which compiles out bounds checks for performance-oriented builds. - Set
-DBITVECTOR_NO_BOUND_CHECK=OFFwhen validating caller behavior or debugging out-of-range access. set_bit_true_unsafeand SIMD-style setters assume the caller has already proven valid positions and sizes.data()exposes raw storage for low-level integration, but that also means callers can bypass invariants.- The current implementation is x86/x64-oriented because it directly includes native intrinsic headers; ARM support is a clear portability follow-up.
- The repository does not currently include packaging metadata or a license file, so external adoption should start by resolving those basics.
Requirements:
- CMake 3.21 or newer.
- A C++17 compiler.
- An x86 or x64 target for the current prototype. The header includes native x86 intrinsic headers, so ARM builds need portability work before they can compile cleanly.
- Network access on first configure, because CMake fetches GoogleTest, SIMDe, and Google Benchmark.
If you are using CMake 4.x, the pinned GoogleTest release may require adding
-DCMAKE_POLICY_VERSION_MINIMUM=3.5 to the configure command until that
dependency is upgraded.
Build, test, and run benchmarks:
cmake -S . -B build -DCMAKE_BUILD_TYPE=Release
cmake --build build --config Release
ctest --test-dir build --output-on-failure
./build/bitvector_benchmarkRun a debug-oriented build with runtime bounds checks enabled:
cmake -S . -B build-safe -DBITVECTOR_NO_BOUND_CHECK=OFF -DCMAKE_BUILD_TYPE=Debug
cmake --build build-safe --config Debug
ctest --test-dir build-safe --output-on-failureRun the shorter benchmark configuration used by CI:
./build/bitvector_benchmark --benchmark_min_time=0.01s#include "bitvector.hpp"
#include <cstddef>
int main() {
bowen::BitVector<> bits(1024, false);
bits.set_bit(3, true);
bits.push_back(true);
bowen::BitVector<> dense(1024, true);
dense.set_bit(511, false);
std::size_t pos = 0;
dense.incrementUntilZero(pos);
// pos now holds 511, the first zero bit in the dense vector.
}Core bowen::BitVector operations:
BitVector()creates an empty vector.BitVector(size_t n, bool value = false)createsninitialized bits.operator[]reads or writes a bit by index.set_bit(size_t pos, bool value)sets a specific bit.set_bit_true_unsafe(size_t pos)sets a bit without bounds checking.set_bit_true_6(size_t pos, size_t stride)sets six strided bits.qset_bit_true_6_v2(size_t pos, size_t stride, size_t size)explores a SIMD-style path for repeated strided writes.incrementUntilZero(size_t& pos)advancesposto the next zero bit.push_back(bool value)appends one bit.reserve(size_t new_capacity)reserves capacity measured in bits.assign(size_t n, bool value)resizes and fills the vector.data()returns the underlying word storage.size()returns the number of logical bits.empty()reports whether the vector has no bits.begin()andend()provide iterator access for traversal.
The repository includes two GitHub Actions workflows:
unit_tests.ymlconfigures a release build, builds the project, and runs GoogleTest throughctest.performance.ymlbuilds the benchmark target, runs Google Benchmark, and dumps assembly for selected benchmark functions.
Local tests cover construction, push_back, copy and assignment, resizing,
iterator traversal, bounds-check behavior when enabled, and
incrementUntilZero.
bitvector.hppcontains the core implementation.bitvector_test.cppcontains GoogleTest unit coverage.bitvector_benchmark.cppcontains Google Benchmark comparisons againststd::vector<bool>.CMakeLists.txtdefines build options, dependencies, tests, and benchmarks..github/workflows/contains CI validation and benchmark workflows.scripts/dump_benchmark_asm.shhelps inspect generated assembly for selected benchmark functions.
- Add a license and packaging metadata before external distribution.
- Stabilize API naming and document safe versus unsafe entry points.
- Broaden portability and benchmarks across CPUs, compilers, operating systems, and real bitmap-heavy workloads.
- Expand STL compatibility, including const traversal and clearer iterator semantics.
- Add fuzz or property-based tests for randomized bit patterns and boundary conditions.