Crate backed_data

Source
Expand description

Cache data outside memory, loading in when referenced.

You may want a more standard data storage solution! See alternatives to make sure another approach doesn’t fit your case better.

This crate uses some unsafe code on certain features (not the default features). See unsafe_usage for the listing and explanations. Dependencies are listed and explained in deps. features runs down selecting which of the (many) features to use.

§Motivating Example

More examples that demonstrate more complex uses are in examples.

Assume that you have some LargeStruct that takes up significant storage and can be reduced to a smaller representation for searching. If it is stored in a collection where only a small number of elements are used, keeping them all loaded in memory wastes system resources. A vector store is one such structure. This example assumes a large, hashable type with only a few entries accessed.

#[cfg(all(feature = "bincode", feature = "array"))] {
    use std::{
        env::temp_dir,
        iter::from_fn,
    };

    use serde::{Serialize, Deserialize};

    use backed_data::{
        entry::{
            disks::Plainfile,
            formats::BincodeCoder,
        },
        array::VecBackedArray,
    };

    #[derive(Debug, PartialEq, Eq, Serialize, Deserialize)]
    struct LargeStruct {
        val: u8,
        ...
    };

    impl LargeStruct {
        fn new_random() -> Self {
              ...
        }
    }

    const NUM_BACKINGS: usize = 1_000;
    const ELEMENTS_PER_BACKING: usize = 1_000;

    // This application only needs to find three random elements.
    let query = [0, 10_000, 50_000];

    // Do not use a temporary directory in real code.
    // Use some location that actually guarantees memory leaves RAM.
    let backing_dir = temp_dir().join(BACKING_PATH);
    std::fs::create_dir_all(&backing_dir).unwrap();

    // Define a backed array using Vec.
    let mut backing = VecBackedArray
        ::<LargeStruct, Plainfile, BincodeCoder<_>>::
        new();

    // Build the indices and backing store in 1,000 item chunks.
    for _ in 0..NUM_BACKINGS {
        let chunk_data: Vec<_> = from_fn(|| Some(LargeStruct::new_random()))
            .take(ELEMENTS_PER_BACKING)
            .collect();

        // This is handled automatically by `DirectoryBackedArray` types.
        let target_file = backing_dir.join(uuid::Uuid::new_v4().to_string()).into();

        // Add a new bincode-encoded file that stores 1,000 elements.
        // After this operation, the elements are on disk only (chunk_data
        // is dropped by scope rules).
        backing.append(chunk_data, target_file, BincodeCoder::default()).unwrap();
    }

    // Query for three elements. At most 3,000 elements are loaded, because
    // the data is split into 1,000 element chunks. Only 2,997 useless
    // elements are kept in memory, instead of 99,997.
    let results: Vec<_> = query
        .iter()
        .map(|q| backing.get(*q))
        .collect();
}

§Usage

The core structure is entry::BackedEntry, which is wrapped by array and directory. It should be pointed at external data to load when used. That data will remain in memory until unloaded (so subsequent reads avoid the cost of decoding). Try to only unload in one of the following scenarios:

  • The data will not be read again.
  • The program’s heap footprint needs to shrink.
  • The external store was modified by another process.

Each entry needs a format and (potentially layered) disks to use. The array wrapper also needs choices of containers to hold its array of keys and array of backed entries.

§Licensing and Contributing

All code is licensed under MPL 2.0. See the FAQ for license questions. The license non-viral copyleft and does not block this library from being used in closed-source codebases. If you are using this library for a commercial purpose, consider reaching out to dansecob.dev@gmail.com to make a financial contribution.

Contributions are welcome at https://github.com/Bennett-Petzold/backed_data. Please open an issue or PR if:

  • Some dependency is extraneous, unsafe, or has a versioning issue.
  • Any unsafe code is insufficiently explained or tested.
  • There is any other issue or missing feature.

Re-exports§

pub use entry::BackedEntryArr;
pub use entry::BackedEntryArrLock;
pub use entry::BackedEntryCell;
pub use entry::BackedEntryLock;
pub use entry::BackedEntryAsync;async
pub use array::VecBackedArray;array
pub use directory::StdDirBackedArray;directory
pub use directory::ZstdDirBackedArray;directory and runtime and zstd
pub use directory::AsyncZstdDirBackedArray;directory and runtime and async_zstd

Modules§

arrayarray
Defines BackedArray and the Container/ResizingContainer traits it uses.
directorydirectory
Defines DirectoryBackedArray.
entry
Defines BackedEntry, the core of this library.
examples
Example usage re-exports.
extra_docs
Additional description of the library.
test_utilstest
Defines tools used for ONLY testing.
utils
Backbone traits and structs for the library implementation.

Macros§

cursor_vectest
Creates a default CursorVec for testing.