10/24/2022

Introducing Archivist

Author: Mikhail Swift

Archivista - your attestation librarian

The last two or three years has seen an explosion of exciting technology and ideas around software supply chain security. It's hard to keep track of everything going on in the space, almost as hard as it is to keep track of all of the attestations that tools like Witness, in-toto, Tekton Chains, and Syft can generate.

That's why we're excited to announce the public release of our new open source project Archivist. Archivist is a graph database and data store for in-toto attestations. Our goal with Archivist is to make it easier to discover and use the attestations generated as part of your software supply chain to make more informed decisions about the software you use.

Motivation

One of the big problems we've encountered while evaluating Witness policies against a software artifact is discovering a set of attestations that satisfies the policy. It's easy enough to find the attestations that describe the actual creation of the artifact whether that is a compilation or image build process, but what about all the information that may have occurred prior to the artifact's existence?

Attestations describing the provenance of the commit the artifact was created from, or test results, or security scans aren't easily discoverable because the artifact may not have existed when these attestations were created. However, attestations may provide us clues on how to discover additional relevant attestations.

For example if we have a binary we can find the attestation describing the compilation of that binary, which may contain the git commit that the binary was created from. Then using that commit id we can search again and find all of the testing and scanning attestations that were created about the same commit!

attestation graph describing four attestations connected by commit id, commiter id, and gitlab project id

Thinking about this process of progressively using clues from data we know about to discover more data starts to look more and more like a graph traversal problem the harder you look at it. That's why we chose to expose a GraphQL API to interact with Archivist.

Using Archivist's GraphQL API can help with the aforementioned types of queries, but it can also help with incident response and discovering what software is affected by newly discovered vulnerabilities. Using Witness' tracing capabilities and environmental attestors Archivist can index and enable queries to find all artifacts that were built with specified files or on specified build hardware.

What's Next?

Archivist is available on Github and a public instance is up for testing and development purposes. We want to expand the types of data Archivist can index and query, and we could use help doing so! We'd love feedback and to hear what you'd like to see from Archivist.