Getting started writing SPDX 3
(a.k.a My First SPDX File)
This guide is designed to walk you through the concepts behind an SPDX document, by walking through writing one by hand. While it is possible to write all your SPDX documents by hand, we would recommend looking at the various language bindings that are available for crafting more complex documents. Nevertheless, walking through an example of a hand written document can be instructive into how SPDX documents work to better understand concepts that are at play, even when using language bindings.
All of the provided fragments listed here are intended to be used to construct a complete a valid SPDX JSON document when concatenated together
If you do would like to construct the complete example from this Markdown file, use the following command:
cat getting-started.md | awk 'BEGIN{flag=0} /^```json/, $0=="```" { if (/^---$/){flag++} else if ($0 !~ /^```.*/ ) print $0 > "doc-" flag ".spdx.json"}'
Please note that all descriptions of properties, classes, etc. are non-normative; that is they are intended to help you understand what is going on in simpler language, but are not necessarily complete. Links to the full official documentation are provided where possible.
The Preamble
All documents need to start somewhere, and SPDX documents are no exception.
The root of all SPDX documents will be a JSON object, so start with that:
{
Next, we need to identify that the document is an SPDX 3 JSON-LD document, which is done with:
"@context": "https://spdx.org/rdf/3.0.0/spdx-context.jsonld",
SPDX documents are designed to be a strict subset of JSON-LD, such that they can be parsed using either a full JSON-LD parser if you need the full power of linked documents or RDF, or a much simpler JSON parser if all you care about is extracting meaningful SPDX data from the document.
Because the document is valid JSON-LD, the @context
must be provided to tell
the JSON-LD parser how to expand the human readable names in the document into
full IRIs (don't worry if you don't know what that means, it's not really that
important). You can think of this line as telling us "This is an SPDX document,
and this provided URL tells us how to decode it". The SPDX JSON
Schema will force you to put the correct value here when
validating a document.
Now, we need to specify the list of objects that we want to create in this
document. JSON-LD has a special way of specifying this list using the @graph
property of the root object like so:
"@graph": [
Tell us about yourself
Our first SPDX object is going to be a Person that tells us who is writing this document (you!), so lets get started with it:
{
"type": "Person",
This is the basic format for any object in SPDX; all objects have one required
property named type
that tells us what this object actually is, so here we
say this is a Person.
Next, we need to name our object:
"spdxId": "http://spdx.org/spdxdocs/Person/JoshuaWatt-141ec767-40f2-4aad-9658-ac2703f3a7d9",
Most objects can have some sort of "ID" property that gives it a name. In the
case of Person, that property is called spdxId
(inherited
from Element). This property is the URI that should give this
object a universally unique name. Although this property looks like a HTTP
URL, it is in fact not. Technically speaking, a URL defined a Location, where
as a URI defines an Identifier (i.e. the name by which something is known).
In all likelihood, a URI is not a resolvable location from whence you can do an
HTTP GET
to retrieve data, but rather just a way of constructing a namespaced
identifier. This identifier can be used within this document to refer to this
object (more on that later), or it can be referenced from other documents to
refer to this specific object (although in that case there needs to be
additional information to describe how to find this document). URI's are
considered to be universally unique, so any objects constructed with this URI
are considered to be the same object, and any references to this URI is a
reference to this specific object we are creating.
If you work for a company, own a domain, etc. it is encouraged to use that (or
some subdomain of it) in place of spdx.org/spdxdocs
.
In practice, many spdxId
values will have some sort of hash or random
UUID-like string incorporated to make them unique.
Moving on from this, we have:
"creationInfo": "_:creationinfo",
All SPDX objects derived from Element must specify how they were created by linking to a CreationInfo object. It is important to know the providence of where objects come from; but more on this later.
"name": "Joshua Watt",
The optional name property is inherited from the Element
class, and means "the common name for the thing", or in this case, your name.
As our last step, we want to indicate another way by which You are known to the world; specifically your E-mail address.
To do this we first need to use the (optional) externalIdentifier property which Person inherits from Element:
"externalIdentifier": [
This property is an array of ExternalIdentifier objects, so start by adding one to the array:
{
"type": "ExternalIdentifier",
Again notice this uses the type
property to identify what the object is.
However it should be noted that this is our first object that is not derived
from Element, and therefore it does not need a spdxId
property.
Next, lets add the relevant information about your email address:
"externalIdentifierType": "email",
"identifier": "JPEWhacker@gmail.com"
Two properties are used here. First,
externalIdentifierType is used to indicate
what type of external identifier this is. There are many choices, but in the
case we are specifying your email address, so we choose the value email
. The
second property is the indentifier property which is the
actual string identifier (in this case, your email address).
We are now done with our Person, so close it all out and prepare for the next object:
}
]
},
Where did all this stuff come from?
Our next object is going to be a CreationInfo object. It is required to provide one for every SPDX document, as all objects derived from Element must link to one in their creationInfo property to indicate where they came from.
Note that the CreationInfo describes where a SPDX Element itself came from (that is, who wrote the actual JSON). This is a distinct concept from describing where the thing an Element describes comes from, which is covered later.
Lets get started:
{
"type": "CreationInfo",
Hopefully this is making sense. We are saying this object is a CreationInfo.
"@id": "_:creationinfo",
This object also has an @id
similar to the spdxId
of our person, but it is
subtly different First of all, this one is not a URI like our
Person, but instead starts with a _:
. This type of identifier
is known as a blank node. Blank nodes serve a similar purpose to the URI of
the spdxId
, however they only have scope within this SPDX document. What
this means is that it be impossible to reference this
CreationInfo by name outside of this document. Inside the
document, you can use this identifier to refer to this object. The string after
the _:
is arbitrary and you may choose whatever unique (within the document)
string that you choose.
It should be noted that CreationInfo does not derive
from Element class (like our previous example of
ExternalIdentifier), and as such the @id
property
is technically optional. However, since we will need to refer to this object at
other places in the document, we must give it an identifier. This also means
that this object does not have a mandatory
creationInfo property (which makes sense since it
would be a circular reference). Finally, CreationInfo is
only allowed to have a blank node identifier.
If you look back at the Person we just created, you'll notice
that its creationInfo property has the string value
that matches the @id
of this object; this is how objects are linked together
by reference in SPDX.
Next, we need to specify which version of the SPDX spec that elements linking to this CreationInfo are conforming to:
"specVersion": "3.0.0",
Now, we need to use the createdBy property to indicated who (or what) created the elements that are linked to this CreationInfo:
"createdBy": [
"http://spdx.org/spdxdocs/Person/JoshuaWatt-141ec767-40f2-4aad-9658-ac2703f3a7d9"
],
This property is a list of reference to any class that derives from
Agent. Since you are the person writing the document, put a
single list item that is the spdxId
of your Person element
here to link them together. Note that even though this is using a full URI
instead of a blank node, this is linking in the same way as
creationInfo described above.
Also, it is worth noting that this does indeed create a circular reference between our Person.creationInfo property and CreationInfo.createdBy property. This is fine in SPDX, as objects are not required to be a Directed Acyclical Graph (DAG).
Finally, we need to specify the date that any objects linking to this CreationInfo were created using the created property and close out the object:
"created": "2024-03-06T00:00:00Z"
},
Use today's date and time in ISO 8601 with the format:
"%Y-%m-%dT%H:%M:%SZ"
. The timezone should always be UTC.
Describing the Document
SPDX requires that information about the document itself be provided. In order to do this, we must create a SpdxDocument object, so lets do that now:
{
"type": "SpdxDocument",
"spdxId": "https://spdx.org/spdxdocs/Document1-d078aed9-384d-4a64-87cb-99c79647c8c9",
"creationInfo": "_:creationinfo",
SpdxDocument derives from Element, so it
has 3 required properties, type
, spdxId
and
creationInfo. We've seen all of these properties
before in Person, so hopefully this getting more familiar. Note
that we again link back out our previous CreationInfo
object.
Next, we need to indicate which Profiles our document uses
using the profileConformance property. This can
be used by consumers of the document to quickly determine if the information
they want is in the document (for example, if a user wants to find CVE data,
but the security
profile is not present, there is no reason to continue
looking in this document).
"profileConformance": [
"core",
"software"
],
In this case, we are saying this document conforms to the core
profile (all
SPDX documents should include this), and the software
profile, since we will
be describing some software later.
The final property we need to define is rootElement. This property is a list of Element (or any subclass of Element) references. Add this now and close our our SpdxDocument:
"rootElement": [
"https://spdx.org/spdxdocs/BOM-e2e955f5-c50e-4a3a-8c69-db152f0f4615"
]
},
The purpose of this property is to indicate the "interesting" element(s) in the document. Since a document can contain a large number of elements, it might be difficult for a consumer of the document to know what the focus of the document is. This property clarifies that by suggesting which element(s) a user should look at to start navigating. While it is possible to have more than one root element, it is rare to need more than one.
Careful readers of the SpdxDocument documentation will
note that we have omitted the element (derived from the
ElementCollection parent class). Technically
speaking, the property should link to all the elements that are in the
document using this property. However because this would be error prone, it is
implied that all Element objects present in the @graph
(that
is, all the objects we are writing) are implicitly added to the
element property.
A Complete Document!
At this point, we have a completed SPDX document (albeit, one that has an unresolved references in SpdxDocument.rootElement). This is a fully valid document because it has the SPDX 3.0 preamble, and the required SpdxDocument object, which in turn requires a valid CreationInfo, which we've provided. Finally, the CreationInfo requires an Agent to describe who or what created the Elements in the document, which we've provided by writing a Person object which describes you.
While this is the minimal example, it may feel long. However, as we continue in the document it should become more apparent how reuse of these 3 objects (particularly the CreationInfo) helps reduce total document size while still conveying precise information. In addition, there are other options to make a more compact document that are not covered yet, such as referring to a external Agent instead of encoding it in the document.
Lets Add Some Software!
Now that we have the basic valid document, its time to start adding some
interesting data to it. Lets start with a fictitious software package called
amazing-widget
which we distribute as a tarball for users to download and run.
To start with, we need to define a software_Package object the defines how our software is distributed. In this case, the software_Package will be describing a tarball which someone can download, but it can be almost any unit of content that can be used to distribute software (either as binaries or source). See the documentation for more details.
Lets define our package:
{
"type": "software_Package",
"spdxId": "https://spdx.org/spdxdocs/Package-d1db6e61-aebe-4b13-ae73-d0f66018dbe0",
"creationInfo": "_:creationinfo",
This should be familiar by now. Note the reuse of our previous CreationInfo.
Also note that this is our first element that is outside of the Core
profile
in SPDX. In this specific case, the class is defined in the Software
profile,
and as such is prefixed with software_
. Any classes and properties that are
defined in a profile other than Core
will be prefixed with the lower case
profile name + _
to disambiguate them from classes and properties with the
same name in other profiles.
Again, we can use Element.name to give the common name for our package:
"name": "amazing-widget",
Importantly, even though this is a class defined in the Software
profile,
name is defined in core so it does not get prefixed. When
writing objects, pay attention to which profile the property is defined in,
as that sets the prefix (the documentation should make it clear what the
serialized name of a property is if you are unsure TODO: It does not yet).
Next, we will define what version the amazing-widget
package is using
software_packageVersion, and where the user
could download this package from using
software_downloadLocation (both are
optional):
"software_packageVersion": "1.0",
"software_downloadLocation": "http://dl.example.com/amazing-widget_1.0.0.tar",
These are our first two examples of properties not defined in the Core
profile, and as such they get the software_
prefix.
Now, we should define when this software was packaged using the (optional) builtTime property, so that downstream users can tell how old it is:
"builtTime": "2024-03-06T00:00:00Z",
Note that we are back in the Core
profile properties here (specifically,
builtTime is a property of Artifact in
Core
)
Next, we want to indicate who actually made the package we are describing. This is done using the (optional) originatedBy array property:
"originatedBy": [
"http://spdx.org/spdxdocs/Person/JoshuaWatt-141ec767-40f2-4aad-9658-ac2703f3a7d9"
],
In this example, you can put a single element that references your
Person spdxId
here to indicate that you actually made the
package. Note that while we are using the same spdxId
as we used in the
CreationInfo, this is not required.
originatedBy is the property that we used to describe
who made the actual package being described by the
software_Package and not the JSON object itself.
Finally, we would like to inform consumers of our SPDX how they can validate the package to ensure its contents have not changed, or to check if a file that they have is the same one being described by this document. This is done using the verifiedUsing property, which is an array of IntegrityMethod objects (or subclasses).
"verifiedUsing": [
{
"type": "Hash",
"algorithm": "sha256",
"hashValue": "f3f60ce8615d1cfb3f6d7d149699ab53170ce0b8f24f841fb616faa50151082d"
}
]
},
Specifically, we are using the Hash subclass of integrity method to
indicate that the SHA-256 checksum of the package file is
f3f60ce8615d1cfb3f6d7d149699ab53170ce0b8f24f841fb616faa50151082d
Whats in our Package?
Describing that we have a distributed package is a great start, but we are able to go further (although this is not mandatory!). Our next object is going to describe all the files contained in our software_Package by using software_File.
Lets get started with our first file, the program executable:
{
"type": "software_File",
"spdxId": "https://spdx.org/spdxdocs/File-8f79956e-4089-4166-9a71-457de77e4846",
"creationInfo": "_:creationinfo",
"name": "/usr/bin/amazing-widget",
"verifiedUsing": [
{
"type": "Hash",
"algorithm": "sha256",
"hashValue": "ee4f96ed470ea288be281407dacb380fd355886dbd52c8c684dfec3a90e78f45"
}
],
"builtTime": "2024-03-05T00:00:00Z",
"originatedBy": [
"http://spdx.org/spdxdocs/Person/JoshuaWatt-141ec767-40f2-4aad-9658-ac2703f3a7d9"
],
We've seen all this before, so hopefully it all makes sense.
While it's great to have a file, it's not easy to tell what purpose this file serves. We might be able to infer that its an executable program from the name, but SPDX provides the ability for us to directly specify this using the (optional) software_primaryPurpose and software_additionalPurpose properties (derived from sofware_Artifact):
"software_primaryPurpose": "executable",
"software_additionalPurpose": [
"application"
],
A software_Artifact can have as many purposes a you want to describe, but there should always be a software_primaryPurpose property defined before any software_additionalPurpose are added.
Finally, as one last bit of information, we'll say what the copyright text of the program is using the (optional) software_copyrightText property and close out our file:
"software_copyrightText": "Copyright 2024, Joshua Watt"
},
Lets add one more file for fun. This one will describe a config file for our program:
{
"type": "software_File",
"spdxId": "https://spdx.org/spdxdocs/File-77808a5c-7a1b-43d1-9fa9-410a309ca9f3",
"creationInfo": "_:creationinfo",
"name": "/etc/amazing-widget.cfg",
"verifiedUsing": [
{
"type": "Hash",
"algorithm": "sha256",
"hashValue": "89a2e80bc48c4dd10044c441af0fc6fdad5d31b2fa391cb2cf9c51dbf4200ed9"
}
],
"builtTime": "2024-03-05T00:00:00Z",
"originatedBy": [
"http://spdx.org/spdxdocs/Person/JoshuaWatt-141ec767-40f2-4aad-9658-ac2703f3a7d9"
],
"software_primaryPurpose": "configuration"
},
Linking things together with Relationships
Now we've described our software_Package, and two software_Files that should be contained in it, but we have one small problem: there is nothing that tells us that our files are actually contained by the package.
In order to do this, we must introduce the SPDX Relationship. These are a very powerful concept in SPDX that allows linking Elements and describing how they are related.
Relationships themselves are also derived from SPDX Elements, so we need the required three properties to start a new one:
{
"type": "Relationship",
"spdxId": "https://spdx.org/spdxdocs/Relationship/contains-6b0b7ce4-a069-406d-9088-9e91f65b79f0",
"creationInfo": "_:creationinfo",
Next, we need to say what the relationship between our objects is going to be. We do this using the relationshipType property:
"relationshipType": "contains",
The full list of what a Relationship can describe is
defined by the RelationshipType vocabulary (a fancy
work for enumeration). There are a lot of possible options, and each one has a
specific meaning and restrictions on what types it can relate, so read the
documentation to find the specific one you need and how to use it. In our case,
we are using contains
which is defined as "The from
Element contains each to
Element". Perfect.
Now, we need to describe what Elements are being connected. Relationships always have a directionality associated with them: you can think of them as an arrow pointing from their from property to their to properties. from is always required and must be a single object, whereas to is a list of zero or more objects. Lets write the JSON to express this:
"from": "https://spdx.org/spdxdocs/Package-d1db6e61-aebe-4b13-ae73-d0f66018dbe0",
"to": [
"https://spdx.org/spdxdocs/File-8f79956e-4089-4166-9a71-457de77e4846",
"https://spdx.org/spdxdocs/File-77808a5c-7a1b-43d1-9fa9-410a309ca9f3"
],
This is the minimum required to define a Relationship, but we want to add one more property to convey additional information and close out the object:
"completeness": "complete"
},
The completeness property is very useful as it
indicates if we know that this Relationship can be
considered to describe all we know about the type of relationship or not. For
example, by stating that this relationship is complete
, we are saying that
our package contains those 2 files, and only those 2 files. We could have
also stated that the relationship was incomplete
in which case we are stating
that we know we didn't list all the files, and other are included.
Alternatively, we could have stated that the relationship
completeness was noAssertion
meaning we don't know
if we captured all the files or not. If this property is omitted, it's assumed
to be noAssertion
.
Wrapping it all up in a BOM
We've made great progress, and we are almost done. For our final step, we want to wrap up everything we know about the package into a "Software Bill of Materials".
This is done by creating a software_Sbom object:
{
"type": "software_Sbom",
"spdxId": "https://spdx.org/spdxdocs/BOM-e2e955f5-c50e-4a3a-8c69-db152f0f4615",
"creationInfo": "_:creationinfo",
Note that this is the object referenced by the rootElement of our SpdxDocument, since it is the primary subject of our entire document.
software_Sbom derives from ElementCollection just like SpdxDocument, so it has the same rootElement property. In this case, it is the subject of the SBOM, which is our software_Package:
"rootElement": [
"https://spdx.org/spdxdocs/Package-d1db6e61-aebe-4b13-ae73-d0f66018dbe0"
],
Unlike SpdxDocument however, there is no implicit value for the element property. Instead, we need to list all the elements that are part of this SBOM (think of this as the line items in the SBOM). In our specific case, this is the software_Files that part of our package, but if you had any other elements related to the package (e.g. licenses, security information, etc.) those would also be included:
"element": [
"https://spdx.org/spdxdocs/File-8f79956e-4089-4166-9a71-457de77e4846",
"https://spdx.org/spdxdocs/File-77808a5c-7a1b-43d1-9fa9-410a309ca9f3"
],
Finally, we need to specify what type(s) of BOM this is using the software_sbomType property:
"software_sbomType": [
"build"
]
}
This property is effectively indicating at what point in the software lifecycle
this SBOM was generated. Since we are describing an executable program, build
seems the most likely.
Closing it all up
Now that we are all done, we have a few things to clean up, namely that we need
to close the @graph
list and the root object, so lets do that now:
]
}
Congratulations! You just wrote your first SPDX document! Hopefully this walk through has been instructive and you are ready to get started with SPDX!