Inside the OpenSanctions database, information from different sources is stored in a statement-based data model which lets us trace the origin and temporal range of each attribute of any tracked entity.
The OpenSanctions database is designed to meet the following design objectives:
In order to meet these goals, the system uses a statement-based database design. To illustrate this, think of a claim like this one: the US sanctions list, as of the most recent update, claims that entity
ofac-12345 has the property
name set to the value
All information in the system is stored using this way, including links between different entites. When exports are generated, the system will do so by grouping all the statements that pertain to a particular entity and combining them into the desired format. The forms of the data we publish, including the JSON format returned by the API, are simplified/aggregated for easier use. If you're trying to use the data in a way that does not require granular provenance, use these more common export formats.
As a database schema, this results in a table with the following columns:
entity_id (source ID): the entity identifier as derived from the data source, e.g.
ofac-nnnn for entries on the US OFAC sanctions lists.
prop (property): the entity attribute that this statement relates to, e.g.
name (see data dictionary).
prop_type (property type): the data type of the given property, e.g.
value: the actual value of the property for the entity. If multiple values are indicated in the source data, each of them will result in a separate statement.
lang: the language (3 letter code) of the value, if it is known.
original_value: the property value before it was cleaned (e.g. country name vs. code, unparsed date).
dataset (source dataset): the source dataset identifier (same as the dataset URL slugs).
schema (schema): the type of the given entity. Specific statements can indicate more or less specific schemata, e.g.
Company (the resulting entity would be a
Company) - see data dictionary.
first_seen (first seen): the first date when the processing pipeline found this value linked to the given entity. Please note that this only records values after July 2021, when we started tracking the data - more realistic evidence of when an entity was added to the given sanctions list can be found in the
last_seen (last seen): the latest date when the processing pipeline found this value.
target (is target): a boolean to indicate if the entity to which this statement belong is a target - a designated sanctioned entity - in the data source. Some entities, like addresses, or family members, are included but not targets in their own right.
canonical_id (deduplicated ID): the entity identifier as resulting from our de-duplication process. If no de-duplication has been performed, this is the same as
entity_id. If the entity has been merged with others inside the same database, it will receive an ID starting with
NK- (short for nomenklatura). If the entity has been de-referenced against Wikidata, the Wikidata item ID (like
Q7747) will override any
On the web site: If you'd like to play with how this looks in practice, you can use the raw data explorer to browse and filter the statement data. You can add extra query parameters like
prop, etc. in your browser's URL bar.
Bulk data access: The statements table is exported to CSV format every night as part of the regular data pipeline runs. You can fetch the data here:
This file is subject to non-commercial licensing. You can import this CSV into a local install of the OpenSanctions codebase using the
opensanctions import-statements [file] command.