Using the bulk data

The datasets published by OpenSanctions are made available in multiple formats, suitable for different purposes.

Please also refer to the entity structure definition and the data dictionary. Advanced users may want to learn about the statement-based data model.

Data formats

Bulk data is made available in the following formats for each data source and collections we maintain:

If you would like to discuss a custom export format for OpenSanctions that is more suitable to your needs, please get in touch.

Updates and metadata

Bulk data exports are generated in periodic intervals (every 6-8 hours) by OpenSanctions. The timestamp of the last export (and detailed metadata about the contained datasets) can be found in the metadata index:

https://data.opensanctions.org/datasets/latest/index.json

Polling this file in regular intervals (e.g. every 30 minutes) is the best practice method for finding out if updated data has been released.

The metadata index contains the following:

  • run_time: The timestamp when the index was last updated.
  • datasets: An index of available datasets. Each item contains
    • Basic descriptive information - name, title, summary, description
    • version: The export ID of the latest available build of the dataset.
    • last_export: Timestamp when the dataset was last exported.
    • last_change: Timestamp when some entity in the dataset last changed.
    • Statistical information
    • An array of resources which are the different export formats available for the dataset (See above).
      • name: Use the resource name to select the entry of the resource format appropriate for your needs.
    • delta_url: An index of dataset versions that can be used to perform incremental dataset updates.
    • Some datasets are our representations of data sources (e.g. US OFAC SDN) while others are collections (See glossary)
    • You can fetch each dataset's metadata individually at URLs using the dataset name, e.g. https://data.opensanctions.org/datasets/latest/sanctions/index.json for the sanctions collection.
  • model: The data format definition. See the documentation for the entity data format.
    • schemata: The types each entity may be.
    • types: The format which properties of entities may be.
  • schemata: The actual set of Schemata used in the data - this is a subset of the full data model.