Data Asset Schema
The standard for describing a Data Asset
Templates for Data Assets in ixo Documents use a structured data format based on open schemas (mainly schema.org) to describe any type of Data Asset existing within the Internet of Impact.
Types of data assets
A structured object, such as Verifiable Claim, with a data model that can be processed using a specific tool or algorithm
An algorithm for processing or transforming data
A table or a CSV file with some data
An organised collection of tables
A search query
A collection of files which are related in a way that provides a meaningful dataset
Images capturing data
Files relating to machine learning, such as trained parameters or neural network structure definitions
Anything else that looks like a data asset!
The standard data model (schema) for data assets
The ixo standard for data assets is compatible with Web 2.0 guidelines for dataset providers used to describe data for search engines such as Google to better understand the content of pages. Data assets are easier to find and understand when they are described with metadata such as name, description, creator, format, etc.
The schema describing Data Assets within ixo Documents implements the schema.org Dataset structure.
Dataset example
For example, if the Data Asset is a Dataset, we would use the schema.org/Dataset definition of Dataset
as described in the following table. Included, is information about the publication of the dataset such as the license, when it was published, and identifier
(DOI) or sameAs
pointing to a canonical version of this Dataset object in a different repository.
Add identifier
, license
, and sameAs
for Datasets that provide provenance and license information.
Required properties | |
---|---|
| A short summary describing a dataset. Guidelines
|
| A descriptive name of a dataset. For example, "Snow depth in Northern Hemisphere". |
Recommended properties | |
---|---|
| Alternative names that have been used to refer to this dataset, such as aliases or abbreviations. Example (in JSON-LD format): |
| |
|
Identifies academic articles that are recommended by the data provider be cited in addition to the dataset itself. Provide the citation for the dataset itself with other properties, such as Additional guidelines
|
| If the dataset is a collection of smaller datasets, use the |
|
An identifier, such as a DOI or a Compact Identifier. If the dataset has more than one identifier, repeat the |
| Keywords summarizing the dataset. |
| A license under which the dataset is distributed. For example: Additional guidelines
|
| URL of a reference Web page that unambiguously indicates the dataset's identity, usually in a different repository. |
| You can provide a single point that describes the spatial aspect of the dataset. Only include this property if the dataset has a spatial dimension. For example, a single point where all the measurements were collected, or the coordinates of a bounding box for an area. Points Shapes Use GeoShape to describe areas of different shapes. For example, to specify a bounding box. Points inside Named locations |
| The data in the dataset covers a specific time interval. Only include this property if the dataset has a temporal dimension. Schema.org uses the ISO 8601 standard to describe time intervals and time points. You can describe dates differently depending upon the dataset interval. Indicate open-ended intervals with two decimal points ( Single date Time period Open-ended time period |
| The variable that this dataset measures. For example, temperature or pressure.The |
| |
| Location of a page describing the dataset. |
DataCatalog
DataCatalog
The full definition of DataCatalog
is available at schema.org/DataCatalog.
Datasets are often published in repositories that contain many other datasets. The same dataset can be included in more than one such repository. You can refer to a data catalog that this dataset belongs to by referencing it directly.
Recommended properties | |
---|---|
| The catalog to which the dataset belongs. |
DataDownload
DataDownload
The full definition of DataDownload
is available at schema.org/DataDownload. In addition to Dataset properties, add the following properties for datasets that provide download options.
The distribution
property describes how to get the dataset itself because the URL often points to the landing page describing the dataset. The distribution
property describes where to get the data and in what format. This property can have several values: for instance, a CSV version has one URL and an Excel version is available at another.
Required properties | |
---|---|
| The link for the download. |
Recommended properties | |
---|---|
| The description of the location for download of the dataset and the file format for download. |
|
Tabular datasets
A tabular dataset is one organised primarily in terms of a grid of rows and columns. For pages that embed tabular datasets, you can also create more explicit markup, building on the basic approach described above.
Attribution and further resources
The structured data model for ixo data assets builds on schema.org and Google Developer guidelines. To build and test Data Asset templates, a great resource is Google's Structured Data Markup Helper.
Last updated