The standard for describing a Data Asset
Templates for Data Assets in ixo Documents use a structured data format based on open schemas (mainly schema.org) to describe any type of Data Asset existing within the Internet of Impact.
A structured object, such as Verifiable Claim, with a data model that can be processed using a specific tool or algorithm
An algorithm for processing or transforming data
A table or a CSV file with some data
An organised collection of tables
A search query
A collection of files which are related in a way that provides a meaningful dataset
Images capturing data
Files relating to machine learning, such as trained parameters or neural network structure definitions
Anything else that looks like a data asset!
The ixo standard for data assets is compatible with Web 2.0 guidelines for dataset providers used to describe data for search engines such as Google to better understand the content of pages. Data assets are easier to find and understand when they are described with metadata such as name, description, creator, format, etc.
The schema describing Data Assets within ixo Documents implements the schema.org Dataset structure.
For example, if the Data Asset is a Dataset, we would use the schema.org/Dataset definition of Dataset
as described in the following table. Included, is information about the publication of the dataset such as the license, when it was published, and identifier
(DOI) or sameAs
pointing to a canonical version of this Dataset object in a different repository.
Add identifier
, license
, and sameAs
for Datasets that provide provenance and license information.
DataCatalog
The full definition of DataCatalog
is available at schema.org/DataCatalog.
Datasets are often published in repositories that contain many other datasets. The same dataset can be included in more than one such repository. You can refer to a data catalog that this dataset belongs to by referencing it directly.
DataDownload
The full definition of DataDownload
is available at schema.org/DataDownload. In addition to Dataset properties, add the following properties for datasets that provide download options.
The distribution
property describes how to get the dataset itself because the URL often points to the landing page describing the dataset. The distribution
property describes where to get the data and in what format. This property can have several values: for instance, a CSV version has one URL and an Excel version is available at another.
A tabular dataset is one organised primarily in terms of a grid of rows and columns. For pages that embed tabular datasets, you can also create more explicit markup, building on the basic approach described above.
The structured data model for ixo data assets builds on schema.org and Google Developer guidelines. To build and test Data Asset templates, a great resource is Google's Structured Data Markup Helper.
Required properties | |
---|---|
Recommended properties | |
---|---|
Recommended properties | |
---|---|
Required properties | |
---|---|
Recommended properties | |
---|---|
description
A short summary describing a dataset.
Guidelines
The summary must be between 50 and 5000 characters long.
The summary may include Markdown syntax. Embedded images need to use absolute path URLs (instead of relative paths).
When using the JSON-LD format, denote new lines with (two characters: backslash and lower case letter "n").
name
A descriptive name of a dataset. For example, "Snow depth in Northern Hemisphere".
alternateName
Alternative names that have been used to refer to this dataset, such as aliases or abbreviations. Example (in JSON-LD format):
creator
citation
Text
or CreativeWork
Identifies academic articles that are recommended by the data provider be cited in addition to the dataset itself. Provide the citation for the dataset itself with other properties, such as name
, identifier
,creator
, and publisher
properties. For example, this property can uniquely identify a related academic publication such as a data descriptor, data paper, or an article for which this dataset is supplementary material for. Examples (in JSON-LD format):
Additional guidelines
Don’t use this property to provide citation information for the dataset itself. It is intended to identify related academic articles, not the dataset itself. To provide information necessary to cite the dataset itself use name
, identifier
, creator
, and publisher
properties instead.
When populating the citation property with a citation snippet, provide the article identifier (such as a DOI) whenever possible.
Recommended: "Doe J (2014) Influence of X. Biomics 1(1). https://doi.org/10.1111/111"
Not recommended: "Doe J (2014) Influence of X. Biomics 1(1)."
hasPart
or isPartOf
If the dataset is a collection of smaller datasets, use the hasPart
property to denote such relationship. Conversly, if the dataset is part of a larger dataset, use isPartOf
. Both properties can take the form of a URL or a Dataset
instance. In case Dataset
is used as a value it has to include all of the properties required for a standalone Dataset
. Examples:
identifier
URL
, Text
, or PropertyValue
An identifier, such as a DOI or a Compact Identifier. If the dataset has more than one identifier, repeat the identifier
property. If using JSON-LD, this is represented using JSON list syntax.
keywords
Keywords summarizing the dataset.
license
A license under which the dataset is distributed. For example:
Additional guidelines
Provide a URL that unambiguously identifies a specific version of the license used.
Recommended
Not recommended
sameAs
URL of a reference Web page that unambiguously indicates the dataset's identity, usually in a different repository.
spatialCoverage
You can provide a single point that describes the spatial aspect of the dataset. Only include this property if the dataset has a spatial dimension. For example, a single point where all the measurements were collected, or the coordinates of a bounding box for an area.
Points
Shapes
Use GeoShape to describe areas of different shapes. For example, to specify a bounding box.
Points inside box
, circle
, line
, or polygon
properties must be expressed as a space separated pair of two values corresponding to latitude and longitude (in that order).
Named locations
temporalCoverage
The data in the dataset covers a specific time interval. Only include this property if the dataset has a temporal dimension. Schema.org uses the ISO 8601 standard to describe time intervals and time points. You can describe dates differently depending upon the dataset interval. Indicate open-ended intervals with two decimal points (..
).
Single date
Time period
Open-ended time period
variableMeasured
The variable that this dataset measures. For example, temperature or pressure.The variableMeasured
property is proposed and pending standardization at schema.org. We encourage publishers to share any feedback on this property with the schema.org community.
version
url
Location of a page describing the dataset.
includedInDataCatalog
The catalog to which the dataset belongs.
distribution.contentUrl
The link for the download.
distribution
The description of the location for download of the dataset and the file format for download.
distribution.encodingFormat