Query with SPARQL

Ontologies can provide the standardized vocabulary to populate both graph databases and traditional tabular database content fields containing categorical selections (e.g. anatomical parts or numeric units). For reuse in tabular databases, the easiest way to extract the hierarchic ontology terms is to convert them into a flat table indicating term id, its parent, and associate info. This use case is detailed below, and culminates in the new foodon_synonyms.tsv file which is updated regularly on the FoodOn github repository.

Introduction

An OWL ontology is often provided in the Resource Description Format (RDF) which can be queried as an RDF graph using the SPARQL query language. The RDF format is used to detail the hierarchy and attributes of ontology terms. Technically, this is achieved by describing each piece of information as a subject-predicate-object triple, where the subject is some entity, and the object is some other entity or piece of information like text or a number. These triples enable descriptions of term hierarchies, (e.g. ” ‘left lung‘ subClassOf lung“) and other relations that can exist between terms (e.g. ” ‘left lung‘ ‘in_left_side_of‘ some ‘pair of lungs‘”).

A note for those reusing ontology content in a “Property Graph Database” like Neo4j: these have different and powerful but relatively non-standardized technical framework (in other words, different querying languages) in comparison to the RDF standard. RDF graphs cannot add information directly to the predicate part of a triple, they can only add information to a triple as a whole. Property graph databases allow the property part to be referenced directly to attach other properties. One can say “Alice ‘married to’ John”, and add a “‘as of’ Dec 17, 1994” property to the ‘married to’ component. An RDF graph can express the same thing, but it requires a more complex structure. Historically the OWL ontology specification was built on RDF because logical validation of a vocabulary and its relations was viable to a great extent via RDF triples.

Here is an example of a simple SPARQL query which selects the ontology id and label for every term which has a label. SPARQL variables begin with a ? question mark. The query places constraints on various parts of one or more connected triples. Every triple subject (here as “?ontology_id”) and predicate (here as “rdfs:label”) point to entities by their URI. The PREFIX instruction allows shorter URL’s by substituting the shorter namespace prefix (here as “rdfs:”). The “rdfs:label” is an annotation one can add to any entity to provide a text string (here captured in “?label” for its label.

PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>

SELECT DISTINCT ?ontology_id ?label

WHERE {?ontology_id rdfs:label ?label}

We can use the command line robot query command to run this query (contained in a file called ‘foodon_labels.sparql’) against the foodon.owl file. The end result is a list of FoodOn term labels. Robot loads foodon.owl into memory as an RDF graph, and uses the OWL API to query it with SPARQL. To run the command, open a terminal window in the /src/ontology/ folder of a Github synchronized (or “cloned”) FoodOn repository.

NOTE: if cutting and pasting examples in this section to a file or command line generates a robot error, try typing it out instead, or at least retype the double dashes (possibly Unicode character set, so you may need to type each character.

> robot query --input foodon.owl --query foodon_labels.sparql temp.tsv --format TSV

This yields a tab-delimited text file of the labels of all entities in the foodon.owl file, in no particular order. Other output formats including JSON are possible. Note that this doesn’t actually include ALL FoodOn entities because quite a number of those are held in other import files that foodon.owl references. Note that the @en suffix on term labels indicates english. Some ontologies don’t indicate language of their labels, in which case that is absent.

?ontology_id	?label
<http://purl.obolibrary.org/obo/FOODON_03530283>	"white skin"@en	
<http://purl.obolibrary.org/obo/FOODON_03414220>	"galia melon plant"@en	
<http://purl.obolibrary.org/obo/FOODON_03413980>	"streaked seerfish"@en	
<http://purl.obolibrary.org/obo/FOODON_03411555>	"phaseolus vulgaris plant"@en	
<http://purl.obolibrary.org/obo/FOODON_03412740>	"saffron milkcap"@en	
<http://purl.obolibrary.org/obo/FOODON_00003570>	"pear tomato (whole)"@en	
<http://purl.obolibrary.org/obo/FOODON_03414413>	"calcium phosphates"@en	
...

Next we add an “alternative term” annotation to the query, an annotation provided by the Information Artifact Ontology (IAO). It is included using the OPTIONAL {…} construct since otherwise only terms that have a label and an alternative term are included – most terms don’t have an alternative term. Note that if a term had two alternative terms, these would be returned on two separate rows in results.

PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>
PREFIX obo: <http://purl.obolibrary.org/obo/>

SELECT DISTINCT ?ontology_id ?label ?alt_label

WHERE { ?ontology_id rdfs:label ?label 
  OPTIONAL {?ontology_id obo:IAO_0000118 ?alt_label.}
}

This time we’ll run it against FoodOn’s /src/ontology/foodon-merged.owl ontology file, which contains all import file terms as well. The foodon_labels.sparql query is in that folder.

robot query --input foodon-merged.owl --query foodon_labels_alts.sparql temp.tsv --format TSV

Results show that the Taxonomy ontology NCBITaxon often uses ‘alternative term’:

?ontology_id	?label	?alt_label
<http://purl.obolibrary.org/obo/HANCESTRO_0371>	"Latvian"@en	
...
<http://purl.obolibrary.org/obo/NCBITaxon_65351>	"Lepidium campestre"	"field cress plant"@en
<http://purl.obolibrary.org/obo/NCBITaxon_37176>	"Ovibos moschatus"	"muskox"@en
...
<http://purl.obolibrary.org/obo/GAZ_00003171>	"Commonwealth of Virginia"	"virginia"@en
...
<http://purl.obolibrary.org/obo/NCBITaxon_37052>	"Ardenna grisea"	"sooty shearwater"@en
...

The above type of query is great for returning tabular data for entities that have single attributes of a given type – a single label (i.e. in only one language), a single alternative term, etc. but what about those that have lists of synonyms? Here a different approach is needed where each row of a query returns an entity by id, as well as a particular feature (label, synonym, parent, etc.) We use the UNION query construct to return a list of FoodOn term labels and synonyms starting from a particular term / class.

PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>
PREFIX obo:  <http://purl.obolibrary.org/obo/>
PREFIX oboInOwl:  <http://www.geneontology.org/formats/oboInOwl#>

SELECT DISTINCT ?class ?parent ?type ?label 
WHERE {

	# Enter ontology term identifier here to start report with:
	# obo:BFO_0000001 entity includes every term in ontology
	# obo:BFO_0000040 material entity, including chemical food components and food products
	# obo:ENVO_00010483  environmental material terms, including food materials
	# obo:FOODON_00002403 food material terms, including all food products and additives
	BIND (obo:BFO_0000040 as ?search). 

	# Retrieve term and all subclass terms.
	?class rdfs:subClassOf* ?search.

	{
		# Retrieve one or more explicit parents of given entity
		{ 
			?class rdfs:subClassOf ?parent.
			# Ignore blank node parent (subclass axiom)
			FILTER (isBlank(?parent)) 
		}

		# Retrieve entity's label(s). Might be multilingual.
		UNION { 
			?class rdfs:label ?label.
			BIND ('label' as ?type).
		}

		# Retrieve types of synonym
		UNION {
			?class oboInOwl:hasSynonym ?label.
			BIND ('synonym' as ?type).
		}
		UNION {
			?class oboInOwl:hasBroadSynonym ?label.
			BIND ('synonym (broad)' as ?type).
		}
		UNION {
			?class oboInOwl:hasExactSynonym ?label.
			BIND ('synonym (exact)' as ?type).
		}
		UNION {
			?class oboInOwl:hasNarrowSynonym ?label.
			BIND ('synonym (narrow)' as ?type).
		}
		UNION {
			?class obo:IAO_0000118 ?label.
			BIND ('label (alternative)' as ?type).
		}
	}
}
ORDER BY ?class ?type

BIND is used to set a variable (the “?search” variable above gets set to obo:BFO_0000040, i.e. BFO “material entity”). Then the recursive “?class rdfs:subClassOf* ?search” triple expression returns any ?class that is connected in a path of 0 or more rdfs:subClassOf relations to the ?search entity. For a found ?class, each UNION expression returns any related parent, label, or synonym on a separate row of the result. An ORDER BY clause ensures all the query results for each term are grouped together.

> robot query --input foodon-merged.owl --query foodon_synonyms.sparql output.tsv --format TSV

Results:

?class	?parent	?type	?label
...
<http://purl.obolibrary.org/obo/FOODON_00000015>	<http://purl.obolibrary.org/obo/NCBITaxon_9913>		
<http://purl.obolibrary.org/obo/FOODON_00000015>		"label"	"cattle bull"@en
<http://purl.obolibrary.org/obo/FOODON_00000015>		"synonym"	"bull"@en
<http://purl.obolibrary.org/obo/FOODON_00000015>		"synonym (broad)"	"http://purl.obolibrary.org/obo/AGRO_00000119"@en
<http://purl.obolibrary.org/obo/FOODON_00000071>	<http://purl.obolibrary.org/obo/FOODON_03460130>		
<http://purl.obolibrary.org/obo/FOODON_00000071>		"label"	"food cutting process"@en
<http://purl.obolibrary.org/obo/FOODON_00000074>	<http://purl.obolibrary.org/obo/FOODON_03411625>		
<http://purl.obolibrary.org/obo/FOODON_00000074>		"label"	"tortoise"@en
...		

An informal YouTube video of a curation team tutorial about SPARQL querying is available which covers most of the above material.

Other SPARQL examples

The “foodon_subdomains.sparql” example query returns the count of main branches (facets) of FoodOn:

PREFIX owl: <http://www.w3.org/2002/07/owl#>
PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>
PREFIX obo: <http://purl.obolibrary.org/obo/>
PREFIX foodon: <http://purl.obolibrary.org/obo/FOODON_>
SELECT
	?search	
	(STR(?label) AS ?name) 
	(count(?class) as ?total) 
	WHERE {
	values ?search {
		foodon:03411041 # Chemical food component
		obo:OBI_0100026 # Organism (NCBI taxonomy)
		foodon:03411564 # Food product organismal source
		foodon:03420116 # Part of organism (anatomy)
		foodon:00002381 # Food product by organism (~single component food)
		foodon:00002501 # Multi-component food product
		foodon:00002451 # Food transformation process
		foodon:00003368 # Food contact material
		foodon:03400361 # Agency food product type
	}
	{?class rdfs:subClassOf+ ?search}
	OPTIONAL {?search rdfs:label ?label.}
} 
GROUP BY ?search ?label

Result as of January 18, 2021: