Curation Rules

The process of formulating curation rules about how a given food source or product or related information artifact should be described is ongoing. Generally as an OBO Foundry member, we subscribe to its curation principles. Here are the more specific rules we have so far:

Singular term labels and aristotelean definitions

  • Terms are generally labeled in the singular to fit the aristotelean definition form “An X is a (member of class) Y that has feature/property/differentia Z“. This facilitates an axiomatic interpretation of the english definition. Plurals might show up in FoodOn as synonyms mainly inherited from LanguaL, but will be phased out unless they are exceptions to the plural = root + “s” pattern. Recognition of plural terms in text mining is expected in the future to be handled mainly by external resources like LexMapr and Wictionary.
    • ISO 25964 standard for thesauri notes the practice in some languages of using plurals to indicate count nouns, so that process terms can be described alongside but have different labels – the difference between “paintings” and “painting”. Ontologies don’t need this labeling approach though if terms are positioned under material entities and processes – a term’s upper-level class distinguishes the sense of the word. (Potentially a “[term label] sensu [variation semantic]” could be used to individuate the labels).
  • We do not depend on the term to be text-matchable. It is as long as necessary to differentiate it from its siblings. See OBO Foundry documentation on term labels. We use term synonyms to provide words or phrases found in text matching.
  • More information on writing good aristotelean definitions is here.
  • One may also find General OBOFoundry ontology guidelines for textual definitions useful.

Term popularity

Sometimes terms like “cacao bean” and “cocoa bean” which both refer to the cacao plant’s beans, show up frequently in usage. Which to make the default term label, and which the exact synonym? One of our goals in FoodOn is to map over conveniently to Wikipedia pages for equivalent terms, so we can be swayed to use the same label that Wikipedia has, in this case “cocoa bean“.

For organisms as a whole, as Wikipedia recognizes, sometimes food sources can only be disambiguated by referencing the scientific name. We are switching to using scientific name for whole plant, animal, fungi and algae references where possible, and including an organism common name in the synonym list.

As well, we can be swayed by resources like Google’s Ngram Viewer data on the popularity of term usage in books.

Term deprecation

All FoodOn terms that are deprecated remain in FoodOn, with an ‘owl:deprecated=true’ annotation, and another “‘term replaced by’=[new ID of term]” annotation. FoodOn plant and animal part terms which were inherited from LanguaL have mainly been replaced in favour of UBERON animal and PO plant anatomy terms.

Facet curation details

The organismal source facet

This provides a simple, shallow, generally non-scientific name hierarchy of plants, animals, and fungi grouped in ways humans find convenient for organizing food, and so is essentially a shortcut menu to the NCBITaxon hierarchy which can be 20, 30, 40 or more levels deep. The leafs of the organismal source hierarchy are generally NCBITaxon terms, which thus transition into the bottom species level of the NCBITaxon hierarchy. This branch currently covers food for humans and domesticated animals. It excludes reference to parts of organisms or organism products like milk or generic terms like egg which stretch across species – see “organism material” below for this.

An organismal source term equivalency statement may provide a disjunction of more than one NCBITaxon entity. An organismal source term may reference other taxonomic references too, like ITIS. This approach addresses a few challenges: the shifting taxonomic reclassification of organisms, and the fact that ontology driven taxonomies like NCBITaxon actually don’t have terms for all organisms (NCBITaxon is dedicated to covering organisms that have had some kind of sequencing done on them). Thirdly there are cases where use of a term shifts over time so that other organisms are referenced (e.g. due to availability issues.) Referencing both past and present taxa can be done by adding to the “in taxon” axiom.

  • References to animal as an organism, including classes that differentiate animal by breed, sexual anatomy (including castration), and age, as for example “hen: a female adult chicken”. In this way animals can be referenced without necessarily implying a food context, as when involved in veterinary processes.
  • References to plant organisms, usually with the word “plant” or “tree” at end so that in display, e.g. in the context of search engine results, it is clear that reference to the whole organism (roots, bark, leaves, etc.) or any part is occurring. If we only listed “apple” under organismal source, rather than “apple tree”, this would imply the fruit alone, preventing other parts of the organism to be involved as food products, e.g. apple blossom flowers in tea.
  • As well, a plant grown from a splice is technically composed of two taxonomic entities. The food source organism class allows this, while a direct taxonomic reference doesn’t.

The organism material facet

Parts (e.g. limb) or material outputs (e.g. milk) of organisms, and generic terms like “egg” that stretch across species – are considered primarily in the anatomical “organism material” facet which food product facet items can references in conjunction with organisms to describe food products.

The food product facet

A class in this hierarchy either makes reference to the fact that it is ‘derived from’ some food source organism, or that it ‘has ingredient’, ‘has defining ingredient’ or ‘has part’ some other food product. Invariably a food product has some amount of food processing involved, even if just basic harvesting, e.g. from the sea or plucking from a fruit tree. Note that foods that a person may harvest from the wild, or grow or raise domestically for consumption without economic consideration are also considered to be food products.

If a food source can be traded or given essentially as a whole, like a chicken, then it may also be listed directly under the food product category. If a specification about how intact or whole a food source organism is is required, then this descriptor will need to be captured in an axiom that includes reference to the food source, as is done with terms like “chicken (raw)”.

Note that when we use singular terms like “chicken”, we state in definition that we are referring to one whole chicken.

FoodOn’s position about term labels: OBOFoundry has a policy to spell out terms in such a way that commas and bracketed expressions are not used, so, e.g. “whole raw chicken” rather than chicken (raw). We find this problematic with long lists of similar subclasses of a term (for example chicken meat food products) and especially when trying to identify terms of interest in search results. For this reason our current approach is to have the rdfs:label have the main organism name or varietal name and anatomical part/qualifier, followed by other state or process descriptors (e.g. cooked, frozen, raw). Textually, this leads to semantically similar terms being ordered together, with aristotelean differentiae clearly visible in brackets. The alternative is that words like “frozen” and “raw” lead the labels and control sectioning of search results.

  • A food product label can include both an organism and anatomical part in its name, e.g. “chicken back”. Generally the word “whole” in the label will be parenthesized to emphasize the intended wholeness of the organism or part where necessary. this favours “chicken (whole, …)” rather than “chicken (…) ” when there are chicken part and piece siblings).
  • A food product having processes applied to it beyond those that were required to isolate an anatomical part likely should have those processes listed in brackets following the label, e.g. “chicken (frozen)”, “chicken (fried)”.
  • The order of listing processes should echo the order they were applied to the food, so “poultry (deboned, canned)” rather than “poultry (canned, deboned)”.
  • One should use a verb past participle to describe the output of the process, i.e. a “freezing process” outputs “frozen” food. We are reducing variations on these terms:
    • “boned” and “boneless” should usually be normalized to “deboned”. However, sometimes “boneless” is used to state the condition of a food rather than implying a separate deboning process, so a judgement call may be required.
  • The food product’s axiomatization should reflect the parent class conjunction with the food process term(s) in its label, e.g. ” ‘chicken (frozen)’ equivalentTo ‘chicken meat food product’ and ‘meat (frozen)’ “.
  • The food product suffix:
    • If it might be ambiguous whether a term is a food category having various closely related subclasses, or a “leaf” food item that has no subclasses, we include the suffix “food product” to mark the broader sense. e.g. “barley flour food product” vs “barley flour”.
    • A leaf food item that likely won’t have children in the future, doesn’t need a “food product” suffix in its label unless this helps to disambiguate it from search results coming from other ontologies in a lookup service.
    • Items with the “food product” suffix should have an exact synonym that is the suffix-less phrase in order to facilitate text-mining.

Synonyms

For a given “primary” entity entry in an ontology,

  • a broad synonym is a word or phrase which refers to an entity/concept that would have the primary entity class as a subclass, but the synonym is out of scope as an entity within the primary entity’s ontology, or has not otherwise been defined within the ontology. Used for searching or text matching purposes.
  • an exact synonym refers to the same entities as the primary entity does.
  • a narrow synonym refers to some more particular entity that is not a subclass. This should be provided only where a primary entity doesn’t have a subclass that the synonym applies to, and no work is planned to create that.

What if we have a common name like “white ibis” mentioned in a biosample, which could reference either of two species, the Australian white ibis, Threskiornis molucca, or the American white ibis, Eudocimus albus? The biosample is clearly just one of those two, so logically we need a reference that covers either, to acknowledge lack of certainty. This can be accomplished in this case by attaching a “broad synonym” of “white ibis” to both of those taxa. The alternative, to attach white ibis as an exact synonym to either record is misleading from a global language perspective in that it is only in a local speech community that the reference is understood as pointing to only one of the two taxa;. This is a strategy for general text-mining of data records where there is no extra knowledge available about what those records pertain to.

In the future, FoodOn will be adopting the SSSOM mapping specification for synonym management.