FoodOn and LanguaL

Much of FoodOn’s core vocabulary comes from transforming LanguaL, a mature and popular food indexing thesaurus (see a presentation on the LanguaL paradigm, and technical info). The FoodOn paper summarizes LanguaL’s history:

“FoodOn has drawn many of its initial terms from LanguaL, a library science and ontology friendly food classification system consisting of 14 food product description facets including plant or animal food source, chemical additive, preservation or cooking process, packaging, and standard national and international upper-level product type schemes.4 LanguaL has evolved steadily from its origin at the Center for Food Safety and Applied Nutrition of the United States Food and Drug Administration (FDA) in the 1970’s. Provided online as a free resource by Danish Food Informatics (http://langual.org), it has been used to index numerous European Union and United States agency databases, including the USDA Nutrient Database for Standard Reference (SR), a food composition database of nutritional data for servings of common and branded food products, and 30 European Food Information Resource (EuroFIR) Food Classification system sanctioned databases.5,6 For example, see the Czech Food Composition Database entry for black raw currants (http://www.nutridatabaze.cz/en/food/?id=42#tab-2)”

FoodOn: a harmonized food ontology to increase global food traceability, quality control and data integration

FoodOn and LanguaL similarities

FoodOn makes the same distinction, like LanguaL, between food source plant and animal organisms which are in LanguaL Facet B, and agency food categorization schemes in LanguaL Facet A – with branches dedicated to EFSA, GS1, FDA CFR, and EUROFIR – these are all copied term-for-term into FoodOn. A reference to a LanguaL Facet A category implies the semantics of an organism or derived product as a food commodity. All of Facet A is mirrored under FoodOn’s “food product type” branch, a new hierarchy which is a very rough copy of the FDA Code of Federal Regulations food product (CFR) branch, but re-organized to be the most ontologically “pure” of all the schemes. To reflect their origins, most “foodon food product” items are linked directly or by inheritance via a “member of” reference to a CFR category. Note that agency schemes have not been cross-referenced to each other yet in LanguaL or FoodOn. One agency’s (Facet A) reference to broccoli isn’t linked to its equivalent in another agency scheme for example.

All of Facet B is mirrored under FoodOn’s “food source” branch. That’s where a reference to a broccoli plant as a whole is made, and this is often the reference one wants to make unless more particular plant or animal food product aspect is involved, e.g. a reference to broccoli florets. A reference to an organism in Facet B does not necessitate that the organism is bearing a food role. Thus we can reference Cattle (LanguaL B1161; FOODON_03411161 ) in axioms that may be devoted entirely to epidemiology investigations about hoof disease without implying any conversation about Cattle as food.

In LanguaL, in Facet B, the Broccoli organism is just called “broccoli”, in FoodOn we have added the words “… plant (food source)” to the label. This was intended to ontologically distinguish references to the whole organism from references to broccoli food product variants that are being added to FoodOn. However, we will probably drop the “food source” label component shortly as we want to allow references to these organisms without implication of a food role in order to support One – Health epidemiological modelling.

Other agency food product schemes don’t necessarily follow ontological rules, and have different areas of interest. Our aim over the next years is to cross reference their categories to the new “foodon food product” branch, which is where we focus on adding new food terms. In this way we aim to enable mapping of agency food database contents. The other agency product branches will remain untouched into the future (except as their agencies update them), and exist primarily for cross-referencing from within FoodOn; they are provided with a FoodOn ontology URI to facilitate this.

​​​​LanguaL identifiers in FoodOn.

Every LanguaL id is in FoodOn – as a dbXref annotation. (A number of LanguaL terms show as deprecated terms in FoodOn, but are still available to reference.) As well, due to the way the initial import script was written, to a large extent LanguaL Facet B ids can be found mapped directly FoodOn ids: e.g. LanguaL Broccoli, code B1443 has a URL:

http://www.langual.org/langual_thesaurus.asp?srcid=9380&termid=B1443

which can be seen in last digits of the FoodOn URI:

http://purl.obolibrary.org/obo/FOODON_03411443

In other words, add facet B id 1443 to FOODON_0341 base code for Facet B to get above link. This same example works for a number of other facets – but don’t rely on this into the future as it was just done during initial conversion; (we do have a synchronization script that can pull in future LanguaL versions though). New “food source” terms are being added to FoodOn that have no LanguaL equivalents, so the scheme above may only be suitable for mapping legacy LanguaL databases to FoodOn.

FoodOn and LanguaL differences

FoodOn does diverge from LanguaL on a number of fronts, partly because of some different capabilities OWL has.

  • There are a number of cases where LanguaL has “[facet X] not used” and “[facet X term] not known” terms, e.g. “Packing medium not used”. In FoodOn we don’t mention a facet if it isn’t relevant, so we have deprecated all such “not used” and “not known” terms. If a facet term is relevant, we use it; if it isn’t known whether or not a food can be described by a particular facet term, but we know the parent category is pertinent, we just reference the parent category. So, when describing a food we wouldn’t say “Cut of meat not known”; instead we’d say “Cut of meat”. This enables a reasoner to know that the entity being described is a cut of meat, which allows that some subclass of cut of meat is pertinent (or some other cut of meat not listed).
  • LanguaL used Facet B “food source” to indicate the primary ingredient of any indexed food, and facet H “treatment applied > ingredient added” to indicate secondary ingredients. OWL ontologies have the luxury of using object properties to describe primary and secondary ingredients, and so we have added “has ingredient” and “has substance added” relations to do this. Consequently we have deprecated most of LanguaL’s “ingredient added” items (they exist in the langual_deprecated_import.owl mentioned below).
  • We have preserved a number of LanguaL’s regulation-oriented content percentage food added items in FoodOn. However, a food product’s “has substance added” relations could potentially be annotated with a percentage % or proportion, which brings composite foods closer to a recipe or product label level of description. Note that FoodOn, like LanguaL, does not yet tackle recipe description. This will require a larger food processing vocabulary.
  • An OWL include file called “langual_deprecated_import.owl” exists of all old LanguaL deprecated terms that one might use to convert terms to their new FoodOn equivalents.
  • Most of LanguaL’s food additives list items have been converted to their CHEBI equivalents.