annotations package
This package makes it easy to handle and filter lists of items with annotations and modifiers of the kind:
| Name | Annotation |
|---|---|
| apple | "fruit[medium],red" |
| orange | "fruit[medium],orange" |
| carrot | "vegetable[long],orange" |
| cherry | "fruit[small],red[dark]" |
Installation
Install using pip:
Simple usage
For illustration purposes, we'll work off an example of a simple list of protein IDs and their intracellular localisations of the form localisation1[modifier1,modifier2],localisation2[modifier3]. However, the same procedure applies very broadly for other types of annotations as well, outside of cell biology.
Here is the list of gene IDs and localisations (a protein can have more than one annotation if it has been detected in more than one organelle for example):
raw_localisation_strings = {
"protein1": "cytoplasm[points]",
"protein2": "nucleoplasm,cytoplasm[weak]",
"protein3": "lysosome,endocytic",
"protein4": "nucleolus",
"protein5": "cytoplasm,mitochondrion",
"protein6": "Golgi apparatus",
"protein7": "nucleoplasm",
"protein8": "lysosome",
}
Creating collections of annotations
To turn this into versatile AnnotationCollection objects, we simply do
from annotations import AnnotationCollection
protein_annotations = {
protein: AnnotationCollection(annotation_string)
for protein, annotation_string in raw_annotation_strings.items()
}
Matching annotations
We'll use simple python constructs to manipulate and match these annotations. For example, to find all proteins that localise to the cytoplasm, we use the AnnotationCollection.match method
print([
protein
for protein, localisation in protein_annotations.items()
if localisation.match("cytoplasm")
])
Excluding modifiers
When matching terms, entries can be excluded based on modifiers:
print([
protein
for protein, localisation in protein_annotations.items()
if localisation.match("cytoplasm", exclude_modifiers={"weak"})
])
Requiring modifiers
Conversely, modifiers can also be required:
print([
protein
for protein, localisation in protein_annotations.items()
if localisation.match("cytoplasm", require_modifiers={"points"})
])
Filtering annotations based on modifiers
We can also use the filter_by_modifiers method to create new AnnotationCollections for proteins:
non_weak_protein_annotations = {
protein: localisation.filter_by_modifiers(
exclude_modifiers={"weak"},
)
for protein, localisation in protein_annotations.items()
}
cytoplasm[weak] annotation from protein2. The same rules regarding required and excluded modifiers from above apply.
Using annotation ontologies
Annotations often come organised into formalised ontologies, i.e. an annotation hierarchy. An example of this in the above localisations is that both the nucleolus and the nucleoplasm annotations are considered part of the cell nucleus. A common task is then to find all proteins that localise to the nucleus. This is made vastly easier by defining the following ontology:
raw_ontology = [
[
{
"name": "cytoplasm",
"children": [
{
"name": "endocytic",
"children": [
{
"name": "lysosome",
"children": "",
},
],
},
{
"name": "Golgi apparatus",
"children": [
{
"name": "lysosome",
"children": "",
},
],
},
],
},
{
"name": "mitochondrion",
"children": [],
},
{
"name": "nucleus",
"children": [
{
"name": "nucleoplasm",
"children": [],
},
{
"name": "nucleolus",
"children": [],
},
]
}
],
]
Creating the ontology
We can turn this into an ontology object by using the Ontology class and filling it with OntologyEntry objects:
from annotations import Ontology, OntologyEntry
ontology = Ontology()
def recurse_raw_ontology(entry):
ontology_entry = OntologyEntry(name=entry["name"])
ontology.entries[ontology_entry.name] = ontology_entry
for child in entry["children"]:
child = recurse_raw_ontology(child)
child.set_parent(ontology_entry)
ontology_entry.add_child(child)
return ontology_entry
for root_entry in raw_ontology:
ontology.root_entries.append(recurse_raw_ontology(root_entry))
Creating ontology annotations
We then use the ontology to create ontology-connected OntologyAnnotationCollection objects for our proteins:
from annotations import OntologyAnnotationCollection
protein_ontology_annotations = {
protein: OntologyAnnotationCollection(annotation_string, ontology)
for protein, annotation_string in raw_annotation_strings.items()
}
Matching based on ontology
As above, we filter our proteins based on a term, this time we use nucleus. Note that this term does not occur in the localisation annotations of any of our proteins, but it is the parent of the nucleoplasm and nucleolus localisations in the ontology.
print([
protein
for protein, localisation in protein_ontology_annotations.items()
if localisation.match("nucleus")
])
Again, as with regular annotations, the matching can be adjusted with require_modifiers and exclude_modifiers.