GENO ontology

GENO is an OWL model of genotypes, their more fundamental sequence components, and links to related biological and experimental entities.

At present many parts of the model are exploratory and set to undergo refactoring. In addition, many classes and properties have GENO URIs but are place holders for classes that will be imported from an external ontology (e.g. SO, ChEBI, OBI, etc). Furthermore, ongoing work will implement a model of genotype-to-phenotype associations. This will support description of asserted and inferred relationships between a genotypes, phenotypes, and environments, and the evidence/provenance behind these associations. Documentation is under development as well, and for now a slidedeck is available at http://www.slideshare.net/mhb120/brush-icbo-2013

Open in the Ontology Lookup Service (OLS)


allele [GENO_0000512]

One of a set of sequence features known to exist at a particular genomic location. An allele is a seqeunce feature at a genomic location where variation occurs (i.e. where >1 different sequence is known to exist). An allele can span only the extent of sequence known to vary (e.g. a single base SNP, or short insertion), or it can span a larger extent that includes one or more variable features as proper parts (e.g. a ‘gene allele’ that spans the extent of an entire gene which contains several sequence alterations). Alleles can carry ‘reference’ or ‘variant’ sequence - depending on whether the its ‘state’ matches that considered to be the reference at that location. Alleles whose state differs from the reference are called ‘variant alleles’, and those that match the reference are called ‘reference alleles’. What is considered the ‘reference’ state at a particular location may vary, depending on the context/goal of a particular analysis. A ‘sequence alteration’ is a ‘variant allele’ that varies along its entire extent (i.e every position varies from that of some defined reference sequence).

allele origin [GENO_0000877]

A quality inhering in an allele that describes its genetic origin (how it came to be part of a cell’s genome), i.e. whether it occurred de novo through some spontaneous mutation event, or was inherited from a parent.

allele set [GENO_0000954]

A set of discrete alleles within a particular genome. ‘Sets’ are used to model entities that can be comprised of multiple discrete elements - but which can also contain zero or a single member. An “Allele Set’ represents any collection of 0 or more discrete alleles found within a particular genome. The alleles in such a set can be located at distant or close locations in the genome, and if on the same chromosome can be in trans, in cis, or even overlapping When the members of such a set are found ‘in cis’ on the same chromosome, they may constitute a ‘haplotype’. When found ‘in trans’ at the same location on homologous chromosomes, they may constitute a ‘single locus complement’.

allelic cellular distribution [GENO_0000926]

A quality inhering in an allele reflecting whether it is found in all cells of an organism’s body, or just some clonal subset (e.g. in mosaicism).

allelic genotype [GENO_0000823]

A genotype that specifies the ‘allelic state’ at a particular location in the genome - i.e. the set of alleles present at this locus across all homologous chromosomes. An ‘allelic genotype’ describes the set of alleles present at a particular location in the genome. This use of the term ‘genotype’ reflects its use in clinical genetics where variation has historically been assessed at a specific locus, and a genotype describes the allelic state at that particular location. This contrasts to the use of the term ‘genotype in model orgnaism communities where it commonly describes the allelic state at all loci in a genome known to vary from an established reference or background.

allelic phase [GENO_0000886]

A quality inhering in a collection of discontinuous sequence features in a single genome in virtue of their relative position on the same or separate chromosomes.

allelic state [GENO_0000875]

A quality inhering in an ‘allelic complement’ (aka a ‘single locus complement’) that describes the allelic variability found at a particular locus in the genome of a single cell/organism

allosomal inheritance [GENO_0000935]

An inheritance pattern wherein the trait is determined by alleles of a single causal gene on a sex chromosome.

amino acid residue [GENO_0000782]

[biological sequence unit; amino acid residue]

amino acid sequence [GENO_0000722]

[has_sequence_unit; amino acid sequence; amino acid residue; biological sequence]

ancestral polymorphic allele [GENO_0000500]

A polymorphic allele that is determined from the sequence of a recent ancestor in a phylogentic tree.

aneusomic [GENO_0000513]

a sequence attribute of a chromosome or chromosomal region that has been abnormally duplicated or lost, as the result of a non-disjunction event or unbalanced translocation.

aneusomic chromosomal part [GENO_0000343]

A large deletion or terminal addition of part of some non-homologous chromsosome, as the result of an unbalanced translocation. Aneusomic chromosomal parts are examples of “partial aneuploidy” as described in http://en.wikipedia.org/wiki/Aneuploidy: “The terms “partial monosomy” and “partial trisomy” are used to describe an imbalance of genetic material caused by loss or gain of part of a chromosome. In particular, these terms would be used in the situation of an unbalanced translocation, where an individual carries a derivative chromosome formed through the breakage and fusion of two different chromosomes. In this situation, the individual would have three copies of part of one chromosome (two normal copies and the portion that exists on the derivative chromosome) and only one copy of part of the other chromosome involved in the derivative chromosome.”

aneusomic chromosome [GENO_0000346]

A complete chromosome that has been abnormally duplicated, or the absense of a chromosome that has been lost, typically as the result of a non-disjunction event or unbalanced translocation Large sequence features gained in a genome are considered to be sequence alterations (akin to insertions), including aneusomic chromosome segments gained through unbalanced translocation events, entrie aneusomic chromosomes gained through a non-disjunction event during replication, or extrachromosomal replicons that become part of the heritable gneme of a cell or organism. Similarly, large sequence features lost from genome are akin to deletions and therefore also considered sequence alterations. This includes the loss of chromosomal segments through unbalanced translocation events, and the loss of entire chromosomes through a non-disjunction event during replication.

aneusomic zygosity [GENO_0000392]

[aneusomic zygosity]

autosomal dominant inheritance [GENO_0000147]

An inheritance pattern wherein a trait caused by alleles of an autosomal gene manifests in heterozygotes.

autosomal inheritance [GENO_0000934]

An inheritance pattern wherein the trait is determined by alleles of a single causal gene on a non-sex chromosome.

autosomal recessive inheritance [GENO_0000148]

An inheritance pattern wherein a trait caused by alleles of an autosomal gene manifests in homozygous but not heterozygote individuals.

background genome [GENO_0000010]

A reference genome that represents the sequence of a genome from which a variant genome is derived (through the introduction of sequence alterations).

biological process [GENO_0000351]

[biological process]

biological sequence [GENO_0000702]

A linear ordering of units representing monomers of a biological macromolecule (e.g. nucleotides in DNA and RNA, amino acids in polypeptides). ‘Sequences’ differ from ‘sequence features’ in that instances are distinguished only by their inherent ordering of units, and not by any positional aspect related to alignment with some reference sequence. Accordingly, the ‘ATG’ translational start codon of the human AKT gene is the same sequence as the ‘ATG’ start codon of the human SHH gene, but these represent two distinct sequence features in virtue of their different positions in the genome.

biological sequence or set [GENO_0000921]

A biolocical sequence, or set of such sequences.

biological sequence set [GENO_0000922]

A set of biological sequences. ‘Sets’ are used to represent entities that are typically collections of more than one member. But we allow for sets that contain 0 members (an ’empty’ set) or 1 member (a ‘singleton’ or ‘unit’ set), consistent with the concept of ‘mathematical sets’. A set may also include multiple copies of the same sequence. For example, in a ‘copy number complement’, members are all copies of this same biological sequence.

biological sequence unit [GENO_0000779]

[biological sequence unit]

biparental allele origin [GENO_0000976]

Describes an allele that is part of an allelic complement where one allele is maternally inherited and other paternally inherited. Biparental inheritance of alleles is typical of normal mendelian inheritance, where offspring inherit a maternal and a paternal copies of a given gene.

chromosomal band intensity [GENO_0000618]

[chromosomal band intensity; sequence feature attribute]

chromosomal deletion inheritance [GENO_0000970]

An inheritance pattern wherein the trait is determined by inheritance of missing sections of one or more chromosomes, encompassing either 0 or multiple genes, possibly together with environmental factors.

chromosomal duplication inheritance [GENO_0000971]

An inheritance pattern wherein the trait is determined by inheritance of duplicated sections of one or more chromosomes, encompassing either 0 or multiple genes, possibly together with environmental factors.

chromosomal inheritance [GENO_0000969]

An inheritance pattern wherein the trait is determined by inheritance of extra, missing, or re-arranged chromosomes possibly together with environmental factors.

chromosomal rearrangement inheritance [GENO_0000972]

An inheritance pattern wherein the trait is determined by inheritance of translocation or inversion of sections of one or more chromosomes, possibly together with environmental factors.

chromosomal region [GENO_0000614]

An extended part of a chromosome representing a term of convenience in order to hierarchically organize morphologically defined chromosome features: chromosome > arm > region > band > sub-band.

chromosome sub-band [GENO_0000616]

[has_sequence_attribute; chromosomal band intensity; chromosome band; is part of; chromosome_part; chromosome sub-band]

clonal [GENO_0000928]

A cellular distribuution in which an allele is found only in some clonal subset of cells in an organism, typically in virtue of its somatic origin.

co-dominant autosomal inheritance [GENO_0000143]

An autosomal dominant inheritance pattern wherein a heterozygous individual simultaneously expresses the distinct traits associated with each allele in the heterozygous locus.

co-dominant X-linked inheritance [GENO_0000939]

An X-linked dominant inheritance pattern wherein a heterozygous individual simultaneously expresses the distinct traits associated with each allele in the heterozygous locus.

co-dominant Z-linked inheritance [GENO_0000946]

An Z-linked dominant inheritance pattern wherein a heterozygous individual simultaneously expresses the distinct traits associated with each allele in the heterozygous locus.

complete autosomal dominant inheritance [GENO_0000144]

An autosomal dominant inheritance pattern wherein the trait associated with one allele completely masks the trait associated with a different allele found at that locus.

complete X-linked dominant inheritance [GENO_0000937]

An X-linked dominant inheritance pattern wherein the trait associated with one allele completely masks the trait associated with a different allele found at that locus.

complete Z-linked dominant inheritance [GENO_0000944]

A Z-linked dominant inheritance pattern wherein the trait associated with one allele completely masks the trait associated with a different allele found at that locus.

compound heterozygous [GENO_0000402]

A heterozygous quality inhering in a single locus complement comprised of two different varaint alleles and no wild type locus. (e.g.fgf8a/fgf8a)

constitutional [GENO_0000927]

A cellular distribution in which an allele is found in all cells of an organism’s body, typically in virtue of its germline origin.

copy number complement [GENO_0000961]

A set representing the complement of all copies of a particular biological sequence (typically at the scale of complete genes or larger) present in a particular genome. The notion of a ‘complement’ is useful as a special case of a set, where the members necessarily comprise an exhaustive collection of all objects that make up some well-defined set. Here, a ‘copy number complement’ represents ‘represents the set of all copies of a specified sequence in a particular genome. Note that sequences can be duplicated in a set (i.e. contain more than one member representing the same sequence). In the ‘copy number complement’ example, each set member is a copy of this same biological sequence. The count of how many of a particular sequences are found in a genome is the sequences ‘copy number’. In diploid organisms, the normal copy number for sequences at most locations is 2 (a notable exception being those on the X-chromosome where normal copy number is 1). Variations in copy number occur if this count increases due to a duplication of the gene/region, or decreases due to a deletion of a gene/region. A driving use case for representing copy number is to support associations between variation in copy number of a particular sequence, and phenotypes or diseases that can result. A ‘complement’ refers to an exhaustive collection of all objects that make up some well-defined set. Such a set may contain 0, 1, or more than one members. The notion of a complement is useful for defining many biologically-relevant sets of sequence features, such as ‘copy number complements’ representing the set of all copies of a particular sequence in a genome. The fact that we are counting how many copies of the same sequence exist in a genome here, as opposed to how many of the same feature, is what sets sequence-level concepts like ‘copy number complement’ apart from feature-level concepts like ‘single locus complement’. To illustrate the difference, consider a duplication event that creates a new copy of the human APOE gene on a different chromosome. This creates an entirely new sequence feature at a distinct locus from that of the original APOE gene. The ‘copy number complement’ for sequence defined by the APOE gene locus would have a count of three, as this sequence is present three times in the genome. But the ‘single locus complement’ at the APOE gene locus would still have a count of two - because the duplicated copy is at a different location in the genome, and therefore does not represent a copy of the APOE locus.

danio rerio gene [GENO_0000047]

A gene that originates from the genome of a danio rerio.

danio rerio strain [GENO_0000119]

[danio rerio strain; strain or breed; has_member; Danio rerio]

de novo allele origin [GENO_0000880]

Describes an attribute describing an allele that originated through a mutation event in a germ cell of one of the parents, or in the fertilized egg itself during early embryogenesis. We distinguish germline, somatic, and de novo allele origin based on a combination two key criteria - whether the allele inherited from a parent, and whether it is *heritble’ by offspring. De novo variants are heritable but not inherited - as they are not observed in either parent, but can be passed to offspring in virtue of their being present in the individual’s germ cells. By contrast, germline variants are both inherited (passed down from a parent) and heritable (passable down to offspring), and somatic variants are neither inherited or heritable - having originated via a spontaneous mutation in a non-germ cell. De novo variants appear for the first time in one family member. They often explain genetic disorders in which an affected child has a mutation in every cell in the body but the parents do not, and there is no family history of the disorder.

digenic inheritance [GENO_0000930]

A multifactorial inheritance pattern that is determined by the simultaneous action of alleles in two genes.

diplotype [GENO_0000885]

An allelic genotype specifying the set of two alleles present at a particular location in a diploid genome (i.e., a diploid ‘single locus complement’) Alt: A sequence feature complement comprised of two haplotypes at a particular location on paired homologous chromosomes in a diploid genome. “Humans are diploid organisms; they have paired homologous chromosomes in their somatic cells, which contain two copies of each gene. An allele is one member of a pair of genes occupying a specific spot on a chromosome (called locus). Two alleles at the same locus on homologous chromosomes make up the individual’s genotype. A haplotype (a contraction of the term ‘haploid genotype’) is a combination of alleles at multiple loci that are transmitted together on the same chromosome. Haplotype may refer to as few as two loci or to an entire chromosome depending on the number of recombination events that have occurred between a given set of loci. Genewise haplotypes are established with markers within a gene; familywise haplotypes are established with markers within members of a gene family; and regionwise haplotypes are established within different genes in a region at the same chromosome. Finally, a diplotype is a matched pair of haplotypes on homologous chromosomes.” From https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4118015/ https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4118015/figure/sap-26-03-165-g002/

disomic zygosity [GENO_0000391]

[disomic zygosity]

DNA residue [GENO_0000780]

[DNA residue; biological sequence unit]

DNA sequence [GENO_0000720]

[DNA residue; has_sequence_unit; biological sequence; DNA sequence]

effective genotype [GENO_0000525]

A genotype that describes the total intrinsic and extrinsic variation across a genome at the time of a phenotypic assessment (where ‘intrinsic’ refers to variation in genomic sequence, as mediated by sequence alterations, and ’extrinsic’ refers to variation in gene expression, as mediated through transient gene-specific interventions such as gene knockdown reagents or overexpression constructs). An effective genotype is meant to summarize all factors related to genes and their expression that influence an observed phenotype - including ‘intrinsic’ alterations in genomic sequence, and gene-specific ’extrinsic’ alterations in expression transiently introduced at the time of the phenotypic assessment.

engineered genetic construct [GENO_0000856]

An engineered region that is used to transfer foreign genetic material into a host cell. Constructs can be engineered to carry inserts of DNA from external sources, for purposes of cloning and propagation or gene expression in host cells. Constructs are typically packaged as part of delivery systems such as plasmids or viral vectors.

expressed transgene region [GENO_0000638]

A transgene part whose sequence is expressed in a gene product through transcription and/or translation.

expression construct [GENO_0000495]

[engineered genetic construct; expression construct]

expression-qualified sequence feature [GENO_0000737]

A sequence feature whose identity is additionally dependent on factors specifically influencing its level of expression in the context of a biological system (e.g. being targeted by gene-knockdown reagents, or driven from exogneous expression system like recombinant construct)

expression-variant gene [GENO_0000529]

A gene altered in its expression level relative to some baseline of normal expression in the system under investigation (e.g. a cell line or model organism). Expression-variant genes are altered in their expression level through some modification or intervention external to its sequence and position. These may include endogenous mechanisms (e.g. direct epigentic modification that impact expression level, or altered regulatory networks controlling gene expression), or experimental interventions (e.g. targeting by a gene-knockdown reagent, or being transiently expressed as part of a transgenic construct in a host cell or organism). The identity of a given instance of a experssion-variant gene is dependent on how its level of expression is manipulated in a biological system (i.e. via targeting by gene-knockdown reagents, or being transiently overexpressed). So expression-variant genes have the additional identity criteria of a genetic context of its material bearer (external to its sequence and position) that impacts its level of expression in a biological system.

extra-chromosomal transgene [GENO_0000861]

A transgene that is not chromosomally integrated in the host genome, but instead exists as part of an extra-chromosomal construct.

extrachromosomal replicon [GENO_0000494]

A genetic feature that is not part of the chromosomal genome of a cell or virion, but rather a stable and heritable element that is replilcated and passed on to progeny (e.g. a replicative plasmid or transposon)

extrinsic genotype [GENO_0000524]

A specification of the known state of gene expression across a genome, and how it varies from some baseline/reference state. An extrinsic genotype describes variation in the ’expression level’ of genes in a cell or organism, as mediated by transient, gene-specific experimental interventions such as RNAi, morpholinos, TALENS CRISPR, or construct overexpression. This concept is relevant primarily for model organisms and systems that are subjected to such interventions to determine how altered expression of specific genes may impact organismal or cellular phenotypes in the context of a particular experiment. The ’extrinsic genotype’ concept is contrasted with the more familiar notion of an ‘intrinsic genotype’, describing variation in the inherent genomic sequence (i.e. ‘allelic state’). In G2P research, interventions affecting both genomic sequence and gene expression are commonly applied in order to assess the impact specific genomic features can have on phenotype and disease. It is in this context that we chose to model ’extrinsic’ alterations in expression as genotypes - to support parallel conceptualization and representation of these different types of genetic variation that inform the discovery of G2P associations.

female intrinsic genotype [GENO_0000647]

A genomic genotype here the genomic background specifies a female sex chromosome complement.

functional copy complement [GENO_0000963]

A set representing the complement of all functional versions of a specified sequence (typically that of a gene) in a particular genome. A ‘complement’ refers to an exhaustive collection of all objects that make up some well-defined set. Such a set may contain 0, 1, or more than one members. The notion of a complement is useful for defining many biologically-relevant sets of sequence features, such as the set of all functional copies of a particular sequence in a genome. This is known as the ‘functional copy number’ or ‘genetic dosage’ of the sequence. ‘Functional copies’ of a sequence are those that exhibit normal activity and/or produce gene products that exhibit normal activity associated with the sequence. The count of functional copies of a gene is often referred to as its ‘dosage’. In diploid organisms, the normal ‘dosage’ is 2 for autosomal genes/regions. Dosage increases if there is a duplication of a functional gene/region. Dosage decreases if there is either a deletion of a gene/region, or an inactivating mutation that eliminates gene function. This sets it apart from the notion of a ‘copy number complement’, which reflects how many copies of a sequence exist in a genome, regardless of their functionality. Addition of a non-functional allele of a gene will increase its copy number, but not increase its dosage. As we saw for ‘copy number complement’, the defining sequence here is specified in terms of a location on a reference sequence - typically the location where a gene or set of genes resides. But the criteria for membership in a ‘functional’ copy number complement require only that the feature can perform the functions associated with the gene or genes at the defining location. A gene allele that varies by only one nucleotide from the wild-type gene may not qualify as functional if that alteration eliminates the activity of the allele.

gained aneusomic chromosomal segment [GENO_0000344]

A part of some non-homologous chromosome that has been gained as the result of an unbalanced translocation event. Such additions of translocated chromosomal parts confer a trisomic condition to the duplicated region of the chromsome, and are thus considered to be ‘variant single locus complements’ in virtue of an abnormal number of features at a particular genomic location, rather than abnormal sequence within the location.

gained aneusomic chromosome [GENO_0000338]

A complete chromosome that has been abnormally duplicated in a genome, typically as the result of a meiotic non-disjunction event or unbalanced translocation This ‘gained’ chromosome is conceptually an ‘insertion’ in a genome that received two copies of a chromosome in a cell division following a non-disjunction event. As such, it qualifies as a type of sequence_alteration, and as a ’extra’ chromosome.

gene allele [GENO_0000014]

A genomic feature that represents one of a set of versions of a gene (i.e. a haplotype whose extent is that of a gene) In SO, the concept of a ‘gene’ is functionally defined, in that a gene necessarily produces a functional product. By contrast, the concept of a ‘gene allele’ here is positionally defined - representing the sequence present at the location a gene resides in a reference genome (based on sequence alignment). An Shh gene allele, for example, may be a fully functional wild-type version of the gene, a non-functional version carrying a deleterious point mutation, a truncated version of the gene, or even a complete deletion. In all these cases, an ‘Shh gene allele’ exists at the position where the canonical gene resides in the reference genome - even if the extent of this allele different than the wild-type, or even zero in the case of the complete deletion. A genomic feature being an allele_of a gene is based on its location in a host genome - not on its sequence. This means, for example, that the insertion of the human SMN2 gene into the genome of a mouse (see http://www.informatics.jax.org/allele/MGI:3056903) DOES NOT represent an allele_of the human SMN2 gene according to the GENO model - because it is located in a mouse genome, not a human one. Rather, this is a transgenic insertion that derives_sequence_from the human SMN2 gene. If this human SMN2 gene is inserted within the mouse SMN2 gene locus (e.g. used to replace mouse SMN2 gene), the feature it creates is an allele_of the mouse SMN2 gene (one that happens to match the sequence of the human ortholog of the gene). But again, it is not an allele_of the human SMN2 gene.

gene knockdown reagent [GENO_0000533]

[gene knockdown reagent; engineered_region]

gene part [GENO_0000666]

A genomic feature that is part of a gene, and delineated by some functional or structural function or role it serves (e.g.a promoter element, coding region, etc).

gene product [GENO_0000907]

The molecular product resulting from transcription of a single gene (either a protein or RNA molecule)

gene trap insertion [GENO_0000092]

[Insertion; gene trap insertion]

genetic material [GENO_0000482]

A nucleic acid molecule that contains one or more sequences serving as a template for gene expression in a biological system (ie a cell or virion). This class is different from genomic material in that genomic material is necessarily heritable, while genetic material includes genomic material, as well as any additional nucleic acids that participate in gene expression resulting in a cellular or organismal phenotype. So things like transiently transfected expression constructs would qualify as ‘genetic material but not ‘genomic material’. Things like siRNAs and morpholinos affect gene expression indirectly, (ie are not templates for gene expression), and therefore do not qualify as genetic material.

genomic background [GENO_0000611]

A genomic genotype that specifies the baseline sequence of a genome from which a variant genome is derived (through the introduction of sequence alterations).

genomic entity [GENO_0000897]

An generically dependent continuant that carries biological sequence that is part of or derived from a genome.

genomic feature [GENO_0000481]

A sequence feature (continuous extent of biological sequence) that is of genomic origin (i.e. carries sequence from the genome of a cell or organism) 1. A feature being ‘of genomic origin’ here means only that its sequence has been located to the genome of some organism by alignment with some reference genome. This is because the sequence was originally identified in, or artificially created to replicate, sequence from an organism’s genome. 2. The location of a genomic feature is defined by start and end coordinates based on alignment with a reference genome. Genomic features can span any size from a complete chromosome, to a chromosomal band or region, to a gene, to a single base pair or even junction between base pairs (this would be a sequence feature with an extent of zero). 3. As sequence features, instances of genomic features are identified by both their inherent sequence and their position in a genome - as determined by an alignment with some reference sequence. Accordingly, the ‘ATG’ start codon in the coding DNA sequence of the human AKT gene and the ‘ATG’ start codon in the human SHH gene represent two distinct genomic features despite having he same sequence, in virtue of their different positions in the genome.

genomic feature location [GENO_0000902]

The location of a sequence feature in a genome, defined by its start and end position on some reference genomic coordinate system 1. A genomic location (aka locus) is defined by its begin and end coordinates on a reference genome, independent of a particular sequence that may reside there. In GENO, we say that a genomic location is occupied_by a ‘sequence feature’ - where the identity of this feature depends on both it sequence, and its location in the genome (i.e. the locus it occupies). For example, the ‘ATG’ sequence beginning the ORF of the human SHH gene shares the same sequence as the ‘ATG’ beginning the ORF of the human AKT gene. But these are distinct sequence features because they occupy different genomic locations. 2. A given genomic location (e.g. the human SHH gene locus) may be occupied by different alleles (e.g. different alleles of the SHH gene). Within the genome of a single diploid organism, there is potential for two alleles to exist at such a locus (i.e. two different versions of the SHH gene). And across genomes of all members of a species, many more alleles of the SHH gene may exist and occupy this same locus. 3. The notion of a genomic location in the realm of biological sequences is analogous to a BFO:spatiotemporal region in the realm of physical entities. A spatiotemporal region can be occupied_by physical objects, while a genomic location is occupied_by sequence features. Just as a spatiotemporal region is distinct from an object that occupies it, so too a genomic locus is distinct from a sequence feature that occupies it. As a more concrete example, consider the distinction between a street address and the building that occupies it as analogous to the relationship between a genomic location and the feature that resides there.

genomic feature set [GENO_0000660]

A set of genomic features (i.e. sequence features that are of genomic origin). A genomic feature is any located sequence feature in the genome, from a single nucleotide to a gene into an entire chromosome. ‘Sets’ are used to represent entities that are typically collections of more than one member - e.g. the set of chromosomes that make up the human genome. But we allow for sets that contain 0 members (an ’empty’ set) or 1 member (a ‘singleton’ or ‘unit’ set), consistent with the concept of ‘mathematical sets’. For example, a ‘single locus complement’ at an X-linked locus in a XY male will consist of only one allele, as there is only one X-chromosome in the genome. Note also that sets may contain duplicates (i.e. more than one member representing the same feature). For example, a homozygous ‘single locus complement’ is a set comprised of two of the same feature. The notion of a ‘genomic feature set’ differs from that of a ‘genomic sequence set’ in that we are counting how many copies of the same sequence feature exist in a genome, as opposed to how many of the same sequence. ‘Genomic feature sets are useful for representing things like ‘single locus complements’, where members are sequence features whose identity is dependent on their location. By contrast, ‘genomic sequence sets’ are useful for describing things like ‘copy number complements’, which are concerned only with how many copies of a sequence exist in a genome, regardless of the location where these reside.

genomic genotype [GENO_0000899]

A genotype that describes the total variation in heritable genomic sequence of a cell or organism, typically in terms of alterations from some reference or background genotype. 1. A genomic genotype is a short-hand specification of a genome that uses a representational syntax comprised of information about a reference genome (‘genomic background’), and all specific variants from this reference (the ‘genomic variation complement’). Conceptually, this variant genome sequence can be resolved by substituting all sequences specified by the ‘genomic variation complement’ for the corresponding sequences in the reference ‘genomic background’ sequence. 2. ‘Heritable’ genomic sequence is that which is passed on to subsequent generations of cells/organisms, and includes all chromosomal sequences, the mitochondrial genome, and any transmissable extrachromosomal replicons.

genomic genotype (sex-agnostic) [GENO_0000000]

A genomic genotype that does not specify the sex determining chromosomal features of its bearer (i.e. does not indicate the background sex chromosome complement) In practice, most genotype instances classified as sex-agnostic genotypes because they are not sex-specific. When a genotype is indicated to be that of a male or female, it implies a known sex chromosome complement in the genomic background. This requires us to distinguish separate ‘sex-qualified’ genotype instances for males and females that share a common ‘sex-agnostic’ genotype. For example, male and female mice that of the same strain/background and containing the same set of genetic variations will have the same sex-agnostic intrinsic genotype, but different sex-qualified intrinsic genotypes (which take into account background sex chromosome sequence as identifying criteria for genotype instances).

genomic genotype (sex-qualified) [GENO_0000645]

A genomic genotype where the genomic background specifies a male or female sex chromosome complement. We distinguish the notion of a sex-agnostic intrinsic genotype, which does not specify whether the portion of the genome defining organismal sex is male or female, from the notion of a sex-qualified intrinsic genotype, which does. Male and female mice that contain the same background and genetic variation complement will have the same ‘sex-agnostic intrinsic genotype’, despite their genomes varying in their sex-chromosome complement. By contrast, these two mice would have different ‘sex-qualified intrinsic genotypes’, as this class takes background sex chromosome sequences into account in the identity criteria for its instances. Conceptually, a sex-qualified phenotype represents a superset of sequence features relative to a sex-agnostic intirnsic genotype, in that if specifies the background sex-chromosome complement of the genome.

genomic material [GENO_0000106]

A nucleic acid macromolecule that is part of a cell or virion and has been inherited from an ancestor cell or virion, and/or is capable of being replicated and inherited through successive generations of progeny. 1. Genomic material here is considered as a DNA or RNA molecule that is found in a cell or virus, and capable of being replicated and inherited by progeny cells or virus. As such, this nucleic acid is either chromosomal DNA, or some replicative epi-chromosomal plasmid or transposon. Genetic material is necessarily part of some ‘material genome’, and both are necessarily part of some cell or virion. So a genomic library is not considered a material genome/genetic material - rather, we could say that this genomic library is a ‘genomic material sample’ that bears the concretization of some genome. 2. Genomic material need not be inherited from an immediate ancestor cell or organism (e.g. a replicative plasmid or transposon acquired through some experimental modification), but such cases must be capable of being inherited by progeny cells or organisms.

genomic sequence [GENO_0000960]

A biological sequence that is of genomic origin (i.e. carries sequence from the genome of a cell or organism). A sequence being ‘of genomic origin’ here means only that it has been located to the genome of some organism by alignment with some reference genomic sequence. This is because the sequence was originally identified in, or artificially created to replicate, sequence from an organism’s genome.

genomic sequence set [GENO_0000872]

A set of genomic sequences (a biological sequence that is of genomic origin). A ‘genomic sequence set’ differs from a ‘genomic feature set’ in that we are counting how many copies of the same sequence exist in a genome, as opposed to how many of the same sequence feature. ‘Genomic sequence sets’ are useful for describing things like ‘copy number complements’, which are concerned only with how many copies of a sequence exist in a genome, regardless of the location where these reside. By contrast, ‘genomic feature sets are useful for representing things like ‘single locus complements’, where members are sequence features whose identity is dependent on their location.

genomic variation complement [GENO_0000009]

A genomic feature set representing all ‘variant single locus complements’ in a single genome, which together constitute the ‘variant’ component of a genomic genotype. A ‘complement’ refers to an exhaustive collection of all objects that make up some well-defined set. Such a complement may contain 0, 1, or more than one members. The notion of a complement is useful for defining many biologically-relevant sets of sequence features. Here, a ‘genomic variation complement’ is the set of all ‘single locus complements’ in a particular genome that harbor some known variation. In model organisms, the majority of genotypes describe variation at a single location in the genome (ie only one ‘single-locus variant complement’) that are variant realtive to some reference background. For example, the genotype instance ‘fgf8a<t1282a/+>(AB)’) exhibits a mutation at only one locus. But some genotypes describe variation at more than one location (e.g. a double mutant that has alterations in the fgf8a gene and the shh gene)).

genotype [GENO_0000536]

A specification of the genetic state of an organism, whether complete (defined over the whole genome) or incomplete (defined over a subset of the genome). Genotypes typically describe this genetic state as a diff between some variant component and a canonical reference. 1. Scope of ‘Genetic State’: ‘Genetic state’ is considered quite broadly in GENO to describe two general kinds of ‘states’. First, is traditional notion of ‘allelic state’ - defined as the complement of alleles present at a particular location or locations in a genome (i.e. across all homologous chromosomes containing this location). Here, a genotype can describe allelic state at a specific locus in a genome (an ‘allelic genotype’), or describe the allelic state across the entire genome (‘genomic genotype’). Second, this concept can also describe states of genomic features ’extrinsic’ to their intrinsic sequence, such as the expression status of a gene as a result of being specifically targeted by experimental interventions such as RNAi, morpholinos, or CRISPRs. 2. Genotype Subtypes: In GENO, we use the term ‘intrinsic’ for genotypes describing variation in genomic sequence, and ’extrinsic’ for genotypes describing variation in gene expression (e.g. resulting from the targeted experimental knock-down or over-expression of endogenous genes). We use the term ’effective genotype’ to describe the total intrinsic and extrinsic variation in a cell or organism at the time a phenotypic assessment is performed. Two more precise conccepts are subsumed by the notion of an ‘intrinsic genotype’: (1) ‘allelic genotypes’, which specify allelic state at a single genomic location; and (2) ‘genomic genotypes’, which specify allelic state across an entire genome. In both cases, allelic state is typically specified in terms of a differential between a reference and a set of 1 or more known variant features. 3. The Genotype Partonomy: ‘Genomic genotypes’ describing sequence variation across an entire genome are ‘decomposed’ in GENO into a partonomy of more granular levels of variation. These levels are defined to be meaningful to biologists in their attempts to relate genetic variation to phenotypic features. They include ‘genomic variation complement’ (GVC), ‘variant single locus complement’ (VSLC), ‘allele’, ‘haplotype’, ‘sequence alteration’, and ‘genomic background’ classes. For example, the components of the zebrafish genotype “fgf8a<ti282a/ti282a>; fgf3<t24149/+>[AB]”, described at zfin.org/ZDB-FISH-150901-9362, include the following elements: - GVC: fgf8a<ti282a/ti282a>; fgf3<t24149/+> (total intrinsic variation in the genome) - Genomic Background: AB (the reference against which the GVC is variant) - VSLC1: fgf8a<ti282a/ti282a> (homozygous complement of gene alleles at one known variant locus) - VSLC2: fgf3<t24149/+> (heterozygous complement of gene alleles at another known variant locus) - Allele 1: fgf8a (variant version of the fgf8a gene, present in two copies) - Allele 2: fgf3 (variant version of the fgf3 gene, present in one copy) - Allele 3: fgf3<+> (wild-type version of the fgf3 gene, present in one copy) - Sequence Alteration1: (the specific mutation within the fgf8a gene that makes it variant) - Sequence Alteration2: (the specific mutation within the fgf3 gene that makes it variant) A graphical representation of this decomposition that maps each element to a visual depiction of the portion of a genome it denotes can be found here: https://github.com/monarch-initiative/GENO-ontology/blob/develop/README.md One reason that explicit representation of these levels is important is because it is at these levels that phenotypic features are annotated to genetic variations in different clinical and model organism databases For example, ZFIN typically annotates phenotypes to effective genotypes, MGI to intrinsic genotypes, Wormbase to variant alleles, and ClinVar to haplotypes and sequence alterations. The ability to decompose a genotype into representations at these levels allows us to “propagate phenotypes” up or down the partonomy (e.g. infer associations of phenotypes annotated to a genotype to its more granular levels of variation and the gene(s) affected). This helps to supporting integrated analysis of G2P data.

genotype-phenotype association [GENO_0000833]

[genotype-phenotype association; association has object; has_qualifier; environmental system]

germline allele origin [GENO_0000888]

Describes an allele that is inherited from a parent in virtue of the allele being present in one or both of the parent’s germ cells (sperm or egg). We distinguish germline, somatic, and de novo allele origin based on a combination two key criteria - whether the allele inherited from a parent, and whether it is *heritble’ by offspring. Germline variants are both inherited (passed down from a parent) and heritable (passable down to offspring). By contrast, somatic variants are neither inherited or heritable - having originated via a spontaneous mutation in a non-germ cell. De novo mutations in germ cells are not inherited but are typically heritable, as they originated through a spontaneous mutation that made them present a germ cells.

gneg [GENO_0000620]

[chromosomal band intensity; gneg]

gpos [GENO_0000619]

[chromosomal band intensity; gpos]

gpos100 [GENO_0000622]

[gpos100]

gpos25 [GENO_0000625]

[gpos25]

gpos33 [GENO_0000633]

[gpos33]

gpos50 [GENO_0000624]

[gpos50]

gpos66 [GENO_0000632]

[gpos66]

gpos75 [GENO_0000623]

[gpos75]

gvar [GENO_0000621]

[chromosomal band intensity; gvar]

haplotype [GENO_0000871]

A set of discrete, genetically-linked sequence alterations that reside on the same chromosomal strand and are typically co-inherited within a haplotype block. A haplotype is a set of non-overlapping alleles that reside in close proximity on the same DNA strand. We model them as ‘complements’ because they include all known/relevant alleles within a defined region in the genome (e.g. a ‘gene’, or a ‘haplotype block’) - where this set may consist of 0, 1, or more alterations from some reference. Because they are genetically linked, the alleles comprising a haplotype are likely to be co-inherited and survive descent across many generations of reproduction. As highlighted in https://en.wikipedia.org/wiki/Haplotype, the term ‘haplotype’ is most commonly used to describe the following scenarios of genetic linkage between ‘alleles’: 1. The ‘alleles’ comprising the haplotype are ‘single nucleotide polymorphisms’ (SNPs) or other small alterations, which collectively tend to occur together on a chromosomal strand). This use of ‘haplotype’ is commonly seen in phasing of patient WGS or WES data, to describe a state where two or more alterations that are believed to occur ‘in cis’ on the same chromosomal strand. 2. The ‘alleles’ comprising the haplotype are SNPs or other short alterations, which collectively define a specific version of a gene. In this case, the locaiton bounding the haplotype corresponds to a gene locus, and the haplotype defines a specific allele of that gene (i.e ‘gene allele’). “Star alleles” of PGx genes are examples of this category of haplotype (e.g. https://www.ebi.ac.uk/cgi-bin/ipd/imgt/hla/get_allele_hgvs.cgi?A*33:01:01, https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4724253/). 3. Each of the ‘alleles’ comprising the haplotype is itself a ‘gene allele’ (i.e. a specific version of an entire gene), such that the haolotype contains multiple complete ‘gene alleles’ that are co-inherited because they reside in tightly linked clusters on a single chromosome. Each of these more specific definition serves a purpose for a particular type of genetic analysis or use case. The GENO definition of ‘haplotype’ is broadly inclusive of these and any other scenarios where distinct ‘alleles’ of any kind on the same chromosomal strand are genetically linked, and thus tend to be co-inherited across successive generations.

haplotype block [GENO_0000898]

A sequence feature representing a region of the genome over which there is little evidence for historical recombination, such that sequence alterations it contains are typically co-inherited across generations. A particular haplotype block is defined by the set of sequence alterations it is known to contain, which collectively represent a ‘haplotype’. The boundaries of haplotype blocks are defined in efforts to identify haplotypes that exist in organisms or populations. A haplotype block may span any number of sequence alterations, and may cover small or large chromosomal regions - depending on the number of recombination events that have occurred between the alterations defining the haplotype.

hemizygous [GENO_0000134]

[hemizygous; disomic zygosity]

hemizygous insertion-linked [GENO_0000606]

[hemizygous insertion-linked]

hemizygous X-linked [GENO_0000604]

[hemizygous X-linked]

hemizygous Y-linked [GENO_0000605]

[hemizygous Y-linked]

heritabililty [GENO_0000138]

The disposition of an entity to be transmitted to subsequent generations following a genetic replication or organismal reproduction event. We can use these terms to describe the heritability of genetic matieral or sequence features - e.g. chromosomal DNA or genes are heritable in that they are passed on to child cells/organisms). Such genetic material has a heritable disposition in a cell or virion, in virtue of its being replicated in its cellular host and inherited by progeny cells (such that the sequence content it encodes is stably propagated in the genetic material of subsequence generations of cells). We can also use these terms to describe the heritability of phenotypes/conditions - e.g. the passage of a particular trait or disease across generations of reproducing cells/organisms.

heritable [GENO_0000139]

[heritable; heritabililty]

heteroplasmic [GENO_0000603]

an allelic state where more than one type of allele exists at a particular location in the organellar genome (mitochondrial or plastid) of a cell/organism.

heteroplasmic mitochondrial inheritance [GENO_0000892]

A mitochondrial inheritance pattern whereby manifestation of a trait is observed when some inherited mitochondria contian the causative allele and some do not.

heterozygous [GENO_0000135]

[heterozygous; disomic zygosity]

homo sapiens gene [GENO_0000054]

A gene that originates from the genome of a homo sapiens.

homoplasmic [GENO_0000602]

an allelic state where a single allele exists at a particular location in the organellar genome (mitochondrial or plastid) of a cell/organism.

homoplasmic mitochondrial inheritance [GENO_0000893]

A mitochondrial inheritance pattern whereby manifestation of a trait occurs when only mitochondria containing the causative allele are inherited.

homozygous [GENO_0000136]

[homozygous; disomic zygosity]

human population [GENO_0000111]

a population of homo sapiens grouped together in virtue of their sharing some commonality (either an inherent attribute or an externally assigned role)

in cis [GENO_0000131]

A quality inhering in a collection of discontinuous sequence features in a single genome that reside on the same macromolecule (eg the same chromosomes).

in trans [GENO_0000132]

A quality inhering in a collection of discontinuous sequence features in a single genome that reside on different macromolecules (e.g. different chromosomes).

incomplete autosomal dominant inheritance [GENO_0000145]

An autosomal dominant inheritance pattern wherein the trait expressed in a heterozygous individual is intermediate between the trait expressed in individuals homozygous for either allele in the heterozygous locus.

incomplete X-linked dominant inheritance [GENO_0000938]

An X-linked dominant inheritance pattern wherein the trait expressed in a heterozygous individual is intermediate between the trait expressed in individuals homozygous for either allele in the heterozygous locus.

incomplete Z-linked dominant inheritance [GENO_0000945]

A Z-linked dominant inheritance pattern wherein the trait expressed in a heterozygous individual is intermediate between the trait expressed in individuals homozygous for either allele in the heterozygous locus.

inheritance pattern [GENO_0000141]

The pattern in which a genetic trait or condition is passed from one generation to the next, as determined by genetic interactions between alleles of the causal gene, and interactions between these alleles and the environment. An inheritance pattern results from the disposition of a genetic variant to cause a particular trait or phenotype when it is present in a particular genetic and environmental context. Here, “genetic context” refers to the allelic state of the variant, which depends on what other alleles exist at the same location/locus in the genome. Zygosities such as heterozygous and homozygous are simple, common examples of ‘states’ of an allele. These genetic and environmental “interactions” of alleles play out at the level of the gene products produced by the causal alleles, and are observable in the pattern with which the trait caused by an allele is inherited across generations of individuals. Thus, an inheritance pattern such as dominance is not inherent to a single allele or its phenotype, but rather a result of the relationship between two alleles of a gene and the phenotype that results in a given environment. This also means that the ‘dominance’ of an allele is context dependent - Allele 1 can be dominant over Allele 2 in the context of Phenotype X, but recessive to Allele 3 in the context of Phenotype Y.

inherited allele origin [GENO_0000974]

Describes an allele that is inherited from a parent.

integrated transgene [GENO_0000093]

A transgene that has been integrated into a chrromosome in the host genome. An integrated transgene differs from a transgenic insertion in that a transgenic insertion may contain single transgene, a partial transgene that needs endognous sequences from the host genome to become functional (e.g. an enhancer trap), or multiple transgenes (i.e. be polycistronic). Fiurthermore, the transgenic insertion may contain sequences in addition to its transgene(s - e.g. sequences flanking the transgene reqired for integration or replicaiton/maintenance in the host genome. The term ‘integrated transgene’ covers individual transgenes that were delivered in whole or in part by a transgenic insertion. An ‘integrated transgene’ differs from its parent ’transgene’ in that transgenes can include genes introduced into a cell/organism on an extra-chromosomal plasmid that is never integrated into the host genome.

intrinsic genotype [GENO_0000719]

A genotype that describes the total variation in heritable genomic sequence of a cell or organism, typically in terms of alterations from some reference or background genotype. 1. A genomic genotype is a short-hand specification of a genome that uses a representational syntax comprised of information about a reference genome (‘genomic background’), and all specific variants from this reference (the ‘genomic variation complement’). Conceptually, this variant genome sequence can be resolved by substituting all sequences specified by the ‘genomic variation complement’ for the corresponding sequences in the reference ‘genomic background’ sequence. 2. ‘Heritable’ genomic sequence is that which is passed on to subsequent generations of cells/organisms, and includes all chromosomal sequences, the mitochondrial genome, and any transmissable extrachromosomal replicons.

karyotype [GENO_0000644]

A genotype that describes what is known about variation in a genome at a gross structural level, in terms of the number and appearance of chromosomes in the nucleus of a eukaryotic cell. Karyotypes describe structural variation across a genome at the level of chromosomal morphology and banding patterns detectable in stained chromosomal spreads. This coarser level does not capture more granular levels of variation commonly represented in other forms of genotypes (e.g. specific alleles and sequence alterations). A base karyotype representing a genome with no known structural variation can be as simple as ‘46XY’, but karyotypes typically contains some gross variant component (such as a chromosome duplication or translocation).

knockdown reagent targeted gene complement [GENO_0000839]

[knockdown reagent targeted gene complement; has_variant_part; reagent-targeted gene complement]

location-qualified sequence feature [GENO_0000736]

A sequence feature whose identity is additionally dependent on the cellular or anatomical location of the genetic material bearing the feature. As a qualified sequence feature, the BRCA1c.5096G>A variant as materialized in a somatic breast epithelial cell could be distinguished as a separate entity from a BRCA1c.5096G>A variant in a different cell type or location (e.g. germline BRCA1 varaint in a sperm cell).

long chromosome arm [GENO_0000629]

A chromosome arm that is the longer of the two arms of a given chromosome.

lost aneusomic chromosomal segment [GENO_0000345]

A deletion of a terminal portion of a chromosome resulting from an unbalanced translocation to another chromosome. This is not a deletion in the sense defined by the Sequence Ontology in that it is not the result of an ’excision’ of nucleotides, but an unbalanced translocation event. The allelic complement that results is comprised of the terminus or junction represented by this lost chromosomal segment, and the remaining normal segment in the homologous chromosome. The lost aneusommic chromosomal segment is typically accommpanied by a gained aneusomic chromosomal segment from another chromosome. Loss of translocated chromosomal parts can confer a monosomic condition to a region of the chromsome. This results in a ‘variant single locus complement’ - in virtue of an abnormal number of features at a particular locus, rather than abnormal sequence within the locus.

lost aneusomic chromosome [GENO_0000339]

A ‘deletion’ resulting from the loss of a complete chromosome, typically as the result of a meiotic non-disjunction event or unbalanced translocation.

major polymorphic allele [GENO_0000498]

A polymorphic allele that is present at the highest frequency relative to other polymorphic variants at the same genomic location.

male intrinsic genotype [GENO_0000646]

A genomic genotype here the genomic background specifies a male sex chromosome complement.

material genome [GENO_0000108]

A material entity that represents all genetic material in a cell or virion. The material genome is typically molecular aggregate of all the chromosomal DNA and epi-chromosomal DNA that represents all sequences that are heritable by progeny of a cell or virion. A genome is the collection of all nucleic acids in a cell or virus, representing all of an organism’s hereditary information. It is typically DNA, but many viruses have RNA genomes. The genome includes both nuclear chromosomes (ie nuclear and micronucleus chromosomes) and cytoplasmic chromosomes stored in various organelles (e.g. mitochondrial or chloroplast chromosomes), and can in addition contain non-chromosomal elements such as replicative viruses, plasmids, and transposable elements. Note that at present, a material genome and genetic material are necessarily part of some cell or virion. So a genomic library is not considered a material genome/genetic material - rather, we could say that this genomic library is a ‘genomic material sample’ that bears the concretization of some SO:genome.

maternal allele origin [GENO_0000878]

Describes an allele that is inherited from a female parent in virtue of the allele being present in the mother’s egg.

microsatellite alteration [GENO_0000873]

A relation used to describe an environment contextualizing the identity of an entity.

minor polymorphic allele [GENO_0000499]

A polymorphic allele that is not present at the highest frequency among all fixed variants at the locus (i.e. not the major polymorphic allele at a given location).

mitochondrial inheritance [GENO_0000949]

An inheritance pattern observed for traits related to a gene encoded on the mitochondrial genome. Because the mitochondrial genome is essentially always maternally inherited, a mitochondrial condition can only be transmitted by females, although the condition can affect both sexes. The proportion of mutant mitochondria can vary (heteroplasmy).

modification-qualified sequence feature [GENO_0000818]

A sequence feature whose identity is additionally dependent on a chemical modification made to the genetic material bearing the feature (e.g. binding of transcriptional regulators, or epigenetic modifications including direct DNA methylation, or modification of histones associated with a feature)

monogenic inheritance [GENO_0000933]

An inheritance pattern wherein the trait is determined by alleles of a single causal gene, possibly together with environmental factors.

mosaic [GENO_0000964]

A clonal distribution in which an allele arose during embryogenesis and is present in a subset of tissues derived from some common developmental cell or tissue type. [mosaic; clonal]

multifactorial inheritance [GENO_0000929]

An inheritance pattern that depends on a mixture of major and minor genetic determinants (i.e. alleles of more than one contributing genes), possibly together with environmental factors. Diseases inherited in this manner are termed ‘complex diseases’.

mus musculus gene [GENO_0000057]

A gene that originates from the genome of a mus musculus.

mus musculus strain [GENO_0000118]

[strain or breed; mus musculus strain]

mutant [GENO_0000480]

An attribute inhering in a feature bearing a sequence alteration that is present at very low levels in a given population (typically less than 1%), or that has been experimentally generated to alter the feature with respect to some reference sequence.

mutation [GENO_0000492]

A sequence alteration that is very rare allele in a population (typically <1%), or an experimentally-induced variation that derives from a wild-type feature in a given strain.

non-heritable [GENO_0000140]

[non-heritable; heritabililty]

novel [GENO_0000685]

An attribute of a genomic feature that represents a feature not previously found in a given genome, e.g. an extrachromosomal replicon or aneusomic third copy of a chromosome.

novel extrachromosomal replicon [GENO_0000681]

An extrachromosomal replicon that is variant in a genome in virtue of its being a novel addition to the genome - i.e. it is not present in the reference for the genome in which it is found. Extrachromosomal replicons are replicated and passed on to descendents, and thus part of the heritable genome of a cell or organism. In cases where the presence of such a replicon is exogenous or aberrant (i.e. not included in the reference for that genome), the replicon is considered a ‘sequence alteration’.

novel replicon [GENO_0000684]

A genomic feature that represents an entirely new replicon in the genome, e.g. an extrachromosomal replicon or an extra copy of a chromosome. Novel replicons are considered as an ‘insertion’ in a genome, and as such, qualify as types of sequence_alterations and variant alleles. There is no pre-existing locus that it modifies, however, and thus it is not really an ‘allele of’ a named locus. But conceptually, we still consider these to represent genetic variants and classify them as variant alleles.

nullizygous [GENO_0000978]

A disomic zygosity quality inhering in a ‘single locus complement’ that is comprised of two non-functional copies of a gene. Loss of function may result from the gene being entirely missing via a deletion, or mutated in a way that eliminates its function.

obsolete reporter role [GENO_0000910]

[obsolete reporter role; sequence feature attribute]

oligogenic inheritance [GENO_0000931]

A multifactorial inheritance pattern that is determined by the simultaneous action of alleles in few genes. It is recommended this term be used for traits governed by three gene loci, although it is noted that usage of this term in the literature is not uniform.

organellar plasmy [GENO_0000918]

An allelic state that describes the number of different alleles of a gene from an organellar genome (i.e. mitochondrial, plastid) that may exist in a cell. Cells with a population of organelles from a single origin that all share the same organellar genome will contain only one allele of each organellar gene, while cells with populations of organelles of different origins may contain more than one allele of a given organellar gene.

organismal entity [GENO_0000904]

A material entity that is an organism, derived from an organism, or composed of organisms (e.g. a cell line, biosample, tissue culture, population, etc).

oryzias latipes strain [GENO_0000887]

[strain or breed; oryzias latipes strain; has_member; Oryzias latipes]

P-element construct [GENO_0000850]

A construct that contains a mobile P-element, holding sequences to be delivered to a target cell or genome.

paternal allele origin [GENO_0000879]

Describes an allele that is inherited from a male parent in virtue of the allele being present in the father’s sperm.

phenotypic inheritance process [GENO_0000770]

[phenotypic inheritance process; biological process]

polygenic inheritance [GENO_0000932]

A multifactorial inheritance pattern that is determined by the simultaneous action of alleles a large number of genes. Typically used for traits/conditions governed by more than three gene loci.

polymorphic [GENO_0000477]

An attribute inhereing in a sequence feature for which there is more than one version fixed in a population at some significant percentage (typically 1% or greater), where the locus is not considered to be either reference or a variant.

polymorphic allele [GENO_0000497]

An allele that is fixed in a population at some stable level, typically > 1%. Polymorphic alleles reside at loci where more than one version exists at some signifcant frequency in a population. Polymorphic alleles are contrasted with mutant alleles (extremely rare variants that exist in <1% of a population), and ‘wild-type alleles’ (extremenly common variants present in >99% of a population). Polymorphic alleles exist in equilibrium in a given population somewhere between these two extremes (i.e. >1% and <99%).

qualified genomic feature [GENO_0000714]

A qualified sequence feature that carries sequence derived from the genome of a cell or organism.

qualified genomic feature set [GENO_0000715]

A set of qualified sequence features that carry genomic sequence. A ‘complement’ refers to an exhaustive collection of all objects that make up some well-defined set. This notion is useful for defining biologically-relevant sets of sequence features. For example, a haplotype is defined as the set of all genetically-linked alleles on a single chromosomal strand at a defined location - e.g. the SNP alleles {rs7412-C, rs429358-C} comprise the haplotype defining the APOEɛ4 gene allele. A complements may contain 0, 1, or more than one members. For example, the complement of alleles at a defined locus across homologous chromosomes in an individual’s genome will consist of two members for autosomal locations, and one member for non-homologous locations on the X and Y chromosome.

qualified sequence feature [GENO_0000919]

A sequence feature whose identity is additionally dependent on the context or state of the material sequence molecule in which the feature is concretized. This context/state describes factors external to the feature’s intrinsic sequence and position that can influences its expression, such as being targeted by gene-knockdown reagents, or an epigenetic modification. Modeling sequence entities at this ‘qualified’ level is useful for distinguishing cases where features with identical sequence and position as separate instances - based on their material bearers being found in different contexts. For example, consider a situation where the zebrafish shha gene (a sequence feature) is targeted in two experimental groups of fish by two different morpholinos, and phenotypes are assessed for each. We want to be able to represent two ‘variants’ of the shha gene in this scenario as separate ‘qualified sequence feature’ instances so we can capture data about the phenotypes resulting from each - just as we would separately represent to different sequence variants (alleles) of the shha gene at the sequence feature level so that we can track their associated phenotypes. GENO defines three levels of sequence-related artifacts, which are distinguished by their identity criteria. 1. ‘Biological sequence’ identity is dependent only on the ordering of units that comprise the sequence. 2. ‘Sequence feature’ identity is dependent on its sequence and the genomic location of the sequence (this is consistent with the definition of ‘sequence feature’ in the Sequence Ontology). 3. ‘Qualified sequence feature’ identity is additionally dependent on some aspect of the physical state or context of the genetic material in which the feature is concretized. This third criteria is extrinsic to its sequence and its genomic location. For example, the feature’s physical concretization being targeted by a gene knockdown reagent in a cell (e.g. the zebrafish Shha gene as targeted by the morpholino ‘Shha-MO1’), or its being transiently expressed from a recombinant expression construct (e.g. the human SHH gene as expressed in a mouse Shh knock-out cell line), or its having been epigenetically modified in a way that alters its expression level or pattern (e.g. the human SHH gene with a specific methylation pattern).

qualified sequence feature or collection [GENO_0000713]

A sequence feature (or collection of features) whose identity is dependent on the context or state of its material bearer (in addition to its sequence an position). This context/state describes factors external to its inherent sequence and position that can influences its expression, such as being targeted by gene-knockdown reagents, or an epigenetic modification.

qualified sequence feature set [GENO_0000920]

A set of qualified seqeunce features. ‘Sets’ are used to represent entities that are typically collections of more than one member. But we allow for sets that contain 0 members (an ’empty’ set) or 1 member (a ‘singleton’ or ‘unit’ set), consistent with the concept of ‘matehmatical sets’.

reagent targeted gene [GENO_0000504]

A gene altered in its expression level in the context of some experiment as a result of being targeted by gene-knockdown reagent(s) such as a morpholino or RNAi. The identity of a given instance of a reagent-targeted gene is dependent on the experimental context of its knock-down - specifically what reagent was used and at what level. For example, the wild-type shha zebrafish gene targeted in epxeriment 1 by morpholino1 annd in experiment 2 by morpholino 2 represent two distinct instances of a ‘reagent-targeted gene’, despite sharing the same sequence and position.

reagent-targeted gene complement [GENO_0000527]

A set comprised of all reagent-targeted genes in a single genome in the context of a given experiment (e.g. the zebrafish shha and shhb genes in a zebrafish exposed to morpholinos targeting both of these genes). A ‘complement’ refers to an exhaustive collection of all objects that make up some well-defined set. Such a complement may contain 0, 1, or more than one members. The notion of a complement is useful for defining many biologically-relevant sets of sequence features. For example, a ‘reagent-targeted gene complement’ is the set of all genes in a particular genome that are targeted by reagents in the context of a particular experiment.

reagent-targeted gene subregion [GENO_0000534]

A region within a gene that is specifically targeted by a gene knockdown reagent, typically in virtue of bearing sequence complementary to the reagent.

reference [GENO_0000152]

An attribute inhering in a feature that is designated to serve as a standard against which ‘variant’ versions of the same location are compared. Being ‘reference’ is a role or status assigned in the context of a data set or analysis framework. A given allele can be reference on one context and variant in another.

reference allele [GENO_0000036]

An allele whose sequence matches what is consdiered to be the reference sequence at that location in the genome. Being a ‘reference allele’ is a role or status assigned in the context of a specific dataset or analysis. In human variation datasets, ‘reference’ status is typically assigned based on factors such as being the most common in a population, being an ancestral allele, or being indentified first as a prototypical example of some feature or gene. For example, ‘reference alleles’ in characterizing SNPs often represent the allele first characterized in a reference genome, or the most common allele in a population. In model organism datasets, ‘reference’ alleles are typically (but not always) the ‘wild-type’ variant at a given locus, representing a functional and unaltered version of the feature that is part of a defined genomic background, and against which natural or experimentally-induced alterations are compared.

reference genome [GENO_0000914]

A genome whose sequence is identical to that of a genome sequence considered to be the reference.

reference sequence [GENO_0000017]

A sequence that serves as a standard against which other sequences at the same location are compared. A reference sequence is one that serves as a standard against which ‘variant’ versions of the feature are compared, or against which located sequence features within the reference region are aligned in order to assign position information. Being ‘reference’ does not imply anything about the frequency or function of features bearing the sequence. Only that some agent has used it to serve a reference role in defining a variant or locating a sequence.

regulatory transgene region [GENO_0000637]

A transgene part whose sequence regulates the synthesis of a functional product, but which is not itself transcribed.

repeat region alteration [GENO_0000874]

A relation used to describe a process contextualizing the identity of an entity.

reporter region [GENO_0000640]

[reporter region; expressed transgene region]

reporter transgene [GENO_0000667]

A transgene that codes for a product used as a reporter of gene expression or activity.

RNA residue [GENO_0000781]

[biological sequence unit; RNA residue]

RNA sequence [GENO_0000721]

[has_sequence_unit; biological sequence; RNA residue; RNA sequence]

selectable marker [GENO_0000911]

[selectable marker; sequence feature attribute]

selectable marker region [GENO_0000912]

[selectable marker region; expressed transgene region]

selectable marker transgene [GENO_0000642]

A transgene whose product is used as a selectable marker.

sequence feature attribute [GENO_0000788]

An attribute, quality, or state of a sequence feature or collection. Sequence feature attributes can be ‘intrinsic’ - reflecting feature-level characteristics that depend only on the sequence, location, or genomic context of a feature or collection, or ’extrinsic’ - reflecting characteristics of the physical molecule in which the feature is concretized (e.g. its cellular context, source of origin, physical appearance, etc.). Intrinsic attributes include things like allelic state, allelic phase. Extrinsic attributes include things like its cellular distribution and chromosomal band intensity.

sequence feature location [GENO_0000815]

The location of a sequence feature as defined by its start and end position on some reference coordinate system. 1. A sequence feature location is defined by its begin and end coordinates on a reference sequence, but it is not identified by a particular sequence that may reside there. The same location, as defined on a particular reference, may be occupied by different sequences in the genome of organism 1 vs that of organism 2 (e.g. if a SNV exists within this location in only one of the organisms). 2. The notion of a sequence feature location in the realm of biological sequences is analogous to a BFO:spatiotemporal region in the realm of physical entities. A spatiotemporal region can be ‘occupied by’ physical objects, while a genomic location is ‘occupied by’ sequence features. Just as a spatiotemporal region is distinct from an object that occupies it, so too a genomic location is distinct from a sequence feature that occupies it. As a more concrete example, consider the distinction between a street address and the building that occupies it as analogous to the relationship between a genomic locus and the sequence feature that resides there.

sequence feature or set [GENO_0000701]

A sequence feature or a set of such features. GENO defines three levels of sequence-related artifacts, which are distinguished by their identity criteria. 1. ‘Biological sequence’ identity is dependent only on the ordering of units that comprise the sequence. 2. ‘Sequence feature’ identity is dependent on its sequence and the genomic position if the sequence (aligns with definition of ‘sequence feature’ in the Sequence Ontology). 3. ‘Qualified sequence feature’ identity is additionally dependent on some aspect of the physical context of the genetic material bearing the feature, extrinsic to its sequence and its genomic position. For example, its being targeted by gene knockdown reagents, its being transgenically expressed in a foreign cell from a recombinant expression construct, its having been epigenetically modified in a way that alters its expression level or pattern, or its being located in a specific cellular or anatomical location.

sequence feature set [GENO_0000659]

A set of sequence features. ‘Sets’ are used to represent entities that are typically collections of more than one member. But we allow for sets that contain 0 members (an ’empty’ set) or 1 member (a ‘singleton’ or ‘unit’ set), consistent with the concept of ‘mathematical sets’. Sets may also include duplicates (i.e. contain more than one member representing the same feature). The notion of a ‘complement’ is a special case of a set, where the members necessarily comprise an exhaustive collection of all objects that make up some well-defined set. It is useful for defining many biologically-relevant sets of sequence features. For example, a ‘haplotype’ is the set of all genetically-linked alleles on a single chromosomal strand at a defined location - e.g. the SNP alleles {rs7412-C, rs429358-C} comprise the haplotype defining the APOEɛ4 gene allele [1]. And a ‘single locus complement’ is the set of all alleles at a specified location in a particular genome - e.g. the APOEɛ4 and APOEɛ4 gene alleles ([1], [2]) that make up the ‘Gs270’ APOE genotype [3]. [1] https://www.snpedia.com/index.php/APOE-%CE%B54 [2] https://www.snpedia.com/index.php/APOE-%CE%B52 [3] https://www.snpedia.com/index.php/Gs270

sequence interval [GENO_0000965]

A pair of integers representing start and end position of a location on a sequence coordinate system.

sex-limited autosomal dominant inheritance [GENO_0000952]

An autosomal dominant inheritance pattern wherein the trait manifests in heterozygotes in a sex-specific manner (i.e. only in males or only in females).

sex-limited autosomal recessive inheritance [GENO_0000953]

An autosomal recessive inheritance pattern wherein the trait manifests only in homozygotes, and in a sex-specific manner (i.e. only in males or only in females).

short chromosome arm [GENO_0000628]

A chromosome arm that is the shorter of the two arms of a given chromosome.

simple heterozygous [GENO_0000458]

a heterozygous quality inhering in a single locus complement comprised of one variant allele and one wild-type/reference allele (e.g.fgf8a<ti282a/+>) [simple heterozygous]

single locus complement [GENO_0000516]

A set representing the complement of all sequence features occupying a particular genomic location across all homologous chromosomes in the genome of a single organism. A ‘complement’ refers to an exhaustive collection of all objects that make up some well-defined set. Such a complement may contain 0, 1, or more than one members. The notion of a complement is useful for defining many biologically-relevant sets of sequence features. Here, a ‘single locus complement’ is the set of all alleles at a specified location in a particular genome. This complement is typically a pair of two features in a diploid genome (with two copies of each chromosome). E.g. a gene pair, a QTL pair, a nucleotide pair for a SNP, or a pair of entire chromosomes. The fact that we are counting how many copies of the same sequence exist in a genome, as opposed to how many of the same feature, is what sets feature-level concepts like ‘single locus complement’. apart from sequence-level concepts like ‘copy number complement’. To illustrate the difference, consider a duplication event that creates a new copy of the human APOE gene on a different chromosome. This creates an entirely new sequence feature at a distinct locus from that of the original APOE gene. The ‘copy number complement’ for sequence defined by the APOE gene locus would have a count of three, as this sequence is present three times in the genome. But the ‘single locus complement’ at the APOE gene locus would still have a count of two - because the duplicated copy is at a different location in the genome, and therefore does not represent a copy of the APOE locus.

somatic allele origin [GENO_0000882]

Describes an allele that result from some spontaneous mutation event in a somatic cell after fertilization, and thus are not present in every cell in the body. We distinguish germline, somatic, and de novo allele origin based on a combination two key criteria - whether the allele inherited from a parent, and whether it is *heritble’ by offspring. Somatic variants are neither inherited or heritable - having originated via a spontaneous mutation in a non-germ cell. By contrast, germline variants are both inherited (passed down from a parent) and heritable (passable down to offspring). De novo mutations are not inherited but are typically heritable, as they originated through a spontaneous mutation that made them present in germ cells. These acquired mutations are called ‘somatic’ because they typically affect somatic (non-germ) cells. But when spontaneous do mutations occur in the germ cells of an organism, these can be passed on to offspring in whom they will be considered de novo mutations.

strain or breed [GENO_0000112]

A maximal collection of organisms of a single species that have been bred or experimentally manipulated with the goal of being genetically identical. Two mice colonies with the same genotype information, but maintained in different labs, are different strains (many examples of this in MGI/IMSR)

taxonomic group [GENO_0000113]

[collection of organisms; taxonomic group]

terminus [GENO_0000688]

A sequence feature representing the end of a sequence that is bounded only on one side (e.g. at the end of an chromosome or oligonucleotide).

transgene part [GENO_0000460]

A structurally or functionally defined component of a transgene (e.g. a promoter, a region coding for a fluorescent protein tag, etc)

transiently-expressed transgene [GENO_0000506]

A transgene that is delivered as part of a DNA expression construct into a cell or organism in order to transiently express a specified product (i.e. it has not integrated into the host genome).

transiently-expressed transgene complement [GENO_0000528]

The set of all transgenes trransiently expressed in a biological system in the context of a given experiment.

trisomic heterozygous [GENO_0000394]

[aneusomic zygosity; trisomic heterozygous]

trisomic homozygous [GENO_0000393]

[trisomic homozygous; aneusomic zygosity]

undetermined inheritance [GENO_0000889]

An inheritance pattern that is not determined or not known.

uniparental allele origin [GENO_0000975]

Describes an allele that is part of an allelic complement where both alleles are inherited from the same parent. From Wikidedia: Uniparental inheritance is a non-mendelian form of inheritance that consists of the transmission of genotypes from one parental type to all progeny. That is, all the genes in offspring will originate from only the mother or only the father. This phenomenon is most commonly observed in eukaryotic organelles such as mitochondria and chloroplasts. https://en.wikipedia.org/wiki/Uniparental_inheritance

unknown allele origin [GENO_0000881]

Describes an allele whose origin is not known.

unspecified genomic background [GENO_0000649]

A background genotype whose sequence or identity is not known or specified.

unspecified life cycle stage [GENO_0000160]

[unspecified life cycle stage]

unspecified zygosity [GENO_0000137]

[unspecified zygosity]

variant [GENO_0000476]

An attribute inhering in a sequence feature that varies from some designated reference in virtue of alterations in its sequence or expression level

variant allele [GENO_0000002]

An allele that varies in it sequence from what is considered the reference or canonical sequence at that location. Note that what is considered the ‘reference’ vs. ‘variant’ sequence at a given locus may be context-dependent - so being ‘variant’ is more a role played in a particular situation. A ‘variant allele’ contains a ‘sequence alteration’, or is itself a ‘sequence alteration’, that makes it vary_with some other allele to which it is being compared. But in any comparison of alternative sequences at a particular genomic location, the choice of a ‘reference’ vs the ‘variant’ is context-dependent - as comparisons in other contexts might consider a different feature to be the reference. So being ‘variant’ is more a role played in a particular situation - as an allele that is variant in one context/analysis may be considered reference in another. A variant allele can be variant along its entire extent, in which case it is considered a ‘sequence alteration’, or it can span a broader extent of sequence contains sequence alteration(s) as part. And example of the former is a SNP, and an example of the latter is a variant gene allele that contains one or more point mutations in its sequence.

variant copy number complement [GENO_0000962]

A ‘copy number complement’ that has an abnormal number of members, as the result of deletion or duplication event(s). ‘Abnormal’ is typically more or less than two members for an autosomal sequence in a diploid genome, and more or less than one member for a sequence in a non-homologous region of a sex-chromosome.

variant gene allele [GENO_0000515]

An allele of a gene that contains some sequence alteration. A gene allele is ‘variant’ in virtue of its containing a sequence alteration that varies from some reference gene standard. But note that a gene allele that is variant in one context/dataset can be considered a reference in another context/dataset.

variant genome [GENO_0000033]

A genome that varies at one or more loci from the sequence of some reference genome.

variant genomic genotype [GENO_0000777]

An intrinsic genotype that specifies variation from a defined reference genome.

variant single locus complement [GENO_0000030]

A single locus complement in which at least one member allele is considered variant, and/or the total number of features in the complement deviates from the normal poloidy of the reference genome (e.g. trisomy 13). Instances of this class are sets comprised of all allels at a specified genomic location where at least one allele is variant (non-reference). In diploid genomes this complement typically has two members. Note that this class also covers cases where deviant numbers of genes or chromosomes are present in a genome (e.g. trisomy of chromosome 21), even if their sequence is not variant.

variation attribute [GENO_0000773]

An attribute describing a type of variation inhering in a sequence feature or collection.

W-linked inheritance [GENO_0000948]

An inheritance pattern wherein the trait is determined by alleles of a single causal gene on a W-chromosome.

wild-type [GENO_0000511]

An allele attribute describing a highly common variant (typically >99% in a population), that typically exhibits canonical function, and against which rare and/or non-functional mutant alleles are compared.

wild-type allele [GENO_0000501]

An allele representing a highly common varaint (typically >99% in a population), that typically exhibits canonical function, and against which rare and/or non-functional mutant alleles are often compared. ‘Wild-type’ is typically contrasted with ‘mutant’, where ‘wild-type’ indicates a highly prevalent allele in a population (typically >99%), and/or some prototypical allele in a background genome that serves as a basis for some experimental alteration to generate a mutant allele, which can be selected for in establishing a mutant strain. The notion of wild-type alleles is more common in model organism databases, where specific mutations are generated against a wild-type reference feature. Wild-type alleles are typically but not always used as reference alleles in sequence comparison/analysis applications. More than one wild-type sequence can exist for a given feature, but typically only one allele is deemed wild-type iin the context of a single dataset or analysis.

wild-type gene [GENO_0000502]

A gene allele representing the most common varaint in a population (typically >99% frequency), that exhibits canonical function, and against which rare and/or non-functional mutant gene alleles are compared in characterizing the phenotypic consequences of genetic variation. [wild-type allele; wild-type gene]

X-linked dominant inheritance [GENO_0000146]

An X-linked inheritance pattern wherein the trait manifests in heterozygotes.

X-linked inheritance [GENO_0000936]

An inheritance pattern wherein the trait is determined by alleles of a single causal gene on an X-chromosome.

X-linked recessive inheritance [GENO_0000149]

An X-linked inheritance pattern wherein a trait caused by alleles of a gene on the X-chromosome manifests in homozygous but not heterozygote individuals.

Y-linked inheritance [GENO_0000941]

An inheritance pattern wherein the trait is determined by alleles of a single causal gene on a Y-chromosome.

Z-linked dominant inheritance [GENO_0000943]

A Z-linked inheritance pattern wherein the trait manifests in heterozygotes.

Z-linked inheritance [GENO_0000942]

An inheritance pattern wherein the trait is determined by alleles of a single causal gene on a Z-chromosome.

Z-linked reccessive inheritance [GENO_0000947]

A Z-linked inheritance pattern wherein a trait caused by alleles of a gene on the Z-chromosome manifests in homozygous but not heterozygote individuals.

zebrafish phenotype [GENO_0000575]

ZFIN do not annotate with a pre-composed phenotype ontology - all annotations compose phenotypes on-the-fly using a combination of PATO, ZFA, GO and other ontologies. So while there is no manually curated zebrafish phenotype ontology, the Upheno pipeline generates one automatically here: http://purl.obolibrary.org/obo/upheno/zp.owl This ontology does not have a root ‘phenotype’ class, however, and so we generate our own in GENO as a stub placeholder for import of needed zebrafish phenotype classes. [zebrafish phenotype]

zygosity [GENO_0000133]

An allelic state that describes the degree of similarity of features at a particular location in the genome (i.e. whether the alleles or haplotypes are the same or different).


Last modified January 5, 2022: adding GENO (491edea)