GENO ontology
GENO is an OWL model of genotypes, their more fundamental sequence components, and links to related biological and experimental entities.
GENO is an OWL model of genotypes, their more fundamental sequence components, and links to related biological and experimental entities.
One of a set of sequence features known to exist at a particular genomic location. An allele is a seqeunce feature at a genomic location where variation occurs (i.e. where >1 different sequence is known to exist). An allele can span only the extent of sequence known to vary (e.g. a single base SNP, or short insertion), or it can span a larger extent that includes one or more variable features as proper parts (e.g. a ‘gene allele’ that spans the extent of an entire gene which contains several sequence alterations). Alleles can carry ‘reference’ or ‘variant’ sequence - depending on whether the its ‘state’ matches that considered to be the reference at that location. Alleles whose state differs from the reference are called ‘variant alleles’, and those that match the reference are called ‘reference alleles’. What is considered the ‘reference’ state at a particular location may vary, depending on the context/goal of a particular analysis. A ‘sequence alteration’ is a ‘variant allele’ that varies along its entire extent (i.e every position varies from that of some defined reference sequence).
A quality inhering in an allele that describes its genetic origin (how it came to be part of a cell’s genome), i.e. whether it occurred de novo through some spontaneous mutation event, or was inherited from a parent.
A set of discrete alleles within a particular genome. ‘Sets’ are used to model entities that can be comprised of multiple discrete elements - but which can also contain zero or a single member. An “Allele Set’ represents any collection of 0 or more discrete alleles found within a particular genome. The alleles in such a set can be located at distant or close locations in the genome, and if on the same chromosome can be in trans, in cis, or even overlapping When the members of such a set are found ‘in cis’ on the same chromosome, they may constitute a ‘haplotype’. When found ‘in trans’ at the same location on homologous chromosomes, they may constitute a ‘single locus complement’.
A quality inhering in an allele reflecting whether it is found in all cells of an organism’s body, or just some clonal subset (e.g. in mosaicism).
A genotype that specifies the ‘allelic state’ at a particular location in the genome - i.e. the set of alleles present at this locus across all homologous chromosomes. An ‘allelic genotype’ describes the set of alleles present at a particular location in the genome. This use of the term ‘genotype’ reflects its use in clinical genetics where variation has historically been assessed at a specific locus, and a genotype describes the allelic state at that particular location. This contrasts to the use of the term ‘genotype in model orgnaism communities where it commonly describes the allelic state at all loci in a genome known to vary from an established reference or background.
A quality inhering in a collection of discontinuous sequence features in a single genome in virtue of their relative position on the same or separate chromosomes.
A quality inhering in an ‘allelic complement’ (aka a ‘single locus complement’) that describes the allelic variability found at a particular locus in the genome of a single cell/organism
An inheritance pattern wherein the trait is determined by alleles of a single causal gene on a sex chromosome.
[biological sequence unit; amino acid residue]
[has_sequence_unit; amino acid sequence; amino acid residue; biological sequence]
A polymorphic allele that is determined from the sequence of a recent ancestor in a phylogentic tree.
a sequence attribute of a chromosome or chromosomal region that has been abnormally duplicated or lost, as the result of a non-disjunction event or unbalanced translocation.
A large deletion or terminal addition of part of some non-homologous chromsosome, as the result of an unbalanced translocation. Aneusomic chromosomal parts are examples of “partial aneuploidy” as described in http://en.wikipedia.org/wiki/Aneuploidy: “The terms “partial monosomy” and “partial trisomy” are used to describe an imbalance of genetic material caused by loss or gain of part of a chromosome. In particular, these terms would be used in the situation of an unbalanced translocation, where an individual carries a derivative chromosome formed through the breakage and fusion of two different chromosomes. In this situation, the individual would have three copies of part of one chromosome (two normal copies and the portion that exists on the derivative chromosome) and only one copy of part of the other chromosome involved in the derivative chromosome.”
A complete chromosome that has been abnormally duplicated, or the absense of a chromosome that has been lost, typically as the result of a non-disjunction event or unbalanced translocation Large sequence features gained in a genome are considered to be sequence alterations (akin to insertions), including aneusomic chromosome segments gained through unbalanced translocation events, entrie aneusomic chromosomes gained through a non-disjunction event during replication, or extrachromosomal replicons that become part of the heritable gneme of a cell or organism. Similarly, large sequence features lost from genome are akin to deletions and therefore also considered sequence alterations. This includes the loss of chromosomal segments through unbalanced translocation events, and the loss of entire chromosomes through a non-disjunction event during replication.
[aneusomic zygosity]
An inheritance pattern wherein a trait caused by alleles of an autosomal gene manifests in heterozygotes.
An inheritance pattern wherein the trait is determined by alleles of a single causal gene on a non-sex chromosome.
An inheritance pattern wherein a trait caused by alleles of an autosomal gene manifests in homozygous but not heterozygote individuals.
A reference genome that represents the sequence of a genome from which a variant genome is derived (through the introduction of sequence alterations).
[biological process]
A linear ordering of units representing monomers of a biological macromolecule (e.g. nucleotides in DNA and RNA, amino acids in polypeptides). ‘Sequences’ differ from ‘sequence features’ in that instances are distinguished only by their inherent ordering of units, and not by any positional aspect related to alignment with some reference sequence. Accordingly, the ‘ATG’ translational start codon of the human AKT gene is the same sequence as the ‘ATG’ start codon of the human SHH gene, but these represent two distinct sequence features in virtue of their different positions in the genome.
A biolocical sequence, or set of such sequences.
A set of biological sequences. ‘Sets’ are used to represent entities that are typically collections of more than one member. But we allow for sets that contain 0 members (an ’empty’ set) or 1 member (a ‘singleton’ or ‘unit’ set), consistent with the concept of ‘mathematical sets’. A set may also include multiple copies of the same sequence. For example, in a ‘copy number complement’, members are all copies of this same biological sequence.
[biological sequence unit]
Describes an allele that is part of an allelic complement where one allele is maternally inherited and other paternally inherited. Biparental inheritance of alleles is typical of normal mendelian inheritance, where offspring inherit a maternal and a paternal copies of a given gene.
[chromosomal band intensity; sequence feature attribute]
An inheritance pattern wherein the trait is determined by inheritance of missing sections of one or more chromosomes, encompassing either 0 or multiple genes, possibly together with environmental factors.
An inheritance pattern wherein the trait is determined by inheritance of duplicated sections of one or more chromosomes, encompassing either 0 or multiple genes, possibly together with environmental factors.
An inheritance pattern wherein the trait is determined by inheritance of extra, missing, or re-arranged chromosomes possibly together with environmental factors.
An inheritance pattern wherein the trait is determined by inheritance of translocation or inversion of sections of one or more chromosomes, possibly together with environmental factors.
An extended part of a chromosome representing a term of convenience in order to hierarchically organize morphologically defined chromosome features: chromosome > arm > region > band > sub-band.
[has_sequence_attribute; chromosomal band intensity; chromosome band; is part of; chromosome_part; chromosome sub-band]
A cellular distribuution in which an allele is found only in some clonal subset of cells in an organism, typically in virtue of its somatic origin.
An autosomal dominant inheritance pattern wherein a heterozygous individual simultaneously expresses the distinct traits associated with each allele in the heterozygous locus.
An X-linked dominant inheritance pattern wherein a heterozygous individual simultaneously expresses the distinct traits associated with each allele in the heterozygous locus.
An Z-linked dominant inheritance pattern wherein a heterozygous individual simultaneously expresses the distinct traits associated with each allele in the heterozygous locus.
An autosomal dominant inheritance pattern wherein the trait associated with one allele completely masks the trait associated with a different allele found at that locus.
An X-linked dominant inheritance pattern wherein the trait associated with one allele completely masks the trait associated with a different allele found at that locus.
A Z-linked dominant inheritance pattern wherein the trait associated with one allele completely masks the trait associated with a different allele found at that locus.
A heterozygous quality inhering in a single locus complement comprised of two different varaint alleles and no wild type locus. (e.g.fgf8a
A cellular distribution in which an allele is found in all cells of an organism’s body, typically in virtue of its germline origin.
A set representing the complement of all copies of a particular biological sequence (typically at the scale of complete genes or larger) present in a particular genome. The notion of a ‘complement’ is useful as a special case of a set, where the members necessarily comprise an exhaustive collection of all objects that make up some well-defined set. Here, a ‘copy number complement’ represents ‘represents the set of all copies of a specified sequence in a particular genome. Note that sequences can be duplicated in a set (i.e. contain more than one member representing the same sequence). In the ‘copy number complement’ example, each set member is a copy of this same biological sequence. The count of how many of a particular sequences are found in a genome is the sequences ‘copy number’. In diploid organisms, the normal copy number for sequences at most locations is 2 (a notable exception being those on the X-chromosome where normal copy number is 1). Variations in copy number occur if this count increases due to a duplication of the gene/region, or decreases due to a deletion of a gene/region. A driving use case for representing copy number is to support associations between variation in copy number of a particular sequence, and phenotypes or diseases that can result. A ‘complement’ refers to an exhaustive collection of all objects that make up some well-defined set. Such a set may contain 0, 1, or more than one members. The notion of a complement is useful for defining many biologically-relevant sets of sequence features, such as ‘copy number complements’ representing the set of all copies of a particular sequence in a genome. The fact that we are counting how many copies of the same sequence exist in a genome here, as opposed to how many of the same feature, is what sets sequence-level concepts like ‘copy number complement’ apart from feature-level concepts like ‘single locus complement’. To illustrate the difference, consider a duplication event that creates a new copy of the human APOE gene on a different chromosome. This creates an entirely new sequence feature at a distinct locus from that of the original APOE gene. The ‘copy number complement’ for sequence defined by the APOE gene locus would have a count of three, as this sequence is present three times in the genome. But the ‘single locus complement’ at the APOE gene locus would still have a count of two - because the duplicated copy is at a different location in the genome, and therefore does not represent a copy of the APOE locus.
A gene that originates from the genome of a danio rerio.
[danio rerio strain; strain or breed; has_member; Danio rerio]
Describes an attribute describing an allele that originated through a mutation event in a germ cell of one of the parents, or in the fertilized egg itself during early embryogenesis. We distinguish germline, somatic, and de novo allele origin based on a combination two key criteria - whether the allele inherited from a parent, and whether it is *heritble’ by offspring. De novo variants are heritable but not inherited - as they are not observed in either parent, but can be passed to offspring in virtue of their being present in the individual’s germ cells. By contrast, germline variants are both inherited (passed down from a parent) and heritable (passable down to offspring), and somatic variants are neither inherited or heritable - having originated via a spontaneous mutation in a non-germ cell. De novo variants appear for the first time in one family member. They often explain genetic disorders in which an affected child has a mutation in every cell in the body but the parents do not, and there is no family history of the disorder.
A multifactorial inheritance pattern that is determined by the simultaneous action of alleles in two genes.
An allelic genotype specifying the set of two alleles present at a particular location in a diploid genome (i.e., a diploid ‘single locus complement’) Alt: A sequence feature complement comprised of two haplotypes at a particular location on paired homologous chromosomes in a diploid genome. “Humans are diploid organisms; they have paired homologous chromosomes in their somatic cells, which contain two copies of each gene. An allele is one member of a pair of genes occupying a specific spot on a chromosome (called locus). Two alleles at the same locus on homologous chromosomes make up the individual’s genotype. A haplotype (a contraction of the term ‘haploid genotype’) is a combination of alleles at multiple loci that are transmitted together on the same chromosome. Haplotype may refer to as few as two loci or to an entire chromosome depending on the number of recombination events that have occurred between a given set of loci. Genewise haplotypes are established with markers within a gene; familywise haplotypes are established with markers within members of a gene family; and regionwise haplotypes are established within different genes in a region at the same chromosome. Finally, a diplotype is a matched pair of haplotypes on homologous chromosomes.” From https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4118015/ https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4118015/figure/sap-26-03-165-g002/
[disomic zygosity]
[DNA residue; biological sequence unit]
[DNA residue; has_sequence_unit; biological sequence; DNA sequence]
A genotype that describes the total intrinsic and extrinsic variation across a genome at the time of a phenotypic assessment (where ‘intrinsic’ refers to variation in genomic sequence, as mediated by sequence alterations, and ’extrinsic’ refers to variation in gene expression, as mediated through transient gene-specific interventions such as gene knockdown reagents or overexpression constructs). An effective genotype is meant to summarize all factors related to genes and their expression that influence an observed phenotype - including ‘intrinsic’ alterations in genomic sequence, and gene-specific ’extrinsic’ alterations in expression transiently introduced at the time of the phenotypic assessment.
An engineered region that is used to transfer foreign genetic material into a host cell. Constructs can be engineered to carry inserts of DNA from external sources, for purposes of cloning and propagation or gene expression in host cells. Constructs are typically packaged as part of delivery systems such as plasmids or viral vectors.
A transgene part whose sequence is expressed in a gene product through transcription and/or translation.
[engineered genetic construct; expression construct]
A sequence feature whose identity is additionally dependent on factors specifically influencing its level of expression in the context of a biological system (e.g. being targeted by gene-knockdown reagents, or driven from exogneous expression system like recombinant construct)
A gene altered in its expression level relative to some baseline of normal expression in the system under investigation (e.g. a cell line or model organism). Expression-variant genes are altered in their expression level through some modification or intervention external to its sequence and position. These may include endogenous mechanisms (e.g. direct epigentic modification that impact expression level, or altered regulatory networks controlling gene expression), or experimental interventions (e.g. targeting by a gene-knockdown reagent, or being transiently expressed as part of a transgenic construct in a host cell or organism). The identity of a given instance of a experssion-variant gene is dependent on how its level of expression is manipulated in a biological system (i.e. via targeting by gene-knockdown reagents, or being transiently overexpressed). So expression-variant genes have the additional identity criteria of a genetic context of its material bearer (external to its sequence and position) that impacts its level of expression in a biological system.
A transgene that is not chromosomally integrated in the host genome, but instead exists as part of an extra-chromosomal construct.
A genetic feature that is not part of the chromosomal genome of a cell or virion, but rather a stable and heritable element that is replilcated and passed on to progeny (e.g. a replicative plasmid or transposon)
A specification of the known state of gene expression across a genome, and how it varies from some baseline/reference state. An extrinsic genotype describes variation in the ’expression level’ of genes in a cell or organism, as mediated by transient, gene-specific experimental interventions such as RNAi, morpholinos, TALENS CRISPR, or construct overexpression. This concept is relevant primarily for model organisms and systems that are subjected to such interventions to determine how altered expression of specific genes may impact organismal or cellular phenotypes in the context of a particular experiment. The ’extrinsic genotype’ concept is contrasted with the more familiar notion of an ‘intrinsic genotype’, describing variation in the inherent genomic sequence (i.e. ‘allelic state’). In G2P research, interventions affecting both genomic sequence and gene expression are commonly applied in order to assess the impact specific genomic features can have on phenotype and disease. It is in this context that we chose to model ’extrinsic’ alterations in expression as genotypes - to support parallel conceptualization and representation of these different types of genetic variation that inform the discovery of G2P associations.
A genomic genotype here the genomic background specifies a female sex chromosome complement.
A set representing the complement of all functional versions of a specified sequence (typically that of a gene) in a particular genome. A ‘complement’ refers to an exhaustive collection of all objects that make up some well-defined set. Such a set may contain 0, 1, or more than one members. The notion of a complement is useful for defining many biologically-relevant sets of sequence features, such as the set of all functional copies of a particular sequence in a genome. This is known as the ‘functional copy number’ or ‘genetic dosage’ of the sequence. ‘Functional copies’ of a sequence are those that exhibit normal activity and/or produce gene products that exhibit normal activity associated with the sequence. The count of functional copies of a gene is often referred to as its ‘dosage’. In diploid organisms, the normal ‘dosage’ is 2 for autosomal genes/regions. Dosage increases if there is a duplication of a functional gene/region. Dosage decreases if there is either a deletion of a gene/region, or an inactivating mutation that eliminates gene function. This sets it apart from the notion of a ‘copy number complement’, which reflects how many copies of a sequence exist in a genome, regardless of their functionality. Addition of a non-functional allele of a gene will increase its copy number, but not increase its dosage. As we saw for ‘copy number complement’, the defining sequence here is specified in terms of a location on a reference sequence - typically the location where a gene or set of genes resides. But the criteria for membership in a ‘functional’ copy number complement require only that the feature can perform the functions associated with the gene or genes at the defining location. A gene allele that varies by only one nucleotide from the wild-type gene may not qualify as functional if that alteration eliminates the activity of the allele.
A part of some non-homologous chromosome that has been gained as the result of an unbalanced translocation event. Such additions of translocated chromosomal parts confer a trisomic condition to the duplicated region of the chromsome, and are thus considered to be ‘variant single locus complements’ in virtue of an abnormal number of features at a particular genomic location, rather than abnormal sequence within the location.
A complete chromosome that has been abnormally duplicated in a genome, typically as the result of a meiotic non-disjunction event or unbalanced translocation This ‘gained’ chromosome is conceptually an ‘insertion’ in a genome that received two copies of a chromosome in a cell division following a non-disjunction event. As such, it qualifies as a type of sequence_alteration, and as a ’extra’ chromosome.
A genomic feature that represents one of a set of versions of a gene (i.e. a haplotype whose extent is that of a gene) In SO, the concept of a ‘gene’ is functionally defined, in that a gene necessarily produces a functional product. By contrast, the concept of a ‘gene allele’ here is positionally defined - representing the sequence present at the location a gene resides in a reference genome (based on sequence alignment). An Shh gene allele, for example, may be a fully functional wild-type version of the gene, a non-functional version carrying a deleterious point mutation, a truncated version of the gene, or even a complete deletion. In all these cases, an ‘Shh gene allele’ exists at the position where the canonical gene resides in the reference genome - even if the extent of this allele different than the wild-type, or even zero in the case of the complete deletion. A genomic feature being an allele_of a gene is based on its location in a host genome - not on its sequence. This means, for example, that the insertion of the human SMN2 gene into the genome of a mouse (see http://www.informatics.jax.org/allele/MGI:3056903) DOES NOT represent an allele_of the human SMN2 gene according to the GENO model - because it is located in a mouse genome, not a human one. Rather, this is a transgenic insertion that derives_sequence_from the human SMN2 gene. If this human SMN2 gene is inserted within the mouse SMN2 gene locus (e.g. used to replace mouse SMN2 gene), the feature it creates is an allele_of the mouse SMN2 gene (one that happens to match the sequence of the human ortholog of the gene). But again, it is not an allele_of the human SMN2 gene.
[gene knockdown reagent; engineered_region]
A genomic feature that is part of a gene, and delineated by some functional or structural function or role it serves (e.g.a promoter element, coding region, etc).
The molecular product resulting from transcription of a single gene (either a protein or RNA molecule)
[Insertion; gene trap insertion]
A nucleic acid molecule that contains one or more sequences serving as a template for gene expression in a biological system (ie a cell or virion). This class is different from genomic material in that genomic material is necessarily heritable, while genetic material includes genomic material, as well as any additional nucleic acids that participate in gene expression resulting in a cellular or organismal phenotype. So things like transiently transfected expression constructs would qualify as ‘genetic material but not ‘genomic material’. Things like siRNAs and morpholinos affect gene expression indirectly, (ie are not templates for gene expression), and therefore do not qualify as genetic material.
A genomic genotype that specifies the baseline sequence of a genome from which a variant genome is derived (through the introduction of sequence alterations).
An generically dependent continuant that carries biological sequence that is part of or derived from a genome.
A sequence feature (continuous extent of biological sequence) that is of genomic origin (i.e. carries sequence from the genome of a cell or organism) 1. A feature being ‘of genomic origin’ here means only that its sequence has been located to the genome of some organism by alignment with some reference genome. This is because the sequence was originally identified in, or artificially created to replicate, sequence from an organism’s genome. 2. The location of a genomic feature is defined by start and end coordinates based on alignment with a reference genome. Genomic features can span any size from a complete chromosome, to a chromosomal band or region, to a gene, to a single base pair or even junction between base pairs (this would be a sequence feature with an extent of zero). 3. As sequence features, instances of genomic features are identified by both their inherent sequence and their position in a genome - as determined by an alignment with some reference sequence. Accordingly, the ‘ATG’ start codon in the coding DNA sequence of the human AKT gene and the ‘ATG’ start codon in the human SHH gene represent two distinct genomic features despite having he same sequence, in virtue of their different positions in the genome.
The location of a sequence feature in a genome, defined by its start and end position on some reference genomic coordinate system 1. A genomic location (aka locus) is defined by its begin and end coordinates on a reference genome, independent of a particular sequence that may reside there. In GENO, we say that a genomic location is occupied_by a ‘sequence feature’ - where the identity of this feature depends on both it sequence, and its location in the genome (i.e. the locus it occupies). For example, the ‘ATG’ sequence beginning the ORF of the human SHH gene shares the same sequence as the ‘ATG’ beginning the ORF of the human AKT gene. But these are distinct sequence features because they occupy different genomic locations. 2. A given genomic location (e.g. the human SHH gene locus) may be occupied by different alleles (e.g. different alleles of the SHH gene). Within the genome of a single diploid organism, there is potential for two alleles to exist at such a locus (i.e. two different versions of the SHH gene). And across genomes of all members of a species, many more alleles of the SHH gene may exist and occupy this same locus. 3. The notion of a genomic location in the realm of biological sequences is analogous to a BFO:spatiotemporal region in the realm of physical entities. A spatiotemporal region can be occupied_by physical objects, while a genomic location is occupied_by sequence features. Just as a spatiotemporal region is distinct from an object that occupies it, so too a genomic locus is distinct from a sequence feature that occupies it. As a more concrete example, consider the distinction between a street address and the building that occupies it as analogous to the relationship between a genomic location and the feature that resides there.
A set of genomic features (i.e. sequence features that are of genomic origin). A genomic feature is any located sequence feature in the genome, from a single nucleotide to a gene into an entire chromosome. ‘Sets’ are used to represent entities that are typically collections of more than one member - e.g. the set of chromosomes that make up the human genome. But we allow for sets that contain 0 members (an ’empty’ set) or 1 member (a ‘singleton’ or ‘unit’ set), consistent with the concept of ‘mathematical sets’. For example, a ‘single locus complement’ at an X-linked locus in a XY male will consist of only one allele, as there is only one X-chromosome in the genome. Note also that sets may contain duplicates (i.e. more than one member representing the same feature). For example, a homozygous ‘single locus complement’ is a set comprised of two of the same feature. The notion of a ‘genomic feature set’ differs from that of a ‘genomic sequence set’ in that we are counting how many copies of the same sequence feature exist in a genome, as opposed to how many of the same sequence. ‘Genomic feature sets are useful for representing things like ‘single locus complements’, where members are sequence features whose identity is dependent on their location. By contrast, ‘genomic sequence sets’ are useful for describing things like ‘copy number complements’, which are concerned only with how many copies of a sequence exist in a genome, regardless of the location where these reside.
A genotype that describes the total variation in heritable genomic sequence of a cell or organism, typically in terms of alterations from some reference or background genotype. 1. A genomic genotype is a short-hand specification of a genome that uses a representational syntax comprised of information about a reference genome (‘genomic background’), and all specific variants from this reference (the ‘genomic variation complement’). Conceptually, this variant genome sequence can be resolved by substituting all sequences specified by the ‘genomic variation complement’ for the corresponding sequences in the reference ‘genomic background’ sequence. 2. ‘Heritable’ genomic sequence is that which is passed on to subsequent generations of cells/organisms, and includes all chromosomal sequences, the mitochondrial genome, and any transmissable extrachromosomal replicons.
A genomic genotype that does not specify the sex determining chromosomal features of its bearer (i.e. does not indicate the background sex chromosome complement) In practice, most genotype instances classified as sex-agnostic genotypes because they are not sex-specific. When a genotype is indicated to be that of a male or female, it implies a known sex chromosome complement in the genomic background. This requires us to distinguish separate ‘sex-qualified’ genotype instances for males and females that share a common ‘sex-agnostic’ genotype. For example, male and female mice that of the same strain/background and containing the same set of genetic variations will have the same sex-agnostic intrinsic genotype, but different sex-qualified intrinsic genotypes (which take into account background sex chromosome sequence as identifying criteria for genotype instances).
A genomic genotype where the genomic background specifies a male or female sex chromosome complement. We distinguish the notion of a sex-agnostic intrinsic genotype, which does not specify whether the portion of the genome defining organismal sex is male or female, from the notion of a sex-qualified intrinsic genotype, which does. Male and female mice that contain the same background and genetic variation complement will have the same ‘sex-agnostic intrinsic genotype’, despite their genomes varying in their sex-chromosome complement. By contrast, these two mice would have different ‘sex-qualified intrinsic genotypes’, as this class takes background sex chromosome sequences into account in the identity criteria for its instances. Conceptually, a sex-qualified phenotype represents a superset of sequence features relative to a sex-agnostic intirnsic genotype, in that if specifies the background sex-chromosome complement of the genome.
A nucleic acid macromolecule that is part of a cell or virion and has been inherited from an ancestor cell or virion, and/or is capable of being replicated and inherited through successive generations of progeny. 1. Genomic material here is considered as a DNA or RNA molecule that is found in a cell or virus, and capable of being replicated and inherited by progeny cells or virus. As such, this nucleic acid is either chromosomal DNA, or some replicative epi-chromosomal plasmid or transposon. Genetic material is necessarily part of some ‘material genome’, and both are necessarily part of some cell or virion. So a genomic library is not considered a material genome/genetic material - rather, we could say that this genomic library is a ‘genomic material sample’ that bears the concretization of some genome. 2. Genomic material need not be inherited from an immediate ancestor cell or organism (e.g. a replicative plasmid or transposon acquired through some experimental modification), but such cases must be capable of being inherited by progeny cells or organisms.
A biological sequence that is of genomic origin (i.e. carries sequence from the genome of a cell or organism). A sequence being ‘of genomic origin’ here means only that it has been located to the genome of some organism by alignment with some reference genomic sequence. This is because the sequence was originally identified in, or artificially created to replicate, sequence from an organism’s genome.
A set of genomic sequences (a biological sequence that is of genomic origin). A ‘genomic sequence set’ differs from a ‘genomic feature set’ in that we are counting how many copies of the same sequence exist in a genome, as opposed to how many of the same sequence feature. ‘Genomic sequence sets’ are useful for describing things like ‘copy number complements’, which are concerned only with how many copies of a sequence exist in a genome, regardless of the location where these reside. By contrast, ‘genomic feature sets are useful for representing things like ‘single locus complements’, where members are sequence features whose identity is dependent on their location.
A genomic feature set representing all ‘variant single locus complements’ in a single genome, which together constitute the ‘variant’ component of a genomic genotype. A ‘complement’ refers to an exhaustive collection of all objects that make up some well-defined set. Such a complement may contain 0, 1, or more than one members. The notion of a complement is useful for defining many biologically-relevant sets of sequence features. Here, a ‘genomic variation complement’ is the set of all ‘single locus complements’ in a particular genome that harbor some known variation. In model organisms, the majority of genotypes describe variation at a single location in the genome (ie only one ‘single-locus variant complement’) that are variant realtive to some reference background. For example, the genotype instance ‘fgf8a<t1282a/+>(AB)’) exhibits a mutation at only one locus. But some genotypes describe variation at more than one location (e.g. a double mutant that has alterations in the fgf8a gene and the shh gene)).
A specification of the genetic state of an organism, whether complete (defined over the whole genome) or incomplete (defined over a subset of the genome). Genotypes typically describe this genetic state as a diff between some variant component and a canonical reference. 1. Scope of ‘Genetic State’: ‘Genetic state’ is considered quite broadly in GENO to describe two general kinds of ‘states’. First, is traditional notion of ‘allelic state’ - defined as the complement of alleles present at a particular location or locations in a genome (i.e. across all homologous chromosomes containing this location). Here, a genotype can describe allelic state at a specific locus in a genome (an ‘allelic genotype’), or describe the allelic state across the entire genome (‘genomic genotype’). Second, this concept can also describe states of genomic features ’extrinsic’ to their intrinsic sequence, such as the expression status of a gene as a result of being specifically targeted by experimental interventions such as RNAi, morpholinos, or CRISPRs. 2. Genotype Subtypes: In GENO, we use the term ‘intrinsic’ for genotypes describing variation in genomic sequence, and ’extrinsic’ for genotypes describing variation in gene expression (e.g. resulting from the targeted experimental knock-down or over-expression of endogenous genes). We use the term ’effective genotype’ to describe the total intrinsic and extrinsic variation in a cell or organism at the time a phenotypic assessment is performed. Two more precise conccepts are subsumed by the notion of an ‘intrinsic genotype’: (1) ‘allelic genotypes’, which specify allelic state at a single genomic location; and (2) ‘genomic genotypes’, which specify allelic state across an entire genome. In both cases, allelic state is typically specified in terms of a differential between a reference and a set of 1 or more known variant features. 3. The Genotype Partonomy: ‘Genomic genotypes’ describing sequence variation across an entire genome are ‘decomposed’ in GENO into a partonomy of more granular levels of variation. These levels are defined to be meaningful to biologists in their attempts to relate genetic variation to phenotypic features. They include ‘genomic variation complement’ (GVC), ‘variant single locus complement’ (VSLC), ‘allele’, ‘haplotype’, ‘sequence alteration’, and ‘genomic background’ classes. For example, the components of the zebrafish genotype “fgf8a<ti282a/ti282a>; fgf3<t24149/+>[AB]”, described at zfin.org/ZDB-FISH-150901-9362, include the following elements: - GVC: fgf8a<ti282a/ti282a>; fgf3<t24149/+> (total intrinsic variation in the genome) - Genomic Background: AB (the reference against which the GVC is variant) - VSLC1: fgf8a<ti282a/ti282a> (homozygous complement of gene alleles at one known variant locus) - VSLC2: fgf3<t24149/+> (heterozygous complement of gene alleles at another known variant locus) - Allele 1: fgf8a
[genotype-phenotype association; association has object; has_qualifier; environmental system]
Describes an allele that is inherited from a parent in virtue of the allele being present in one or both of the parent’s germ cells (sperm or egg). We distinguish germline, somatic, and de novo allele origin based on a combination two key criteria - whether the allele inherited from a parent, and whether it is *heritble’ by offspring. Germline variants are both inherited (passed down from a parent) and heritable (passable down to offspring). By contrast, somatic variants are neither inherited or heritable - having originated via a spontaneous mutation in a non-germ cell. De novo mutations in germ cells are not inherited but are typically heritable, as they originated through a spontaneous mutation that made them present a germ cells.
[chromosomal band intensity; gneg]
[chromosomal band intensity; gpos]
[gpos100]
[gpos25]
[gpos33]
[gpos50]
[gpos66]
[gpos75]
[chromosomal band intensity; gvar]
A set of discrete, genetically-linked sequence alterations that reside on the same chromosomal strand and are typically co-inherited within a haplotype block. A haplotype is a set of non-overlapping alleles that reside in close proximity on the same DNA strand. We model them as ‘complements’ because they include all known/relevant alleles within a defined region in the genome (e.g. a ‘gene’, or a ‘haplotype block’) - where this set may consist of 0, 1, or more alterations from some reference. Because they are genetically linked, the alleles comprising a haplotype are likely to be co-inherited and survive descent across many generations of reproduction. As highlighted in https://en.wikipedia.org/wiki/Haplotype, the term ‘haplotype’ is most commonly used to describe the following scenarios of genetic linkage between ‘alleles’: 1. The ‘alleles’ comprising the haplotype are ‘single nucleotide polymorphisms’ (SNPs) or other small alterations, which collectively tend to occur together on a chromosomal strand). This use of ‘haplotype’ is commonly seen in phasing of patient WGS or WES data, to describe a state where two or more alterations that are believed to occur ‘in cis’ on the same chromosomal strand. 2. The ‘alleles’ comprising the haplotype are SNPs or other short alterations, which collectively define a specific version of a gene. In this case, the locaiton bounding the haplotype corresponds to a gene locus, and the haplotype defines a specific allele of that gene (i.e ‘gene allele’). “Star alleles” of PGx genes are examples of this category of haplotype (e.g. https://www.ebi.ac.uk/cgi-bin/ipd/imgt/hla/get_allele_hgvs.cgi?A*33:01:01, https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4724253/). 3. Each of the ‘alleles’ comprising the haplotype is itself a ‘gene allele’ (i.e. a specific version of an entire gene), such that the haolotype contains multiple complete ‘gene alleles’ that are co-inherited because they reside in tightly linked clusters on a single chromosome. Each of these more specific definition serves a purpose for a particular type of genetic analysis or use case. The GENO definition of ‘haplotype’ is broadly inclusive of these and any other scenarios where distinct ‘alleles’ of any kind on the same chromosomal strand are genetically linked, and thus tend to be co-inherited across successive generations.
A sequence feature representing a region of the genome over which there is little evidence for historical recombination, such that sequence alterations it contains are typically co-inherited across generations. A particular haplotype block is defined by the set of sequence alterations it is known to contain, which collectively represent a ‘haplotype’. The boundaries of haplotype blocks are defined in efforts to identify haplotypes that exist in organisms or populations. A haplotype block may span any number of sequence alterations, and may cover small or large chromosomal regions - depending on the number of recombination events that have occurred between the alterations defining the haplotype.
[hemizygous; disomic zygosity]
[hemizygous insertion-linked]
[hemizygous X-linked]
[hemizygous Y-linked]
The disposition of an entity to be transmitted to subsequent generations following a genetic replication or organismal reproduction event. We can use these terms to describe the heritability of genetic matieral or sequence features - e.g. chromosomal DNA or genes are heritable in that they are passed on to child cells/organisms). Such genetic material has a heritable disposition in a cell or virion, in virtue of its being replicated in its cellular host and inherited by progeny cells (such that the sequence content it encodes is stably propagated in the genetic material of subsequence generations of cells). We can also use these terms to describe the heritability of phenotypes/conditions - e.g. the passage of a particular trait or disease across generations of reproducing cells/organisms.
[heritable; heritabililty]
an allelic state where more than one type of allele exists at a particular location in the organellar genome (mitochondrial or plastid) of a cell/organism.
A mitochondrial inheritance pattern whereby manifestation of a trait is observed when some inherited mitochondria contian the causative allele and some do not.
[heterozygous; disomic zygosity]
A gene that originates from the genome of a homo sapiens.
an allelic state where a single allele exists at a particular location in the organellar genome (mitochondrial or plastid) of a cell/organism.
A mitochondrial inheritance pattern whereby manifestation of a trait occurs when only mitochondria containing the causative allele are inherited.
[homozygous; disomic zygosity]
a population of homo sapiens grouped together in virtue of their sharing some commonality (either an inherent attribute or an externally assigned role)
A quality inhering in a collection of discontinuous sequence features in a single genome that reside on the same macromolecule (eg the same chromosomes).
A quality inhering in a collection of discontinuous sequence features in a single genome that reside on different macromolecules (e.g. different chromosomes).
An autosomal dominant inheritance pattern wherein the trait expressed in a heterozygous individual is intermediate between the trait expressed in individuals homozygous for either allele in the heterozygous locus.
An X-linked dominant inheritance pattern wherein the trait expressed in a heterozygous individual is intermediate between the trait expressed in individuals homozygous for either allele in the heterozygous locus.
A Z-linked dominant inheritance pattern wherein the trait expressed in a heterozygous individual is intermediate between the trait expressed in individuals homozygous for either allele in the heterozygous locus.
The pattern in which a genetic trait or condition is passed from one generation to the next, as determined by genetic interactions between alleles of the causal gene, and interactions between these alleles and the environment. An inheritance pattern results from the disposition of a genetic variant to cause a particular trait or phenotype when it is present in a particular genetic and environmental context. Here, “genetic context” refers to the allelic state of the variant, which depends on what other alleles exist at the same location/locus in the genome. Zygosities such as heterozygous and homozygous are simple, common examples of ‘states’ of an allele. These genetic and environmental “interactions” of alleles play out at the level of the gene products produced by the causal alleles, and are observable in the pattern with which the trait caused by an allele is inherited across generations of individuals. Thus, an inheritance pattern such as dominance is not inherent to a single allele or its phenotype, but rather a result of the relationship between two alleles of a gene and the phenotype that results in a given environment. This also means that the ‘dominance’ of an allele is context dependent - Allele 1 can be dominant over Allele 2 in the context of Phenotype X, but recessive to Allele 3 in the context of Phenotype Y.
Describes an allele that is inherited from a parent.
A transgene that has been integrated into a chrromosome in the host genome. An integrated transgene differs from a transgenic insertion in that a transgenic insertion may contain single transgene, a partial transgene that needs endognous sequences from the host genome to become functional (e.g. an enhancer trap), or multiple transgenes (i.e. be polycistronic). Fiurthermore, the transgenic insertion may contain sequences in addition to its transgene(s - e.g. sequences flanking the transgene reqired for integration or replicaiton/maintenance in the host genome. The term ‘integrated transgene’ covers individual transgenes that were delivered in whole or in part by a transgenic insertion. An ‘integrated transgene’ differs from its parent ’transgene’ in that transgenes can include genes introduced into a cell/organism on an extra-chromosomal plasmid that is never integrated into the host genome.
A genotype that describes the total variation in heritable genomic sequence of a cell or organism, typically in terms of alterations from some reference or background genotype. 1. A genomic genotype is a short-hand specification of a genome that uses a representational syntax comprised of information about a reference genome (‘genomic background’), and all specific variants from this reference (the ‘genomic variation complement’). Conceptually, this variant genome sequence can be resolved by substituting all sequences specified by the ‘genomic variation complement’ for the corresponding sequences in the reference ‘genomic background’ sequence. 2. ‘Heritable’ genomic sequence is that which is passed on to subsequent generations of cells/organisms, and includes all chromosomal sequences, the mitochondrial genome, and any transmissable extrachromosomal replicons.
A genotype that describes what is known about variation in a genome at a gross structural level, in terms of the number and appearance of chromosomes in the nucleus of a eukaryotic cell. Karyotypes describe structural variation across a genome at the level of chromosomal morphology and banding patterns detectable in stained chromosomal spreads. This coarser level does not capture more granular levels of variation commonly represented in other forms of genotypes (e.g. specific alleles and sequence alterations). A base karyotype representing a genome with no known structural variation can be as simple as ‘46XY’, but karyotypes typically contains some gross variant component (such as a chromosome duplication or translocation).
[knockdown reagent targeted gene complement; has_variant_part; reagent-targeted gene complement]
A sequence feature whose identity is additionally dependent on the cellular or anatomical location of the genetic material bearing the feature. As a qualified sequence feature, the BRCA1c.5096G>A variant as materialized in a somatic breast epithelial cell could be distinguished as a separate entity from a BRCA1c.5096G>A variant in a different cell type or location (e.g. germline BRCA1 varaint in a sperm cell).
A chromosome arm that is the longer of the two arms of a given chromosome.
A deletion of a terminal portion of a chromosome resulting from an unbalanced translocation to another chromosome. This is not a deletion in the sense defined by the Sequence Ontology in that it is not the result of an ’excision’ of nucleotides, but an unbalanced translocation event. The allelic complement that results is comprised of the terminus or junction represented by this lost chromosomal segment, and the remaining normal segment in the homologous chromosome. The lost aneusommic chromosomal segment is typically accommpanied by a gained aneusomic chromosomal segment from another chromosome. Loss of translocated chromosomal parts can confer a monosomic condition to a region of the chromsome. This results in a ‘variant single locus complement’ - in virtue of an abnormal number of features at a particular locus, rather than abnormal sequence within the locus.
A ‘deletion’ resulting from the loss of a complete chromosome, typically as the result of a meiotic non-disjunction event or unbalanced translocation.
A polymorphic allele that is present at the highest frequency relative to other polymorphic variants at the same genomic location.
A genomic genotype here the genomic background specifies a male sex chromosome complement.
A material entity that represents all genetic material in a cell or virion. The material genome is typically molecular aggregate of all the chromosomal DNA and epi-chromosomal DNA that represents all sequences that are heritable by progeny of a cell or virion. A genome is the collection of all nucleic acids in a cell or virus, representing all of an organism’s hereditary information. It is typically DNA, but many viruses have RNA genomes. The genome includes both nuclear chromosomes (ie nuclear and micronucleus chromosomes) and cytoplasmic chromosomes stored in various organelles (e.g. mitochondrial or chloroplast chromosomes), and can in addition contain non-chromosomal elements such as replicative viruses, plasmids, and transposable elements. Note that at present, a material genome and genetic material are necessarily part of some cell or virion. So a genomic library is not considered a material genome/genetic material - rather, we could say that this genomic library is a ‘genomic material sample’ that bears the concretization of some SO:genome.
Describes an allele that is inherited from a female parent in virtue of the allele being present in the mother’s egg.
A relation used to describe an environment contextualizing the identity of an entity.
A polymorphic allele that is not present at the highest frequency among all fixed variants at the locus (i.e. not the major polymorphic allele at a given location).
An inheritance pattern observed for traits related to a gene encoded on the mitochondrial genome. Because the mitochondrial genome is essentially always maternally inherited, a mitochondrial condition can only be transmitted by females, although the condition can affect both sexes. The proportion of mutant mitochondria can vary (heteroplasmy).
A sequence feature whose identity is additionally dependent on a chemical modification made to the genetic material bearing the feature (e.g. binding of transcriptional regulators, or epigenetic modifications including direct DNA methylation, or modification of histones associated with a feature)
An inheritance pattern wherein the trait is determined by alleles of a single causal gene, possibly together with environmental factors.
A clonal distribution in which an allele arose during embryogenesis and is present in a subset of tissues derived from some common developmental cell or tissue type. [mosaic; clonal]
An inheritance pattern that depends on a mixture of major and minor genetic determinants (i.e. alleles of more than one contributing genes), possibly together with environmental factors. Diseases inherited in this manner are termed ‘complex diseases’.
A gene that originates from the genome of a mus musculus.
[strain or breed; mus musculus strain]
An attribute inhering in a feature bearing a sequence alteration that is present at very low levels in a given population (typically less than 1%), or that has been experimentally generated to alter the feature with respect to some reference sequence.
A sequence alteration that is very rare allele in a population (typically <1%), or an experimentally-induced variation that derives from a wild-type feature in a given strain.
[non-heritable; heritabililty]
An attribute of a genomic feature that represents a feature not previously found in a given genome, e.g. an extrachromosomal replicon or aneusomic third copy of a chromosome.
An extrachromosomal replicon that is variant in a genome in virtue of its being a novel addition to the genome - i.e. it is not present in the reference for the genome in which it is found. Extrachromosomal replicons are replicated and passed on to descendents, and thus part of the heritable genome of a cell or organism. In cases where the presence of such a replicon is exogenous or aberrant (i.e. not included in the reference for that genome), the replicon is considered a ‘sequence alteration’.
A genomic feature that represents an entirely new replicon in the genome, e.g. an extrachromosomal replicon or an extra copy of a chromosome. Novel replicons are considered as an ‘insertion’ in a genome, and as such, qualify as types of sequence_alterations and variant alleles. There is no pre-existing locus that it modifies, however, and thus it is not really an ‘allele of’ a named locus. But conceptually, we still consider these to represent genetic variants and classify them as variant alleles.
A disomic zygosity quality inhering in a ‘single locus complement’ that is comprised of two non-functional copies of a gene. Loss of function may result from the gene being entirely missing via a deletion, or mutated in a way that eliminates its function.
A quality inhering in a particular allele in virtue of its presence only in a particular type of cell in an organism (e.g. somatic vs germ cells) Cellular context of an allele is typically defined in the context of evaluating an individual organism, as alleles that are somatic in one organism can be germline in others.
[obsolete autosomal recessive inheritance]
[obsolete biological sequence collection]
[obsolete biological sequence or collection]
One of a set of sequence features or haplotypes that exist at a particular genetic locus.
A single locus complement that represents the collection of all chromosome sequences for a given chromosome in a single genome
A sequence alteration within the coding sequence of a gene.
An informational artifact that describes a canonical allele by defining its sequence and position relative to a particular reference sequence. The notion of a ‘contextual allele’ derives from the ClinGen Allele model. Here, each genetic allele in a patient corresponds to a single ‘canonical allele’, which in turn may aggregate any number of ‘contextual allele’ representations that are may be defined against different reference sequences. Accordingly, many contextual alleles can describe a single canonical allele. For example, the contextual alleles “NC_000013.11:g.32319070T>A” and “NG_012772.3:g.8591T>A” both describe the same underlying canonical allele, a single nucleotide variation, in the BRCA2 gene.
A set of all features in a particular genome whose sequence aligns with a particular location on a reference genome. Such features are typically on the scale of complete genes or larger. 1. Features described by ‘copy number’ are larger regions of sequence spanning one or more complete genes, or large chromosomal segment. Copies of these regions often become distributed across a genome at unknown locations. By contrast, short repeats, such as tri-nucelotide ‘CAG’ repeats in the Huntingtin gene, occur at defined locations (adjacent to the originating ‘CAG’ sequence), and can therefore be modeled as proper alleles. 2. A copy number complement, like any sequence feature complement, is a set of features in a particular genome that meet some criterion. The criterion in this case is that their sequence maps to that of a particular location in a reference sequence. So a copy number complement is the set of all features that share or align with a specified sequence defined on some reference. The sequence of member sequences need not exactly match that of the reference, as copies may accrue some alterations. What is important is that conceptually they represent exact or inexact copies of the reference sequence at a defining location. 3. In a ’normal’ diploid genome, the copy number complement for any feature (on a non-Y chromosome) contains two members. A copy number variation occurs when a complement contains more or less than two members - as the result of deletion or duplication event(s). In GENO, a ‘copy number variation’ refers to a copy number complement’ that has an abnormal number of members.
disposition inhering in a genetic locus variant that is realized in its inheritance by some offspring such that at least a partial variant-associated phenotype is apparent in heterozygotes
[obsolete enhancer trapping technique]
[obsolete experimental insertion]
A sequence feature attribute that reflects characteristics of the physical molecule in which the feature is concretized (e.g. its cellular context, source of origin, etc.)
A set of all features representing functional versions of a specified sequence (typically that of a gene) in a particular genome. The notion of ‘functional copy number’ (aka ‘genetic dosage’) describes how many ‘functional’ copies of a sequence are present in a genome - i.e. sequences that retain their normal activity and/or produce gene products that retain their normal activity. In diploid organisms, the normal dosage is 2 for autosomal genes/regions. Dosage increases if there is a duplication of the gene/region. Dosage decreases if there is either a deletion of a gene/region, or an inactivating mutation that eliminates gene function. This latter condition sets it apart from the notion of a ‘copy number complement’, which reflects how many actual copies of a sequence exist in a genome. Addition of a non-functional allele of a gene will increase its genomic sequence complement count (i.e. its copy number), but not increase its dosage. As for copy number complements, the defining ‘sequence’ here is specified in terms of a location on a reference sequence - typically the location where a gene or set of genes resides. But the criteria for membership in a functional copy number complement require only that the feature can perform the functions associated with the gene or genes at the defining location. A gene allele that varies by only one nucleotide from the wild-type gene may not qualify if that alteration eliminates the function of the allele. This represents an important distinction between ‘copy number’ and ‘functional copy number’. The former is not concerned with the functionality of sequence copies - only that there is a duplication of sequence in the genome. Thus, the addition of a non-functional allele of a gene will increase its copy number, but not increase its ‘functional copy number (aka its dosage).
a quality inhering in a feature in virtue of its presence only in the genome of gametes (germ cells).
[obsolete gene trapping technique]
Genetic dosage reflects how many ‘functional’ copies of a sequence are present in a genome. In diploid organisms, the normal dosage is 2 for autosomal genes/regions. Dosage increases if there is a duplication of the gene/region. Dosage decreases if there is either a deletion of a gene/region, or an inactivating mutation that eliminates gene function. This sets it apart from the notion of ‘copy number’, which reflects how many actual copies of a sequence exist in a genome. Addition of a non-functional allele of a gene will increase its copy number, but not increase its dosage. Duplications of a sequence can occur at new locations in the genome, such that the resulting sequence represents a distinct sequence feature from the copy at its native locus. For example, duplication of a region containing the human APOE gene on a different chromosome creates a sequence feature that shares sequence from the original gene, but not location, and therefore represents a different sequence feature. The notions of dosage and copy number are therefore concerned with sequence-level entities (how many copies of a ‘sequence’ exist), as opposed to sequence feature-level entities. The notion of a single-locus complement would be used to describe how many of a particular features are present in a genome - and describe which alleles of this feature are found. [obsolete genetic dosage]
[obsolete genetic insertion technique]
A sequence feature collection comprised of discontiguous sequences from a single genome Conceptually, members of this collection are meant to be about the sum total genetic material in a single cell or organism. But these members need not be associated with an actual material in a real cell or organism individual. For example, things like a ‘reference genome’ may not actually represent the material genome of any individual cell or organism in reality. Here, there may be no genomic material referents of the sequences in such a collection because the genome is tied to an idealized, hypothetical cell or organism instance. The key is that conceptually, they are still tied to the idea of being contained in a single genome. In the case of a genotype, the individual seqeunce members are not all about the genetic material of a singel cell or organism. Rather, it is the resolved sequence contained in the genotype that is meant to be about the total genomic sequence content of a genome - which we deem acceptable for classifying as a genetic locus collection.
A sequence feature position based on a genomic coordinate system, where the position specifies start and end coordinates based on its alignment with some reference genomic sequence.
A haplotype is an allele that represents one of many possible versions of a ‘haplotype block’, which defines a region of genomic sequence that is typically ‘co-inherited’ across generations due to a lack of historically observed recombination within it. 1. The relationship between ‘haplotype’ and ‘haplotype block’ is analogous to the relationship between ‘gene allele’ and ‘gene’: a ‘gene allele’ is one of many possible instances of a ‘gene’, while a ‘haplotype’ is one of many possible instances of a ‘haplotype block’. In this sense, a gene allele can be considered to be a haplotype whose extent is that of a gene (as it is generally true that there is a low probability of recombination within any given gene). 2. Haplotypes typically contain more than one ‘genetically-linked’ loci where sequence alterations are known to exist, such that a set of alterations will be co-inherited together across many generations of reproduction. A common use of ‘haplotype’ is in phasing of patient WGS or WES data, where this term refers to sequence containing two or more alterations that are beleived to occur ‘in cis’ on the same chromosomal strand. GENO’s definition is consistent with but more inclusive than this view, allowing for haplotypes with one or zero established alterations as long as there is a low probability of recombination within the region it spans (such that alterations found in cis are likely to remain in cis across successive generations). As a result, GENO considers any allele that spans an extent greater than that of a single sequence alteration to be a haplotype - as long as there is an expectation of low recombination frequency within the haplotype block occupied by the allele. For example, a ‘gene allele’ is a haplotype representing a particular version of a gene that contains one or more sequence alterations - as a ‘gene’ is a region of sequence with a low probability of recombination that is generally expeted to be inherited as a unit. 3. As highlighted in https://en.wikipedia.org/wiki/Haplotype, the term ‘haplotype’ is most commonly used to describe the following scenarios of genetic linkage between ‘alleles’: a. The first is regions containing multiple linked ‘gene alleles’ - i.e. specific versions of entire genes that are co-inherited because they reside in tightly linked clusters on a single chromosome. b. The second is a region containing multiple linked single nucleotide polymorphisms (SNPs) that tend to occur together on a chromosomal strand (i.e. be statistically associated). This use of ‘haplotype’ is commonly seen in phasing of patient WGS or WES data, to describe a state where two or more alterations that are believed to occur ‘in cis’ on the same chromosomal strand. c. A third, which is related to the previous case, occurs when the extent of region containing linked SNPs is that of a single gene. In this case, the haplotype represents a ‘gene allele’ - a version of an entire gene defined by the set of sequence alterations it contains. We may consider this a haplotype as most genes are small enough that there is little chance of recombination events moving cis alterations onto separate chromosomes. The GENO definition of ‘haplotype’ is broadly inclusive of these and any other scenarios where distinct ‘alleles’ of any kind on the same chromosomal strand are genetically linked, and thus tend to be co-inherited across successive generations.
A sequence feature representing a region of the genome over which there is little evidence for historical recombination, such that sequences it contain are typically co-inherited/transmitted across generations. A haplotype block is a class of genomic sequence defined by a lack of evidence for historical recombination, such that sequence alterations within it tend to be co-inherited across successive generations. A haplotype is considered to be one of many possible versions of a ‘haplotype block’ - defined by the set of co-inherited alterations it contains. In this sense, the relationship between ‘haplotype’ and ‘haplotype block’ is analogous to the relationship between ‘gene allele’ and ‘gene’* - a ‘gene allele’ is one of many possible instances of a ‘gene’, while a ‘haplotype’ is one of many possible instances of a ‘haplotype block’. The boundaries of haplotype blocks are defined in efforts to identify haplotypes that exist in organisms or populations. A haplotype block may span any number of sequence alterations, and may cover small or large chromosomal regions - depending on the number of recombination events that have occurred between the alterations defining the haplotype. ———————– * One difference however is that gene instnaces are necessarily ‘functional’ - so non-functional alleles of a gene locus wont qualify as gene instances. no such requirement exists for haplotype block instnaces.
A sequence feature attribute that reflects feature-level characteristics that depend only on the sequence, location, or genomic context of a feature or collection, but are independent of how it may be concretized in physical form.
[obsolete mutagen treatment technique]
An allele that is variant with respect to some wild-type allele, in virtue of its being very rare in a population (typically <1%), or being an experimentally-induced alteration that derives from a wild-type feature in a given strain. ‘Mutant’ is typically contrasted with ‘wild-type’, where ‘mutant’ indicates a natural but very rare allele in a population (typically <1%), or an experimentally-induced variation that derives from a wild-type background locus for a given strain, which can be selected for in establishing a mutant line.
A genomic feature that has an extent of zero.
[obsolete promoter trapping technique]
[obsolete random genetic insertion technique]
[obsolete random transgene insertion technique]
A sequence feature that references some biological macromolecule applied as a reagent in an experiment or technique (e.g. a morpholino expression plasmid, or oligonucleotide probe)
A version/allele of a gene that serves as a standard against which variant genes are compared. Being a ‘reference gene’ is a role or status assigned in the context of a specific dataset or analysis. In human variation datasets, ‘reference’ status is typically assigned based on factors such as being the most common version/allele in a population, being an ancestral allele, or being indentified first as a prototypical example of a gene. In model organism datasets, ‘reference’ genes are typically the ‘wild-type’ allele for a given gene, representing a functional and unaltered version of the gene that is part of a defined genomic background, and against which natural or experimentally-induced versions are compared.
A junction found at a chromosomal position where an insertion has occurred on the homologous chromosome, such that the junction represents the reference feature paired with the hemizygously inserted feature. In the case of a transgenic insertion that creates a hemizygous locus, the refernce locus that this insertion is variant_with is the junction on the homologous chromosome at the same position where the insertion occurred. This is the ‘hemizygous reference’ junction. The junction-insertion pair represents the allelic complement at that locus, which is considered to be hemizygous. Most genotype syntaxes represent this hemizygous state with a ’ /0’ notation.
A single locus complement that serves as a standard against which ‘variant’ sequences are compared
[obsolete reporter role; sequence feature attribute]
a collection more than one sequence features (ie a collection of discontinuous sequence features) 1. Note that members of this class can be features with extents of zero (e.g. junctions). This is likely different than the SO:sequence feature class which has members that are regions.
A collection of more than one sequence feature.
sequence attribute that can inhere only in a collection of more than one sequence features
An information entity that is intented to represent some biological sequence, sequence feature, qualified sequence feature, or a collection of one or more of these entities.
[obsolete targeted gene mutation technique]
[obsolete targeted genetic insertion technique]
[obsolete targeted knock-in technique]
A sequence attribute inhering in a feature whose identity is not specified.
A genomic feature known to exist, but remaining uncharacterized with respect to its identity (e.g. which allele exists at a given gene locus). An unspecified feature is known to exist as the partner of a characterized allele when the zygosity at that locus is not known. Its specific sequence/identity, however, is unknown (ie whether it is a reference or variant allele).
A copy number complement’ that has an abnormal number of members (e.g. more or less than two for an autosomal sequence in a diploid genome, as a result of deletion or duplication event(s). In a ’normal’ diploid genome, the copy number complement for any feature (on a non-Y chromosome) contains two members. A copy number variation occurs when a complement contains more or less than two members - as the result of deletion or duplication event(s). Note that the ‘copy number variation’ class in GENO is related to but ontologically distinct form the SO ‘copy_number_variation’ class. The GENO class refers to a set of all copies of a sequence in a genome, where the number of members in the set is in conflict with the genome’s normal ploidy (e.g. not two for a diploid genome). The SO class, which is defined as a sequence feature level concept and therefore represents a single continuous extent of sequence, refers to a single copy of duplicated (or deleted) sequence that comprises the set defined by the GENO CNV class.
A multifactorial inheritance pattern that is determined by the simultaneous action of alleles in few genes. It is recommended this term be used for traits governed by three gene loci, although it is noted that usage of this term in the literature is not uniform.
An allelic state that describes the number of different alleles of a gene from an organellar genome (i.e. mitochondrial, plastid) that may exist in a cell. Cells with a population of organelles from a single origin that all share the same organellar genome will contain only one allele of each organellar gene, while cells with populations of organelles of different origins may contain more than one allele of a given organellar gene.
A material entity that is an organism, derived from an organism, or composed of organisms (e.g. a cell line, biosample, tissue culture, population, etc).
[strain or breed; oryzias latipes strain; has_member; Oryzias latipes]
A construct that contains a mobile P-element, holding sequences to be delivered to a target cell or genome.
Describes an allele that is inherited from a male parent in virtue of the allele being present in the father’s sperm.
[phenotypic inheritance process; biological process]
A multifactorial inheritance pattern that is determined by the simultaneous action of alleles a large number of genes. Typically used for traits/conditions governed by more than three gene loci.
An attribute inhereing in a sequence feature for which there is more than one version fixed in a population at some significant percentage (typically 1% or greater), where the locus is not considered to be either reference or a variant.
An allele that is fixed in a population at some stable level, typically > 1%. Polymorphic alleles reside at loci where more than one version exists at some signifcant frequency in a population. Polymorphic alleles are contrasted with mutant alleles (extremely rare variants that exist in <1% of a population), and ‘wild-type alleles’ (extremenly common variants present in >99% of a population). Polymorphic alleles exist in equilibrium in a given population somewhere between these two extremes (i.e. >1% and <99%).
A qualified sequence feature that carries sequence derived from the genome of a cell or organism.
A set of qualified sequence features that carry genomic sequence. A ‘complement’ refers to an exhaustive collection of all objects that make up some well-defined set. This notion is useful for defining biologically-relevant sets of sequence features. For example, a haplotype is defined as the set of all genetically-linked alleles on a single chromosomal strand at a defined location - e.g. the SNP alleles {rs7412-C, rs429358-C} comprise the haplotype defining the APOEɛ4 gene allele. A complements may contain 0, 1, or more than one members. For example, the complement of alleles at a defined locus across homologous chromosomes in an individual’s genome will consist of two members for autosomal locations, and one member for non-homologous locations on the X and Y chromosome.
A sequence feature whose identity is additionally dependent on the context or state of the material sequence molecule in which the feature is concretized. This context/state describes factors external to the feature’s intrinsic sequence and position that can influences its expression, such as being targeted by gene-knockdown reagents, or an epigenetic modification. Modeling sequence entities at this ‘qualified’ level is useful for distinguishing cases where features with identical sequence and position as separate instances - based on their material bearers being found in different contexts. For example, consider a situation where the zebrafish shha gene (a sequence feature) is targeted in two experimental groups of fish by two different morpholinos, and phenotypes are assessed for each. We want to be able to represent two ‘variants’ of the shha gene in this scenario as separate ‘qualified sequence feature’ instances so we can capture data about the phenotypes resulting from each - just as we would separately represent to different sequence variants (alleles) of the shha gene at the sequence feature level so that we can track their associated phenotypes. GENO defines three levels of sequence-related artifacts, which are distinguished by their identity criteria. 1. ‘Biological sequence’ identity is dependent only on the ordering of units that comprise the sequence. 2. ‘Sequence feature’ identity is dependent on its sequence and the genomic location of the sequence (this is consistent with the definition of ‘sequence feature’ in the Sequence Ontology). 3. ‘Qualified sequence feature’ identity is additionally dependent on some aspect of the physical state or context of the genetic material in which the feature is concretized. This third criteria is extrinsic to its sequence and its genomic location. For example, the feature’s physical concretization being targeted by a gene knockdown reagent in a cell (e.g. the zebrafish Shha gene as targeted by the morpholino ‘Shha-MO1’), or its being transiently expressed from a recombinant expression construct (e.g. the human SHH gene as expressed in a mouse Shh knock-out cell line), or its having been epigenetically modified in a way that alters its expression level or pattern (e.g. the human SHH gene with a specific methylation pattern).
A sequence feature (or collection of features) whose identity is dependent on the context or state of its material bearer (in addition to its sequence an position). This context/state describes factors external to its inherent sequence and position that can influences its expression, such as being targeted by gene-knockdown reagents, or an epigenetic modification.
A set of qualified seqeunce features. ‘Sets’ are used to represent entities that are typically collections of more than one member. But we allow for sets that contain 0 members (an ’empty’ set) or 1 member (a ‘singleton’ or ‘unit’ set), consistent with the concept of ‘matehmatical sets’.
A gene altered in its expression level in the context of some experiment as a result of being targeted by gene-knockdown reagent(s) such as a morpholino or RNAi. The identity of a given instance of a reagent-targeted gene is dependent on the experimental context of its knock-down - specifically what reagent was used and at what level. For example, the wild-type shha zebrafish gene targeted in epxeriment 1 by morpholino1 annd in experiment 2 by morpholino 2 represent two distinct instances of a ‘reagent-targeted gene’, despite sharing the same sequence and position.
A set comprised of all reagent-targeted genes in a single genome in the context of a given experiment (e.g. the zebrafish shha and shhb genes in a zebrafish exposed to morpholinos targeting both of these genes). A ‘complement’ refers to an exhaustive collection of all objects that make up some well-defined set. Such a complement may contain 0, 1, or more than one members. The notion of a complement is useful for defining many biologically-relevant sets of sequence features. For example, a ‘reagent-targeted gene complement’ is the set of all genes in a particular genome that are targeted by reagents in the context of a particular experiment.
A region within a gene that is specifically targeted by a gene knockdown reagent, typically in virtue of bearing sequence complementary to the reagent.
An attribute inhering in a feature that is designated to serve as a standard against which ‘variant’ versions of the same location are compared. Being ‘reference’ is a role or status assigned in the context of a data set or analysis framework. A given allele can be reference on one context and variant in another.
An allele whose sequence matches what is consdiered to be the reference sequence at that location in the genome. Being a ‘reference allele’ is a role or status assigned in the context of a specific dataset or analysis. In human variation datasets, ‘reference’ status is typically assigned based on factors such as being the most common in a population, being an ancestral allele, or being indentified first as a prototypical example of some feature or gene. For example, ‘reference alleles’ in characterizing SNPs often represent the allele first characterized in a reference genome, or the most common allele in a population. In model organism datasets, ‘reference’ alleles are typically (but not always) the ‘wild-type’ variant at a given locus, representing a functional and unaltered version of the feature that is part of a defined genomic background, and against which natural or experimentally-induced alterations are compared.
A genome whose sequence is identical to that of a genome sequence considered to be the reference.
A sequence that serves as a standard against which other sequences at the same location are compared. A reference sequence is one that serves as a standard against which ‘variant’ versions of the feature are compared, or against which located sequence features within the reference region are aligned in order to assign position information. Being ‘reference’ does not imply anything about the frequency or function of features bearing the sequence. Only that some agent has used it to serve a reference role in defining a variant or locating a sequence.
A transgene part whose sequence regulates the synthesis of a functional product, but which is not itself transcribed.
A relation used to describe a process contextualizing the identity of an entity.
[reporter region; expressed transgene region]
A transgene that codes for a product used as a reporter of gene expression or activity.
[biological sequence unit; RNA residue]
[has_sequence_unit; biological sequence; RNA residue; RNA sequence]
[selectable marker; sequence feature attribute]
[selectable marker region; expressed transgene region]
A transgene whose product is used as a selectable marker.
An attribute, quality, or state of a sequence feature or collection. Sequence feature attributes can be ‘intrinsic’ - reflecting feature-level characteristics that depend only on the sequence, location, or genomic context of a feature or collection, or ’extrinsic’ - reflecting characteristics of the physical molecule in which the feature is concretized (e.g. its cellular context, source of origin, physical appearance, etc.). Intrinsic attributes include things like allelic state, allelic phase. Extrinsic attributes include things like its cellular distribution and chromosomal band intensity.
The location of a sequence feature as defined by its start and end position on some reference coordinate system. 1. A sequence feature location is defined by its begin and end coordinates on a reference sequence, but it is not identified by a particular sequence that may reside there. The same location, as defined on a particular reference, may be occupied by different sequences in the genome of organism 1 vs that of organism 2 (e.g. if a SNV exists within this location in only one of the organisms). 2. The notion of a sequence feature location in the realm of biological sequences is analogous to a BFO:spatiotemporal region in the realm of physical entities. A spatiotemporal region can be ‘occupied by’ physical objects, while a genomic location is ‘occupied by’ sequence features. Just as a spatiotemporal region is distinct from an object that occupies it, so too a genomic location is distinct from a sequence feature that occupies it. As a more concrete example, consider the distinction between a street address and the building that occupies it as analogous to the relationship between a genomic locus and the sequence feature that resides there.
A sequence feature or a set of such features. GENO defines three levels of sequence-related artifacts, which are distinguished by their identity criteria. 1. ‘Biological sequence’ identity is dependent only on the ordering of units that comprise the sequence. 2. ‘Sequence feature’ identity is dependent on its sequence and the genomic position if the sequence (aligns with definition of ‘sequence feature’ in the Sequence Ontology). 3. ‘Qualified sequence feature’ identity is additionally dependent on some aspect of the physical context of the genetic material bearing the feature, extrinsic to its sequence and its genomic position. For example, its being targeted by gene knockdown reagents, its being transgenically expressed in a foreign cell from a recombinant expression construct, its having been epigenetically modified in a way that alters its expression level or pattern, or its being located in a specific cellular or anatomical location.
A set of sequence features. ‘Sets’ are used to represent entities that are typically collections of more than one member. But we allow for sets that contain 0 members (an ’empty’ set) or 1 member (a ‘singleton’ or ‘unit’ set), consistent with the concept of ‘mathematical sets’. Sets may also include duplicates (i.e. contain more than one member representing the same feature). The notion of a ‘complement’ is a special case of a set, where the members necessarily comprise an exhaustive collection of all objects that make up some well-defined set. It is useful for defining many biologically-relevant sets of sequence features. For example, a ‘haplotype’ is the set of all genetically-linked alleles on a single chromosomal strand at a defined location - e.g. the SNP alleles {rs7412-C, rs429358-C} comprise the haplotype defining the APOEɛ4 gene allele [1]. And a ‘single locus complement’ is the set of all alleles at a specified location in a particular genome - e.g. the APOEɛ4 and APOEɛ4 gene alleles ([1], [2]) that make up the ‘Gs270’ APOE genotype [3]. [1] https://www.snpedia.com/index.php/APOE-%CE%B54 [2] https://www.snpedia.com/index.php/APOE-%CE%B52 [3] https://www.snpedia.com/index.php/Gs270
A pair of integers representing start and end position of a location on a sequence coordinate system.
An autosomal dominant inheritance pattern wherein the trait manifests in heterozygotes in a sex-specific manner (i.e. only in males or only in females).
An autosomal recessive inheritance pattern wherein the trait manifests only in homozygotes, and in a sex-specific manner (i.e. only in males or only in females).
A chromosome arm that is the shorter of the two arms of a given chromosome.
a heterozygous quality inhering in a single locus complement comprised of one variant allele and one wild-type/reference allele (e.g.fgf8a<ti282a/+>) [simple heterozygous]
A set representing the complement of all sequence features occupying a particular genomic location across all homologous chromosomes in the genome of a single organism. A ‘complement’ refers to an exhaustive collection of all objects that make up some well-defined set. Such a complement may contain 0, 1, or more than one members. The notion of a complement is useful for defining many biologically-relevant sets of sequence features. Here, a ‘single locus complement’ is the set of all alleles at a specified location in a particular genome. This complement is typically a pair of two features in a diploid genome (with two copies of each chromosome). E.g. a gene pair, a QTL pair, a nucleotide pair for a SNP, or a pair of entire chromosomes. The fact that we are counting how many copies of the same sequence exist in a genome, as opposed to how many of the same feature, is what sets feature-level concepts like ‘single locus complement’. apart from sequence-level concepts like ‘copy number complement’. To illustrate the difference, consider a duplication event that creates a new copy of the human APOE gene on a different chromosome. This creates an entirely new sequence feature at a distinct locus from that of the original APOE gene. The ‘copy number complement’ for sequence defined by the APOE gene locus would have a count of three, as this sequence is present three times in the genome. But the ‘single locus complement’ at the APOE gene locus would still have a count of two - because the duplicated copy is at a different location in the genome, and therefore does not represent a copy of the APOE locus.
Describes an allele that result from some spontaneous mutation event in a somatic cell after fertilization, and thus are not present in every cell in the body. We distinguish germline, somatic, and de novo allele origin based on a combination two key criteria - whether the allele inherited from a parent, and whether it is *heritble’ by offspring. Somatic variants are neither inherited or heritable - having originated via a spontaneous mutation in a non-germ cell. By contrast, germline variants are both inherited (passed down from a parent) and heritable (passable down to offspring). De novo mutations are not inherited but are typically heritable, as they originated through a spontaneous mutation that made them present in germ cells. These acquired mutations are called ‘somatic’ because they typically affect somatic (non-germ) cells. But when spontaneous do mutations occur in the germ cells of an organism, these can be passed on to offspring in whom they will be considered de novo mutations.
A maximal collection of organisms of a single species that have been bred or experimentally manipulated with the goal of being genetically identical. Two mice colonies with the same genotype information, but maintained in different labs, are different strains (many examples of this in MGI/IMSR)
[collection of organisms; taxonomic group]
A sequence feature representing the end of a sequence that is bounded only on one side (e.g. at the end of an chromosome or oligonucleotide).
A structurally or functionally defined component of a transgene (e.g. a promoter, a region coding for a fluorescent protein tag, etc)
A transgene that is delivered as part of a DNA expression construct into a cell or organism in order to transiently express a specified product (i.e. it has not integrated into the host genome).
The set of all transgenes trransiently expressed in a biological system in the context of a given experiment.
[aneusomic zygosity; trisomic heterozygous]
[trisomic homozygous; aneusomic zygosity]
An inheritance pattern that is not determined or not known.
Describes an allele that is part of an allelic complement where both alleles are inherited from the same parent. From Wikidedia: Uniparental inheritance is a non-mendelian form of inheritance that consists of the transmission of genotypes from one parental type to all progeny. That is, all the genes in offspring will originate from only the mother or only the father. This phenomenon is most commonly observed in eukaryotic organelles such as mitochondria and chloroplasts. https://en.wikipedia.org/wiki/Uniparental_inheritance
Describes an allele whose origin is not known.
A background genotype whose sequence or identity is not known or specified.
[unspecified life cycle stage]
[unspecified zygosity]
An attribute inhering in a sequence feature that varies from some designated reference in virtue of alterations in its sequence or expression level
An allele that varies in it sequence from what is considered the reference or canonical sequence at that location. Note that what is considered the ‘reference’ vs. ‘variant’ sequence at a given locus may be context-dependent - so being ‘variant’ is more a role played in a particular situation. A ‘variant allele’ contains a ‘sequence alteration’, or is itself a ‘sequence alteration’, that makes it vary_with some other allele to which it is being compared. But in any comparison of alternative sequences at a particular genomic location, the choice of a ‘reference’ vs the ‘variant’ is context-dependent - as comparisons in other contexts might consider a different feature to be the reference. So being ‘variant’ is more a role played in a particular situation - as an allele that is variant in one context/analysis may be considered reference in another. A variant allele can be variant along its entire extent, in which case it is considered a ‘sequence alteration’, or it can span a broader extent of sequence contains sequence alteration(s) as part. And example of the former is a SNP, and an example of the latter is a variant gene allele that contains one or more point mutations in its sequence.
A ‘copy number complement’ that has an abnormal number of members, as the result of deletion or duplication event(s). ‘Abnormal’ is typically more or less than two members for an autosomal sequence in a diploid genome, and more or less than one member for a sequence in a non-homologous region of a sex-chromosome.
An allele of a gene that contains some sequence alteration. A gene allele is ‘variant’ in virtue of its containing a sequence alteration that varies from some reference gene standard. But note that a gene allele that is variant in one context/dataset can be considered a reference in another context/dataset.
A genome that varies at one or more loci from the sequence of some reference genome.
An intrinsic genotype that specifies variation from a defined reference genome.
A single locus complement in which at least one member allele is considered variant, and/or the total number of features in the complement deviates from the normal poloidy of the reference genome (e.g. trisomy 13). Instances of this class are sets comprised of all allels at a specified genomic location where at least one allele is variant (non-reference). In diploid genomes this complement typically has two members. Note that this class also covers cases where deviant numbers of genes or chromosomes are present in a genome (e.g. trisomy of chromosome 21), even if their sequence is not variant.
An attribute describing a type of variation inhering in a sequence feature or collection.
An inheritance pattern wherein the trait is determined by alleles of a single causal gene on a W-chromosome.
An allele attribute describing a highly common variant (typically >99% in a population), that typically exhibits canonical function, and against which rare and/or non-functional mutant alleles are compared.
An allele representing a highly common varaint (typically >99% in a population), that typically exhibits canonical function, and against which rare and/or non-functional mutant alleles are often compared. ‘Wild-type’ is typically contrasted with ‘mutant’, where ‘wild-type’ indicates a highly prevalent allele in a population (typically >99%), and/or some prototypical allele in a background genome that serves as a basis for some experimental alteration to generate a mutant allele, which can be selected for in establishing a mutant strain. The notion of wild-type alleles is more common in model organism databases, where specific mutations are generated against a wild-type reference feature. Wild-type alleles are typically but not always used as reference alleles in sequence comparison/analysis applications. More than one wild-type sequence can exist for a given feature, but typically only one allele is deemed wild-type iin the context of a single dataset or analysis.
A gene allele representing the most common varaint in a population (typically >99% frequency), that exhibits canonical function, and against which rare and/or non-functional mutant gene alleles are compared in characterizing the phenotypic consequences of genetic variation. [wild-type allele; wild-type gene]
An X-linked inheritance pattern wherein the trait manifests in heterozygotes.
An inheritance pattern wherein the trait is determined by alleles of a single causal gene on an X-chromosome.
An X-linked inheritance pattern wherein a trait caused by alleles of a gene on the X-chromosome manifests in homozygous but not heterozygote individuals.
An inheritance pattern wherein the trait is determined by alleles of a single causal gene on a Y-chromosome.
A Z-linked inheritance pattern wherein the trait manifests in heterozygotes.
An inheritance pattern wherein the trait is determined by alleles of a single causal gene on a Z-chromosome.
A Z-linked inheritance pattern wherein a trait caused by alleles of a gene on the Z-chromosome manifests in homozygous but not heterozygote individuals.
ZFIN do not annotate with a pre-composed phenotype ontology - all annotations compose phenotypes on-the-fly using a combination of PATO, ZFA, GO and other ontologies. So while there is no manually curated zebrafish phenotype ontology, the Upheno pipeline generates one automatically here: http://purl.obolibrary.org/obo/upheno/zp.owl This ontology does not have a root ‘phenotype’ class, however, and so we generate our own in GENO as a stub placeholder for import of needed zebrafish phenotype classes. [zebrafish phenotype]
An allelic state that describes the degree of similarity of features at a particular location in the genome (i.e. whether the alleles or haplotypes are the same or different).