Exome Variant Analysis of Japanese Male

By Jack Cartee, Chris Monaco, and Hannah Hatchell

Introduction

The 1000 Genomes Project ran between 2008 and 2015 with the goal of creating the largest public database of human genomic data used to analyze the genetic variants occurring with relatively high frequencies in the human population. This data is used to study the patterns and significance of genetic variation in the human population. Here, we ran a variant analysis of a male Japanese individual’s exome which was sequenced as part of the 1000 Genomes (JPT) Japanese exome sequencing Project (NA18940) to identify potentially pathogenic variants.

Pipeline

pipeline

Table 1. Overview of analytical pipeline.

For our variant analysis, we incorporated the WGS/WES Mapping to Variant Calls workflow provided by samtools. We used the GRCh38 human genome as the reference genome to map our exome to determine genetic variants. The output file from this pipeline is a vcf file which is then used further downstream in the analysis.

Variant Analysis

The vcf file produced from the WES Mapping to Variant Calls pipeline was then annotated using the web service wANNOVAR. wANNOVAR is an online service that provides annotations for given genetic variants. We then annotated our vcf file with the HG38 genome as the reference genome.

>Variant Filtering

The resulting annotations of the significant variants were then analyzed for pathogenicity using CADD PHRED scores which are incorporated during the wANNOVAR annotation pipeline. The variants were filtered by calling only variants with a PHRED score of greater than or equal to 30. This means that the allelic frequency of this genetic variant is less than or equal to 0.1% in the human population (Kircher et. al.). Deleterious variants, variants that reduce an individual’s organismal fitness, are naturally depleted in a given population due to natural selection, therefore, the CADD score measures deleteriousness, a property that strongly correlates with pathogenicity (Kircher et. al.). In other words, the more rare a variant, the more likely it is to be deleterious. By measuring the deleteriousness factor through the CADD score, we determined potentially pathogenic variants present in our individual’s exome.

> Variant List

table2Table 2: Potentially pathogenic variants annotated through wANNOVAR. Pathogenicity was measured using CADD PHRED scores generated through wANNOVAR. Highlighted rows indicate variants selected by each  group member to characterize.

Variant 1: Homo sapiens tumor suppressor p53-binding protein 2, transcript variant 1 (TP53BP2) – Hannah Hatchell

Gene: TP53BP2

Nucleotide Change: C -> A

Amino Acid Change: R -> L at 492

Type of SNV: Nonsynonymous SNV

Genomic Position: Chr 1:  223799909

1

Figure 1. JMOL rendition of unaltered p53 protein structure in yellow. The R492L mutation is highlighted in red.

The p53 pathway has been shown to mediate cellular stress responses; specifically by initiating DNA repair, cell-cycle arrest, senescence, and apoptosis. These responses have been implicated in an individual’s ability to suppress tumor formation and to respond to many types of cancer therapy (Vasquez et al). This gene encodes a member of the ASPP (apoptosis-stimulating protein of p53) family of p53 interacting proteins, which is comprised of four ankyrin repeats and an SH3 domain responsible for the mediation of protein-protein interactions [provided by RefSeq, Jul 2008]. It is localized to the perinuclear region of the cytoplasm, and regulates apoptosis and cell growth through interactions with other regulatory molecules including members of the p53 family. Multiple transcript variants encoding different isoforms have been found for this gene.

Importantly, the dysregulation of TP53BP2 in vivo has been implicated in an array of different cancer phenotypes, including pancreatic, gastric, and breast cancer (Song et al, Jun et al). In a Korean population, one study suggested that the TP53BP2 locus is associated with susceptibility to gastric cancer in particular (Jun et al). The allelic frequencies of 4 SNP within TP53BP2, g.206692C>T, g.198267A>T, g.164895G>A and g.152389A>T, differed significantly between cases and controls (p < or = 0.0376). When compared to carriers of non-risk alleles, individuals homozygotic for each of the risk alleles had a 50% increase in risk of gastric cancer. This information could be hypothesized to hold true in a Japanese population as well, from which our particular subject was drawn.

SWISS-MODEL was used to generate a PDB file that was used for JMOL visualization. The PolyPhen-2 predicted pathogenicity score was 0.997, which places the mutation in the category of “probably damaging”. The specific mutation is a change from a hydrophilic arginine residue to a hydrophobic leucine residue, which could result in significant changes to the protein’s tertiary structure. Hydrophilic residues tend to reside on the protein’s outer surface, where they interact comfortably with solvent molecules. On the other hand, hydrophobic molecules are more often found to reside at a protein’s interior where such interactions can be avoided. Therefore, it can be hypothesized that the R492 mutation may pose serious structural changes and thereby impact the protein’s overall functionality.

Variant 2: Homo sapiens centrosomal protein 68, transcript variant 1 (CEP68 (G1983S)) – Jack Cartee

Gene: CEP68

Nucleotide Change: G -> A

Amino Acid Change: G -> S at 1983

Type of SNV: Nonsynonymous SNV

Genomic Position: Chr 2:  65072945

prot2

Figure 2. Jmol rendition of unaltered cep68 protein structure in yellow. The G1983S mutation is highlighted in red.

This genetic variant causes a nonsynonymous mutation in the CEP68 gene which encodes for the homo sapien centrosomal protein 68. This gene encodes for a centrosomal protein which is required for centrosome cohesion (Cornejo-García et. al.). Centrosome cohesion is an integral part of the cell cycle which maintains organization at the centre of the cell shortly before mitosis begins. The protein is required in mitosis and is used to decorate fibres coming from the proximal ends of the centrioles (Cornejo-García et. al.). The protein functions with high importance during cellular division.

Due to the highly conserved nature of this protein with regards to the cell cycle, nonsynonymous SNVs located within this gene can be highly deleterious. In recent literature, CEP68 has been found to be the major locus associated with aspirin intolerance susceptibility in asthmatics (Cornejo-García et. al). Non-steroidal anti-inflammatory drugs (NSAIDS) such as aspirin are highly consumed in the human population as a tool to treat minor pain and inflammatory diseases. Variants in CEP68 were found to be associated with hypersensitivity reactions to NSAIDs (Cornejo-García et. al).

This variant causes an amino acid change of a glycine residue to a serine residue at the 1983 position. The predicted pathogenicity score was 0.989 through PolyPhen2, indicating that the variant is “probably damaging”. Serine is a polar amino acid while glycine is an extremely small amino acid which can fit into either hydrophilic or hydrophobic environments. Therefore, when the glycine residue changes to serine, the tertiary structure of the protein changes drastically causing loss of wild-type function. The variant is compound heterozygous meaning both forms of this gene are damaged. This concludes that this may in fact be a pathogenic variant that could be initiating the hypersensitive reactions to NSAIDs which is a known disease phenotype associated with this variant.

Variant 3: CYB4B1 (M331I mutant) – Chris Monaco

Gene: CYP4B1

Nucleotide Change: G -> A

Amino Acid Change: M -> I at 331

Type of SNV: Nonsynonymous SNV

Genomic Position: Chr 1:  46815187

3

Figure 3. Jmol rendition of unaltered CYB4B1 protein structure in yellow. TheM331I mutation is highlighted in red.

This gene encodes a member of the cytochrome P450 superfamily of enzymes. The cytochrome P450 proteins are monooxygenases which catalyze many reactions involved in drug metabolism and synthesis of cholesterol, steroids and other lipids. This protein localizes to the endoplasmic reticulum. In rodents, the homologous protein has been shown to metabolize certain carcinogens; however, the specific function of the human protein has not been determined. Multiple transcript variants have been found for this gene. [provided by RefSeq, Jan 2016].

The individual in this study presents a nonsynonymous SNV mutation in the CYP4B1 gene where the guanine on chr1:46815187 is changed to an adenine. This polymorphism results in an amino acid change from methionine to isoleucine at residue 331. Because both methionine and isoleucine both contain hydrophobic side chains, this mutation is unlikely to have a significant impact on protein structure or function.

References

  1. Cornejo-García JA, Flores C, Plaza-Serón MC, et al. Variants of CEP68 Gene Are Associated with Acute Urticaria/Angioedema Induced by Multiple Non-Steroidal Anti-Inflammatory Drugs. Yao Y-G, ed. PLoS ONE. 2014
  2. Kircher M, Witten DM, Jain P, O’Roak BJ, Cooper GM, Shendure J. A general framework for estimating the relative pathogenicity of human genetic variants. Nature genetics. 2014;46(3):310-315. doi:10.1038/ng.2892.
  3. http://www.htslib.org/workflow/
  4. The 1000 Genomes Project Consortium. An integrated map of genetic variation from 1,092 human genomes. Nature 491, 56–65 (2012)
  5. Song, Bin, Qi Bian, Yi-Jie Zhang, Cheng-Hao Shao, Gang Li, An-An Liu, Wei Jing, Rui Liu, Ying-Qi Zhou, Gang Jin, and Xian-Gui Hu. “Downregulation of ASPP2 in Pancreatic Cancer Cells Contributes to Increased Resistance to Gemcitabine through Autophagy Activation.” Molecular Cancer 14.1 (2015): n. pag. Web.
  6. Jun, Yi, Khan. “TP53BP2 Locus Is Associated with Gastric Cancer Susceptibility.” International Journal of Cancer. U.S. National Library of Medicine, n.d. Web. 12 Dec. 2016.
  7. Vazquez, Alexei, Elisabeth E. Bond, Arnold J. Levine, and Gareth L. Bond. “The Genetics of the P53 Pathway, Apoptosis and Cancer Therapy.” Nature Reviews Drug Discovery 7.12 (2008): 979-87. Web.

 

 

Advertisements

Exome variant analysis of GBR female

By: Group G8 – Adam Dabrowski, Christian Colon, Harrison Kim, Juichang (David) Lu, Erisa Sula

Introduction:

The 1000 Genomes Project is the culmination of a collaborative effort to sequence the genomes of different populations from around the globe. It is the largest, publicly-accessible catalog for human genotypic data and is composed of 2,504 samples across 26 different world populations. The goal of the project is to isolate variants of at least 1% across the populations studied. After the initial work that was done for the project, the information available could be immense for the development of diagnosing genetic issues and the future of disease isolation and cure. This report is an analysis of the single nucleotide polymorphisms of one individual pulled from the 1000 genomes project.

Methods:

This project began by selecting sample HG00111, a female from Great Britain. The two FASTQ files, ERR031963_1.filt.fastq and ERR031963_2.filt.fastq, were used in conjunction with the GRCh38_full_analysis_set_plus_decoy_hla.fa reference file to generate our .vcf file in a variant calling pipeline equipped with VCFTools and SAMTools. With the results from the variant calling, we used wANNOVAR and Variant Effect Predictor to filter the variants. We then sorted the variants by CADD phred score of greater than 30 (for their pathogenic tendency) and also by Minor Allele Frequency (EUR) less than 0.01, to look for rarity in the population. Those parameters left us with a select list of 11 variants. Each group member selected their own variant, began a literature review, and analyzed their variant of choice; using sources like OMIM, NCBI and dbSNP in addition to SWISS- MODEL to create models of the 3D structures of the proteins affected by the selected variants.

Pipeline: blue – variant calling, green – variant filtering, purple – variant analysispipeline

Table of Variants: the highlighted variants are the ones selected for closer study

Variant Gene Chromosome Reference Alternate
rs200198757 CFAP58 chr10 C T
rs181501140 WDR63 chr1 G T
rs202188944 DPAGT1 chr11 G A
rs186069877 ST18 chr8 C T
rs145821086 ERO1LB chr1 A G
rs139192433 SH3TC2 chr5 C T
rs138207610 POLE chr12 G A
rs144319704 LTK chr15 C T
rs146089422 OSBPL10 chr3 C T
rs34084056 MOK chr14 C T
rs149869826 SYTL5 chrX C T

POLE – Adam Dabrowski

  • Gene: POLE
  • Chromosome: 12
  • SNP: rs138207610
  • Base Change: G to A
  • Amino Acid Change: P [Pro] to S [Ser] (on the reverse strand)
  • CADD phred: 33
  • Minor Allele Frequency (EUR): 0.002
  • Mutation Type: Nonsynonymous

The variant rs138207610 is the single nucleotide change in the POLE gene found on chromosome 12. The normal function of this gene is to encode for a catalytic subunit of DNA polymerase epsilon, which itself plays a key role in leading strand DNA synthesis and base repair (NCBI, 2016). However, when this gene is altered, it can drastically affect its function and proficiency, which can be seen when this SNP is present in those individuals. As shown in the data provided by NCBI’s dbSNP database (Figure 1), when the mutation on the forward strand changes the base from a G to an A at location 132676611, it also causes the corresponding mutation in the reverse strand changing the C to a T.

pole-1Figure 1: dbSNP Reference Data of Amino Acid Change

According to the Gene Models, in 4 of the 6 cases that SNP changes the amino acid from proline to serine, which leads to a serious mutation in the POLE gene. The mutated version of the gene is not shown but, for reference, a normal structure POLE gene is shown below in Figure 2. The change that this causes is because the original proline is nonpolar while the new serine is a polar amino acid. This change could most certainly affect the structure of the gene and therefore affect its binding capabilities. This mutation and potential misfolding lead to some serious issues within the individual.

pole-2

Figure 2: 3D Model of a normal POLE gene

When there are mutations in this gene it can lead to very problematic results. The largest pathogenic risk with this variant of the POLE gene is that it can lead to colorectal cancer and polyposis (OMIM, 2016).  To be clear, there is no direct causation between malfunctioning POLE and colorectal cancer or polyposis, but there is a strong inclination, based on results that a mutated version of POLE would be an indicator for the disease and even indicate early onset as well (Ambry, 2015). There has been enough congruence between this malfunctioning gene and the large frequency of colorectal cancer and polyposis in order to warrant more study and research to be performed (Stuenkel et al., 2014). A study conducted by Valle et al. (2014) did find that one individual in their sample of 858 had both a mutated POLE based on this SNP and also had colorectal cancer. Now, this rate may seem exceedingly low, however, their sample was 858 Spanish persons. When looking at the break down of allelic frequencies across populations provided by 1000 Genomes Project (Figure 3), it is evident that people of Spanish descent (IBS) do not even posses this variant in their genotype sequencing (1000 Genomes Browser, 2016). The genotype is 100% G for this allele. Additionally, the individual of study for our group was of British descent (GBR), and when referring to their allelic frequency in Figure 3, it can be seen that 1% of the population contains this variant. When further examining Figure 3, it is interesting to see that this variant is not expressed at all outside of the European populations, and even within them it is only expressed at 1% in the CEU and GBR populations. Therefore, if more research were to be done on this variant, it would be imperative to only select individuals from these cohorts.

pole-3Figure 3: Allelic Frequency Breakdown Among Populations

This variant does seem to have pathogenic tendencies, but further research is most certainly required. Although there is no conclusive evidence, there seems to be a link between a mutation in the gene due to this variant and colorectal cancer and polyposis. Therefore, further research should be conducted to see if there is a definitive link between the variant and the diseases it has been thought to play a role in.

Citations

  1. Ambry Genetics. Accessed December 5, 2016. http://www.ambrygen.com/tests/pold1-and-pole-analysis.
  2. NCBI. “POLE DNA polymerase epsilon, catalytic subunit [ Homo sapiens (human) ].” Accessed December 7, 2016. https://www.ncbi.nlm.nih.gov/gene/5426.
  3. Stuenkel, A. J., Carin R. Espenschied, Brandon Smith, Rachel McFarland, and Tina Pesaran. “POLD1 and POLE: preliminary data from a laboratory-based multi-gene panel testing cohort.”
  4. OMIM. Accessed December 5, 2016. https://www.omim.org/entry/174762.
  5. Valle, L., et al., New insights into POLE and POLD1 germline mutations in familial colorectal cancer and polyposis. Human molecular genetics, 2014. 23(13): p. 3506-3512.
  6. 1000 Genomes Browser. “rs138207610 SNP.” http://browser.1000genomes.org/Homo_sapiens/Variation/Population?db=core;r=12:133252697-133253697;v=rs138207610;vdb=variation;vf=24278591.

 


ERO1B – Erisa Sula

  • Gene: ERO1B – ENDOPLASMIC RETICULUM OXIDOREDUCTIN 1-LIKE, BETA
  • Chromosome: 1
  • SNP: rs145821086
  • Base Change: A to G
  • Amino Acid Change: F [Phe] to L [Leu] (on the reverse strand)
  • CADD phred: 31
  • Minor Allele Frequency (EUR): 0.001
  • Mutation Type: Nonsynonymous SNV

The ERO1B gene is located on chromosome 1 at location 1:236,215,121-236,282,038 [2]. The observed variant in this individual, rs145821086, is a single nucleotide polymorphism (SNP) occurring at location 236,220,932 which is a coding region of the genome within the ERO1B gene. The ancestral allele at this position is an adenine (A) base, but people with this particular variant have a guanine (G) instead. As shown in figure 1, this variant results in an amino acid change from a phenylalanine to a leucine at position 415 in the ERO1B protein [2]. There are no alternative splice variants for the ERO1B protein and that is why only one accession number is listed in the figure.

ero1b-1Figure 1. dbSNP data [2]

This SNP is found at very low frequencies across all populations. According to the 1000 Genomes Project, it occurs in about 0.1% of the EUR (european) population [2] as can be seen in the figure below.

ero1b-2Figure 2. Minor allele frequencies [2]

This specific site on chromosome one not only has very low minor allele frequencies across all populations, but the site is also very highly conserved among different species [3] as can be seen in Figure 3.

ero1b-3Figure 3. UCSC Genome browser shows high conservation of this site in the genome across many species [3]

The ERO1B gene codes for the ERO1B protein – Endoplasmic reticulum oxidoreductin 1-like, beta. One of this protein’s main functions is to reoxidize protein disulfide isomerases (PDI) in the endoplasmic reticulum [4]. This means that it is a highly important part of the formation of disulfide bonds and in ensuring the proper folding of proteins. The highest expression of this protein is found in the adult pancreas [4]. Studies have shown that treating human and monkey cells in ways that induce protein unfolding increases transcription of the ERO1B gene [4]. A model of the regular ERO1B protein with no mutations is shown in Figure 4. This model was created by SWISS-MODEL and used the ERO1-A protein structure as a template due to it having a sequence similarity of 64.57% with ERO1B [5].

model

Figure 4. ERO1B protein structure model created by SWISS-MODEL [5].

To get a better look at the exact SNP, rs145821086, the SWISS-MODEL generated PDB file was opened in JMOL to highlight the amino acid change in the protein. This is shown in Figure 5.

A)                                                             B)

Figure 5. a) Ero1B model created in SWISS-MODEL, opened in jmol to zoom in and highlight the exact amino acid that is affected by the SNP. PHE 415 has been colored red. In the individual we are analyzing, this amino acid has been changed to a LEU. b) Same view of the protein as in a) but including visualization for the protein surface. It is apparent from this image that the phenylalanine residue is not on the surface of the protein.

Even though this variant had a high CADD score implying it is likely to be pathogenic, the amino acid change from PHE (phenylalanine) to LEU (leucine) seems unlikely to cause any major changes in protein structure. PHE and LEU are both very hydrophobic amino acids, additionally they are both similar in size. Their hydrophobicity, as seen in Figure 5, also means that they would not be on the surface of the protein where they would be exposed to an aqueous environment and have a higher probability of not being part of a binding site, unless the protein undergoes major conformational changes when oxidizing PDI’s.

A 2006 paper by Dias-Gunasekara et al. concluded that mutations within the FAD binding domain of the ERO1B protein could compromise the conformational stability of the protein and lead to mis-oxidation [6]. However, the SNP being analyzed here is outside of the FAD binding domain and therefore unlikely to be linked to the same issues. Another paper by Dias-Gunasekara et al. showed that a mutation that resulted in a change in the amino acid cysteine 396 (C396), would prevent ERO1B homodimer formation [7]. This mutation falls in a highly conserved region of the ERO1B protein that follows the CXXCXXC pattern. PHE415 is not within this pattern but as seen in Figure 3 is also very highly conserved across many species. This gives cause to believe that a change at this location could potentially be deleterious. However, since this variant has such a low frequency in the global population (it isn’t found at all in some populations and has the highest incidence in the EUR population of only 0.1%), it has not been properly studied yet and it is not known what the direct effects of this variant are.

Citations:

  1. Chang X, Wang K. wANNOVAR: annotating genetic variants for personal genomes via the web. J Med Genet. 2012 Jun 20
  2. Reference SNP (refSNP) Cluster Report: Rs145821086. (n.d.). Retrieved December 12, 2016, from https://www.ncbi.nlm.nih.gov/projects/SNP/snp_ref.cgi?rs=145821086
  3. Kent WJ, Sugnet CW, Furey TS, Roskin KM, Pringle TH, Zahler AM, Haussler D. The human genome browser at UCSC. Genome Res. 2002 Jun;12(6):996-1006.
  4. Hartz, P. A. (2013, October 8). ENDOPLASMIC RETICULUM OXIDOREDUCTIN 1-LIKE, BETA; ERO1LB. Retrieved December 12, 2016, from https://www.omim.org/entry/615437?search=ero1b&highlight=ero1b
  5. Marco Biasini; Stefan Bienert; Andrew Waterhouse; Konstantin Arnold; Gabriel Studer; Tobias Schmidt; Florian Kiefer; Tiziano Gallo Cassarino; Martino Bertoni; Lorenza Bordoli; Torsten Schwede. (2014). SWISS-MODEL: modelling protein tertiary and quaternary structure using evolutionary information. Nucleic Acids Research (1 July 2014) 42 (W1): W252-W258; doi: 10.1093/nar/gku340.
  6. Dias-Gunasekara S, van Lith M, Williams JA, Kataky R, Benham AM. Mutations in the FAD binding domain cause stress-induced misoxidation of the endoplasmic reticulum oxidoreductase Ero1beta. The Journal of biological chemistry. Sep 01 2006;281(35):25018-25025.
  7. Dias-Gunasekara S, Gubbens J, van Lith M, et al. Tissue-specific expression and dimerization of the endoplasmic reticulum oxidoreductase Ero1beta. The Journal of biological chemistry. Sep 23 2005;280(38):33066-33075.

 


ST18 – Juichang (David) Lu

  • Chromosome: 8
  • SNP: rs186069877
  • Base Change: C to T
  • Amino Acid Change: From glutamic acid [Glu] to Lysine [Lys].
  • CADD phred: 34
  • Minor Allele Frequency (EUR): 0.001
  • Mutation Type: Nonsynonymous

rs186069877 is a single nucleotide variant located on chromosome 8. This mutation is responsible for changing one base pair from C to T at location 52149740. The consequence of this mutation is to change the glutamic acid to lysine at location 682 at one of the version of transcript (Figure 1). The report was obtained from dbSNP database[1].

st18-1Figure 1. rs186069877 mutation (missense) changes the glutamic acid to lysine in protein.

The location of this SNP is particularly interesting. The loss of heterozygosity on chromosome 8 was commonly observed in the progression of cancer in humans. A previous study has identified the ST18 gene to be one of them[2]. ST18 gene encodes a zinc-finger DNA-binding protein with six fingers and an SMC domain. ST18 was identified to have the potential of being a transcriptional regulator. ST18 was also observed to be expressed in multiple normal tissue cells. On the other hand, it was significantly down regulated in breast cancer cells. Moreover, 160 bp within the promoter region of ST18 gene was hypermethylated in most of the breast cancer samples[2]. The correlation of the hypermethylation and underexpression suggests that the epigenetic mechanism, instead of the mutation on the protein, is more likely to cause the progression of cancer.

A 3D image was generated using SWISS-MODEL tool [3]. The model was selected based on the  GMQE score, which is a quality estimation of the model. The number ranges between 0 and 1, reflecting the expected accuracy of a model built with the alignment. Higher number indicates higher quality. Due to the limitation of the database, the sequence of ST18 was not successfully modeled. The picture was modeled with only partial sequence (KIAA0535 protein), and the mutation site was not mapped in the graph. The template was not able to cover the whole sequence; therefore, the model was generated based on the partial sequence covered by the template. The cause of this problem can be that the protein does not contain many highly conserved domain, so the algorithm was not able to determine the structure from the amino acid sequence alone.st18-2

 

Figure 2. 3D image generated with partial alignment with Swiss-Model. The name of this protein is KIAA0535 protein, where it shows a 100% identical alignment.

The rs186069877 mutation was classified as missense mutation, or a point mutation. It indicates that one single nucleotide on the gene mutated and leads to the codon codes for a different amino acid. In our case, the mutation causes the original glutamic acid to mutate to lysine. While we are not able to obtain a viable model, we can say that the chemical property of the two amino acids are very different. Glutamic acid has a negative charge on the  carboxylic acid functional group. On the other hand, lysine has a positive charge on the amine group. Other than the difference in charge, the length of the two amino acid varies greatly too. The difference between them are two carbon bonds in length. Although the mutated amino acid has very different chemical property, the effect of the mutation really depends on the position of the amino acid in the perspective of structure. At the right position, mutation can disrupt other secondary structure and leads to a dysfunctional protein. The frequency of rs186069877 very rare among the population. To be specific, the SNP was only observed in european population with a 0.1% chance[1].

Since we are not able to assess the importance of our SNP by building a model, we look at the mutated position and see if the amino acid was conserved throughout the species (Figure 3). As Figure 3 shows, the mutated site was conserved in most of the species, indicating its importance in the evolutionary point of view.

f198188a9a97457a846949ea74d00b3a

Figure 3. Conservation of mutation site obtained from UCSC genome browser.

st18-3Figure 4. Population diversity of SNP rs186069877

While we are uncertain about the effect of rs186069877 on the protein, the mutation of ST18 gene can cause many diseases including pemphigus vulgaris (PV) [4], Alzheimer’s disease [5], and primary angle glaucoma[6]. However, unlike rs186069877, most of the documented variance are located in intron, or intergenic region. Considering the regulatory function of intron, the result indicates that these disease were mainly caused by the regulatory problem of ST18 instead of structural dysfunctionality.

Citations

  1. Reference SNP (refSNP) Cluster Report: rs186069877. (n.d.). Retrieved December 12, 2016, from https://www.ncbi.nlm.nih.gov/projects/SNP/snp_ref.cgi?rs=rs186069877
  2. Jandrig, B. et al. ST18 is a breast cancer tumor suppressor gene at human chromosome 8q11.2. Oncogene 23, 9295–9302 (2004).
  3. Marco Biasini, Stefan Bienert, Andrew Waterhouse, Konstantin Arnold, Gabriel Studer, Tobias Schmidt, Florian Kiefer, Tiziano Gallo Cassarino, Martino Bertoni, Lorenza Bordoli, Torsten Schwede (2014). SWISS-MODEL: modelling protein tertiary and quaternary structure using evolutionary information Nucleic Acids Research 2014 (1 July 2014) 42 (W1): W252-W258
  4. Sarig, Ofer et al. “Population-Specific Association between a Polymorphic Variant in ST18, Encoding a Pro-Apoptotic Molecule, and Pemphigus Vulgaris” Journal of Investigative Dermatology , Volume 132 , Issue 7 , 1798 – 1805
  5. Sherva, Richard et al. “Genome Wide Association Study of the Rate of Cognitive Decline in Alzheimer’s Disease.” Alzheimer’s & dementia : the journal of the Alzheimer’s Association 10.1 (2014): 10.1016/j.jalz.2013.01.008. PMC. Web. 12 Dec. 2016.
  6. Day, Alexander C et al. “Genotype–phenotype Analysis of SNPs Associated with Primary Angle Closure Glaucoma (rs1015213, rs3753841 and rs11024102) and Ocular Biometry in the EPIC-Norfolk Eye Study.” The British journal of ophthalmology 97.6 (2013): 704–707. PMC. Web. 12 Dec. 2016.

 


SH3TC2 – Harrison Kim

  • Location: Chr5
  • Dbsnp: rs139192433
  • Base Change: C to T
  • CADD phred score: 33
  • Minor allele frequency(EUR) = .002
  • Mutation Type: Nonsynonymous
  • Amino acid change: R[Arg] => Q[Gln]

Snp rs139192433 is a variant responsible for changing the amino acid arginine to glutamine on chromosome 5, location 149008949, the site of the SH3TC2 gene (Figure 1).

sh3tc2-1Figure 1. Snp causes missense mutation changing arginine to glutamine.

The SH3TC2 gene comprises the Src Homology Domain 3(SH3) and a tetratricopeptide repeat 2 (TC2). SH3 domain’s functions are not well known, but are thought to be involved in regulation of various processes especially in increasing local concentrations of proteins and the formation of multiprotein complexes [3]. The other protein in the complex produced by the gene is TC2. This protein is a structural motif whose functions are to mediate protein-to-protein interactions as well as multiprotein complex formations. It is also involved in numerous biological processes but most notable for neurogenesis and protein folding [4].

SH3TC2 is a member of a small gene family and is proposed to be an adapter or docking molecule that plays a role in the assembly of multiprotein complexes [5]. Furthermore, the gene is responsible for producing a protein expressed in Schwann cells of the peripheral nerves and localized to the plasma membrane and perinuclear endocytic recycling compartment. The gene’s role suggests a possible function in myelination and regions of axoglial interactions [6].

The mutation is classified as a homozygous missense mutation. The primary disease associated with this mutation is Charcot-Marie-Tooth disease or better known as CMT. CMT is a progressive disorder affecting peripheral nerves which are crucial for the detection of sensations. Although it is not fatal, CMT can set on very early in life and ranges in severity of muscle weakness. CMT4 is the form of CMT that is associated with this SNP and involves abnormalities in the myelin sheath and axon, and exhibits a pattern of inheritance, that distinguishes it from the other forms of CMT [2].

CMT4C is caused by a biallelic pathogenic variant of SH3TC2 and is a frequent cause of the autosomal recessive CMT4 disease. Prevalence for CMT4C among those already affected by CMT is approximately 18% which was not as low as it seems given that the statistic was pulled from a sample size of 299 patients in a separate study. Pathogenic variants were found in individuals from Algeria, Morocco, France, Belgium, England, the Netherlands, Germany, Austria, Italy, Bosnia, Czech, Greece, Turkey, Iran, Japan, and Canada. 20% of affected individuals of Turkish descent possessed a biallelic SH3TC2 pathogenic variant [1].

Lupo et al. researched SH3TC2 missense mutations in a clinical study involving Caucasian non-Gypsy patients affected by CMT. They state that there are 19 total SH3TC2 mutations identified from non-Gypsy Caucasian families ranging from Turkish to UK backgrounds and that it has a prevalence of 28 in 100,000 people. To understand the mechanism of the pathology, they focused on sub-cellular localization of the protein and how possible missense mutations affecting this ultimately leads to the demyelinating neuropathy. Based off their findings, they state that the two pathways, endocytic and membrane-trafficking, are involved in the pathogenesis of CMT4C disease and that the missense mutation is actually affecting communication between the abnormal Schwann cells and axons [7].

sh3tc2-2Figure 2. dbSnp table representation of population diversity (alleles in RefSNP orientation)

This variant is quite rare in the European population as it is shown from dbSNP that only ~0.2% are affected (Figure 2). However, compared to the other populations, the variant is still more prevalent for the European population. Considering the subject chosen for the project, the exact variant in the Great Britain population is more prevalent compared to the overall European population, suggesting a higher risk for obtaining the variant and thus possible pathogenicity (Figure 3).

sh3tc2-3

Figure 3. Furthermore, using NCBI’s 1000 Genomes Browser, it is shown that the variant is only seen in ~1% of Great Britain’s population.

sh3tc2-4Figure 4. UCSC Genome Browser above displays conservation at chromosome 5, site 149008949 across all species shown in the table.

Following a homology report for the SH3TC2 gene, it is apparent that the amino acid in question is largely conserved throughout many different species as seen in Figure 4 above. Only the lamprey (not shown in the figure) reported the amino acid being unidentified.

sh3tc2-5

Figure 5. Swissprot model of Tetratricopeptide repeat protein 7B.

 

SH3TC2 variant produces 5 isoforms from alternative splicing. The model of the secondary protein structure shown above is from isoform 1 and was chosen based off of the highest sequence identity being 14.35% (amongst >20 models generated). This structure is Tetratricopeptide repeat protein 7B and the image was taken from Swiss-MODEL (Figure 5). Unfortunately,  the exact protein-protein complex model involving both the SH3 domain and TC2 protein was not able to be produced via Swiss-MODEL or other protein visualization softwares, and thus the exact location of the mutation could not be found or marked. However, one may surmise that a change in amino acid from arginine to glutamine may drive functional/structural changes to the protein at large. Arginine is an amino acid with a charged R group, whereas glutamine is an amino acid with an uncharged polar R group. If the change occurs at the surface of the protein, it might make sense then that the protein complex could disrupt or change the biological functions of the surrounding proteins. On the other hand, if the change is occurring within the protein-complex, it would only alter the SH3TC2 protein’s structure or function while still affecting the protein’s original function.

Based off available literature and data, it seems that this variant possesses pathogenic qualities. However, databases like ClinVar interpret the variant as having uncertain clinical significance and therefore it is not conclusive to say that the variant poses any pathogenic threat. Further research conducted on the variant will help elucidate whether the variant is definitively pathogenic.

Citations

  1. Azzedine H, LeGuern E, Salih MA. Charcot-Marie-Tooth Neuropathy Type 4C. 2008 Mar 31 [Updated 2015 Oct 15]. In: Pagon RA, Adam MP, Ardinger HH, et al., editors. GeneReviews® [Internet]. Seattle (WA): University of Washington, Seattle; 1993-2016. Available from: https://www.ncbi.nlm.nih.gov/books/NBK1340/
  2. “Charcot-Marie-Tooth Disease – Genetics Home Reference.” U.S. National Library of Medicine. National Institutes of Health, n.d. Web. 09 Dec. 2016.
  3. EMBL-EBI, InterPro. “InterPro.” SH3 Domain (IPR001452). N.p., n.d. Web. 09 Dec. 2016.
  4. EMBL-EBI, InterPro. “InterPro.” Tetratricopeptide Repeat 2 (IPR013105). N.p., n.d. Web. 09 Dec. 2016.
  5. Genecards.org. N.p., n.d. Web. 09 Dec. 2016.
  6. “SH3 DOMAIN AND TETRATRICOPEPTIDE REPEAT DOMAIN 2; SH3TC2.” Omim.org. N.p., n.d. Web. 9 Dec. 2016.
  7. Vincenzo Lupo, Máximo I. Galindo, Dolores Martínez-Rubio, Teresa Sevilla, Juan J. Vílchez, Francesc Palau, and Carmen Espinós. Missense mutations in the SH3TC2 protein causing Charcot-Marie-Tooth disease type 4C affect its localization in the plasma membrane and endocytic pathway. Hum. Mol. Genet. (2009) 18 (23): 4603-4614 first published online September 10, 2009 doi:10.1093/hmg/ddp427

 


SYTL5 – Christian Colon

  • Gene: SYTL5 (slp5)
  • Chromosome: X
  • SNP: rs149869826
  • Base Change: C to T
  • Amino Acid Change: A [Ala] to V [Val]
  • Minor Allele Frequency (EUR): 0.0099
  • Mutation Type: Nonsynonymous

The SYTL5 gene is located on the X chromosome at location 37,906,068 bp and ends at 38,129,294 bp from pter[1].  The variant rs149869826 is caused by a single-nucleotide polymorphism at position 1756 within the gene which causes a thymine to be present instead of a cytosine.  This leads to a residue change of alanine to valine at position 467 of the SYTL5 protein.

sytl5-1

Figure 1.  Model of proper SYTL5 protein with an alanine (red) residue at position 467.  Model was generated with jmol.

sytl5-2

Figure 2. Model of rs149869826 variant of SYTL5 protein.  There is a valine (green) residue present at position 467.  Model was generated using jmol.

sytl5-3

Figure 3. Allele frequencies for different regions.

This SNP occurs in about 0.99% of the European population and has an even smaller occurrence in other populations.  Both alanine and valine are hydrophobic amino acids and want to be on the inside of the protein.  This transition should have little effect on the overall structure of the protein.  The SYTL5 gene encodes a protein that is a part of the synaptotagmin-like protein family.  This family of proteins function as binding sites for Rab72A which plays a part in protein transport [2].  It was found in a study that SYTL5 proteins interacting with Rab27a will decrease the interaction between Rab27a and ENaC, which is an epithelial sodium channel [3].  This is important because Rab27a inhibits ENaC mediated currents.  Changes to SYTL5 could cause it to not bind with Rab27a which would allow it to inhibit ENaC and alter absorption of sodium ions.  This study did not state our specific SNP as having an effect on SYTL5 and Rab27a interaction.  A different study found that the interaction between SYTL5 and Rab27a  is associated with human hemophagocytic syndrome [4].  Neither of these studies mention our specific SNP of SYTL5 and it would be unlikely that this SNP is pathogenic if it isn’t causing a structural change that will affect its binding to Rab27a.  

Citations

  1. http://www.genecards.org/cgi-bin/carddisp.pl?gene=SYTL5
  2. https://www.ncbi.nlm.nih.gov/gene?cmd=Retrieve&dopt=Graphics&list_uids=94122
  3. Saxena, S. K., Horiuchi, H., & Fukuda, M. (2006). Rab27a regulates epithelial sodium channel (ENaC) activity through synaptotagmin-like protein (SLP-5) and Munc13-4 effector mechanism. Biochemical and Biophysical Research Communications, 344(2), 651-657. doi:10.1016/j.bbrc.2006.03.160
  4. Kuroda, T. S., Fukuda, M., Ariga, H., & Mikoshiba, K. (2002). Synaptotagmin-like protein 5: A novel Rab27A effector with C-terminal tandem C2 domains. Biochemical and Biophysical Research Communications, 293(3), 899-906. doi:10.1016/s0006-291x(02)00320-0

Pathogenic Variant Analysis of Han Chinese Exome

Michael Finlayson, Meixue Duan, Ruoyu Tian, Brandon Smith, Bowen Yang

Introduction

1000 genome project ran between 2008 to 2015 with three phases. The aim of the 1000 genome project was to characterize the genetic variants among geographical populations to get a sense of the genetic contributions to diseases. Low-coverage whole-genome sequencing and exome sequencing were conducted among 1092 individuals from 14 populations. And 98% of accessible SNPs, short indels and long insertions were validly captured. In our study, we analyzed one of the exome sequencing data downloaded from 1000 genome database. The individual is HG00556, male, from Han Chinese South population. We have found the variants in this individual, filtered the variants with high CADD phred score and done further research on characterizing four of those possible pathogenic variants.

Workflow

Figure 1: Workflow

Methodology

Alignment and variant calling

The raw exome reads were aligned to the GRCh38 reference genome using BWA version 0.7.12-r1039. The aligned reads were cleaned, sorted, indexed, and with samtools version 1.3.1. Variants were called with samtools and bcftools, both version 1.3.1.

Genomic variant annotation

All the variants were annotated by wANNOVAR (web server ANNOtate VARiation) software (http://wannovar.wglab.org/)  to facilitate fast and easy variant annotations, including gene-based, region-based and filter-based annotations on a variant call format (VCF) file generated from alignment and variant calling. The functional annotations showed in the results are including different types of gene annotations, alternative allele frequency in the 1000 Genomes Project, conserved element annotation, dbSNP annotation, deleteriousness prediction scores for nonsynonymous variants, ClinVar variant annotation and genome-wide association study (GWAS) variant annotation. Moreover, the reference genome we chose to annotate is hg38.

Potential pathogenic variants

We filtered the variants provided by wANNOVAR based on two criteria, CADD score and variant category. Combined Annotation Dependent Depletion (CADD) is used to score the deleteriousness of SNVs as well as indel variants in the human genome (Kircher, Witten et al. 2014). It can correlate with both coding and non-coding variants, and quantitatively measure the causal variants in diseases. wANNOVAR sorted all the variants with CADD and gave each variant a CADD phred socre. We set a cutoff of 30. If the CADD phred is 30, the variant is in the top 0.1% of deleterious variants. The higher CADD phred score is, the more deleterious the variant is.  Moreover, we filtered all the variants and kept those that are nonsynonymous, loss or gain of stop codon, frameshift. Mutations of DNA can be categorized as frameshift, insertion, deletion, loss of stop codon, gain of stop codon, nonsynonymous and synonymous mutations. Since there is no amino acid change in a synonymous mutation, it maybe bring less deleterious effect on the protein function. However, the switch, gain and loss of amino acid sequences from the rest mutations may cause severe functional disturbance of protein. To decrease the workload of variants analysis and increase the effectiveness of finding potential pathogenic variants, we filtered variants and got the attached excel sheet. And here is the summary table of the 199 candidate pathogenic variants we have found.

Stopgain

Nonsynonymous Total
Number of variants 111 88

199

Table 1: A functional summary of potential pathogenic exonic variants. Total variants from wANNOVAR were filtered with CADD phred score higher than and equal to 30, and stopgain and nonsynomylous function by Microsoft Excel.

Individual Variant Analysis

After filtering variants , each member in the group picked one variant for further analysis as shown in Table 2 below:

Group Member Location Gene Reference Variant
Meixue Duan chr8:19962213 LPL C G
Bowen Yang chr1: 46815187 CYP4B1 G A
Ruoyu Tian chr12: 111803962 ALDH2 G A
Michael Finlayson chr11: 108244022 ATM G A
Brandon Smith Chr1: 223111858 TLR5 C T

Table 2: Individual Picked Variants

Variant 1: LPL

dbSNP: rs328

Clinvar ID: RCV000001598.1

This variant is located in gene Lipoprotein Lipase (LPL), which is expressed in heart, muscle, and adipose tissue. LPL functions as a homodimer, and has the dual functions of triglyceride hydrolase and ligand/bridging factor for receptor-mediated lipoprotein uptake. Severe mutations that cause LPL deficiency result in type I hyperlipoproteinemia, while less extreme mutations in LPL are linked to many disorders of lipoprotein metabolism. [provided by RefSeq, Jul 2008] Variant rs328, which is a heterozygous nucleotide change: a C to G transversion at position of 1421, allele change from TCA to TGA, residue change from S[Ser] to Ter[*][OPA], resulting in changing the codon for Ser to a stop codon (See Figure 3). The Swiss Model of LPL is shown in Figure 2, the amino acid in green circle is 8 amino acids far away from rs328.

The gene visualization of LPL with SNP rs328 is shown in Figure 4. Ariza et.al[4] developed a trial to research the influence of several genetic variants in genes related with triglyceride(TG) metabolism, they used 1825 Spanish subjects (80% men, mean age 36 years) to be genotyped for the LPL-HindIII(rs320), S447X (rs328), D9N (rs1801177) and N291S(rs268) polymorphisms. Finally, they reported that rs328 had a significant lowering effect of the LPL. Because the nonsense polymorphism S447X has been associated with a gain of activity because of the premature truncation of the enzyme [2]. Consequently, it has also been related with lower fasting TG levels [1] and, very recently, with a favourable influence on the longitudinal changes of these levels [3].

Figure 2. Swiss Model of LPL Structure. The amino acid in green circle is the nearest amino acid to variant Ser447.

Tang at. el[3] developed a trial to research the relationship of SNPs in the lipoprotein lipase with plasma levels of high-density lipoprotein cholesterol (HDL-C) and triglycerides. They used data from 2045 African Americans and 2116 European Americans in the Coronary Artery Risk Development in Young Adults study. Finally, they found that consistent with the overall pattern of associations across individual examinations, rs326, rs328, and rs13702 were significantly associated with triglycerides in both groups and with HDL-C in European Americans; the G, G, and C alleles in rs326, rs328, and rs13702, respectively, were associated with lower levels of triglycerides, which is consistent with previous report, and higher levels of HDL-C. The association between rs328 and HDL-C in African Americans was in the same direction as that in European Americans, but the magnitude of the association was weaker and did not reach statistical significance (P0.21)

Figure 3. Visualization of rs328

Figure 4. GeneView via analysis of contig annotation: LPL (lipoprotein lipase, with SNP)

Diseases associated with LPL include Lipoprotein Lipase Deficiency and Hyperlipidemia, Family Combined. And rs328 in LPL is mostly related to the level of triglycerides and Among its related pathways are PPAR signaling pathway and high-density lipoprotein cholesterol. Moreover, Gene Ontology annotations related to this gene include receptor binding and carboxylic ester hydrolase activity.

Reference

1. Corella D, Guillen M, Saiz C, Portoles O, Sabater A, Folch J, Ordovas JM: Associations of LPL and APOC3 gene polymorphisms on plasma lipids in a Mediterranean population: interaction with tobacco smoking and the APOE locus. J Lipid Res. 2002, 43 (3): 416-27.

2.Rip J, Nierman MC, Ross CJ, Jukema JW, Hayden MR, Kastelein JJ, Stroes ES, Kuivenhoven JA: Lipoprotein lipase S447X: a naturally occurring gain-of-function mutation. Arterioscler Thromb Vasc Biol. 2006, 26 (6): 1236-45. 10.1161/01.ATV.0000219283.10832.43.

3.Tang W, Apostol G, Schreiner PJ, Jacobs DR, Boerwinkle E, Fornage M: Associations of Lipoprotein Lipase Gene Polymorphisms with Longitudinal Plasma Lipid Trends in Young Adults: the CARDIA Study. Circ Cardiovasc Genet. 2010.

4.María-José Ariza, Miguel-Ángel Sánchez-Chaparro, Francisco-Javier Barón, Ana-María Hornos, Eva Calvo-Bonacho, José Rioja, Pedro Valdivielso, José-Antonio Gelpi and Pedro González-Santos: Additive effects of LPL, APOA5 and APOEvariant combinations on triglyceride levels and hypertriglyceridemia: results of the ICARIA genetic sub-study. BMC Medical Genetics. 2010,11:66.

Variant 2: CYP4B1

Variant rs2297810  is found in CYP4B1 gene at coordinates chr1: 46815187. This gene encodes a member of the cytochrome P450 superfamily of enzymes. The cytochrome P450 proteins are monooxygenases which catalyze many reactions involved in drug metabolism and synthesis of cholesterol, steroids and other lipids. This protein localizes to the endoplasmic reticulum. In rodents, the homologous protein has been shown to metabolize certain carcinogens; however, the specific function of the human protein has not been determined (Yokotani N, et al. 1990). Multiple transcript variants have been found for this gene.

The variant is supported by dbSNP web. The allele changes from ATG to ATA and residue changes from M[Met] to I [Ile]. Although there’s no specific disease shows relationship with CYP4B1 gene, but as an important gene in metabolism system which is related to hereditary disease from the test showed in Genetic Testing Registry (GTR). They tested by wet experiment which take blood sample to prepare of DNA for deletion/duplication test. Also, they did Next Generation Sequencing of the entire coding region.  In liver microsomes, this enzyme is involved in an NADPH-dependent electron transport pathway. It oxidizes a variety of structurally unrelated compounds, including steroids, fatty acids, and xenobiotics.  This gene is also conserved in many species. (See Figure.5  below)

Figure 5. rs2297810 figure from variation viewer

Figure 6. HomoloGene:128045 conserved in Euteleostomi

Thum and Borlak (2000) investigated the gene expression of major human cytochrome P450 genes in various regions of explanted hearts from 6 patients with dilated cardiomyopathy and 1 with transposition of the arterial trunk and 2 samples of normal heart. mRNA for cytochrome 4B1 was predominantly expressed in the right ventricle. A strong correlation between tissue-specific gene expression and enzyme activity was found. Thum and Borlak (2000) concluded that their findings showed that expression of genes for cytochrome P450 monooxgenases and verapamil metabolism are found predominantly in the right side of the heart, and suggested that this observation may explain the lack of efficacy of certain cardioselective drugs.

Imaoka et al. (2000) developed a microassay for CYP4B1 mRNA by performing RT-PCR. Using this method, they assayed CYP4B1 mRNA levels in transurethral resection samples from the bladders of patients with bladder tumors and compared the levels with those from nonbladder tumor patients and from nontumor sections from the patients with bladder tumors. The bladder tumor patients had a significantly high expression of CYP4B1 in both tumor and normal tissue from their bladder than the nonbladder tumor patients.

Reference

1.Yokotani N, Sogawa K, Matsubara S, et al. (1990). “cDNA cloning of cytochrome P-450 related to P-450p-2 from the cDNA library of human placenta. Gene structure and expression.”. Eur. J. Biochem. 187 (1): 23–9. doi:10.1111/j.1432-1033.1990.tb15273.x. PMID 2298205.

2.Nhamburo, P. T., Gonzalez, F. J., McBride, O. W., Gelboin, H. V., Kimura, S. Identification of a new P450 expressed in human lung: complete cDNA sequence, cDNA-directed expression, and chromosome mapping. Biochemistry 28: 8060-8066, 1989.

3.Thum, T., Borlak, J. Gene expression in distinct regions of the heart. Lancet 355: 979-983, 2000. S

4.Imaoka, S., Yoneda, Y., Sugimoto, T., Hiroi, T., Yamamoto, K., Nakatani, T., Funae, Y.CYP4B1 is a possible risk factor for bladder cancer in humans. Biochem. Biophys. Res. Commun. 277: 776-780, 2000.

Variant 3: rs671 (ALDH2 E504K)

The rs671 SNP is located on the chromosome 12, position 111803962 and it is on aldehyde dehydrogenase 2 family (ALDH2).  There are two alleles, G and A. G is the reference allele and A is the alternative allele. Based on vcf file, the genotype of the individual we chose is heterozygous: A-G. And the point mutation from G to A causes glutamate at 504 position switch to lysine.

Aldehyde dehydrogenase is the second enzyme of the major oxidative pathway of alcohol metabolism and it has two isoforms, cytosolic and mitochondrial. Most Caucasians have two isoforms, while about 50% of East Asians are lack of active mitochondrial isoform. The rs671 is belonging to ALDH2, encoding the mitochondrial isoform. According to 1000 genome database, A allele frequency is 0.036 in all population, 0.0015 in African population, 0.0029 in American population and 0.17 in East Asian population (figure 7). So the alternative allele is highly accumulated in East Asian than others (figure 8).        

rs671 has been well documented as “alcohol flush” or “Asian flush ” SNP. The A allele of rs671 causes glutamate to lysine transition and the protein product of this variant is a defective form of aldehyde dehydrogenase, which cannot metabolize alcohol (figure 9). The defective ALDH2 E504K protein crystal structure has not been reported yet. I used SWISS-MODEL to build a 3D structure of ALDH2 and ALDH2 E504K, shown in figure 10. There is a little structural difference where I circled in figure 4A and 4C. The inactivation of ALDH2 may be due to the disruption of catalytic domain of ALDH2. The inactive ALDH2 can be activate by a small molecule Alda-1, even though not targeting 504LYS (figure 11) (Chen, Ferreira et al. 2014).

Apart from alcoholism, A allele on rs671 is also a genetic risk allele for esophageal cancer. The CADD phred score is 35 and Polyphen2 HDIV is 1, with a phred score “D” (damage). CADD score (Kircher, Witten et al. 2014) measures the deleteriousness of a variant and the Polyphen2 (Adzhubei, Schmidt et al. 2010) is to predict the damage of missense mutation. CADD and Polyphen scores correlate with each other, which indicates the mutation is deleterious. rs671 has also been studied in Genome Wide Association Study (GWAS) in the perspective of esophageal cancer. The p-value is 3E-24, which indicates a significant correlation between the variant with esophageal cancer. Moreover, the odd ratio is 1.67. People with the A allele have 1.67-fold higher probability to get esophageal cancer than people with normal G allele. Alcohol drinkers with A allele show higher risk of getting esophageal cancer than nondrinkers or drinkers with G/G genotype. And the risk is also alcohol dose-dependent: there is positive relationship between alcohol consumption and the risk of esophageal cancer in individual carrying A allele (Ding, Li et al. 2010). And a significant gene-environmental interaction between rs671 polymorphism and alcohol consumption has also been verified (Matsuo, Hamajima et al. 2001). Some other studies have also revealed the association between rs671 and myocardial infarction (Han, Wang et al. 2013) and gastric cancer (Hidaka, Sasazuki et al. 2015). All in all, this variant is likely to be pathogenic.

Figure 7: 1000 genome project phase 3 rs671 allele frequency among population. rs671 has two alleles, A and G. ALL: all population; AFR: African; AMR: American; EAS: East Asian; EUR: European; SAS: South Asian.

Figure 8: Geographical genotype frequency distribution of rs671 (from Wikipedia).

Figure 9: Alcohol metabolism pathway in normal and ALDH2 deficient liver(Chen, Ferreira et al. 2014). After alcohol consumption, alcohol enters into liver and alcohol dehydrogenase (ADH) catalyzes alcohol to acetaldehyde, which is a reversible reaction. And then acetaldehyde is catalyzed by ALDH2 to acetate, which is rate-limiting step. In ALDH2*2 (ALDH2 deficient liver), there is accumulation of acetaldehyde.

Figure 10: The 3D protein structure of ALDH2 and ALDH2 E504K. A, the normal structure of ALDH2; B, the zoomed in structure at position 504 with glutamate (grey); C, the structure of the mutated protein, ALDH2 E504K; D, the zoomed in structure at position 504 with lysine (grey). The black circles point out the structural difference between normal and mutated ALDH2. 3D images are generated by SWISS-MODEL (Biasini, Bienert et al. 2014).

Figure 11: Crystal structure of Alda-1 with mutated ALDH2 (Chen, Ferreira et al. 2014). Alda-1 activates the mutated ALDH2 enzyme activity by binding to the catalytic domain.

Reference

1.Adzhubei, I. A., S. Schmidt, L. Peshkin, V. E. Ramensky, A. Gerasimova, P. Bork, A. S. Kondrashov and S. R. Sunyaev (2010). “A method and server for predicting damaging missense mutations.” Nat Methods 7(4): 248-249.

2.Biasini, M., S. Bienert, A. Waterhouse, K. Arnold, G. Studer, T. Schmidt, F. Kiefer, T. Gallo Cassarino, M. Bertoni, L. Bordoli and T. Schwede (2014). “SWISS-MODEL: modelling protein tertiary and quaternary structure using evolutionary information.” Nucleic Acids Res 42(Web Server issue): W252-258.

3.Chen, C. H., J. C. Ferreira, E. R. Gross and D. Mochly-Rosen (2014). “Targeting aldehyde dehydrogenase 2: new therapeutic opportunities.” Physiol Rev 94(1): 1-34.

4.Ding, J. H., S. P. Li, H. X. Cao, J. Z. Wu, C. M. Gao, Y. T. Liu, J. N. Zhou, J. Chang and G. H. Yao (2010). “Alcohol dehydrogenase-2 and aldehyde dehydrogenase-2 genotypes, alcohol drinking and the risk for esophageal cancer in a Chinese population.” J Hum Genet 55(2): 97-102.

5.Han, H., H. Wang, Z. Yin, H. Jiang, M. Fang and J. Han (2013). “Association of genetic polymorphisms in ADH and ALDH2 with risk of coronary artery disease and myocardial infarction: a meta-analysis.” Gene 526(2): 134-141.

6.Hidaka, A., S. Sasazuki, K. Matsuo, H. Ito, N. Sawada, T. Shimazu, T. Yamaji, M. Iwasaki, M. Inoue, S. Tsugane and J. S. Group (2015). “Genetic polymorphisms of ADH1B, ADH1C and ALDH2, alcohol consumption, and the risk of gastric cancer: the Japan Public Health Center-based prospective study.” Carcinogenesis 36(2): 223-231.

7.Kircher, M., D. M. Witten, P. Jain, B. J. O’Roak, G. M. Cooper and J. Shendure (2014). “A general framework for estimating the relative pathogenicity of human genetic variants.” Nat Genet 46(3): 310-315.

8.Matsuo K, Hamajima N, Shinoda M, et al. Gene–environment interaction between an aldehyde dehydrogenase-2 (ALDH2) polymorphism and alcohol consumption for the risk of esophageal cancer[J]. Carcinogenesis, 2001, 22(6): 913-916.

Variant 4: ATM

dbSNP: rs79075295

This change is a non-synonymous SNV changing G to A at position 566 in exon 6 of the gene ATM serine/threonine kinase. It results in an amino acid substitution from arginine to lysine at residue 189 in the gene’s known protein product. Analysis with ANNOVAR produced a CADD pred score of 33 for this variant indicating a high chance that this variant is pathogenic. The read depth for this variant in the source data is 22 and the phred-scaled quality score of the base call indicating this SNP is 52.

Prevalence

This variant uncommon across all groups in the 1000 genomes data, although it is relatively common in Ad Mixed American, East Asian, and European groups (See Table 4 below).

1000G_ALL 1000G_AFR 1000G_AMR 1000G_EAS 1000G_EUR 1000G_SAS
0.16 0.079 0.21 0.3 0.25 0.01

Table 4: ATM Variant among 1000 genome data

ExAC reports this variant to be less common with prevalence at ~10% or below for all groups (See Table 5 below).

ExAC_Freq ExAC_AFR ExAC_AMR ExAC_EAS ExAC_FIN ExAC_NFE ExAC_OTH ExAC_SAS
0.0479 0.0865 0.0106 0.022 0.1078 0.0581 0.0879 0.0015

Table 5: ATM Variant prevalence among all groups

Conservation

The GERP++ score reported by ANNOVAR was 5.59 for this variant indicating that residue 189 is highly conserved and supporting the notion that it is pathogenic. This was further supported by multiple sequence alignment using the UCSC Genome browser. These results showed that this residue is conserved across vertebrates with no detected substitutions, and the region around the residue is also conserved.

Figure 12. Multiple sequence alignment of reference ATM allele showing conservation of residue 189

Splice Variants

ATM has 25 splice variants documented in Ensemble version 87, although only one of them has reliable evidence. Of These 25 splice variants this SNP is present in 4, including the most reliably supported isoform (Yates et al 2002).

Figure 13. Splice variants of human ATM

Structure

There are no published structures of this protein in either it’s common form or this variant, so comparative modeling of both forms was performed using Geno3D (Combet et al 2002). The results obtained show apparently identical structures. While these results indicate that the conversion of Arginine to Lysine at amino acid 189 has no effect on protein structure, due to the unreliable nature of protein structural prediction this possibility cannot be ruled out.

Figure 14. ATM protein structures modeled by Geno3D, reference sequence left and variant sequence right

Function and Pathogenesis

The ATM protein product is a 370-kDa-protein member of the phosphatidylinositol 3-kinase family of Ser/Thr protein kinases (Savitsky et al 1995). It is a major factor in the cellular response to DNA double strand breaks. ATM activity is thought to involve the phosphorylation of p53, Chk2, and Nbs1 which leads to the arrest of the cell cycle while DNA repair takes place. ATM is also thought to phosphorylate the histone H2AX at serine 139 which leads to the recruitment of repair factors to the site of breakage and modifications of chromatin state that facilitate DNA repair. Loss of function in ATM leads to impaired response to DNA damaging stimuli such as ionizing radiation (Burma et al 2001).

This SNP is found the SMART domain of ATM which involves telomere-length maintenance and DNA damage repair (Yates et al 2015). Disruption in this domain could lead to loss of function or impaired function of this ATM variant. Modeling with SNPs3D resulted in a svm profile score or -0.22 indicating a likelihood of functional impairment in this variant.

The condition associated with loss of ATM function is Ataxia-Telangiectasia (AT). It is associated with a loss of motor control (ataxia), visible threadlike veins (telangiectasia), immunodeficiency, and a predisposition toward cancer (Jaspers et al 1988). AT affected individuals are sensitive to ionizing radiation and respiratory infection, and typically experience progressive neurologic degeneration. In addition to the effects of impaired DNA damage repair. AT is associated with other symptom producing effects unrelated to DNA repair such as increased intracellular recombination and problems with axonal transport and central nervous system vesicle trafficking (Brown et al 1999, OMIM 2016).

Reference

1.Yates A., Akanni W., Amode M. R., Barrell D., Billis K., Carvalho-Silva D., Cummins C., Clapham P., Fitzgerald S., Gil L., Girón C. G., Gordon L., Hourlier T., Hunt S. E., Janacek S. H., Johnson N., Juettemann T., Keenan S., Lavidas I., Martin F. J., Maurel T., McLaren W., Murphy D. N., Nag R., Nuhn M., Parker A., Patricio M., Pignatelli M., Rahtz M., Riat H. S., Sheppard D., Taylor K., Thormann A., Vullo A., Wilder S. P., Zadissa A., Birney E., Harrow J., Muffato M., Perry E., Ruffier M., Spudich G., Trevanion S. J., Cunningham F., Aken B. L., Zerbino D. R., Flicek P (2015). Ensembl 2016. Nucleic Acids Research. 44(D1): D710-6. doi:10.1093/nar/gkv1157

2.Combet C., Jambon M., Deléage G., Geourjon C. Geno3D: automatic comparative molecular modelling of protein (2002). Bioinformatics 18(1): 213-214. doi: 10.1093/bioinformatics/18.1.213

3.Carranza D., Vega A. K., Torres-Rusillo S., Montero E., Martinez L. J., Santamaría M., Santos J. L., Molina I. J. (2016). Molecular and Functional Characterization of a Cohort of Spanish Patients with Ataxia-Telangiectasia. Neuromolecular Medicine. doi:10.1007/s12017-016-8440-8

4.Burma S., Chen B. P, Murphy M., Kurimasa A., David J. Chen D. J (2001). ATM Phosphorylates Histone H2AX in Response to DNA Double-strand Breaks. The Journal of Biological Chemistry 276(45): 42462-42467. doi:10.1074/jbc.C100466200

5.Savitsky K., Bar-Shira A., Gilad S., Rotman G., Ziv Y., Vanagaite L, Tagle D. A., Smith S., Uziel T., Sfez S., et al (1995). A single ataxia telangiectasia gene with a product similar to PI-3 kinase. Science 268(5218): 1749-1753. doi:10.1126/science.7792600

6.Jaspers N. G. J., Gatti R. A., Baan C., Linssen P. C. M. L., Bootsma D (1988). Genetic complementation analysis of ataxia telangiectasia and Nijmegen breakage syndrome: a survey of 50 patients. Cytogenet. Cell Genet. 49: 259-263.

7.Brown, K. D., Barlow, C., Wynshaw-Boris, A (1999). Multiple ATM-dependent pathways: an explanation for pleiotropy. (Editorial) Am. J. Hum. Genet. 64: 46-50.

8.Online Mendelian Inheritance in Man, OMIM®. McKusick-Nathans Institute of Genetic Medicine, Johns Hopkins University (Baltimore, MD), 2016. World Wide Web URL: http://omim.org/

Variant 5: TLR5 (R392X)

dbSNP: rs5744168

Introduction

TLR5 (Toll-like receptor 5) is part of a family of receptors that are critical to the function of the innate immune system. These Toll-like receptors serve to differentiate between what is considered self and non-self by binding to molecular patterns associated with microbial infection (Beutler, B.A. 2009). Stimulation of Toll-like receptors initiates a signaling cascade that leads to the production of cytokines, chemokines and growth factors which help to initiate an inflammatory response and to direct how the immune system should respond to the invader. Each member of the TLR family recognizes a distinct pathogen-associated molecular pattern. So far the only known group of agonists that stimulate TLR5 are flagellins, which are essential protein components of flagella (Parker et al 2007).

Figure 15. Dimerization of TLR5 complexed to flagellin (Yoon et al 2012).

TLR5, like other TLR’s, works when binding to flagellin brings the C-terminals into juxtaposition so that the intracellular regions can intitiate signaling cascades that upregulate the proinflammatory transcription factor NF-κB (Yoon et al 2012). In this way, epithelial cells can provide information to the immune system on the relative presence of flagella-containing bacteria.

Variant

The variant rs5744168 is a nonsense mutation from C to T at position 1174 of exon 6. This leads to the introduction of a premature stop codon instead of an arginine at position 392 of the TLR5 protein by changing the codon CGA to TGA. The raw CADD score is 7.367 and the scaled CADD phred score is 39 meaning there is a significantly high chance that this variant is pathogenic. This score makes sense since this 858 amino acid long protein would effectively be missing over half of its amino acids. Thus, we would expect the final protein product to be nonfunctional. From figure 2, we see that this variant of TLR5 has lost the cytoplasmic part of the protein which would render it incapable of triggering a signaling cascade in response to ligand binding. It would also lose its transmembrane region meaning it should have no way to anchor itself into the cell membrane.

Figure 16. Protein feature view of TLR5 from RCSB Protein Data Bank.

This variant seems to be most prevalent in the South Asian population with an allele frequency of around 10% which is significantly higher than the total allele frequency of around 5.5% (Table 6).

Table 6. Population frequencies of TLR5 variant rs5744168 from ExAc Browser (http://exac.broadinstitute.org/variant/1-223285200-G-A).

Pathogenicity

Since this SNP causes the TLR5 protein to become nonfunctional, we would expect it to be correlated with conditions related to infection with motile microbes that contain flagella. Indeed, this SNP has been associated with increased risk of pneumonia, specifically Legionnaires’ Disease. In lung epithelial cells, expression of the cytokine IL-8 is predominantly upregulated through TLR5 stimulation in the case of L. pneumophila infection. In fact, this allele was shown to have a dominant effect since heterozygotes also showed higher susceptibility to infection due to expression of IL-8 being too low to generate an appropriate immune response (Hawn et al 2003).

A case study of patients from North India showed a positive correlation between this allele and ulcerative colitis. They found significant difference in allele frequencies between control group and patients with control group having 1.7% and patients having 4.4%. This correlation may be due to disruption of cytokine homeostasis due to cells lacking the ability to properly express cytokines such as TNFα, IL-6, IFNγ (Meena et al 2015). In fact, a one study showed that mRNA levels of TNFα were lower in flagellin stimulated homozygous mutant R392X cells than in wild-type cells (Klimosch et al 2013).

Another study found a correlation between this variant and recurrent urinary tract infections (rUTI’s) in women. Whereas control women had heterozygote and homozygote genotype frequencies of 0.068 and 0.008, respectively, women with rUTI’s had frequencies of 0.122 and 0.003. These differences were found to be statistically significant when comparing genotype frequencies with a dominant model (Hawn et al 2009).

Reference

1.Beutler, B. A. (2009). TLRs and innate immunity. Blood, 113(7), 1399–1407. http://doi.org/10.1182/blood-2008-07-019307

2.Parker, L. C., Prince, L. R., & Sabroe, I. (2007). Translational Mini-Review Series on Toll-like Receptors: Networks regulated by Toll-like receptors mediate innate and adaptive immunity. Clinical and Experimental Immunology, 147(2), 199–207. http://doi.org/10.1111/j.1365-2249.2006.03203.x

3.Hawn, T. R., Verbon, A., Lettinga, K. D., Zhao, L. P., Li, S. S., Laws, R. J., … Aderem, A. (2003). A Common Dominant TLR5 Stop Codon Polymorphism Abolishes Flagellin Signaling and Is Associated with Susceptibility to Legionnaires’ Disease. The Journal of Experimental Medicine, 198(10), 1563–1572. http://doi.org/10.1084/jem.20031220

4.Meena, N. K., Ahuja, V., Meena, K., & Paul, J. (2015). Association of TLR5 Gene Polymorphisms in Ulcerative Colitis Patients of North India and Their Role in Cytokine Homeostasis. PLoS ONE, 10(3), e0120697. http://doi.org/10.1371/journal.pone.0120697

5.Klimosch SN, Forsti A, Eckert J, Knezevic J, Bevier M, von Schonfels W, et al. Functional TLR5 Genetic Variants Affect Human Colorectal Cancer Survival. Cancer Res. 2013; 73:7232–7242. doi: 10.1158/0008-5472.CAN-13-1746

6.Hawn, T. R., Scholes, D., Li, S. S., Wang, H., Yang, Y., Roberts, P. L., … Hooton, T. M. (2009). Toll-Like Receptor Polymorphisms and Susceptibility to Urinary Tract Infections in Adult Women. PLoS ONE, 4(6), e5990. http://doi.org/10.1371/journal.pone.0005990

7.Yoon, S., Kurnasov, O., Natarajan, V., Hong, M., Gudkov, A. V., Osterman, A. L., & Wilson, I. A. (2012). Structural basis of TLR5-flagellin recognition and signaling. Science (New York, N.Y.), 335(6070), 859–864. http://doi.org/10.1126/science.1215584

Exome Variant Analysis project, G4

Zhao Wang, Tian Jin, Cong Gao

Introduction:

For our analysis, we selected the sample individual HG00419 from the 1KG genomes project that sequenced 2709 individuals over 26 different populations.

Our sample HG0014 detail is shown below:

Sex: Female

Biosample ID: SAME122991

Population: Southern Han Chinese

Code: CHS

Super population: East Asian (EAS)

Pipeline:

  • Processed the BAM file through alignment and variant calling and converted into a VCF file.
  • Ran file through wANNOVAR that annotated functional consequences of the variant.
  • Filtered the data by only keeping:
  1. Exonic region
  2. Nonsynonymous mutation
  3. Pathogenetic SNP indicated by Clinvar
  4. Allele frequency occur in our 1kg genomic < .05
  5. Allele frequency occur in ExAC database <.05
  6. Cadd Phred score> 10
  • Each member then chose one variant from the list to analyze

Result:

Here’s is our list:

Gene Ref Alt dbSNP SIFT_score Otherinfo CADD_phred
NCF2 G A rs13306575 0.07 het 21.1
UGT1A1 G A rs4148323 0.07 hom 12.79
CLRN1 A G rs121908142 0 het 16.77
CD36 T C rs142186404 . het 29.4
PTS C T rs104894276 0.62 het 18.6
APOE G A rs190853081 0.21 het 16.69

 

Variant Analysis

Variant 1: CD36

1

The SNP that I’m analyzing occurs on chromosome 7 position 80669964 (cytogenetic location at 7q21.11). It is a nonsynonymous missense mutation that replaces a thymine(T) with a cytosine(C). This change replaced the amino acid phenylalanine (Phe) with a leucine (Leu) at the position of 254. This is a very rare mutation since there is no allele frequency recorded in the 1KG genomic and the ExAC database. However, it has a CADD_phred score of 29.4. Since it is greater than 20, it indicates that it is within 1% of the most deleterious substitution.

By putting the Fasta file into Swiss-model, I was able to generate 3D structure models as shown below.

2

Figure 1 Reference with Phe

3

Figure 2 Mutant with Leu

Even though there is no SIFT score to predict how damaging the substitution is to the protein structure. It can be observed from the structure that the position of the substitution is buried within the protein, which mean that it might affect the function of the protein a lot more than if it was located on the peripheral. Also, phenylalanine (Phe) is known to have a phenyl ring that adds to the stability of the structure. By replacing the amino acid with a Leucine (Leu), it’s possible that the protein lost some of its stability and the function due to the structural differences.

From the Clinvar database, I learned that this mutation is pathogenic and causes Platelet glycoprotein IV deficiency. The reason is because CD36 is a glycoprotein IV expressed in platelets, monocytes, erythroblasts, capillary endothelial cells, and mammary epithelial cells and this mutation make the protein non-functional. CD36 deficiency is also more commonly observed in East Asian Population which fits our sample subject’s description.

In 2002, Hanawa and colleagues studied 11 subjects with CD36 deficiency and one of those subjects have a homozygous mutation of Phe to Leu at the same position as our subject. However, our subject only has a heterozygous mutation and so she might not have shown symptoms of CD36 deficiency.

Reference:

Amino Acids. (n.d.). Retrieved December 04, 2016, from http://www.biology.arizona.edu/biochemistry/problem_sets/aa/aa.html

Hanawa, H. (2002). Identification of cryptic splice site, exon skipping, and novel point mutations in type I CD36 deficiency. Journal of Medical Genetics, 39(4), 286-291. doi:10.1136/jmg.39.4.286

NM_001001547.2(CD36):c.760TC (p.Phe254Leu) Simple – Variation Report – ClinVar – NCBI. (n.d.). Retrieved December 04, 2016, from https://www.ncbi.nlm.nih.gov/clinvar/variation/13540/

Swissmodel. (n.d.). Retrieved December 04, 2016, from https://swissmodel.expasy.org/interactive/vnmQF4/

 

Variant 2:PTS

chr11    6-pyruvoyltetrahydropterin synthase    PTS

PTS:NM_000317:exon5:c.C259T:p.P87S      RCV000000509.1

6-pyruvoyl-tetrahydropterin_synthase_deficiency

Ethnicity Origin Affected Individuals Families Chromosomes tested Number Tested Family history Method
not provided germline not provided not provided not provided not provided not provided not provided literature only

There is only one paper which is published in 1998 describes this variant. No further research or review about this variant has been conducted.

Hyperphenylalaninemia is a medical condition characterized by mildly or strongly elevated concentrations of the amino acid phenylalanine in the blood. Hyperphenylalaninemia (HPA) may be caused by deficiency of phenylalanine hydroxylase ortetrahydrobiopterin (BH4), the essential cofactor for the aromatic amino acid hydroxylases. 6-Pyruvoyl-tetrahydropterin synthase (PTPS) deficiency is a major cause of BH4 deficient HPA. Seven single base mutations at nucleotides 73 (C>G), 155 (A>G), 166 (G>A), 209 (T>A), 259(C>T), 286 (G>A), and 317 (C>T) on PTPS cDNA were detected in Chinese PTPS-deficient HPAby polymerase chain reaction and solid phase DNA sequencing.

In enzymology, a 6-pyruvoyltetrahydropterin synthase (PTPS) (EC 4.2.3.12) is an enzyme that catalyzes the following chemical reaction:

7,8-Dihydroneopterin triphosphate <=>  6-pyruvoyltetrahydropterin + triphosphate

Hence, this enzyme has one substrate, 7,8-Dihydroneopterin triphosphate, and two products, 6-pyruvoyltetrahydropterin and triphosphate.

6-Pyruvoyltetrahydropterin synthase deficiency is an autosomal recessive disorder that causes malignant hyperphenylalaninemia due to tetrahydrobiopterin deficiency.[1] It belongs to the rare diseases. It is a recessive disorder that is accompanied by hyperphenylalaninemia. Commonly reported symptoms are initial truncal hypotonia, subsequent appendicular hypertonia, bradykinesia, cogwheel rigidity, generalized dystonia, and marked diurnal fluctuation. Other reported clinical features include difficulty in swallowing, oculogyric crises, somnolence, irritability, hyperthermia, and seizures. Chorea, athetosis, hypersalivation, rash with eczema, and sudden death have also been reported.

8

Figure| Structures from PDB and predicted by Swiss-model. up-left and up-right| overall structure of wildtype protein and mutated structure. down-left and down-right| structure of possible active site of the protein.

Based on the result from PDB and  Swiss-model, the overall structure of the protein just changed slightly. And the structure of the possible active site is not changed. And also the mutation happens on the back of the active site and has no direct interaction with the metal element(Ni2+) in the protein. So, the mutation may not severely cause protein disfunction. 

ref:

[mut] Liu, T. T., Hsiao, K. J., Sheng-Feng, L., Wu, S. J., Wu, K. F., Chiang, S. H., … & Wei-Min, Y. (1998). Mutation analysis of the 6-pyruvoyl-tetrahydropterin synthase gene in Chinese hyperphenylalaninemia caused by tetrahydrobiopterin synthesis deficiency. Human mutation, 11(1), 76.

[swiss] Biasini, M., Bienert, S., Waterhouse, A., Arnold, K., Studer, G., Schmidt, T., … & Schwede, T. (2014). SWISS-MODEL: modeling protein tertiary and quaternary structure using evolutionary information. Nucleic acids research, gku340.

 

Variant 3: CLRN1

chr3       clarin1(Also known as RP61; USH3; USH3A), Usher syndrome type-3 protein

CLRN1:NM_001195794:exon4:c.T488C:p.L163P    RCV000004646.1   Usher_syndrome\x2c_type_3

Ethnicity Origin Affected Individuals Families Chromosomes tested Number Tested Family history Method
not provided germline not provided not provided not provided not provided not provided not provided literature only


Ensembl:ENSG00000163646
 MIM:606397;Vega:OTTHUMG00000140368

Location:3q25.1

Mutation in different isoforms:

CLRN1: NM_052995:exon3:c.T221C:p.L74P,

CLRN1: NM_174878:exon3:c.T449C:p.L150P,

CLRN1: NM_001195794:exon4:c.T488C:p.L163P(The one in present research, NP_001182723.1, clarin-1 isoform d)SIFT score:1     Polyphen2_HVAR score:0    Prediction: pathogenicMutation type: heterozygous missense mutation  

USH(Usher Syndrome) is a genetically heterogeneous condition characterized by the association of retinitis pigmentosa with sensorineural deafness. Age at onset and differences in auditory and vestibular function distinguish Usher syndrome type 1 (USH1), Usher syndrome type 2 (USH2) and Usher syndrome type 3. USH3 Usher syndrome type III is an autosomal recessive disorder characterized by progressive sensorineural hearing loss, vestibular dysfunction, and retinitis pigmentosa symptoms, including nyctalopia, constriction of the visual fields, and loss of central visual acuity, usually by the second decade of life.

Usher syndrome type IIIA (USH3A) is caused by homozygous or compound heterozygous mutation in the CLRN1 gene (Phenotype MIM number 606397) on chromosome 3q25.  Seven monogenic mutations have been reported to be involved in USH3A in Europe and the United States and other two missense mutation is detected in patients with Retinitis Pigmentosa 61.

CLRN (Clarin 1) gene encodes a protein that contains a cytosolic N-terminus, multiple helical transmembrane domains, and an endoplasmic reticulum membrane retention signal, TKGH, in the C-terminus. The encoded protein may be important in development and homeostasis of the inner ear and retina. Mutations within this gene have been associated with Usher syndrome type IIIa. Multiple transcript variants encoding distinct isoforms have been identified for this gene. An important paralog of this gene is CLRN2. The mature form of the protein is predicted to contain three transmembrane domains and 204 residues(as Figure 1).

4

Figure 1.

In our present research, the mutation is located in clarin-1 isoform d which encoding the longest isoform of CLRN1(Figure 2). And this location is within the third transmembrane site, forming helical structure in protein. And this genetic mutation will affect correspondent protein structure (Figure 3).

5

Figure 2.

6

Figure 3A.  CLRN1 reference

7

Figure 3B. CLRN1 mutant protein

This mutation is to be pathogenic based on the Clinvar database and reported in two different literatures (same missense mutation in different isoforms). And from 3D view, the structure around mutant loci in the protein seem to be changed slightly. However, this mutation in present research probably has nothing to do with disease phenotype, meaning the sample person won’t show symptoms associated with Usher Syndrome type IIIA. That’s because the mutation we sorted is heterozygous and monogenic missense mutation which will not cause manifestations in person who harboring this type of mutation. Even if this person have got the relevant symptom, it would not be severe since there are still normal proteins, merely less, to function well in the human body.

Reference

[mut] Fields RR, Zhou G, Huang D, Davis JR, Möller C, Jacobson SG, Kimberling WJ, Sumegi J. (2002). Usher syndrome type III: revised genomic structure of the USH3 gene and identification of novel mutations. Am J Hum Genet, 71(3):607-17.

[mut] Herrera W, Aleman TS, Cideciyan AV, Roman AJ, Banin E, Ben-Yosef T, Gardner LM, Sumaroka A, Windsor EA, Schwartz SB, Stone EM, Liu XZ, Kimberling WJ, Jacobson SG. (2008). Retinal disease in Usher syndrome III caused by mutations in the clarin-1 gene. Invest Ophthalmol Vis Sci, 49(6):2651-60.

[swiss] Biasini, M., Bienert, S., Waterhouse, A., Arnold, K., Studer, G., Schmidt, T., … & Schwede, T. (2014). SWISS-MODEL: modelling protein tertiary and quaternary structure using evolutionary information. Nucleic acids research, gku340.

Exome Variant Analysis of a Mende Individual from Sierra Leone (Data taken from the 1000 Genomes Project)

Individual chosen: HG03058

Individual profile:

Sex: Female
Population: Mende
Population code: MSL
SuperPopulation: African
SuperPopulation code: AFR

The analytical pipeline we used is described below:

picture1We found a few pathogenic variants in this individual’s exome sequence – 6 of them are listed here.

Jessica Rowell

Gene: DUSP22,
Gene name (full): Dual specificity protein phosphatase 22
Chromosome: 6
Start position: 291,630
Stop position: 351,355
Variation: R119 ⇒ H119 (CGT ⇒ CAT)
rsID (if available): rs7768224

The individual under study has a potentially deleterious SNP at position 348195 of the forward strand in chromosome 6 (rs7768224), located within the DUSP22 gene.  This nonsynonymous, missense mutation comprises a nucleotide change from guanine (ancestral allele) to adenine.  It results in an amino acid change from arginine to histidine at position 119 (in an alpha helix) of the encoded protein, dual specificity protein phosphatase 22 (Q9NRW4).  Both arginine and histidine are highly basic amino acids that confer a positive charge; their similarities make the change less likely to disrupt the protein structure. However, histidine’s status as an electron donor changes within a small, physiologically relevant pH range. Because the same is not true for arginine, this amino acid change could change the function of the protein in some biological circumstances.

The DUSP22 gene belongs to the protein-tyrosine phosphatase family, and plays a role in activating the JNK signaling pathway, and dephosphorylates and deactivates p38 and JNK.  It is also known as JNK pathway-associated phosphatase, or JKAP1.  C-Jun N-terminal kinases (JNKs) comprise a family of stress-activated protein kinase enzymes that are involved in regulation of many different physiological processes.  JNKs appear to play opposing roles in the body: they are involved in cellular apoptosis as well as cell survival and proliferation, and they are believed to be expressed in every tissue type of the body2,3.  One study characterizing DUSP22 found that it suppresses FAK phosphorylation, which is a key regulator of cell motility in normal and tumor cells1.  Research indicates that JNKs are differentially regulated in cancer tissues, though not in a consistent direction.  The GeneCards entry for DUSP22 lists an association with anaplastic large cell lymphoma; however, there is no Clinvar entry for the region around rs7768224 in particular.  Some lines of evidence suggest that different JNKs play roles in different cancer types; JNK2 may act as a tumor suppressor in lymphomas.  One study found increased expression of p38, as well as an increase in expression of JNK1 (but not JNK2), in breast cancer tissue compared with normal breast tissue in 14 human patients.  Although expression was increased, phosphorylation of c-Jun was actually decreased3.  There is not much information to date regarding the function of DUSP22; however, given its role in deactivating p38 and activating the JNK signaling pathway one might expect disruption of its function to be related to cancer risk.  Given the still poorly-defined relationship between JNKs and cancer risk, it is difficult to say whether a mutation in DUSP22, even if it affected the JNK signaling pathway, might increase or decrease cancer risk (and it might increase risk of one while decreasing risk of another).

The overall prevalence of the rs7768224 variant allele (A) in the 1,000 Genomes population is 5.8%, but it is much more prevalent in African populations than in either European or Asian populations (AFR: 19.4%, EUR: 0.7%, EAS: 0%, SAS: 0.61%, AMR: 3.03%).  The ExAC data and HapMap data both show similar trends in distribution of this variant allele.  Notably, in the HapMap data the prevalence of this allele among the sampled Yoruba in Ibadan, Nigeria, is 40%.  Given that our individual is from the West African country of Sierra Leone, the existence of this variant is less surprising; indeed, it is common in this geographic area.

The Ensembl entry for this variant lists a SIFT score of 0 (“deleterious”) and a PolyPhen score of 0.977 (“probably damaging”).  The PHRED-scaled CADD scores we calculated for rs7768224 are 26.2 (wANNOVAR) and 34 (CADD), both above the less-conservative score of 20 often used as a cutoff for potentially deleterious SNPs.  This individual is heterozygous for the variant allele, which may lessen the likelihood of a deleterious effect depending on the underlying genetic model (dominant, recessive, or additive).  Given the data on JNKs’ differential effects on tumor cells and the complexity of the JNK pathways, it appears considerably more research is required to determine whether the variant allele A of rs7768224 is increases one’s risk of anaplastic large cell lymphoma, or any other disease.

Protein structure visualization:

4woh

Figure 1: Highlighting position 119 in DUSP22 (PDB ID: 4WOH)

human_6348145348245_ensembl_marked

human_6348145348245_ensembl_2_marked

Figure 2: SNP location information from Ensembl

conservation_ucsc-genome-browser_marked

Figure 3: SNP conservation information from UCSC Genome Browser

Sources:

1. dbSNP
2. UniProtKB/SwissProt
3. Ensembl
4. Protein Data Bank (PDB)
5. GeneCards

  1. Li JP, Fu YN, Chen YR, Tan TH. JNK pathway-associated phosphatase dephosphorylates focal adhesion kinase and suppresses cell migration. The Journal of biological chemistry. 2010;285(8):5472-5478.
  2. Bode AM, Dong Z. The functional contrariety of JNK. Molecular carcinogenesis. 2007;46(8):591-598.
  3. Vlahopoulos S, Zoumpourlis VC. JNK: a key modulator of intracellular signaling. Biochemistry Biokhimiia. 2004;69(8):844-854.

Jacob Boswell

Gene: TEK
Gene name (full): Membrane-bound tyrosine kinase receptor
Chromosome: 9
Start position: 27,109,141 
Stop position: 27,230,178
Variation: T => C

Individual HG_03058 has a potentially deleterious single nucleotide polymorphism in the gene TEK, a membrane-bound tyrosine kinase receptor involved in signaling and vascular quiescence. Mutations in this gene often lead to improper signaling, which results in multiple small lesions within vascular and cutaneous tissues. This gene is located on human chromosome 9 from nucleotide position 27109141 to 27230178. Our observed SNP at position 27213582 occurs in a small 3-10 helical domain near the C-terminus of the protein, and was scored with a phred-adjusted CADD score of 22.6, meaning that it is potentially deleterious. In the reference genome, a thymine resides at the second position within the codon GTG, which codes for the non-polar amino acid valine. In our individual, instead of a thymine, we observe a cytosine, which changes the codon to GCG, and which codes for another non-polar amino acid alanine. The region of the SNP is a cytoplasmic domain involved in receiving the tyrosine signal, so significant conformational changes caused by a different amino acid may have severe consequences related to compromised signaling.

jb

Figure 4: Mapping from SNP to protein structure

In the above structure, we can see how two main domains are found in the TEK protein. The large arm on the right side of the structure is the extracellular domain, which changes in response to signals received by the arm on the left side of the structure. Our SNP occurs in the left side, that is, in the cytoplasm receptor domain, in a small 3-10 helix at the very end of the protein sequence. A 3-10 helix occurs when an acidic polar CO- group forms a bond with a downstream basic polar group NH+. In our case, the bond occurs between glutamate and lysine, which flank our SNP. Since the non-synonymous change resulted in a functionally and structurally related amino acid, we would expect that any influence on folding structure would be fairly limited, especially because this SNP occurs towards the very end of the protein, and therefore folds after most of the protein’s conformation has already been determined. Despite this intuitive reasoning, clinical data indicates that this specific SNP is indeed associated with multiple cutaneous and mucosal venous malformations, suggesting that the slight change in amino acid side-chain is enough to significantly affect the signaling function, and therefore to promote the development of cutaneous and vascular lesions.

Quinn Dickinson

EPAS1 is a gene found on chromosome 2 of the human genome spanning nucleotide 46,524,540 to nucleotide 46,611,800. Using CADD to generate a VCF, we found that our person’s genome has an SNP in EPAS1 at position 46,588,225 of the chromosome or position with a cysteine changed to an adenine. Using PDB, we found that this mutation occurs near the end of Exon 6 of the gene at position 201 (a.a. 67) of the exon and position 774 of the translated region of the protein. The PDB shows this position as a guanidine so the variant is either transfering from an aspartic acid (PDB reference) or a histidine (our reference) to a tyrosine.  In the case of histidine, this does not represent a large change in amino acid composition, both are polar molecules that contain a large aromatic structure. There is a change in the position of the charge and the atom that carries it, which may have some effect. In the change from aspartic acid, this represents a larger change as while they may both be polar acidic molecules, the space the aromatic ring takes up may effect the function of the protein. According to uniprot, this region is part of the PAS 2 domain which functions as a signal sensor. PAS domains are often found in proteins that are hypoxia inducible, such as EPAS1. Additionally this region is part of an alpha helix, ensuring the side chain will be on the outside of the protein.

EPAS1 is a transcription factor used for the induction of oxygen related genes, particularly hypoxia. It likely recruits coactivators such as CREBPB and EP300. It binds to the sequence 5-[AG]CGTG-3 within the hypoia response element of target genes such as Tie-2 and vascular endothelial growth factor.

EPAS 1 has been implicated in disorders such as erythrocytosis, an unusual increase in red bloods cells, along with paragangliomas. (Comino-Méndez et al., 2013)⁠ Additionally, mutations in EPAS1 have been linked to adaption to high altitude in Tibetans (Simonson et al., 2010)⁠ that has been linked to crossbreeding with ancient Denisovans. However, this person is of sub-Saharan ancestry so it seems highly unlikely that the gene would be linked to mating with Denisonvans, rather it seems more likely to be a result of a random mutation.

http://www.uniprot.org/uniprot/Q99814

http://www.rcsb.org/pdb/protein/Q99814

QD1.png

Figure 5: Amino acid mapped to protein structure using PDB

Screen Shot 2016-12-12 at 11.24.07 PM.png

Figure 6: Left to right: Model of wild type D (sequence identity = 98.24%; GMQE = 0.26), Model of wild type H (sequence identity = 97.95%; GMQE = 0.26), Model of wild type Y (sequence identity = 97.95%; GMQE = 0.26)

Sources:

Comino-Méndez, I., de Cubas, A. A., Bernal, C., Álvarez-Escolá, C., Sánchez-Malo, C., Ramírez-Tortosa, C. L., … Cascón, A. (2013). Tumoral EPAS1 (HIF2A) mutations explain sporadic pheochromocytoma and paraganglioma in the absence of erythrocytosis. Human Molecular Genetics, 22(11), 2169–76. https://doi.org/10.1093/hmg/ddt069

Simonson, T. S., Yang, Y., Huff, C. D., Yun, H., Qin, G., Witherspoon, D. J., … Ge, R. (2010). Genetic Evidence for High-Altitude Adaptation in Tibet. Science, 329(5987).

Camila Medrano Trochez

Gene: PAX5
Gene name (full): Paired box transcription factor
Chromosome: 9
Start position: 36,833,275
Stop position: 37,035,319
Variation: C => T

This gene encodes a member of the paired box (PAX) family of transcription factors. Paired box transcription factors are important regulators in early development, and alterations in the expression of their genes are thought to contribute to neoplastic transformation.

This gene encodes the B-cell lineage specific activator protein that is expressed at early B-cell differentiation. Its expression has also been detected in developing CNS and testis and so the encoded protein may also play a role in neural development and spermatogenesis.

This gene is located at chromosome 9p13, starting at 36,833,275 bp and ending at 37,035,319 bp.

The individual that we study has a mutation in this gene located at position 37,006,515 : a Guanine (G) change into an adenine (A).

The change of nucleic acid leads to a change of amino acid at position 106: a Cysteine (C) is replaced by a Threonine(T).

The mutated form has a shorted N-terminus in exon 3.

Mutations in PAX5 gene are associated to leukemia, acute lymphoblastic, 3 (ALL3): A subtype of acute leukemia, a cancer of the white blood cells. Acute lymphoblastic anemia is a malignant disease of bone marrow.

http://www.uniprot.org/uniprot/Q02548

http://www.rcsb.org/pdb/explore/explore.do?pdbId=1K78

cmt

Figure 7: Structure of PAX5

Krithika Ravindran Naidu

Gene: CHID1
Gene name: Chitinase domain-containing protein 1
Chromosome: 11
Start position: 867859
Stop position: 915058
Variation: G => A

CHID1 – Chitinase domain-containing protein 1 is  a gene found on chromosome 11 of the human genome, at the location p15.5, spanning nucleotide 867859 to 915058 . The genome has an  SNP in CHID1 at  position 870446 of the chromosome with a guanine changed to an adenine (SNP rs6682). Catalysis of the hydrolysis of (1->4)-beta linkages of N-acetyl-D-glucosamine (GlcNAc) polymers of chitin and chitodextrins is it’s molecular function. Other functions include saccharide- and LPS-binding protein with possible roles in pathogen sensing and endotoxin neutralization, ligand-binding specificity relates to the length of the oligosaccharides, with preference for chitotetraose (in vitro).  This gene also has several aliases, the most common of which is Stabilin-1 interacting chitinase-like protein, or SI-CLP, which indicates its known interaction with STAB1. CHID1 interacts with the endocytic/sorting receptor stabilin-1 (STAB1; 608560) and is present in late endosomes and secretory lysosomes in alternatively activated macrophages. Using protein pull-down assays, the interaction between STAB1 and SICLP  was confirmed and it was shown that the interaction occurred through fasciclin domain-7 of STAB1. RT-PCR analysis showed upregulated expression of CHID1 in human macrophages after stimulation with IL4 (147780) and/or dexamethasone and revealed that IFNG (147570) suppressed this effect.

The major disease associated with CHID1 gene in GWAS and other genetic association datasets from the GAD Gene-Disease Associations dataset is tobacco use disorder. There are totally 26 associations relating to disease with this gene of which pulmonary hypertension and cardiovascular system disease are related to the abuse of tobacco.
There are 4 increased expression associations —

Tachycardia myocardial tissue, Nephrolithiasis ,Leukemia, Adult T Cell Blood monocyte , Escherichia coli infection of the central nervous system(CNS)  

And there are 4 decreased expression associations —

Sepsis Splenocyte, Osteoarthritis Chondrocyte ,Androgen insensitivity syndrome Fibroblast ,Parkinson’s Disease Substantia Nigra  

http://www.uniprot.org/uniprot/Q9BWS9

http://www.rcsb.org/pdb/protein/Q9BWS9

kr1

Figure 8: Structure of CHID1 from PDB

kr2

Figure 9: The barrel structure of CHID1 found by x-ray crystallography

Shashwat Deepali Nagar

Gene: PIK3CA
Gene name (full): Phosphatidylinositol 4,5-bisphosphate 3-kinase catalytic subunit alpha isoform
Chromosome: Chromosome 3
Start position: 178,866,311
Stop position: 178,952,497
Variation: L829 => I829

Phosphoinositide-3-kinase (PI3K) phosphorylates PtdIns (Phosphatidylinositol), PtdIns4P (Phosphatidylinositol 4-phosphate) and PtdIns(4,5)P2 (Phosphatidylinositol 4,5-bisphosphate) to generate phosphatidylinositol 3,4,5-trisphosphate (PIP3). PIP3 plays a key role by recruiting PH domain-containing proteins to the membrane, including AKT1 and PDPK1, activating signaling cascades involved in cell growth, survival, proliferation, motility and morphology. Participates in cellular signaling in response to various growth factors. Involved in the activation of AKT1 upon stimulation by receptor tyrosine kinases ligands such as EGF, insulin, IGF1, VEGFA and PDGF. Involved in signaling via insulin-receptor substrate (IRS) proteins. Essential in endothelial cell migration during vascular development through VEGFA signaling, possibly by regulating RhoA activity.

PIK3CA has been implicated in various type of cancer. Mutated isoforms participate in cellular transformation and tumorigenesis induced by oncogenic receptor tyrosine kinases (RTKs) and HRAS/KRAS. Interaction with HRAS/KRAS is required for Ras-driven tumor formation. Mutations increasing the lipid kinase activity are required for oncogenic signaling. The protein kinase activity may not be required for tumorigenesis.

The PDB file for PIK3CA was obtained from the Protein Data Bank (ID number : 4YKN) and was visualized using Jmol.

sdn

Figure 10: Highlighting Position 829 in PIK3CA.

 

Exome SNV analysis of Utah Resident with European ancestry

By Mrunal Dehankar, Aditi Paranjpe, Kalyani Patankar, Rohini Mopuri

 

Exome profile

The exome analyzed in this report was retrieved from B-lymphocyte cell type of a resident of Utah, USA with ancestry from Northern and Western Europe. Exome of this individual can be found using SRA067010 in SRA database of NCBI. Family ID of this individual is 1447. Entire family of this individual was sequenced; this exome belongs to a male assigned maternal grandfather in the family. Individual is said to belong to Caucasian race, Utah/Mormon ethnicity.

Protocol

SRA file for the selected sample is downloaded from NCBI SRA and converted into paired end fastq files using SRA toolkit. GRCh38 reference genome and corresponding index files are downloaded from 1000Genome server. Paired end fastq files are aligned with human reference genome using BWA-mem. Samtools are used in order to index and convert sam file generated by BWA-mem into bam file. For identifying variants in the sample, samtools mpileup and bcftools are used which generated a VCF file which contains information about chromosome location, reference allele, alternative allele and statistical information about the variant. Variants are further annotated using ANNOVAR. Databases employed for ANNOVAR annotation are as follows: 1000g2015aug, RefGene, avSNP147, dbNSFP30a, ESP6500siv2_all, ExAC03, CADD13gt20.

Protein models for wild type and mutated genes were constructed using Swiss-Model server. Reliability of each selected model was deduced on the basis of its GMQE and QMEAN scores. Jmol was used to further inspect resulting structural variations in mutated protein models.

List of pathogenic/potentially pathogenic SNVs

Following is the list of rare SNVs from stop gain and non-synonymous categories obtained on filtering the ANNOVAR results. This list was obtained by applying filters to allelic frequencies of whole genomes of world population (1000G_ALL) <= 0.05 (to obtain rare and significant SNVs) and CADD_PHRED scores greater than 30 (top 0.1% percentile of deleterious variants in human genome).

table1Table 1: List of pathogenic/potentially pathogenic SNVs with respect to world population

Following is the list of rare SNVs obtained when 1000G_EUR allelic frequency was filtered (<=0.05). This was done so as to know how frequently a particular SNV occurs in European population as this individual has European ancestry. CADD_PHRED scores were filtered by allowing CADD_PHRED scores greater than 20 (top 1% percentile of deleterious variants in human genome).

table2Table 2: List of pathogenic/potentially pathogenic SNVs with respect to European population

This report performs an in-depth analysis of four rare non-synonymous SNVs (rs146385179, rs142676640, rs567959911, rs201293896). In addition, the report discusses intriguing SNVs of other categories such as loss and gain of stop codons and frameshift insertions and deletions observed in this exome (rs567834406, rs79220013, rs35782494, rs754029158).

 

  • SNV-1

table3

Table 3: Summary of SNV in METTL17 gene

Genome Position : Chromosome 14 : 1463040

Gene : METTL17

METTL17 gene codes for a protein called  mitochondrial Methyltransferase-like protein 17. The protein belongs to the methyltransferase superfamily. The protein is predicted to be a component of the mitochondrial small ribosomal subunit.

Variation Type:  Non-synonymous Single Nucleotide Variation.

It results in a missense SNP, thus altering methyltransferase-like protein 17, mitochondrial isoform 1 (NP_001025162.1) and methyltransferase-like protein 17, mitochondrial isoform 2 (NP_073571.1) proteins. Due to this SNV, arginine is replaced with cysteine in the protein sequence at position 286 (replacement position in codon: 1) in both the proteins. Arginine is a positively charged polar hydrophilic amino acid, while cysteine is a non-polar hydrophilic amino acid; thus possibly resulting in a minor protein structural alteration.

When compared to the ALL variants dataset of 1000Genomes Project released in Aug 2015 (1000g2015aug_all), allelic frequency of this SNV is 0.0002; but allelic frequency of EUR variants dataset (1000g2015aug_eur) is not yet available. This suggests that not only is this SNV rare when it comes to whole genomes of different ethnic populations, but also that this SNV is not yet seen in the European population. Besides, this particular SNV is not yet observed in American and East Asian populations, but seen in African populations with allelic frequency (1000g2015aug_afr) as low as 0.0008, making it extremely rare. Observation of this SNV in African populations opens up a wide scale of questions given that this individual is a resident of Utah (USA) and has European ancestry, but no observance of this SNV in American and European populations as of now. The whole-exome allelic frequency (ExAC_ALL) of this SNV is 0.0002, and the ESP6500 allele frequency is 0.0007, adding to its rarity.

SIFT, PolyPhen, MutationAssessor and MutationTaster scores predict deleterious effect of this SNV; however according to FATHMM predictions, this SNV might be tolerable, and neutral as predicted by LRT. CADD PHRED score can be considered to refute these conflicting assessments provided by FATHMM and LRT. Its CADD PHRED score is as high as 34, meaning it is included in the top 0.1% percentile of deleterious variants in human genome. These conflicting predictions leave ample scope for testing pathogenecity of this SNV.

Its global MAF is T=0.0002/1, meaning minor allele is ‘T’ and has a frequency of 0.02% in the 1000Genomes phase 1 population and that ‘T’ is observed only once in the sample population of 1088 people (2176 chromosomes).

 

Mapping SNV on 3D structure

selection_049

 (a)                                                                     (b)

Figure 1: (a)Wild type, (b)Mutant. Structural alterations highlighted in green, marked by oval.

 

Figure 1 represents 3D models of methyltransferase-like protein 17, mitochondrial isoform 1. Figure 1a shows structure with original amino acid arginine at position 286, while in figure 1b arginine is substituted with cysteine. Significant structural changes are observed due to this SNV. Originally a unit of turn, position 286 becomes a part of the succeeding helix due to substitution of arginine with cysteine.

 

Related information:

Clinical significance of this SNV is uncertain (last reviewed: Jan 20, 2016). The gene is investigated in 22 patients with a combined enzymatic deficiency of primarily the OXPHOS complexes I, III and IV, however no mutations were found that could explain the mitochondrial disorder in the patients investigated in this study [1]. Also, recent studies show that METTL17 is a novel co-activator of estrogen receptors and may play a role in breast tumorgenesis [2].

 

  • SNV-2

table4

Table 4: Summary of SNV in APH1B gene

Genome Position: Chromosome 15:  63597969

Gene: APH1B

APH1 is a multipass transmembrane protein that interacts with presenilin and nicastrin as a functional component of the gamma-secretase complex. APH1B codes for one of the subunit of the gamma-secretase complex which is required for the intramembrane proteolysis of a number of membrane proteins.

Variation Type:  Non-synonymous Single Nucleotide variation

It results in a missense SNP, thus altering gamma-secretase subunit APH-1B isoform 1 (NP_112591.2) and gamma-secretase subunit APH-1B isoform 2 (NP_001139118.1). Due to this SNV, arginine is replaced with cysteine in the protein sequence at position 255 (replacement position in codon: 1) in gamma-secretase subunit APH-1B isoform 1, while same amino acid substitution takes place at position 214 (replacement position in codon: 1) in gamma-secretase subunit APH-1B isoform 2 protein. Arginine is a positively charged polar hydrophilic amino acid, while cysteine is a non-polar hydrophilic amino acid, thus possibly resulting in a minor protein structural alteration.When compared to the ALL variants dataset of 1000Genomes Project released in Aug 2015 (1000g2015aug_all), allelic frequency of this SNV is 0.0002; and when compared to the EUR variants dataset (1000g2015aug_eur), its allelic frequency is 0.001. These frequencies suggest that not only is this SNV rare when it comes to whole genomes of different ethnic populations, but the said is true in European population as well. Besides, this particular SNV is not yet observed in American, East Asian and African populations, making it extremely rare. The whole-exome allelic frequency (ExAC_ALL) of this SNV is as low as 0.0002, and the ESP6500 allele frequency is 0.0004, thus bolstering its rarity.

SIFT, PolyPhen, MutationTaster and LRT scores predict deleterious effect of this SNV; however according to MutationAssessor predictions, this SNV might be non-functional, and tolerable as predicted by FATHMM. CADD PHRED score can be considered to refute these conflicting assessments provided by MutationAssessor and FATHMM. Its CADD PHRED score is as high as 34, meaning it is included in the top 0.1% percentile of deleterious variants in human genome. These conflicting predictions leave ample scope for testing pathogenecity of this SNV.

Its global MAF is T=0.0002/1, meaning minor allele is ‘T’ and has a frequency of 0.02% in the 1000Genomes phase 1 population and that ‘T’ is observed only once in the sample population of 1088 people (2176 chromosomes).

Mapping SNV on 3D structure

Model for this SNV could not be included in the report as the resulting model was not reliable (it did not model the required amino acids).

Related information

The clinical significance of the variation is still unknown. Multiple variations are reported in dbVar within this gene region such as copy number variation, insertions and single nucleotide variations. One of the copy number variations in this gene is reported as pathogenic in ClinVar database. Also, a non synonymous Single Nucleotide Polymorphism (Phe217Leu; rs1047552) in this gene showed a tendency for association with HIV-1 infection in a Xhosa indigenous South African Bantu study (P = 0.087), and associated significantly in a Caucasian Dutch study (P = 0.049), suggesting a role for the gamma-secretase pathway in susceptibility to HIV-1 infection [3]. The same SNP is linked to male-specific association between a gamma-secretase polymorphism and premature coronary atherosclerosis [4]. Therefore, further investigation of the SNP identified in this study may discover association of the variation with diseased condition.

 

  • SNV-3

table5

Table 5: Summary of SNV in GZMM gene

Genome Position: Chromosome 19: 548946

Gene: GZMM

The protein encoded by the Gene GZMM is granzyme M. There are 4 types of granzymes which are expressed and stored by Human natural killer (NK) cells and activated lymphocytes in large cytoplasmic granules.

Variation Type: Non-synonymous SNV

It results in a missense SNP, thus altering 2 proteins: Granzyme M isoform 1 preproprotein (NP_005308.1) and Granzyme M isoform 2 (NP_001245280.1). Due to this SNV, arginine is replaced with tryptophan in the protein sequence at position 86 in Granzyme M isoform 2 (replacement position in codon: 1, replacement position in mRNA: 407), while the same amino acid substitution takes place in protein sequence at position 125 in Granzyme M isoform 1 preproprotein (replacement position in codon: 1, replacement position in mRNA: 411). Arginine is a positively charged polar hydrophilic amino acid, while tryptophan a non-polar hydrophobic amino acid; thus possibly resulting in a major protein structural alteration.

When compared to the ALL variants dataset of 1000Genomes Project released in Aug 2015 (1000g2015aug_all), allelic frequency of this SNV is 0.0002; and when compared to the EUR variants dataset (1000g2015aug_eur), its allelic frequency is 0.001. These frequencies suggest that not only is this SNV rare when it comes to whole genomes of different ethnic populations, but the said is true in European population as well. Besides, this particular SNV is not yet observed in American, East Asian and African populations, making it extremely rare. The whole-exome allelic frequency (ExAC_ALL) of this SNV is as low as 7.62e-05, and the ESP6500 allele frequency is absent, suggesting it was not yet observed in the 6500 exomes sequenced; thus bolstering its rarity.

SIFT,PolyPhen, MutationAssessor and FATHMM scores predict deleterious effect of this SNV; however according to MutationTaster predictions, this SNV might be neutral, and might have unknown functionality as predicted by LRT. CADD PHRED score can be considered to refute these conflicting assessments provided by MutationTaster and LRT. Its CADD PHRED score is as high as 23.7, meaning it is included in the top 1% percentile of deleterious variants in human genome.

Its global MAF is T=0.0002/1, meaning minor allele is ‘T’ and has a frequency of 0.02% in the 1000Genomes phase 1 population and that ‘T’ is observed only once in the sample population of 1088 people (2176 chromosomes).

Mapping SNV on 3D structure

selection_050

 (a)                                                        (b)

Figure 2: (a)Wild type, (b)Mutant. Structural alterations highlighted in green, marked by oval.

 

Figure 2 represents 3D models of Granzyme M isoform 1 preproprotein. Figure 2a shows structure with original amino acid arginine at position 125, while in figure 2b arginine is substituted with tryptophan. Significant structural changes were expected due to opposing natures of amino acids in question. However, as seen from the models, major structural changes are not observed.

Related information:

13 pathogenic variations reported in and around the gene GZMM, however pathogenicity or disease linkage has not been reported yet for this SNV.

 

  • SNV-4

table6Table 6: Summary of SNV in TYRP1 gene

Genome Position: Chromosome 9:  12695565

Gene : TYRP1

This gene encodes a melanosomal enzyme that belongs to the tyrosinase family and plays an important role in the melanin biosynthetic pathway. Defects in this gene are the cause of rufous oculocutaneous albinism and oculocutaneous albinism type III.

Variation Type: Non-synonymous Single Nucleotide variation.

It results in a missense SNP, thus altering 5,6-dihydroxyindole-2-carboxylic acid oxidase precursor (NP_000541.1). Due to this SNV, arginine is replaced with tryptophan in the protein sequence at position 146 (replacement position in codon: 1). Arginine is a positively charged polar hydrophilic amino acid, while tryptophan a non-polar hydrophobic amino acid; thus possibly resulting in a major protein structural alteration.

When compared to the ALL variants dataset of 1000Genomes Project released in Aug 2015 (1000g2015aug_all), allelic frequency of this SNV is 0.0002; and when compared to the EUR variants dataset (1000g2015aug_eur), its allelic frequency is 0.001. These frequencies suggest that not only is this SNV rare when it comes to whole genomes of different ethnic populations, but the said is true in European population as well. Besides, this particular SNV is not yet observed in American, East Asian and African populations, making it extremely rare. The whole-exome allelic frequency (ExAC_ALL) of this SNV is as low as 9.91e-05, and the ESP6500 allele frequency is absent, suggesting it was not yet observed in the 6500 exomes sequenced; thus bolstering its rarity.

SIFT,PolyPhen, MutationTaster and FATHMM scores predict deleterious effect of this SNV; however according to MutationAssessor predictions, this SNV might be non-functional, and neutral as predicted by LRT. CADD PHRED score can be considered to refute these conflicting assessments provided by MutationAssessor and LRT. Its CADD PHRED score is as high as 32, meaning it is included in the top 0.1% percentile of deleterious variants in human genome.

Its global MAF is T=0.0002/1, meaning minor allele is ‘T’ and has a frequency of 0.02% in the 1000Genomes phase 1 population and that ‘T’ is observed only once in the sample population of 1088 people (2176 chromosomes).

Mapping SNV on 3D structure

selection_051

 (a)                                                                      (b)

Figure 3: (a)Wild type, (b)Mutant. Structural alterations highlighted in green, marked by oval.

Figure 3 represents 3D models of 5,6-dihydroxyindole-2-carboxylic acid oxidase precursor protein. Figure 3a shows structure with original amino acid arginine at position 146, while in figure 3b arginine is substituted with tryptophan. Significant structural changes were expected due to opposing natures of amino acids in question. However, as seen from the models, major structural changes are not observed.

Related information:

Pathogenicity or disease linkage has not been reported yet for this SNV.

 

Besides these non-synonymous SNVs, following are few interesting SNVs of other categories found in this exome.

  • Stop gain SNV

Chromosome 20, Position 37653893, avSNP147: rs567834406

It results in gain of a stop codon in DHX35 gene, thus altering following proteins:

a)      probable ATP-dependent RNA helicase DHX35 isoform X1(XP_011527274)

b)      probable ATP-dependent RNA helicase DHX35 isoform X2(XP_016883493)

c)      probable ATP-dependent RNA helicase DHX35 isoform X3 (XP_011527276)

d)     probable ATP-dependent RNA helicase DHX35 isoform X4 (XP_006723911)

e)      probable ATP-dependent RNA helicase DHX35 isoform 1 (NP_068750)

f)       probable ATP-dependent RNA helicase DHX35 isoform 2 (NP_001177738)

This is a rare SNV as it has low allelic frequencies (0.0002 in ALL variants dataset of 1000Genomes Aug 2015 release, 8.27e-06 ExAC_ALL frequency, no ESP6500 frequency). It has deleterious effect according to LRT, MutationTaster and FATHMM predictions, and a high CADD PHRED score of 45 (this is in top 0.1% percentile of deleterious variants in human genome).

Its global MAF is A=0.0002/1, meaning minor allele is ‘A’ and has a frequency of 0.02% in the 1000Genomes phase 1 population and that ‘A’ is observed only once in the sample population of 1088 people (2176 chromosomes).

 

  • Stop loss SNV

Chromosome 9, Position 116800, avSNP147: rs79220013

It results in loss of a stop codon in FOXD4 gene, thus altering forkhead box protein D4 proteins (NP_997188). The stop codon is substituted by tyrosine in protein sequence at position 440 (codon replacement position: 3). However, it has a very low CADD PHRED score of 2.86 and is neutral according to FATHMM predictions, hence might not be highly pathogenic.

 

  • Frameshift deletion SNV

Frameshift deletion is reported at position 704605 in chromosome 11, resulting in alterations in TMEM80 and EPS8L2 genes. Its AVSNP147 ID is rs35782494. It is a commonly seen frameshift deletion SNV with MAF of 0.3556/1781 and observed in populations of all ethnicities. However, its pathogenicity canot be determined due to absence of CADD PHRED, SIFT, PhyloPhen and other relevant scores.

 

  • Frameshift insertion SNV

Frameshift insertion of nucleotide C is reported at position 44113484 in chromosome 7, resulting in alterations in POLM gene, thus affecting DNA-directed DNA/RNA polymerase mu isoform 1(NP_037416), DNA-directed DNA/RNA polymerase mu isoform 2 (NP_001271259) and DNA-directed DNA/RNA polymerase mu isoform 3 (NP_001271260) along with other predicted proteins. Its AVSNP147 ID is rs754029158. It is not so commonly seen frameshift insertion SNV with ExAC_ALL allelic frequency of 8.26e-06, and is not observed in most of the ethnic populations (African, American, Southeast Asian and Finnish). However, its pathogenicity canot be determined due to absence of CADD PHRED, SIFT, PhyloPhen and other relevant scores.

 

References:

  1. Smits P, Rodenburg RJ, Smeitink JA, van den Heuvel LP.”Sequence variants in four candidate genes (NIPSNAP1, GBAS, CHCHD1 and METT11D1) in patients with combined oxidative phosphorylation system deficiencies.” J Inherit Metab Dis. 2010 Dec;33 Suppl 3:S13-9.
  2. Du P, Yuan B, Cao J, Zhao J, Ding L, Chen L, Ying S, Jiang L, Lin J, Xu X, Cheng L, Ye Q. “Methyltransferase-like 17 physically and functionally interacts with estrogen receptors.” IUBMB Life. 2015 Nov;67(11):861-8. doi: 10.1002/iub.1444. Epub 2015 Oct 21.
  3. Van Loo KM, van Schijndel JE, van Zweeden M, van Manen D, Trip MD, Petersen DC, Schuitemaker H, Hayes VM, Martens GJ. “Correlation between HIV-1 seropositivity and prevalence of a gamma-secretase polymorphism in two distinct ethnic populations.” J Med Virol. 2009 Nov;81(11):1847-51. doi: 10.1002/jmv.21601.
  4. Van Loo KM, van Schijndel JE, van Zweeden M, van Manen D, Trip MD, Petersen DC, Schuitemaker H, Hayes VM, Martens GJ. “Male-specific association between a gamma-secretase polymorphism and premature coronary atherosclerosis.”PLoS One. 2008;3(11):e3662. doi: 10.1371/journal.pone.0003662. Epub 2008 Nov 6.

Exome Variant Analysis Project U2

Brief outline of analysis pipeline:

The analysis pipeline was using 1000 Human Genome Project and then the ClinVar file for our chosen test subject. https://collections.su92l.arvadosapi.com/collections/bc56e4f7d6c0d892119114ea3278c137+1945/clinvar-report.html

 We filtered the genes to include only the pathogenic variants. 18 total pathogenic Variants. 

Personal Description

Gender: male

Ethnicity: not available

Age: 63

Weight: 215 lbs.

Height: 6 feet 1 inch

Conditions: hyperlipidemia, myalgia, nocturia, protein S deficiency, DVT of lower extremity, Pulmonary embolism, spinal cord lesion

Variant Descriptions of 5 Pathogenic Variants:

 

Victoria

NM_000903.2(NQO1):c.559C>T (p.Pro187Ser) Breast_cancer\x2c_post-chemotherapy_poor_survival_in

Map of wildtype (SWISS Modeling):

vic1st

Map of variant (Pro → Ser at position 187) (SWISS Modeling):

vic2nd

Published Literature/Database Info:

This mutation is a single nucleotide variant at cytogenetic location 16q22.  It involves a protein change when proline is replaced by serine at position 187.  The variation is of the NQO1 gene, and the functional consequence is missense. The mutation is of germline origin, where cysteine is converted to tyrosine at chromosomal position 16:69711242.  The mutation is associated with poor prognosis in breast cancer, and other mutations of the NQO1 gene are associated with increased susceptibility and incidence of bladder cancer, lung cancer, and lymphoblastic leukemia.

Brief summary/why is it pathogenic or maybe not?

Published literature provides evidence that poor breast cancer prognosis after chemotherapy treatments were associated with mutations in the NQO1 gene.  Disease recurrence after chemotherapy was associated with this mutation due to tumor-infiltrating immune cells that exhibit the missense mutation.  The mutation was associated with a poor response to chemotherapy and reduced disease survival.

Elisa:

Arylsulfatase A isoform a precursor [Homo sapiens]–Metachromatic_leukodystrophy

Map of Wild-Type protein:

elisa1st

Map of Variant protein (Asn → Ser at position 352):

 elisa-2nd

Published literature: Metachromatic leukodystrophy (MLD) is a neurodegenerative lysosomal storage disorder that affects the metabolism of sphingolipids mainly caused by the mutations in the arylsulfatase A (ARSA) gene. MLD is a condition that is categorized as an arylsulfatase A deficiency with an onset during adolescence. Published literature provides evidence that shows 16 mutations in the ARSA gene found in patients affected with MLD. 9 of the 16 mutations were missense mutations, 3 were nonsense mutations, 3 were frameshift mutations and one was a splice site mutation.

Very brief summary of the variant – why is it pathogenic or maybe not? The variant contains a missense mutation at nucleotide position 1055 that converts adenosine to guanine, which causes an amino acid change from asparagine to serine at amino acid position 352. Both amino acids are classified as polar, however asparagine is an aliphatic amino acid. The variant type was a single nucleotide variant. It is pathogenic, because of a single nucleotide change that results in an amino acid change, without affecting the length of the amino acid change. The allele frequency is 0.2248 and zygosity is homozygous.

Audrey: NM_000418.3(IL4R):c.223A>G (p.Ile75Val) AND Acquired immunodeficiency syndrome, slow progression to

Map of Wildtype:

aud1st

Map of Variant (Ile -> Val at position 75):

aud-2nd

Published Literature/Database Info:

This mutation is caused by a A>G mutation on chromosome 16 resulting in a Isoleucine to Valine missense variant in position 75 on the Human Interleukin 4 receptor. This mutation has a clinically significant link to the phenotype of a slow progression to Acquired Immunodeficiency Syndrome (AIDS) and was deemed pathogenic in 2005. The mutation is of germline origin and therefore affects germline cells, exacerbating the mutagenic effects of the variant.

Brief Summary of the Variant:

The human interleukin 4 receptor is a cytokine produced by T-lymphocytes and is important in the regulation and differentiation of lymphoid and myeloid cells. The protein itself binds to high affinity receptors. While the mutation in question is a nonpolar to nonpolar amino acid missense mutation, the functionality of the protein depends on binding affinity. Despite the models not showing any explicit alterations in the protein structure, the change from a hydrocarbon side chain to a smaller isopropyl group could affect the binding affinity of the protein. Because the mutation is in the 75th position, the mutation is not deeply embedded into the protein and could be on the outer surface of the protein, making it more likely to affect protein binding and subtle conformation.  

Kristin: Methylmalonic_aciduria_cblB_type

Map of Wild Type Protein:

model

Map of Variant: (Met –>Lys at position 239)

model

Published Literature:

Methylmalonic aciduria is a genetically heterogeneous disorder of methylmalonate and cobalamin (cbl; vitamin B12) metabolism. Different forms of isolated methylmalonic aciduria have been classified according to complementation groups of cells in vitro. Patients with defects in the synthesis of AdoCbl are usually responsive to vitamin B12 therapy and are classified as ‘cbl’ type: these include cblB and cblA. The cblA type is caused by mutation in the MMAA gene. The ‘mut’ type is caused by mutation in the MUT gene; in general, the mut form of MMA is unresponsive to vitamin B12 therapy (OMIM database). The MMAB gene encodes a protein that catalyzes the final step in the conversion of vitamin B(12) into adenosylcobalamin (AdoCbl), a vitamin B12-containing coenzyme for methylmalonyl-CoA mutase. Mutations in the gene are the cause of vitamin B12-dependent methylmalonic aciduria linked to the cblB complementation group (NCBI).

Brief Summary of the Variant:

Methylmalonic aciduria is caused by a homologous mutation at cytogenetic location, 12q24.11. It is classified as a single nucleotide variant, missense variant, and a non-coding transcript variant with a change in the protein structure from Thymine to Adenine, resulting in a Methionine changing to a Lysine at position 239. The mutation did not show any specific alterations to the 3d modeling but the change from Met to Lys could mean there will be issues with binding other proteins due to Lysine’s different affinity and characteristics.

Christopher Lee: Hermansky-Pudlak_syndrome_5

Map of Wild-Type protein:

2nd-last

Map of Variant protein (Thr → lle at position 1098):  

capture-last

Published literature:

Hermansky-Pudlak syndrome (HPS) is a multifaceted disorder characterized by: tyrosinase-positive oculocutaneous albinism, bleeding as a result of platelet deficiency; and more rarely, pulmonary fibrosis, granulomatous colitis, and immunodeficiency. The mentioned albinism causes hypopigmentation of both the skin and the hair, as well as, a number of ocular complications. To list a few, HPS can induce reduced retinal and iris pigment, foveal hypoplasia, nystagmus, and optic nerve fiber issues. The skin tone of individuals with HPS are also generally a shade lighter than that of most other family members. The bleeding diathesis have both short term and prolonged bleeding long term symptoms, including hemorrhage.

The HPS5 gene has been mapped to chromosome 11p15-p13. (Zhang, Q., Zhao, B., Li, W., Oiso, N., Novak, E. K., Rusiniak, M. E., Gautam, R., Chintala, S., O’Brien, E. P., Zhang, Y., Roe, B. A., Elliott, R. W., and 9 others. Ru2 and Ru encode mouse orthologs of the genes mutated in human Hermansky-Pudlak syndrome types 5 and 6. Nature Genet. 33: 145-154, 2003. )

Very brief summary of the variant

The variant contains a missense mutation at nucleotide position 3293 that converts cytosine to thymine, which causes an amino acid change from threonine to isoleucine at amino acid position 1098. It is classified as a single nucleotide variant and has been identified as a pathogenic variant by GeneReviews in 2012; however, the review status and origin are not provided.

SNP analysis of one Dai Chinese female in 1000 Genome Project

Introduction:

First, to map reads to reference, we use BWA-MEM. BWA is a software package for mapping low-divergent sequences against a large reference genome, such as the human genome. Among its algorithms, BWA-MEM is generally recommended for high-quality queries as it is faster and more accurate. Then SAM Tools manipulate alignments in the SAM format, including sorting, merging, indexing and generating alignments in a per-position format. Last, BCF tools manipulate variant calls in the Variant Call Format (VCF).

To fill the gap between the generation of massive data and the ability to fully exploit the biological contents of these data, ANNOVAR is developed, which is an efficient software tool to utilize update-to-date information to functionally annotate genetic variants detected from diverse genomes with user-specified versions of genome builds (including human genome hg18, hg19, hg38, as well as mouse, worm, fly, yeast and many others). And web-based wANNOVAR addresses the critical needs for functional annotation of genetic variants from personal genomes. It allows flexibility by permitting the users to select gene definition systems, produces more annotation results including predicted functional importance scores for nn-synonymous variants and builds in a ‘variant reduction’ pipeline to facilitate identifying potential disease causal variants and genes from personal genomes.

We performed our analysis on the exome sequencing data of a male from the Chinese Dai (1000G Id: HG00759). The fastq file is from FTP site and mapped to hg38 reference.  The brief introduction of sample is shown below:

Geneder Female
Population Dai Chinese
Biosample ID SAME125020
Sample ID HG00759

Table 1. Sample Profile.


PIPELINE

Then SNP calling were performed with Samtools (Li H, 2011). The SNP calls were further annotated using the online tool wANNOVAR (Wang K, 2010 ; Chang X, 2012) with custom filters : MAF=0.01 and choosing misssense/nonsense/splicing variants.

With the list of annotated SNPs, they were further selected by nonsynonymous and pathogenetic SNPs with dbSNP database recorded to obtain the gene list.

Calling scripts as follows:

bwa mem reference.fa readset1.fq readset2.fq > readset_bwamem.sam

samtools fixmate -O bam bwamem.sam bwamemfixmate.bam

samtools sort -O bam -o sorted.bam -T /tmp/temp bwamemfixmate.bam

samtools index sorted.bam

samtools mpileup -ugf reference.fa sorted.bam | bcftools call -vmO z -o HG00759.vcf.gz

analysispipeline

Figure 1. Analysis pipeline


RESULTS

40 SNPs were filtered out via above introduced pipeline. Then, each group member looked further into one single variant and the detailed information is shown below:

Group Member Gene dbSNP Location Transcript site Change
Cheng Chen SLC24A2 rs26722 Chr5:33963765 NM_001012509:exon3 Nucleotide:G814A Protein:E272K
Chen Lin EDAR rs3827760 Chr2:108897145 NM_022336:exon12 Nucleotide:T1109C Protein:V370A
Junke Wang OPTN rs75654767 Chr10:13136766 NM_021980:exon14 Nucleotide:G1634A Protein:R545Q
Yunfei Xia CCR2 rs1799864 Chr3:46367717 NM_001123041:exon2 Nucleotide:G190A Protein:V64I
Sujun Zhao FCGR3B rs2290834 Chr1:161629781 NM_001244753:exon3 Nucleotide:A424G Protein:I142V
Xinrui Zhou SLC24A5 rs1426654 Chr15:48124387 NM_205850:exon3 Nucleotide:G1634A Protein:R545Q

Table 2. Variants chosen for further analysis

Variant Analysis


1. SLC45A2(rs26722): Cheng Chen

VARIANT INFO:

Chr Chr5
Start 33963765
End 33963765
Change C->T
dbSNP rs26722
Gene.refgene SLC45A2
ExonicFunc.refgene nonsynonymous SNV
ClinVar_SIG pathogenic
ClinVar_DIS Skin/hair/eye_pigmentation\x2c_variation_in\x2c_5
ClinVar_STATUS Single
ClinVar_ID RCV000004762.2
ClinVar_DB MedGen OMIM
ClinVar_DBID C2673584 227240

Table 3. Brief information of chosen variant from wANNOVAR results.

Alternative Splice Variants: 1 alternative splice sequence

Transcript ID Exon Nucleotide change AA change
NM_001012509 3 814 G->A 272 E->K

Table 4. Alternative splice transcripts related to rs26722

Prevalence: Frequency of this variation is high in American, East Asia and South Asian, however it is low in African and Europe.

ALL African American East Asia Europe South Asian
0.18 0.044 0.31 0.39 0.024 0.21

Table 5. Percent of genomes with variation in 1000 genome database

PUBLISHED LITERATURE

Graf, Justin, Richard Hodgson, and Angela Van Daal. “Single nucleotide polymorphisms in the MATP gene are associated with normal human pigmentation variation.” Human mutation 25.3 (2005): 278-284.

3-D STRUCTURE

The mutation position didn’t play an important role in secondary structure, so it wasn’t showed in the picture.

cc.png

Figure 2. Predicted 3-D structure of original protein and changed protein in Jmol.

 

VARIANT SUMMARY

Name Skin/hair/eye_pigmentation\x2c_variation_in\x2c_5
Synonyms Skin/hair/eye_pigmentation
Identifiers MedGen: C2673584; OMIM: 227240

A significant increase in 272Lys allele frequency was observed in the Asian and African-American populations compared to Caucasians. Australian Aborigines did not show any significant difference in 272Lys allele frequency when compared to Caucasians despitetheir obvious different pigmentation phenotype. The variation in allele frequencies for p.Glu272Lys was also highly statistically significant (Pr0.0001). The p.Glu272Lys and p.Phe374Leu polymorphisms result insignificant amino acid substitutions and could potentially alter the function of MATP. The p.Glu272Lys polymorphism causes a glutamate to lysine substitution in the third cytoplasmic loop, while the phenylalanine to leucine substitution occurs in the eighth transmembrane domain of MATP


2. EDAR(rs3827760) – Chen Lin

 VARIANT INFO:

Summary:

Chr chr2
Start 108897145
End 108897145
Change A->G
dbSNP rs3827760
Gene.refgene EDAR
ExonicFunc.refgene nonsynonymous SNV
ClinVar_SIG pathogenic
ClinVar_DIS Hair_morphology_1\x2c_hair_thickness
ClinVar_STATUS single
ClinVar_ID RCV000006216.1
ClinVar_DB MedGen OMIM Orphanet
ClinVar_DBID .-

Table 6. Brief information of chosen variant from wANNOVAR results.

Alternative Splice Variants: There is 1 alternative splice sequence shown in table below:

Transcript ID Exon Nucleotide change AA change
NM_022336 12 1109 T -> C 370 V -> A

Table 7. Alternative splice transcripts related to rs3827760.

Prevalence: Frequency of this variation is extremely high in East Asian people and quite low in African, Europe and South Asian. The prevalence is shown in table below:

ALL African American East Asia Europe South Asian
0.24 0.003 0.39 0.87 0.011 0.013

Table 8. Percent of genomes with variation in 1000 genome database

PUBLISHED LITERATURE

  1. Fujimoto, Akihiro, et al. “A replication study confirmed the EDAR gene to be a major contributor to population differentiation regarding head hair thickness in Asia.” Human genetics 124.2 (2008): 179-185.
  2. Fujimoto, Akihiro, et al. “A scan for genetic determinants of human hair morphology: EDAR is associated with Asian hair thickness.” Human molecular genetics 17.6 (2008): 835-843.
  3. Mou, Chunyan, et al. “Enhanced ectodysplasin‐A receptor (EDAR) signaling alters multiple fiber characteristics to produce the East Asian hair form.” Human mutation 29.12 (2008): 1405-1411.

3-D STRUCTURE

3-D structure prediction shows that the variant in the protein caused significant change in the 3-D structure of EDAR protein.

lc

Figure 3. Predicted 3-D structure of original protein and changed protein in Jmol.

(Blue: origin protein; Yellow: protein variant 370V->A; Red: 370 V&A)

CLINICAL OBSERVATION SUMMARY:

Name Hair morphology 1, hair thickness
Synonyms .

The variant is pathogenic because it changes the 3D structure of Homo sapiens ectodysplasin A receptor (EDAR) protein (as showed in the picture above).

By analysis of the data from the International HapMap project, Akihiro Fujimoto et al. (2008) found that the Asian-specific non-synonymous single nucleotide polymorphism (1540T/C, 370Val/Ala) was associated with Asian hair thickness. Furthermore, they found that the Asian-specific 1540C allele was associated with increase in hair thickness. Reporter gene assays suggested that 1540T/C affects the activity of the downstream transcription factor NF-Kb.

By elevating the EDAR activity in transgenic mice, Mou C et al. (2008) found the hair phenotype of the mice changed. The thick hair fibers were produced by enlarged hair follicles. Their work showed that the multiple differences in hair form between East Asian and other human populations can be explained by the simplest of genetic alterations.

 


3. OPTN(rs75654767) – Junke Wang

 VARIANT INFO:

Summary:

Chr chr10
Start 13136766
End 13136766
Change G -> A
dbSNP rs75654767
Gene.refgene OPTN
ExonicFunc.refgene nonsynonymous SNV
ClinVar_SIG Pathogenic
ClinVar_DIS Glaucoma_1\x2c_open_angle\x2c_e
ClinVar_STATUS Single
ClinVar_ID RCV000007515.1
ClinVar_DB MedGen OMIM Orphanet
ClinVar_DBID C1842026

Table 9. Brief information of chosen variant from wANNOVAR results.

 Alternative Splice Variants: There are 4 alternative splice sequences shown in table below:

Transcript ID Exon Nucleotide change AA change
NM_021980 14 1634 G -> A 545 R -> Q
NM_001008212 15 1634 G -> A 545 R -> Q
NM_001008211 16 1634 G -> A 545 R -> Q
NM_001008213 16 1634 G -> A 545 R -> Q

Table 10. Alternative splice transcripts related to rs75654767.

Prevalence: The target variant is present only in Asians (including east and south asian). The reports support this phenomenon for this variant was first found in Japanese. And frequency of this variant is extremely low in the worldwide. The prevalence is shown in table below:

ALL African American East Asia Europe South Asian
0.0076 0.035 0.0031

Table 11. Percent of genomes with variation in 1000 genome database

PUBLISHED LITERATURE

  1. Lamason, Rebecca L., et al. “SLC24A5, a putative cation exchanger, affects pigmentation in zebrafish and humans.” Science 310.5755 (2005): 1782-1786.
  2. Stokowski, Renee P., et al. “A genomewide association study of skin pigmentation in a South Asian population.” The American Journal of Human Genetics 81.6 (2007): 1119-1132.

3-D STRUCTURE:

The 3-D structure of the original and mutated protein shows no difference from each other. Furthermore, the changed part is not demonstrated in the 3-D structure. That may suggest that the variant do not bring huge influence to the protein structure, thus the presenting variant may not have a functional effect.

optn

Figure 4. Predicted 3-D structure of original protein and changed protein in Jmol.

(Blue: origin protein; Yellow: protein variant 1634 G -> A)

CLINICAL OBSERVATION SUMMARY: 

Name Amyotrophic Lateral Sclerosis
Synonyms lose the ability to speak, eat, move and breathe

Maruyama et al. (2010) identified 2 different homozygous null mutations in the OPTN gene, a deletion of exon 5 (602432.0005) and a nonsense mutation (602432.0006), in 4 Japanese individuals with autosomal recessive amyotrophic lateral sclerosis-12 (ALS12; 613435). These mutations were not identified in over 6,800 individuals with glaucoma. In addition, Maruyama et al. (2010) identified a missense mutation, E478G (602432.0007), segregating as an apparently autosomal dominant mutation with incomplete penetrance in 2 families. This mutation was not seen in a total of 5,000 Japanese chromosomes. In cell transfection assays, Maruyama et al. (2010) observed that nonsense and missense mutations of OPTN abolished the inhibition of activation of nuclear factor kappa-B (NFKB; see 164011) and that E478G mutant OPTN had a cytoplasmic distribution different from that of wildtype OPTN or OPTN carrying a mutation causing in POAG. A patient with the E478G mutation showed OPTN-immunoreactive cytoplasmic inclusions. Furthermore, TDP43 (605078)- or SOD1 (147450)-positive inclusions in sporadic and familial cases of ALS were also noticeably immunolabeled by anti-OPTN antibodies.

In a deceased patient (case A) with frontotemporal dementia (FTD; see 600274), Pottier et al. (2015)identified compound heterozygous mutations in the OPTN gene (Q235X and A481V). The patient had no obvious features of motor neuron disease. The mutations, which were found by whole-genome sequencing, were filtered against the dbSNP (build 137), 1000 Genomes Project, and Exome Sequencing Project databases. OPTN protein levels were dramatically reduced in patient cerebellum, consistent with a loss of function. Levels of TBK1 (604834) were also decreased compared to controls.

 


4. CCR2(rs1799864) – Yunfei Xia

 VARIANT INFO:

Summary:

Chr chr3
Start 46367717
End 46357717
Change G -> A
dbSNP rs1799864
Gene.refgene CCR2
ExonicFunc.refgene nonsynonymous SNV
ClinVar_SIG pathogenic
ClinVar_DIS Congenital_human_immunodeficiency_virus
ClinVar_STATUS single
ClinVar_ID RCV000008756.1
ClinVar_DB MedGen OMIM
ClinVar_DBID C1836230 609423

Table 11. Brief information of chosen variant from wANNOVAR results.

 Alternative Splice Variants: There is 2 alternative splice sequence shown in table below:

Transcript ID Exon Nucleotide change AA change
NM_001123041 2 190 G -> A 64 V -> I
NM_001123396 2 190 G -> A 64 V -> I

Table 12. Alternative splice transcripts related to rs1799864.

Prevalence: we can easily read from the table below that this variation is relatively higher in African, American and East Asia while comparatively much lower in Europe and South Asia. In essence, The CCR2-64I alteration was common in all ethnic groups with the following allele frequencies: 0.098 in Caucasians (n = 1847 individuals); 0.151 in African Americans (n = 899); 0.172 in Hispanics (n = 207), and 0.250 in Asians (n = 40).

The prevalence is shown in next page:

ALL African American East Asia Europe South Asia
0.15 0.17 0.21 0.21 0.086 0.098

Table 13. Percent of genomes with variation in 1000 genome database

PUBLISHED LITERATURE

  1. Smith MW, Dean M., et al. “Contrasting genetic influence of CCR2 and CCR5 variants on HV-1 infection and disease progression. Hemophilia Growth and Development Study (HGDS), Multicenter AIDS Cohort Study (MACS), Multicenter Hemophilia Cohort Study (MHCS), San Francisco City Cohort(SFCC), ALIVE study.” Science. 1997 Aug 15;277(5328):959-65.
  2. Mummidi S, Ahuja SS., et al. “Genealogy of the CCR5 locus and chemokine system gene variants associated with altered rates of HIV-1 disease progression.” Nat Med. 1998 Jul;4(7):786-93.

3-D STRUCTURE:

A G-to-A nucleotide substitution was detected at position 190 (counting from the ATG start codon) that substitutes the CCR2-+ amino acid residue valine at position 64 to isoleucine (CCR2-64I).

It is discovered by Smith MWel ta. that a conservative change located within the first transmembrane domain of the CCR2 receptor. That domain has a completely conserved amino acid sequence identity with CCR5, which suggests functional constraints on mutational variation.

xy.png

Figure 5. Predicted 3-D structure of original protein and changed protein in Jmol.

(Blue: original protein; Yellow: protein variant 64 V -> I; Red: 64 V & I)

CLINICAL OBSERVATION SUMMARY: 

Name Congenital human immunodeficiency virus
Synonyms HUMAN IMMUNODEFICIENCY VIRUS TYPE 1, SUSCEPTIBILITY TO; HIV-1, SUSCEPTIBILITY TO; HUMAN IMMUNODEFICIENCY VIRUS TYPE 1,

RESISTANCE TO

Identifiers MedGen: C1836230; OMIM: 609423
Age NA
Prevalence NA

In the comparison of the genetic influence between CCR2 and CCR5, Smith MW el tal.(1997) found that a mutation (CCR2-64I) within the first transmembrane region of the CCR2 chemokine and HIV-1 receptor gene is described that occurred at an allele frequency of 10 to 15 percent among Caucasians and African Americans. Genetic association analysis of five acquired immunodeficiency syndrome (AIDS) cohorts (3003 patients) revealed that although CCR2-64I exerts no influence on the incidence of HIV-1 infection, HIV-1-infected individuals carrying the CCR2-64I allele progressed to AIDS 2 to 4 years later than individuals homozygous for the common allele. An estimated 38 to 45 percent of AIDS patients whose disease progresses rapidly (less than 3 years until onset of AIDS symptoms after HIV-1 exposure) can be attributed to their CCR2-+/+ or CCR5-+/+ genotype, whereas the survival of 28 to 29 percent of long-term survivors, who avoid AIDS for 16 years or more, can be explained by a mutant genotype for CCR2 or CCR5.

In the analysis of the genealogy of the CCR5 locus and chemokine system gene variants, Mummidi el al.(1998) proposed that allelic variants for the HIV-1 co-receptors chemokine receptor 5 (CCR5) and CCR2, as well as the ligand for the co-receptor CXCR4, stromal-derived factor (SDF-1), have been associated with a delay in disease progression. The disease-retarding effects of the CCR2-641 allele were found in African Americans but not in Caucasians, and the SDF1-3’A/3’A genotype was associated with an accelerated progression to death. In contrast, the CCR5-delta32 allele and a CCR5 promoter mutation with which it is tightly linked were associated with limited disease-retarding effects. Collectively, these findings draw attention to a complex array of genetic determinants in the HIV-host interplay.

 


5. FCGR3B(rs2290834)-Sujun Zhao

 VARIANT INFO:

Chr chr1
Start 161629781
End 161629781
Change T->C
dbSNP rs2290834
Gene.refgene FCGR3B
ExonicFunc.refgene nonsynonymous SNV
AAChange.refgene FCGR3B:NM_001244753:exon3:c.A424G:p.I142V,

FCGR3B:NM_001271035:exon3:c.A421G:p.I141V,

FCGR3B:NM_001271036:exon3:c.A265G:p.I89V,

FCGR3B:NM_001271037:exon3:c.A265G:p.I89V,

FCGR3B:NM_000570:exon4:c.A316G:p.I106V

ClinVar_SIG pathogenic
ClinVar_DIS Neutrophil-specific_antigens_na1/na2
ClinVar_STATUS single
ClinVar_ID RCV000030607.1
ClinVar_DB
ClinVar_DBID

Table 14. Brief information of chosen variant from wANNOVAR results.

Alternative Splice Variants:

There are 5 alternative splice sequences shown in table shown:

Transcript ID Exon Nucleotide change AAchange
NM_001244753 3 424 A->G 142 I->V
NM_001271035 3 421 A->G 141 I->V
NM_001271036 3 265 A->G 189 I->V
NM_001271037 3 265 A->G 189 I->V
NM_000570 4 316 A->G 106 I->V

Table 15. Alternative splice transcripts related to rs2290834.

Prevalence: Frequency of this variant is relatively high in American, East Asian and South Asian, and relatively low in African and European. The prevalence is shown in the table below:

All African American East Asian European South Asian
0.3374 0.2413 0.4937 0.5589 0.29 0.3962

Table 16. Percent of genomes with variation in 1000 genome database

PUBLISHED LITERATURE

  1. Ory, P., Clark, M., Kwoh, E., Clarkson, S. and Goldstein, I. (1989). Sequences of complementary DNAs that encode the NA1 and NA2 forms of Fc receptor III on human neutrophils. Journal of Clinical Investigation, 84(5), pp.1688-1691.
  2. PRADHAN, V., DESHPANDE, N., NADKARNI, A., PATWARDHAN, M., SURVE, P. and GHOSH, K. (2010). Fc γ R IIIB polymorphisms: their association with clinical manifestations and autoantibodies in SLE patients from Western India. International Journal of Rheumatic Diseases, 13(2), pp.138-143.

 3-D STRUCTURE

3-D structure prediction shows that the variant in the protein didn’t cause significant

change in the 3-D structure of FCGR3B protein.

Blue: origin protein

Yellow: protein variant

Red: No.142 I&V

sj.png

Figure 6. Predicted 3-D structure of original protein and changed protein in Jmol.

(Blue: origin protein; Yellow: protein variant 142 I->V; Red: 142 I&V)

CLINICAL OBSERVATION SUMMARY

Name Prototype autoimmune disease like SLE
Synonyms Severe fatigue, joint swelling, joint pain, a rash on nose and face, hair loss, anemia, blood-clotting problems, Raynaud phenomenon

Two polymorphic forms of Fc receptor III (FcR III) are expressed on human neutrophils.

These differ with respect to their apparent molecular masses after digestion with N-glycanase, and with respect to their reactivity with MAb Gran 11 and alloantisera which recognize determinants (NA1 and NA2) of the biallelic neutrophil antigen (NA) system.

The percentage distribution of NA1/NA1, NA1/NA2 and NA2/NA2 was 22.5%, 40% and 37.5%, respectively, among the normal population; and among SLE patients it was 25%, 40% and 35%, respectively. The percentage distribution of SH allele was 68.8% among the normal population, while in SLE patients it was 60%. No statistical difference was found in the distribution of Fc γ R IIIB genotypes in patients of lupus nephritis and SLE without nephritis (P > 0.05).

Among SLE patients studied, NA2 was the prominent allele. It was commonly associated with clinical manifestations such as skin rash, arthritis, hematological and immunological disorders. This suggests that the primary involvement of Fc γ R IIIB NA2 allele is more likely involved with disease susceptibility of SLE.

 


6. SLC24A5(rs1426654) – Xinrui Zhou

 VARIANT INFO:

Summary:

Chr chr15
Start 48134287
End 48134287
Change A -> G
dbSNP rs1426654
Gene.refgene SLC24A5
ExonicFunc.refgene nonsynonymous SNV
ClinVar_SIG pathogenic
ClinVar_DIS Skin/hair/eye pigmentation
ClinVar_STATUS single
ClinVar_ID RCV000001552.2
ClinVar_DB MedGen OMIM Orphanet
ClinVar_DBID C2676042 113750 370097

Table 17. Brief information of chosen variant from wANNOVAR results.

 Alternative Splice Variants: There is 1 alternative splice sequence shown in table below:

Transcript ID Exon Nucleotide change AA change
NM_205850 3 331 A -> G 111 T -> A

Table 18. Alternative splice transcripts related to rs1426654.

Prevalence: Frequency of this variation is extremely high in African and East Asia people and quite low in European, which is consistent with natural appearance among human beings. The prevalence is shown in table below:

ALL African American East Asia Europe South Asian
0.56 0.93 0.41 0.99 0.003 0.31

Table 19. Percent of genomes with variation in 1000 genome database

PUBLISHED LITERATURE

  1. Lamason, Rebecca L., et al. “SLC24A5, a putative cation exchanger, affects pigmentation in zebrafish and humans.” Science 310.5755 (2005): 1782-1786.
  2. Stokowski, Renee P., et al. “A genomewide association study of skin pigmentation in a South Asian population.” The American Journal of Human Genetics 81.6 (2007): 1119-1132.

3-D STRUCTURE:

3-D structure prediction shows that the variant in the protein didn’t cause significant change in the 3-D structure of SLC24A5 protein.

z.png

Figure 7. Predicted 3-D structure of original protein and changed protein in Jmol.

(Blue: origin protein; Yellow: protein variant 111 T -> A; Red: 111 T & A)

CLINICAL OBSERVATION SUMMARY: 

Name Skin/hair/eye pigmentation, variation in, 4 (SHEP4)
Synonyms SKIN/HAIR/EYE PIGMENTATION 4, FAIR/DARK SKIN
Identifiers MedGen: C2676042; Orphanet: 370097; OMIM: 113750
Age Of onset Infancy Neonatal
Prevalence < 1 / 1 000 000

In a genomewide association study of skin pigmentation variation using 1,620,742 SNPs in a population of 737 individuals of South Asian ancestry living in the United Kingdom, Stokowski et al. (2007) found that the SLC24A5 SNP rs1426654, representing the nonsynonymous change A111T, was the primary associated SNP, with the largest effect on the melanin contribution to skin pigmentation as measured by skin reflectance spectrophotometry. The association was replicated in a second independent cohort of 235 individuals. Stokowski et al. (2007) noted that rs1426654 is in strong linkage disequilibrium with the most significant SNP from the genomewide scan, rs1834640, a G-to-A change located 21 kb from SLC24A5, indicating that the 2 probably represent a single associated locus.

Lamason et al. (2005) reported that, in 98.7 to 100% of European American population samples, a G-to-A transition at amino acid 111 in exon 3 of the SLC24A5 gene (rs1426654) results in an alanine-to-threonine substitution. The threonine is associated with lighter skin pigmentation (113750) among European-Americans and among admixed African Americans and African Caribbeans.

 


DISCUSSION

The result of our pilot analysis of this individual exome revealed several deleterious variants and related genes. However, in the further study into each variant, we didn’t find much information that associates these affected genes with a certain disease, or we cannot observe a significant change in 3-D protein structure when the variant is present. Also, experimental records are so limited that we cannot conclude that which genes can be clinically pathogenetic ones.

All of above suggested that the exome analysis is only a starting part of the GWAS study, which can give the researchers who are looking for therapeutical targets a hint to start from.  But to answer the big question that all humans are curious about, “specifically how the change on exome can cause fatal diseases”, we still have a long way and huge amounts of difficulties to go through.


Reference

Fujimoto, Akihiro, et al. “A scan for genetic determinants of human hair morphology: EDAR is associated with Asian hair thickness.” Human molecular genetics 17.6 (2008): 835-843.

Mou, Chunyan, et al. “Enhanced ectodysplasin‐A receptor (EDAR) signaling alters multiple fiber characteristics to produce the East Asian hair form.” Human mutation 29.12 (2008): 1405-1411.

Maruyama, Hirofumi, et al. “Mutations of optineurin in amyotrophic lateral sclerosis.” Nature 465.7295 (2010): 223-226.

Pottier, Cyril, et al. “Whole-genome sequencing reveals important role for TBK1 and OPTN mutations in frontotemporal lobar degeneration without motor neuron disease.” Acta neuropathologica 130.1 (2015): 77-92.

Smith MW, Dean M., et al. “Contrasting genetic influence of CCR2 and CCR5 variants on HV-1 infection and disease progression. Hemophilia Growth and Development Study (HGDS), Multicenter AIDS Cohort Study (MACS), Multicenter Hemophilia Cohort Study (MHCS), San Francisco City Cohort(SFCC), ALIVE study.” Science. 1997 Aug 15;277(5328):959-65.

Mummidi S, Ahuja SS., et al. “Genealogy of the CCR5 locus and chemokine system gene variants associated with altered rates of HIV-1 disease progression.” Nat Med. 1998 Jul;4(7):786-93.

Bio-bwa.sourceforge.net. (2016). Burrows-Wheeler Aligner. [online] Available at: http://bio-bwa.sourceforge.net/ [Accessed 9 Dec. 2016].

Samtools.sourceforge.net. (2016). SAMtools. [online] Available at: http://samtools.sourceforge.net/ [Accessed 9 Dec. 2016].

Htslib.org. (2016). bcftools. [online] Available at: http://www.htslib.org/doc/bcftools-1.0.html [Accessed 9 Dec. 2016].

Kai Wang, e. (2016). ANNOVAR Documentation. [online] Annovar.openbioinformatics.org. Available at: http://annovar.openbioinformatics.org/en/latest/ [Accessed 9 Dec. 2016].

Chang, X. and Wang, K. (2012). wANNOVAR: annotating genetic variants for personal genomes via the web. Journal of Medical Genetics, 49(7), pp.433-436.

Lamason, Rebecca L., et al. “SLC24A5, a putative cation exchanger, affects pigmentation in zebrafish and humans.” Science 310.5755 (2005): 1782-1786.

Stokowski, Renee P., et al. “A genomewide association study of skin pigmentation in a South Asian population.” The American Journal of Human Genetics 81.6 (2007): 1119-1132.

 

 

Pathogenic Variant Analysis of hu1FD496

Introduction

The purpose of the project was to identify and characterize potentially pathogenic variants found within a specific individual’s genome from the 1000 Genomes database. Individual hu1FD496 was a caucasian male with multiple disease conditions. Five variants likely to be pathogenic, with high CADD_phred scores (>30) filtered from the outputs of the programs ANNOVAR and CADD, were subject to in-depth analysis to further understand how these variants affected the sample individual.

Table 1. Participant demographic information. Patient hu1FD496 is a white male.

Participant ID hu1FD496
Date of Birth 1959-06-23 (57 years old)
Gender Male
Weight 180lbs (82kg)
Height 6ft (182cm)
Race White

Table 2. Participant condition information. Patient hu1FD496 is affected by various disease conditions.

Name Start Date End Date
Dupuytren’s contracture 2000-01-01
Gout 1987-01-01
Peyronie’s Disease 1997-01-01 2000-01-01
Spondylosis without Spondylolisthesis 1975-01-01
Strabismus 1977-01-01 2009-01-01
Tendonitis 1998-01-01

Methods

The variant calling file (VCF) of individual hu1FD496 was retrieved from https://my.pgp-hms.org/profile/hu1FD496. ANNOVAR was used to functionally annotate retrieved variants. Identified variants were filtered to retain variants with CADD_phred scores greater than 30. CADD analysis confirmed ANNOVAR functional annotation, and variants with PHRED scores greater than 30 were retained. Altered amino acid sequence were mapped to 3D structures via SWISS-MODEL. Variant function and disease associations were researched via ClinVar, OMIM, and other databases.

ann_cadd_fig

Figure 1. Variant analyses workflow. Potentially pathogenic and pathogenic variants of individual hu1FD496 were identified using the programs ANNOVAR and CADD.

Results

By analyzing the VCF file of individual hu1FD496 using ANNOVAR and CADD, the top 20 pathogenic and potentially pathogenic variants were identified below. These are assumed to be pathogenic and potentially pathogenic based on their high CADD_phred score (>30). Then, the patient’s compound heterozygosity was explored. Patient hu1FD496 is a compound heterozygote for 13 protein coding genes. 5 variants were further analyzed below.  

Table 3. Table of top 20 pathogenic and potentially pathogenic variants. ANNOVAR-scored variants are ranked from highest CADD_phred score to lowest score.

Chr Position Ref/Alt Gene.refGene CADD_phred
chr9 42712595 G/T GXYLT1P3,FOXD4L4 53
chr9 42714027 C/T GXYLT1P3,FOXD4L4 52
chr1 144852390 C/T PDE4DIP 49
chr2 96525771 A/C ANKRD36C 49
chrX 70361131 C/T MED12 45
chr7 64438667 G/A ZNF117 43
chr11 1017041 G/T MUC6 42
chr11 6555244 G/T DNHD1 42
chr19 57089408 T/G ZNF470 42
chr8 19819724 C/G LPL 42
chr9 69201291 T/A FOXD4L6 42
chr9 70918189 A/T FOXD4L3 42
chrX 53654463 C/A HUWE1 42
chr2 233414014 C/T TIGD1 41
chrX 123200067 C/A STAG2 41
chr11 62848487 A/C SLC22A24 40
chr1 54262628 G/T NDC1 40
chr1 144915561 G/A PDE4DIP 40
chr2 97909676 T/G ANKRD36 40
chr7 139077317 T/A C7orf55-LUC7L2,LUC7L2 40

Table 4. Table of protein coding genes where patient hu1FD496 is a compound heterozygote. Patient hu1FD496 is a compound heterozygote for 13 protein coding genes.

Gene Number of deleterious alleles
PDE4DIP 6
RSPH10B 2
USP29 2
ECHDC3 2
SLC22A24 2
LAMA2 2
ZNF117 2
SEC31B 2
ANKRD36 2
COL12A1 2
KRT13 2
MTCH2 2
FRMPD3 2

In-depth Variant Analysis

In this section, we characterize 5 unique variants from Table 3: PDE4DIP, ZNF117, DNHD1, MUC6, and FOXD4L6.


PDE4DIP

The gene product, myomegalin, of the protein coding gene PDE4DIP (phosphodiesterase 4D interacting protein) anchors and localizes cAMP-dependent pathway components and phosphodiesterase 4D to the Golgi complex, centrosome, and skeletal muscle sarcomeric structures to affect intracellular signaling (Verde et al. 2000). PDE4DIP is associated with increased risk for ischemic stroke (Auer et al. 2015), a myeloproliferative disorder (MBD) associated with eosinophilia (Wilkinson et al. 2003), and peritoneal cancer (Lai et al. 2016). A nonsense mutation in PDE4DIP is very likely to be pathogenic with a CADD_phred score of 49 (Table 5). Modeling of functional and mutated PDE4DIP (Figure 2) via SWISS-MODEL shows the effects of a nonsense mutation on protein structure.

0101

Figure 2. Functional (left) and mutated (right) PDE4DIP 3D structures. A nonsense mutation converts a tryptophan (W) to a stop codon and alters the structure of functional PDE4DIP.

Table 5. Variant PDE4DIP. Stop-gained (nonsense) mutation in PDE4DIP is likely to be pathogenic.

Variant  PDE4DIP
GeneID  9659 / ENSG00000178104
Chromosome  1
Position  144852390-144852390
Reference  C
Alternate  T
Amino acid change  W -> stop
Consequence  Stop-gain (nonsense) mutation
CADD_phred score  49
PolyPhen score  NA
SIFT_score  1
GERP++_RS  3.48
Read depth  32
Mapping quality  342.56

ZNF117

Zinc Finger 117 is a KRAB zinc finger protein coding gene located on chromosome 7q11.2 (Li et al. 1999). ZNF117’s main function is transcription factor activity by two methods: sequence-specific binding and zinc ion binding. DNA sequence specific sites are specific to the type of zinc finger protein and in this case, cysteines are targeted and replaced by an F, H, or Y (Li et al. 1999). Mutations associated with this variant are not well researched, but have some possibilities in under- or over-expression of select tissues in humans. As shown in Figure 3 below, a nonsense mutation inserts a stop codon prematurely in ZNF117. This premature stop codon can lead to changes in gene expression and signal transduction through under- and over-expression of transcription factor activity sites (Balasubramanian et al. 2011).  

znf117_func01

Figure 3. Functional (left) and mutated (right) ZNF117 3D structures. A nonsense mutation converts an arginine (R) to a stop codon and alters the structure of functional ZNF117.

Table 6. Variant ZNF117. Stop-gained (nonsense) mutation in ZNF117 is likely to be pathogenic.

Variant  ZNF117
GeneID  51351 / ENSG00000152926
Chromosome  7
Position  64438667-64438667
Reference  G
Alternate  A
Amino acid change  R -> stop
Consequence  Stop-gain (nonsense) mutation
CADD_phred score  43
PolyPhen score  NA
SIFT_score  0.7
GERP++_RS  -2.21
Read depth  19
Mapping quality  548.56

DNHD1

The protein coding gene DNHD1 encodes for the protein dynein heavy chain domain-containing protein 1 and is involved in regulating microtubule motor activity (microtubule-based movement). The dynein heavy chain is the longest and most compact of the three cytoskeletal motors, the others being myosin and kinesin (Asai, D.J. et al 2004), and it converts chemical energy stored in ATP into mechanical work. Proteins involved with dynein heavy chain therefore play important roles in the transportation of various cellular cargos, the sorting and movement of proteins and organelles, and the generation of forces and movements behind mitosis, especially spindle dynamics (Carminati, J.L. et al 1997). DNHD1 is associated with increased risks for some neurodegenerative diseases, particularly those involving locomotor abnormalities, motor impairment, and various neurodegenerative diseases (Braunstein, K.E. et al 2010). A nonsense mutation in this gene is likely to be pathogenic with a CADD_phred score of 42 (Table 7). Modeling of functional and mutated DNHD1 (Figure 4) via SWISS-MODEL shows the effects of a nonsense mutation on protein structure.

dnhd1_funcdnhd1_mutated.png

Figure 4. Functional (left) and mutated (right) DNHD1 3D structures. A nonsense mutation converts a glutamic acid (E) to a stop codon and alters the structure of functional DNHD1.

Table 7. Variant DNHD1. Stop-gained (nonsense) mutation in DNHD1 is likely to be pathogenic.

Variant  DNHD1
GeneID  144132 / ENSG00000179532
Chromosome  11
Position  6555244-6555244
Reference  G
Alternate  T
Amino acid change  E -> stop
Consequence Stop-gain (nonsense) mutation
CADD_phred score  42
PolyPhen score  NA
SIFT_score  0.18
GERP++_RS  4.23
Read depth  17
Mapping quality  206.60

MUC6

MUC6 is a protein coding gene that encodes gastric mucin, a fundamental part of cytoprotection in the epithelial tissues. Acids, microorganisms, and other trauma in the gastrointestinal tract is warded off by gastric mucin (Toribara et al. 1997).  Some strains of Adenocarcinoma, as well as Cystadenoma have been associated with MUC6 (Reis et al. 2000).   Study of MUC6 has increased recently as it has become evident that this gene provides valuable opportunity for modulation of the lumen’s protective mucus layer composition, particularly relating to secretion of acids, noxious agents, and other bacterial presence in the lumen of the GI tract and alimentary canal (Toribara et al. 1997). Most extensive publications have stemmed from study regarding gastric carcinomas to which this gene is linked (Reis et al. 2000). MUC6 also has value in being utilized as a tumor marker for various cancers, and applications to epithelial organogenesis have been speculated.  

muc6_func.png01

Figure 5. Functional (left) and mutated (right) MUC6 3D structures. A nonsense mutation converts an tyrosine (Y) to a stop codon and alters the structure of functional MUC6.

Table 8. Variant MUC6. Stop-gained (nonsense) mutation in MUC6 is likely to be pathogenic.

Variant  MUC6
GeneID  4588 / ENSG00000184956
Chromosome  11
Position  1017041-1017041
Reference  G
Alternate  T
Amino acid change  Y -> stop
Consequence  Stop-gain (nonsense) mutation
CADD_phred score  42
PolyPhen score  NA
SIFT_score  1
GERP++_RS  -6.04
Read depth 74
Mapping quality  437.95

FOXD4L6

FOXD4L6 (Forkhead Box D4-Like 6) is a protein coding gene. The FOX proteins are a set of transcription factors that play a huge role in various processes involved in development and organogenesis. They also play a major role in metabolism and the immune system. Currently, there are a total of seven FOX genes in humans. FOXD4L6 is one of these variants. It is primarily used in RNA polymerase transcription factor activity, sequence- specific DNA binding, and protein binding. Even though the FOX genes are highly conserved, mutations still occur in the FOX genes. FOX genes are differentially expressed in a large number of cancers as well. FOX gene over expression has been linked to breast cancer specifically (Katoh et. al 2013). Since these genes are used in so many differing ways, they can either act as an oncogene or tumor suppressor depending on the individual and cell type. Some drugs have been used to target FOX genes, especially proteasome inhibitors, however research is very limited based on the complex interactions and expressions throughout one’s lifetime in the FOX genes (Jackson et. al 2010).

foxdl46_func.png01

Figure 6. Functional (left) and mutated (right) FOXD4L6 3D structures. A nonsense mutation converts an lysine (K) to a stop codon and alters the structure of functional FOXD4L6.

Table 9. Variant FOXD4L6. Stop-gained (nonsense) mutation in FOXD4L6 is likely to be pathogenic.

Variant  FOXD4L6
GeneID  653404 / ENSG00000204793
Chromosome  9
Position  69201291- 69201291
Reference  T
Alternate  A
Amino acid change  K -> stop
Consequence  Stop-gain (nonsense) mutation
CADD_phred score  42
PolyPhen score  NA
SIFT_score  1
GERP++_RS  2.2
Read depth 19
Mapping quality  NA

References

  1. A global reference for human genetic variation, The 1000 Genomes Project Consortium. 2015. Nature 526: 68-74. doi:10.1038/nature15393.
  2. Asai, D. J. and Wilkes, D. E. (2004), The Dynein Heavy Chain Family1. Journal of Eukaryotic Microbiology, 51: 23–29. doi:10.1111/j.1550-7408.2004.tb00157.x
  3. Auer, P. L., M. Nalls, J. F. Meschia, B. B. Worrall, et al. 2015. Rare and Coding Region Genetic Variants Associated With Risk of Ischemic Stroke: The NHLBI Exome Sequence Project. JAMA Neurol 72: 781-788.
  4. Balasubramanian, S., L. Habegger, A. Frankish, D. G. Macarthur, R. Harte, C. Tyler-Smith, J. Harrow, and M. Gerstein. “Gene Inactivation and Its Implications for Annotation in the Era of Personal Genomics.” Genes & Development 25.1 (2011): 1-10. Web.
  5. Braunstein, K.E. et al. 2010. A point mutation in the dynein heavy chain gene leads to striatal atrophy and compromises neurite outgrowth of striatal neurons. Hum. Mol. Genet. 19, 4385-4398.
  6. Carminati, J. L., & Stearns, T. (1997, August 11). Microtubules Orient the Mitotic Spindle in Yeast through Dynein-dependent Interactions with the Cell Cortex. The Journal of Cell Biology, 138(3), 629-641.
  7. Jackson, B. C., Carpenter, C., Nebert, D. W., & Vasiliou, V. (2010). Update of human and mouse forkhead box (FOX) gene families. Human Genomics, 4(5), 345. doi:10.1186/1479-7364-4-5-345
  8. Katoh, M., Igarashi, M., Fukuda, H., Nakagama, H., & Katoh, M. (2013). Cancer genetics and genomics of human FOX family genes. Cancer Letters, 328(2), 198-206. doi:http://dx.doi.org/10.1016/j.canlet.2012.09.017
  9. Kircher, M., D. M. Witten, P. Jain, B. J. O’Roak, G. M. Cooper, and J. Shendure. 2014. A general framework for estimating the relative pathogenicity of human genetic variants. Nat Genet doi: 10.1038/ng.2892.
  10. Lai, J., Z. Zhou, X. J. Tang, Z. B. Gao, et al. 2016. A tumor-specific neo-antigen caused by a frameshift mutation in BAP1 is a potential personalized biomarker in malignant peritoneal mesothelioma. Int J Mol Sci 17: pii: E739.
  11. Li, Xiang-An, Koichi Kokame, Kousaku Okubo, Kentaro Shimokado, Yoshitane Tsukamoto, Toshiyuki Miyata, Hisao Kato, and Chikao Yutani. “Cloning and Characterization of a Novel Human Gene Encoding a Zinc Finger Protein with 25 Fingers.” Biochimica Et Biophysica Acta (BBA) – Gene Structure and Expression 1489.2-3 (1999): 405-12. Web.
  12. Reis, C. A., L. David, F. Carvalho, U. Mandel, C. D. Bolos, E. Mirgorodskaya, H. Clausen,    and M. Sobrinho-Simoes. “Immunohistochemical Study of the Expression of MUC6 Mucin and Co-expression of Other Secreted Mucins (MUC5AC and MUC2) in Human Gastric Carcinomas.” Journal of Histochemistry & Cytochemistry 48.3 (2000): 377-88. Journal of Histochemistry and Cytochemistry. Web. 3 Dec. 2016.
  13. Toribara, N. W., S. B. Ho, E. Gum, J. R. Gum, P. Lau, and Y. S. Kim. “The Carboxyl-terminal Sequence of the Human Secretory Mucin, MUC6: ANALYSIS OF THE PRIMARY AMINO ACID SEQUENCE.” Journal of Biological Chemistry 272.26 (1997): 16398-6403. Web.
  14. Verde, I., G. Pahlke, M. Salanova, G. Zhang, et al. 2000. Myomegalin is a novel protein of the golgi/centrosome that interacts with cyclic nucleotide phosphodiesterase. The Journal of Biological Chemistry 276: 11189-11198.
  15. Wang K., M. Li, and H. Hakonarson. 2010. ANNOVAR: Functional annotation of genetic variants from next-generation sequencing data. Nucleic Acids Research 38: e164.
  16. Wilkinson, K., E. R. Velloso, L. F. Lopes, C. Lee, et al. 2003. Cloning of the t(1;5)(q23;q33) in a myeloproliferative disorder associated with eosinophilia: involvement of PDGFRB and response to imatinib.

Analysis of Pathogenic Variants in Exome of Sierra Leonean Male

Group Members

Cristian Crisan, Alli Gombolay, Youngkyu Jeon, Joseph Knipe, and Taehwan Yang

Introduction

The 1000 Genomes Project (2008-2015) was an international research initiative to create the largest publicly available catalog of human genetic variation and genotype information.  The main goal of the project was to close the gap in understanding how genetic variants relate to disease.  The International Genome Sample Resource (IGSR) continues to maintain 1000 Genomes Project data.  To study pathogenic variants, we selected the exome of a male individual of the Mende people in Sierra Leone, Africa (Sample ID: HG03224) who participated in the project.  From his exome, we identified 22 potentially pathogenic variants.

Methods

From the 1000 Genomes Project FTP website, we obtained the FASTQ files containing the sequenced paired end reads of sample HG03224’s exome and the hg38 reference genome FASTA file.  To map the reads to the reference genome, we used BWA-MEM, and to call variants, we used SAMtools and BCFtools.  To obtain a list of potentially pathogenic variants, we filtered the VCF file through wANNOVAR.  Since the 1000 Genomes Project uses hg38 as its reference genome, we chose wANNOVAR because it allows variant filtering based on hg38 unlike CADD.  In addition, wANNOVAR combines the power of the ANNOVAR algorithm and CADD scores to filter variants.  We eliminated all synonymous mutations and filtered the remaining variants using a CADD phred score > 30 and an ExAC allele frequency < 0.1%.  We also included frameshift mutations (in addition to nonsynonymous and stop-gain mutations); however, none of them were present in our final list of variants.  To determine the relationship between the variants and disease, we researched the scientific literature for studies involving these variants.  If no such studies existed, we researched the literature for similar variants located on the same gene to hypothesize about the effect of the variants.  To visualize how the variants impact protein structure, we compared the 3D structures of the wild type protein with the mutant protein using SWISS-MODEL.

methods

Figure 1. Analytical pipeline used to study potentially pathogenic variants.

commands

Figure 2. Commands for alignment to reference genome and variant calling.

Results

Based on our analysis of HG03224’s exome, we identified 22 potentially pathogenic variants, 5 of which we studied in detail.  We also determined the protein coding genes for which the individual has both copies damaged or inactivated, known as a compound heterozygote.

list

Table 1. List of potentially pathogenic genes.  Variants obtained from wANNOVAR with a CADD phred score > 30 and an ExAC allele frequency < 0.1% were included.  All variants were found to be heterozygous.  CERCAM and PABPC3 (highlighted in pink) had two variants each.

Variant 1 (Cristian Crisan)


Gene name: CAPN1

Type of Mutation: Nonsynonymous

Variant locus: Chromosome 11, 65182766 (G to A)

Amino acid change: Arginine to Glutamine at position 22

CAPN1 encodes a calcium-activated neutral protease that is primarily found in muscle, stomach, and neuronal mammalian tissue.  This enzyme contains catalytic large heterodimers and small domains involved in regulating activity.  Calcium acts as a positive regulator for CAPN1 in the cytoplasm (Momeni).  Increased concentrations of calcium ions activate CAPN1, and elevated levels of CAPN1 have been linked to Alzheimer’s disease (Kurbatskaya et al.).  High levels of CAPN1 have been shown to positively correlate with acute lymphoblastic leukemia as well as laryngeal cancer (Mikosik et al. and Ueyama et al.).  Finally, biopsies from muscular dystrophy patients have revealed CAPN1 was significantly upregulated in the patients suffering from the disease.

This specific mutation does not have a ClinVar ID.  Looking at the SWISS MODEL predictions, the 3D structure of the protein was not affected.  Since the mutation is heterozygous, it is possible this individual has a normal CAPN1 on the other copy of chromosome 11, and thus this mutation might not have any effect.  However, the observed variant could be damaging to the protein if the alpha helix in which the amino acid change occurs (see image below) is required for the enzymatic reaction and both copies of the genes are affected.
 
Arginine is a positively charged amino acid while Glutamine does not have a charge at physiological pH.  Therefore, if the helix in which the mutation occurs is part of the catalytic active site or is involved in the recognition of the substrate, the variant could result in a non-functional enzyme due to the change in the chemical properties of the amino acid at position 22.  Even though this individual might suffer from certain effects of inactive CAPN1, he is also less likely to develop the diseases described above with which CAPN1 upregulation was found to be correlated.  A less probable outcome would be that the mutation actually increases the catalytic ability of the enzyme.

picture1Figure 3. CAPN1 predicted structure. Purple arrows represent alpha helices and cyan ribbons represent beta sheets. The affected Arginine located at position 22 is highlighted in red.

picture2Figure 4. Wild type CAPN1. The affected Ariginine located at position 22 is highlighted in red.

picture3Figure 5. Glutamine 22 variant. No overall change in protein structure compared to wild type.

Variant 2 (Alli Gombolay)


Gene name: PABPC3

Type of mutation: Stop-gain

Variant Locus: Chromosome 13, 25097231 (G to T)

Amino Acid Change: Glutamic acid to stop signal at position 345

Polyadenylate-binding protein 3 (PABPC3), or Testis-Specific PABPC3, is an intronless gene located on chromosome 13 (OMIM).  PABPC3 is one of the poly(A)-binding proteins (PABPs) that regulate mRNA stability and translation initiation in testes (Ozturk et al.).  The main function of the PABPC3 protein is to bind the poly(A) tail of mRNA (GeneCards).  In addition, this protein might also be involved in the cytoplasmic regulatory processes of mRNA metabolism (GeneCards).  mRNA stabilization is critically important during the development of sperm cells, known as spermatogenesis.  Interestingly, an unusually high proportion of diseases associated with intronless genes play a role in biochemical pathways in the nervous system and testes (Grzybowska).  Abnormal expression of intronless genes is associated with neuropathies, developmental disorders (such as infertility), and cancer (Grzybowska).  In 2016, Ozturk et al. evaluated the expression levels of several poly(A)-binding proteins (PABPs) in testicular biopsies.  They found altered expression of PABPs, including PABPC3, is linked to infertility in men (Ozturk et al.).  Furthermore, although currently understudied, alterations in the expression of PABPC3 have been linked to testicular cancer (Human Protein Atlas).  Although the mutation does not result in any major changes to the protein structure (Figure 6), it likely results in a nonfunctional protein product.  Since stop-gain mutations result in premature termination of transcription and, thus, typically a nonfunctional protein product, this mutation likely alters the level of expression of the PABPC3 gene considerably in this individual.

01   03

Figure 6. 3D Structure of wild type protein (left) and mutated protein (right).

Variant 3 (Taehwan Yang)


Gene name: F11 (Factor XI)

Type of mutation: Stop-gain

Variant Locus: Chromosome 4, 186273156 (C to T)

Amino Acid Change: Glutamine to stop signal at position 102

Factor XI, also known as plasma thromboplastin antecedent, plays an important role in blood coagulation.  The F11 gene is located on chromosome 4 and is 23 kb long (Bolton-Maggs).  Patients who are deficient in the F11 enzyme often suffer from hemophilia, which causes excessive bleeding after haemostatic challenges, such as tooth extraction, surgeries, or trauma (Bolton-Maggs).  Approximately 8% of Ashkenazi Jews are affected by the disease compared to less than 1% in other populations (Bolton-Maggs).  According to Shpilberg et al., there are at least 4 mutations that cause Factor XI deficiency, including one nonsense mutation.  Although it is different from the one known nonsense mutation, this particular nonsense mutation likely also affects F11 gene expression in this individual.  The noticeable differences in the wild type protein structure of F11 compared to the mutant version (Figure 7) support the hypothesis that this mutation affects gene function.  Since nonsense mutations result in premature termination of transcription and, thus, usually a nonfunctional protein product, this mutation likely alters the level of expression of the F11 gene considerably in this individual.

picture4   picture5

Figure 7. 3D Structure of wild type protein (left) and mutated protein (right).

Variant 4 (Joseph Knipe)


Gene name: MAPK4

Type of mutation: Nonsynonymous

Variant locus: Chromosome 18, 50664289 (G to A)

Amino acid change: Aspartic acid to Asparagine at position 111

The MAPK4 gene codes for the mitogen-activated protein kinase 4 enzyme, a member of the mitogen-activated protein kinase family.  MAPK4 is activated by tyrosine kinase growth factors, causing the protein to migrate into the nucleus and phosphorylate nuclear targets (UniProt).  Very little is known about the function and regulation of MAPK4.  MK5, a protein kinase encoded by a tumor suppressor gene, is the only known substrate of the MAPK4 protein (UniProt).  The MK5 protein kinase is activated in response to proinflammatory cytokines and cellular stress (UniProt).  There do not appear to be any significant structural differences between the wild type and mutant protein structures (Figure 8); however, it is possible this mutation affects the function of the MAPK4 protein.  Although there is little information provided in the published scientific literature or SNP databases regarding this specific mutation, another mutation, the Met1Ile mutation, on the MAPK4 gene has been found to be associated with melanoma (NCBI).  In addition, this variant may also be associated with malignant cancers due to possible tumor suppression inefficiencies.

picture6   picture7

Figure 8: MAPK4 wild type (right) vs. mutant (left) 3D protein structures.

Variant 5 (Youngkyu Jeon)


Gene name: CERCAM

Type of mutation: Stop-gain

Variant locus: Chromosome 9, 128431225 (C to G)

Amino acid change: Tyrosine to stop signal at position 375

CERCAM, cerebral endothelial cell adhesion molecule, is a cell adhesion protein that is thought to be used by leukocytes expressing CD18 to migrate across the blood-brain barrier (OMIM).  Most bloodborne molecules do not cross the sealed blood-brain barrier; however, leukocytes regularly traverse the barrier in response to inflammatory signals (Starzyk et al.).  This suggests leukocytes have the ability to relax the restrictions on migration across the barrier (Starzyk et al.).  Pathogens, such as Bordetella pertussis, have been shown to take advantage of leukocytes’ ability to traverse the barrier by invading them (Starzyk et al.).  This mutation could affect expression of the CERCAM gene, possibly leading to a deregulation of the movement of leukocytes infected with pathogens across the blood-brain barrier.  As shown in Figure 9, this mutation noticeably changes the structure of the CERCAM protein, suggesting it likely changes the functionality of the protein.

picture8   picture9

Figure 9. Homology model of the wild type CERCAM protein (left) and mutant protein (right).

Compound Heterozygous Mutations


We found two genes that had two separate variants, CERCAM and PABPC3.
 
The two mutations on the CERCAM gene are located two nucleotides apart on chromosome 9: 128,431,223 (T to C) and 128,431,225 (C to G).  In both cases, the affected amino acid is Y at position 375.  Haplotype information regarding the Y375X or Y375H mutation is not available on the Ensembl website.  However, information regarding a Y375Q mutation is available on the Ensembl website.  When simultaneously considering both of the mutations observed in our study, the net result corresponds with the Y375Q mutation.  Since wANNOVAR analyzes variants at the nucleotide (not amino acid) level, the program seemed to mistakenly analyze both mutations individually when in reality they occur in parallel to result in the Y375Q change. Therefore, we determined the CERCAM mutations occur on the same chromosome.
 
The two mutations on the PABPC3 gene are located 159 nucleotides apart on chromosome 13, and they both result in stop signals.  Haplotype information is not available on the Ensembl website regarding these mutations.  This might indicate there is a lack of information regarding the occurrence of both mutations on the same chromosome copy.  Therefore, it is likely the mutations are located on separate copies of chromosome 13, meaning this individual is a compound heterozygote.  Since both mutations result in stop signals and, thus, premature termination of transcription, the PABPC3 protein in this individual is likely nonfunctional.

Discussion

All five variants we studied in detail have the potential to cause disease in the individual we studied.  Although the scientific literature does not yet contain information regarding the effects of these variants on disease, similar mutations located on the same genes have been shown to cause a variety of diseases, such as Alzheimer’s Disease, muscular dystrophy, cancer, infertility, and hemophilia.  Some of the variants we studied resulted in dramatic changes to protein structure, while others did not noticeably change the structure.  In the near future, scientific studies should be conducted to investigate the role of these variants in disease.

References

1. Bolton-Maggs, PH. “Factor XI deficiency”. Baillieres Clinical Haematology. 1996;9(2):355–68.

2. “CERCAM.” OMIM. <http://omim.org/entry/616626#2>.

3. “Expression of PABPC3 in Cancer.” The Human Protein Atlas. <http://www.proteinatlas.org/>.

4. Grzybowska, EA. Human intronless genes: Functional groups, associated diseases, evolution, and mRNA processing in absence of splicing. Biochemical and Biophysical Research Communications. 2012;424(1):1–6.

5. Kurbatskaya, K et al. Upregulation of calpain activity precedes tau phosphorylation and loss of synaptic proteins in Alzheimer’s disease brain. Acta Neuropathologica Communications. 2016;4(34).

6. “Malignant Melanoma.” NCBI ClinVar. <https://www.ncbi.nlm.nih.gov/clinvar/RCV000071842/#clinical-assertions>.

7. “MAPK4.” UniProt. <http://www.uniprot.org/uniprot/P31152>.

8. Mikosik, A et al. Increased μ-Calpain Activity in Blasts of Common B-Precursor Childhood Acute Lymphoblastic Leukemia Correlates with Their Lower Susceptibility to Apoptosis. PloS ONE. 2015;10(8):1-16.

9. Momeni, HR. Role of Calpain in Apoptosis. Cell Journal. 2011;13(2):65–72.

10. Ozturk S, et al. The poly(A)-binding protein genes, EPAB, PABPC1, and PABPC3 are differentially expressed in infertile men with non-obstructive azoospermia. Journal of Assisted Reproduction and Genetics. 2016;33(3):335-48.

11. “PABPC3.” OMIM. <https://www.omim.org/entry/604680>.

12. “PABPC3 Gene.” GeneCards. <http://www.genecards.org/cgi-bin/carddisp.pl?gene=PABPC3>.

13. Shpilberg, O et al. One of the two common mutations causing factor XI deficiency in Ashkenazi Jews (type II) is also prevalent in Iraqi Jews, who represent the ancient gene pool of Jews.

14. Starzyk, RM et al. Cerebral Cell Adhesion Molecule: A Novel Leukocyte Adhesion Determinant on Blood-Brain Barrier Capillary Endothelium. Journal of Infectious Disease. 2000;181(1):181-87.

15. Ueyama, H, et al. Expression of three calpain isoform genes in human skeletal muscles. Journal of Neurological Sciences. 1998;155:163-169.