Exome Variant Analysis (Undergrad Group 2)


Our group was provided with a VCF file containing SNPs identified between an unknown individual’s genotype and the hg19 reference genome. This VCF file was created by mapping the sequenced genome of an individual in the 1000 genomes database to a reference fasta file and calling variants that differ between the individual’s sequence and the reference sequence. Once the VCF was created, we submitted it to wANNOVAR [1], a tool used to predict the effects of variants by collecting information from 1000 genomes, ESP and ExAC databases on the population frequency of specific alleles, as well as several tools such as ClinVar, SIFT, and PolyPhen, which predict the effect of certain mutations based on physical and chemical properties due to the amino acid change. Likely pathogenic variants were identified based on low population frequencies in ExAC, 1000 genomes, and ESP databases, as well as SIFT, ClinVar and PolyPhen predictions.

In addition to these filters, all of the likely pathogenic variants listed in table 1 are nonsynonymous mutations that occur in exonic sequences.

Table 1. Likely Pathogenic variants

pathogenic variants.png


1. W. (2017). WANNOVAR. Retrieved November 23, 2017, from http://wannovar.wglab.org/


VARIANT 1 (Joseph): plexin domain containing protein 2 (PLXDC2)

Variant genotype [1]:

G/A substitution at 10:20,046,930 (forward strand)

Variant frequency [2]:

Overall human population: 1.997e10-4

Ethnic population: 0.001 in East Asians

Regional population: 0.005 in Chinese Dai in Xishuangbanna, China

Variant effect on protein sequence:

Missense substitution; CGG/CAG codon variant results in arginine>glutamine substitution at residue 129

Variant effect on protein function:

According to COSMIC, this variant has a FATHMM prediction score of 0.94, categorizing it as pathogenic [1].

PLXDC2, or plexin domain containing protein 2, also known as tumor endothelial marker protein 7, is an antibody expressed at increased levels in tumor endothelial cells and is thought to be associated with angiogenesis. PLXDC2 is thought to be a receptor for Pigment Epithelium-Derived Factor (PEDF), a factor that inhibits cell migration and angiogenesis and has been shown to inhibit growth of several types of cancer, including melanoma [3]. A mutation in the extracellular domain of PLXDC2 could be detrimental to PLXDC2-PEDF binding, which could in turn affect the factor’s inhibitory function. The loss of function of this PLXDC2-PEDF interaction caused by the substitution of a polar uncharged residue for an acidic residue is a likely explanation of the pathogenicity of this variant because the substitution falls within the extracellular domain of the PLXDC2 protein and the variant is associated with a malignant melanoma phenotype.

Few studies of PLXDC2 and its variants have been published. One of these relates a mutation near the PLXDC2 gene to the development of glaucoma in an East Asian population [4]. Since PEDF has been shown to have therapeutic effects in the treatment of glaucoma, the published variant could provide evidence for the PLXDC2-PEDF interaction knockdown explanation for the pathogenicity of the variant discussed above.

Structural model:

Screen Shot 2017-12-13 at 7.37.29 PM

Figure 1. Partial structure of PLXDC2 extracellular domain (full protein structure is unavailable). Location of variant R/G substitution at residue 129 is shown in red.

Advice for the individual with this variant:

This variant inhibits the downstream effects of an important anti-tumor factor (PEDF) and is known to be associated with malignant melanoma. Knowledge of this possible predisposition to carcinoma development could allow the individual to treat the disease early. Regular screening for carcinomas is recommended.


1. http://cancer.sanger.ac.uk/cosmic/mutation/overview?id=3436427

2. http://ensembl.org/Homo_sapiens/Location/View?r=10:20046925-20046935

3. Cheng, G., et al. Identification of PLXDc1 and PLXDC2 as the transmembrane receptors for the multifunctional factor PEDF. eLife 3, (2014)

4. Mabuchi, F. et al. Genetic Variant Near PLXDC2 Influences the Risk of Primary Open-angle Glaucoma by Increasing Intraocular Pressure in the Japanese Population. Journal of Glaucoma 26(11), 963-966 (2017)


VARIANT 2 (Sydney Nelson): Insulin Receptor Beta Subunit (INSR)

Variant genotype:

G/A substitution at 19:7,166,376

Quality of genotype call:

2546.77 (PASS)

Variant frequency:

Prevalence in overall population according to EcAC [1]: 8.30e-06

Prevalence in overall population according to Ensembl [2]: 2.03421e-05 (5)

Ethnic population frequency according to ExAC [1]: 1.51e-05 in Non-Finnish Europeans

Ethnic population frequency according to Ensembl [2]: 3.59337e-05 in Non-Finnish Europeans

    3.24929e-05 in South Asians

Variant effect on protein sequence:

Missense substitution: pG547R

The SNP results in a substitution of glycine to arginine at position 547 of the amino acid sequence, in exon 8 of the gene.

Variant effect on protein function [3]:

The protein product is an insulin receptor (a tyrosine kinase receptor) composed of 2 alpha and 2 beta subunits which are joined by disulfide bonds. The receptor is normally imbedded in the cell’s outer membrane, with the alpha subunits protruding from the surface of the cell to detect insulin that is circulating in the bloodstream. The binding of insulin to the alpha subunits of the receptor causes the beta subunits to initiate a signaling pathway that results in the uptake of glucose into the cell (effectively regulating blood sugar levels and allowing cells to take in glucose to produce energy). More specifically, the binding of insulin to its receptor results in the translocation of the Glut-4 glucose transporter to the plasma membrane (where it aids in the movement of glucose into the cell), glycogen synthesis, glycolysis, and fatty acid synthesis.

More than 150 INSR mutations have been identified that are linked to insulin resistance disorders including Donohue Syndrome, Rabson-Mendenhall Syndrome, and Type A Insulin Resistance Syndrome. All of these mutations result in improper insulin response – whether this is due to a mutation in the alpha subunit that affects the receptor’s ability to bind to the insulin hormone, a mutation in the beta subunit that impairs the protein’s ability to phosphorylate downstream targets and initiate signaling pathways, or a general mutation that results in a conformational change leading to the protein’s premature degradation.

Donohue Syndrome is the most severe form of insulin resistance, with both copies of the INSR gene severely mutated, resulting in complete inability to respond to insulin signals. Patients with these mutations do not live past the age of 2.

Rabson-Mendenhall Syndrome is slightly less severe, with patients living into their teens and twenties. Mutations generally still occur in both copies of the gene, but result in less severe abnormalities, so that some of the protein product is still able to reach the cell membrane and detect some circulating insulin in the bloodstream. However, without enough functional receptors being produced and carried to the outer membrane, these mutations eventually lead to diabetes mellitus (in which the blood sugar level becomes dangerously high).

Type A Insulin Resistance Syndrome is the least severe form of insulin resistance. Mutations resulting in type A insulin resistance are not life-threatening, and aren’t generally noticed until puberty. In general, males display no other symptoms other than the development of diabetes mellitus, while females can also experience abnormalities in menstrual cycles, excessive body hair growth, ovarian cysts, and acanthosis nigricans.

Tools and criteria used to identify pathogenicity:

The first filter used to identify pathogenic variants involved population genetics. The allele frequency for this variant is very low across all healthy populations, which indicates that this allele is likely deleterious. According to Ensembl [2], the highest minor allele frequency observed in any population from the 1000 genomes Phase 3, ESP, and ExAC databases is very low (2.03e-05).

Tools on the Ensemble website, including SIFT and PolyPhen, both cited the effects of this variant to be “deleterious” and “probably damaging.” SIFT is a tool that predicts the effect of particular variants on protein function based on sequence homology and the chemical and physical similarities between the original and mutated amino acids. For this variant, SIFT gave a score of 0, indicating that the mutation is likely deleterious. Similarly, the PolyPhen tool also predicts the effects of variants on protein function based on the physical properties of the resulting protein, and gave a score of 0.986 (likely deleterious).

In addition to population genetics and online predictions, the amino acid change itself is a good indicator of pathogenicity. The fact that this mutation occurred in the exonic portion of an important gene indicates that this is a possible pathogenic variant. In this case, the result of the base pair change from C to T at position 7166376 of chromosome 19 resulted in a non-synonymous substitution from the amino acid glycine (G) to the amino acid arginine (R). While glycine is an uncharged, hydrophobic/non-polar amino acid with the lowest molecular weight (75.1 g/mol), arginine is a positively charged (basic), polar amino acid with a much higher molecular weight of 174.2 g/mol. This drastic difference in amino acid characterization is highly likely to disrupt the folding of the INSR protein and its interactions with other proteins and ligands.

Structural Model:

The structure of the mutated protein was predicted using SWISS-MODEL [4][5][6][7][8] and entered into Chimera [9] software in order to visualize the protein product and compare the structures of the mutated and unmutated proteins. Figure 1 depicts the mutated protein product, with the mutated amino acid residue colored red.

image of mutated structure.png

Figure 1. Predicted model of mutated INSR protein

Figure 2 depicts the un-mutated protein superimposed over the mutated protein, with the corresponding amino acid highlighted in green. The un-mutated protein is shown in blue, while the mutated protein is shown in tan. The majority of the two protein products overlap perfectly, however, it can be seen in figure 2 that the portion of the protein surrounding the mutated residue has undergone a conformational change that is likely to affect the activity of this protein.

image of both structures.png

Figure 2. Models of mutated and un-mutated proteins

Advice for the individual with this variant:

It is likely that the individual with this variant will only develop a mild form type A insulin sensitivity (if any symptoms occur) due to the fact that they are heterozygous for this mutation. However, because this variant could increase the individual’s predisposition to develop diabetes mellitus, it would be wise to monitor their blood sugar levels in order to ensure that they do not develop diabetes later in life.


1. Lek, A. (2016). Analysis of protein-coding genetic variation in 60,706 humans. Nature,285-291. doi:10.1038/nature19057

2. McLaren, W., & Cunningham, F. (2016). Ensembl Variant Effect Predictor (VEP): webinar. Genome Biology,17(122). doi:10.6019/tol.ensvep-w.2016.00001.1

3. INSR gene – Genetics Home Reference. (2017, December 6). Retrieved December 13, 2017, from https://ghr.nlm.nih.gov/gene/INSR#resources

4. Biasini M, Bienert S, Waterhouse A, Arnold K, Studer G, Schmidt T, Kiefer F, Cassarino TG, Bertoni

5. M, Bordoli L, Schwede T (2014). SWISS-MODEL: modelling protein tertiary and quaternary structure using evolutionary information Nucleic Acids Research 2014 (1 July 2014) 42 (W1): W252-W258

6. Kiefer F, Arnold K, Künzli M, Bordoli L, Schwede T (2009). The SWISS-MODEL Repository and associated resources. Nucleic Acids Res. 37, D387-D392.

7. Arnold K, Bordoli L, Kopp J, and Schwede T (2006). The SWISS-MODEL Workspace: A web-based environment for protein structure homology modelling. Bioinformatics.,22,195-201.

8. Guex, N., Peitsch, M.C. Schwede, T. (2009). Automated comparative protein structure modeling with SWISS-MODEL and Swiss-PdbViewer: A historical perspective. Electrophoresis, 30(S1), S162-S173.

9. UCSF Chimera–a visualization system for exploratory research and analysis. Pettersen EF, Goddard TD, Huang CC, Couch GS, Greenblatt DM, Meng EC, Ferrin TE. J Comput Chem. 2004 Oct;25(13):1605-12.


VARIANT 3 (Jaimisha Patel): acyl-CoA dehydrogenase family member 11 (ACAD11)

Variant Genotype [1]:

G/T substitution at 3:132,294,680

Variant Frequency [2]:

Overall human population: 1.997e10-4

Ethnic population: 0.001 in Africans

Regional population: 0.006 in Mende in Sierra Leone

Variant effect on protein sequence [1]:

Missense substitution;CGC/CAC codon variant results in arginine>histidine substitution at residue 646

Variant effect on protein function:

This gene encodes for the acyl-CoA dehydrogenase enzyme with a preference for carbon chain lengths between 20 and 26. Acyl-CoA dehydrogenases, also known as ACADs, are a class of enzymes that catalyze the initial step in each cycle of fatty acid β-oxidation in the mitochondria of cells [3]. These enzymes are important in mammalian cells due to their role in metabolizing fatty acids found in ingested food materials. This enzyme’s action represents the first step in fatty acid metabolism [4]. A mutation in this gene could be pathogenic as it affects the individual’s ability to code for the acyl-CoA dehydrogenase enzyme. Deficiencies in acyl-CoA dehydrogenases result in the decreased ability to oxidize fatty acids, which as a result suggest metabolic dysfunction. There are 3 classifications of these enzymes; short chain, medium chain, and long chain. Medium-chain acyl-CoA dehydrogenase deficiencies (MCADD) occur most commonly among acyl-CoA dehydrogenases. This deficiency can lead to fatty acid oxidation disorders and the potential of life-threatening metabolic diseases. Some symptoms of this deficiency include intolerance to fasting, hypoglycemia, and sudden infant death syndrome. These symptoms are directly correlated to the inability to metabolize fats. In contrast most humans’ ability to store excess energy, the inability to gain energy and make sugar from fat stores result in the intolerance to hasting and hypoglycemia. Fatty acids can also begin to accumulate in the blood which lowers the blood’s pH and causes acidosis [3].

Based on my results in ClinVar, there is one published study on this variant but it resulted in an uncertain significance. Upon looking at other sequence variants for the ACAD11 gene, no further conclusions about my variant could be made as the other variants also resulted in uncertain significance and one remained untested. The other two did not have any clinical significance [5]. Although there were no published studies that resulted in clinical significance, there were a couple of scores to be found. According to COSMIC, this variant has a FATHMM prediction score of 0.95, categorizing it as pathogenic.1 Also in accordance to the SIFT score of 0.01 provided on Ensembl, this variant is deleterious [2].

Structural Model:

The structure of the mutated protein was predicted using SWISS-MODEL [6] and entered into Chimera [7] software to visualize the protein product.

Screen Shot 2017-12-13 at 6.15.56 PM

Figure 1. Full protein structure of ACAD11. Location of variant G/T substitution at residue 646 is shown in green.

Advice for the individual with this variant [8]:

Individuals a deficiency in acyl-CoA dehydrogenase, should avoid fasting for prolonged periods of time. Supplementation of simple carbohydrates or glucose during illness is important in preventing catabolism. The amount of time an individual with this deficiency can fast varies with age. For example, infants typically require frequent feedings or a slow release source of carbohydrates. Illnesses and other stresses can significantly reduce the fasting tolerance of affected individuals.


1. http://cancer.sanger.ac.uk/cosmic/mutation/overview?id=3695823

2. http://useast.ensembl.org/Homo_sapiens/Variation/Explore?db=core;r=3:132575339-132576339;v=rs377349315;vdb=variation;vf=58937982

3. Thorpe C, Kim JJ (June 1995). “Structure and mechanism of action of the acyl-CoA dehydrogenases”. FASEB J. 9 (9): 718–25.

4. Touma EH, Charpentier C (January 1992). “Medium chain acyl-CoA dehydrogenase deficiency”. Arch. Dis. Child. 67 (1): 142–5.

5. http://www.genecards.org/cgi-bin/carddisp.pl?gene=ACAD11

6. Arnold K, Bordoli L, Kopp J, and Schwede T (2006). The SWISS-MODEL Workspace: A web-based environment for protein structure homology modelling. Bioinformatics.,22,195-201.

7. UCSF Chimera–a visualization system for exploratory research and analysis. Pettersen EF, Goddard TD, Huang CC, Couch GS, Greenblatt DM, Meng EC, Ferrin TE. J Comput Chem. 2004 Oct;25(13):1605-12.

8. Morris, Andrew A.M.; Spiekerkoetter, Ute (2012). “Disorders of Mitochondrial Fatty Acid Oxidation and Related Metabolic Pathways”. In Saudubray, Jean-Marie; van den Berghe, Georges; Walter, John H. Inborn Metabolic Diseases: Diagnosis and Treatment (5th ed.). New York: Springer. pp. 201–216.



Exome Variant Analysis of Mende male individual from Sierra Leone

Group 5 – Qinwei Zhuang, Jiani Long, Harshini C, Chendi Jiang, Sarthak Sharma


Objective: To identify likely pathogenic variants in the exome of an individual and suggest possible advice for people with such variants

Data Source: 1000 Genomes Project – SRA ERX225034                                                                Ethnicity: Mende in Sierra Leone HapMap population

Analytical Pipeline:

From the 1000 Genome Project database, we selected ERR250500 dataset for our study. The reference genome was GRCh38 human assembly. The pipeline begins with indexing the reference genome file and mapping the fastq input files to the reference using Burrows-Wheeler Alignment tool (BWA). Using Genome Analysis Tool-Kit (GATK), indexing and realigning to reduce the number of miscalls of INDELs in the data and thus improve accuracy of the call and finally calling these variants. The output is a VCF file which contains all variations in the input file from the reference genome.

Tool used to identify deleterious genes: wANNOVAR (Web-based).

The scores used by wANNOVAR include: SIFT(most important), LRT,  and CADD etc. A SIFT score predicts whether an amino acid substitution affects protein function. The overall calculating formula was complicated.  But SIFT score was of top priority when evaluating the significance of the variant. We narrowed down the list of potential deleterious variants manually.  The targeted frequencies were chosen to be smaller than 0.01 and the variants’ significance were chosen to contain the key word “pathogenic”.

Group Member Gene Name Exonic Function 1000G Genomes Frequency ExAC Genomes Frequency wANNOVAR Filtering Score (sift<=0.05)
Qinwei DYNC2H1 nonsynonymous SNV 0.0014 0.0004 0
Jiani POF1B nonsynonymous SNV 0.0029 0.0027 0.05
Harshini RYR1 nonsynonymous SNV 0.006 0.0013 0.001
Chendi ZEB1 nonsynonymous SNV 0.017 0.0051 0.05
Sarthak CLCN1 stopgain 0.0006 0.0003 N

Table 1: Summary of likely pathogenic variants.

Author: Qinwei Zhuang 

Gene Name: DYNC2H1 

Variant genotype:

  1. DNA: A->G (c.A11284G)
  2. Amino Acid: M(Met) -> V(Val) (p.M3762V)

The 1000 genome browser searching result (NM_001080463):


Figure 1: 1000 Genomes Searching Result

Variant frequency:

  1. Overall population:  0.14%;
  2. Mende in Sierra Leone HapMap population: 0.53%

Variant effect on protein: Nonsynonymous variant.  Substitution of Methionine with Valine (codon from AUG -> GUG)

Variant effect on the protein function (description of the pathogenic variant chosen): DYNC2H1 is critical for producing proteins dynein-2, which is found in cell structures known as cilia2.  The quality score of the variant call was 222.  The mutation we found in DYNC2H1 could lead to asphyxiating thoracic dystrophy and short rib-polydactyly syndrome3,4.  DYNC2H1 is crucial for normal cell functions.

Tool used to identify deleterious genes: wANNOVAR (Web-based).

The scores used by wANNOVAR include: SIFT(most important), LRT,  and CADD etc. A SIFT score predicts whether an amino acid substitution affects protein function.

The key scores used for evaluating negative impacts of this variant (description and critique of evidence for pathogenicity):

  • SIFT score: 0 (deleterious)
  • Polyphen2_HVAR_score: 0.62 (possibly damaging)
  • LRT phred: D (deleterious)
  • CADD phred: 26.6 (top 0.1% deleterious)
  • Frequency in regional population is smaller than 0.1%


Table 2: Pathogenicity scores and their significance 

Structural model of protein:

   The yellow part is the highlighted variant position.  The tool used for this was Jena3D5.  “Select 3762” was used as input in the script box and check the box “show selected part”.  Then the variant could be highlighted.


Figure 2: DYNC2H1 protein structure

Advice for people with this variant:

Unfortunately, there is no cure to dysplasia. In severe cases, such as Jeune asphyxiating thoracic dystrophy (ATD), this mutation is fatal in the first year of life4.  Even for people who fortunately survived the early stage of life, there is no cure for diseases caused by this mutation.  Immunotherapy might be possible but currently there is no promising therapy protocol.

Author: Jiani Long

Gene Name: POF1B

Variant genotype:

There are two splicing ways for exon of POF1B. They all have G to A base substitutions at position 986 of exon 10 of the mRNA sequence (C to T in the position 85308188 of DNA sequence). The corresponding amino acid change is Arginine to Glutamine at position 329 of protein sequence.

Quality of genotype call:225

Variant frequency:

  1. Overall population: 0.29 %;
  2. Mende in Sierra Leone HapMap population: 0.6%;
  3. There is an article measuring the incidence of R329Q variants in the Lebanese general population, 92 individuals from similar ethnic backgrounds were tested for the presence of nucleotide variants. Although no one carried the mutation in the homozygous state, the results of the sequence screening showed that 4 of the 92 control women were present in the heterozygous state. Therefore, Lebanon’s estimated allele frequency is 2.2%. The homozygote rate predicted in this population will be 0.048%.[6]

Variant effect on protein:

This substitution of a nucleotide, results in an Arginine to Glutamine mutation of the protein sequence at position 329 (mutation R329Q).

Variant effect on gene/protein function:

POF1B is a candidate gene for premature ovarian failure (POF) which is characterized by elevated gonadotropins and amenorrhea in women aged <40 years.[7] The protein encoded by this gene binds non-muscle actin filaments. The role this gene may play in the etiology of premature ovarian failure remains to be determined.[8]

Co-localization was found with both POF1B and the adherents and tight junction markers of human jejunum. Stable expression of POF1B in the MDCK polarized epithelial cell line maintained focal ligation localization but tight junction was lost in the POF1B R329Q variant. Cells expressing the POF1B R329Q variant show defects in ciliogenesis and cystogenesis. The role of POF1B in the regulation of the actin cytoskeleton was further verified by shRNA silencing of the endogenous protein in human intestinal caco-2 cells. The localization of POF1B to tight junctions plays a key role in regulating the actin cytoskeletal epithelial cell monolayer.[9]

Structural model of the protein and the location of the variant amino acid:

No crystal structure is found for protein POF1B. Structure of protein sequence from 255 to 428 is predicted by SWISS_MODEL. The R329Q variant of is marked.


Figure 3: POF1B Protein Structure

Advice for individual with this variant:

  1. Concerned about the body’s hormone levels and the frequent occurrence of depression or anxiety in daily life.
  2. Have a healthy lifestyle and diet to reduce the risk of cardiovascular disease and osteoporosis.

Author: Harshini C

Gene Name: Ryanodine receptor RYR1

Pathogenic Variant:

RYR1 gene encodes for a ryanodine receptor (RYR1) which found in skeletal muscle and functions as a calcium release channel in the sarcoplasmic reticulum but also serves to connect the sarcoplasmic reticulum and transverse tubule [10]. Abnormal RYR-1 channels lead to dysfunctional muscle contraction and weakness. Breathing problems associated with RYR-1 related muscle disease can range from non-existent to severe and are due to weakness in the muscles of the chest wall[11].

Description of the variant:

  • Variant type: Single nucleotide variant (nonsynonymous)
  • Genomic Location:    Chr19: 38543804 (on Assembly GRCh38)                            
  • Cytogenetic location: 19q13.2
  • Mutation in Nucleotide: NM_000540.2:c.11941C>T: missense variant [Cytosine —> Thymine]
  • Mutation in amino acid: In Exon87, Histidine at position 3981 changed to Tyrosine (His3981Tyr)
  • Regional Population: Mende in Sierra Leone HapMap population
  • Variant Frequency:  
    • Overall Population: 0.06%
    •  Regional:  2.2%
  • Comparison of Allele Frequency:
  • Global: 
    • C=0.9940
    • T=0.0060
  • Regional: 
    • C=0.9765
    • T=0.0235
  • Global MAF: T=0.0060/30
  • Quality score of variant call: 222

Evidence for Pathogenicity:

A)     Scores for evaluating the Pathogenicity of the variant:

  • CADD_phred score: 24.5 (top 0.1% deleterious)
  • SIFT score = 0.001 (Deleterious)
  • LRT phred = U (Unknown)
  • Polyphen2_HVAR_score = 0.986 (Probably damaging)

B) The ClinVar Database[12] for this RYR1 variant (NM_000540.2(RYR1):c.11941C>T (p.His3981Tyr)) summarizes that there are conflicting interpretations of pathogenicity.

Clinical Significance Study done by- Conditions PubMed Record
Benign EGL Classigication (Eurofins Clinical Diagnostics) [May,2013]

Genetic Services Laboratory, University of Chicago [April,2016]

Not specified NA
Likely Benign PreventionGenetics [2015]

Illumina Clinical Services Laboratory,Illumina

[Jun, 2016]

Center for Pediatric Genomic Medicine,Children’s Mercy Hospital and Clinics [Sept, 2017]

-Neuromuscular disease, congenital, with uniform type 1 fiber.

-Malignant hyperthermia susceptibility

-Central core disease

-Multiminicore Disease

Pathogenic OMIM Minicore myopathy with external ophthalmoplegia Available

Table 3: Summary of entries in ClinVar for RYR1 gene variants

The study by OMIM reported several missense and nonsense mutations for RYR1 that have been implicated in centronuclear myopathy(CNM) which is a rare congenital myopathy characterized by prominence of central nuclei on muscle biopsy of the mutations that were identified, missense mutation His3981Tyr on exon 87 was recurrent within the population that was studied [13]. Two other studies reported this RYR1 variant amongst other RYR1 mutations as a common cause of various myopathies and malignant hyperthermia susceptibility(MHS) traits [14][15].

Structural model of protein with variant highlighted:

The structure of RYR1 protein was obtained from PDB. It was visualized using JSmol and the variant amino acid residue (His3981Tyr) was marked in Red using Jena3D software [16]


Figure 4: Structure of RYR1 Protein(left) with the variant shown in RED (right)

Advice to Patient:

Currently there is no cure for RYR1 mutation. However, clinical trial of the drug N-Acetlylcysteine (NAC) is currently underway and is known to have only few side effects. Advice to patient would just be that he/she should not physically exhaust by exercising or other activities since muscle cramps and pain would be common.

Author: Chendi Jiang

Gene name:

ZEB1 gene (189909), short for zinc finger E box-binding homeobox 1 gene, is located on chromosome 10p11.22. It has alternative symbols as transcription factor 8, TCF8, T-lymphocyte-specific interleukin 2 inhibitor, delta-EF1 and NIL2A. [17]

Variant genotype and variant effect on protein:

From Excel file, the variants of this gene are as follows:

  1. ZEB1: NM_001128128:exon2:c.A182C:p.N61T (variant 1),
  2. ZEB1: NM_001174093:exon2:c.A233C:p.N78T (variant 6),
  3. ZEB1: NM_001174094:exon2:c.A182C:p.N61T (variant 7),
  4. ZEB1: NM_001174096:exon2:c.A233C:p.N78T (variant 9),
  5. ZEB1: NM_001323674:exon2:c.A233C:p.N78T(variant 37),
  6. ZEB1: NM_001323675:exon2:c.A191C:p.N64T(variant 38),
  7. ZEB1: NM_001323676:exon2:c.A191C:p.N64T(variant 39),
  8. ZEB1: NM_001323677:exon2:c.A191C:p.N64T(variant 40),
  9. ZEB1: NM_001323678:exon2:c.A182C:p.N61T(variant 41),
  10. ZEB1:NM_030751:exon2:c.A233C:p.N78T (variant 2)

In the annotation of variant 1, ZEB1 is the gene name. NM_001128128 is the accession number of mRNA transcript variant. Exon 2 denotes the location substitution occurs. c.A182C means on 182 base, A is substituted by C in variant compared with reference sequence. p.N61T shows the 61st amino acid in protein asparagine is mutated to threonine. The variant numbers at the end are obtained from NCBI nucleotide database. The other variants can be interpreted in the same way.

After searching mRNA variants accession number in NCBI nucleotide database, some structural and functional information was found. These genes encode zinc finger transcription factors. The encoded protein likely plays a role in transcriptional repression of interleukin 2 [19]. Mutations in this gene are associated with posterior polymorphous corneal dystrophy-3 and late-onset Fuchs endothelial corneal dystrophy. [20]

Regarding the structural information, variant 1 encodes isoform a. Variant 6 differs in the 5′ UTR, lacks a portion of the 5′ coding region, and initiates translation at an alternate start codon, compared to variant 1. This variant also lacks an in-frame exon in the 5′ coding region compared to variant 1. The encoded isoform c is shorter and has a distinct N-terminus, compared to isoform a. Variant 7 uses an alternate in-frame splice site in the 5′ coding region, compared to variant 1. This results in a shorter protein (isoform d), compared to isoform a. Variant 9 differs in the 5′ UTR, lacks a portion of the 5′ coding region, and initiates translation at an alternate start codon, compared to variant 1. The encoded isoform f is longer and has a distinct N-terminus, compared to isoform a. Variant 2, also known as Zfhx1a-1 differs in the 5′ UTR, lacks a portion of the 5′ coding region, and initiates translation at an alternate start codon, compared to variant 1. This variant also uses an alternate in-frame splice site in the 5′ coding region, compared to variant 1.The encoded isoform b is longer and has a distinct N-terminus, compared to isoform a. There is no details structural information for variants 37-41[21].

Quality of genotype: 214

Variant frequency:

  1. Overall population: 1.7 %;
  2. Mende in Sierra Leone HapMap population [18]: 6.4 %;

Structural model of the protein:

No structural model is found for ZEB1. The structure is tried to be predicted by SWISSMODEL. The sequence is from 167 to 307. The rest of the sequence is not be able to predicted since there is not template available.


Figure 5: ZEB1 Protein Structure

Normal protein function:

The ZEB family of zinc finger transcription factors plays an important role in normal development. They induce a developmental process called epithelial to mesenchymal transition (EMT). EMT is a process in which cells undergo a molecular switch from a polarized, epithelial phenotype to a highly motile, non-polarized mesenchymal phenotype. It is essential for processes such as gastrulation, neural crest formation, heart morphogenesis and formation of the musculoskeletal system and craniofacial structures. E-cadherin is a major target gene of these transcriptional repressors, and this downregulation is considered a hallmark of EMT. Mutations in ZEB encoding genes cause severe syndromic malformations such as corneal endothelium. [22]

Variant effect on protein function:

Fuchs endothelial corneal dystrophy (FECD) is the most common genetic disorder of the corneal endothelium. It is characterized by abnormal deposition of extracellular matrix (ECM), such as corneal guttae, accompanied by a loss of endothelial cells. Naoki Okunura et al. demonstrated that the EMT-inducing genes ZEB1 and SNAI1 were highly expressed in corneal endothelial cells in FECD and were involved in excessive production of ECM proteins, such as type I collagen and fibronectin through the transforming growth factor (TGF)-β signaling pathway. They also found that SB431542, a specific inhibitor of TGF-β type I ALK receptors, suppressed the expression of ZEB1 and Snail1 followed by reduced production of ECM. The research suggested that increased expression levels of ZEB1 and Snail1 in FECD cells were responsible for an increased responsiveness to TGF-β present in the aqueous humor and excessive production of ECM. [23]

Author: Sarthak Sharma

Gene Name: CLCN1

Description of pathogenic variant chosen:

CLCN1 is a member of the CLC family of genes[24,25]. The CLC family of genes provides the instructions required for making chloride channels which aid in the inter- and intra- cellular transport of chloride ions. This transport is essential for the cell to transmit electrical signals. Specifically, CLCN1 provides directions for making ClC-1 channels, found in skeletal muscle cells. CLCN1 gene produces two identical protein subunits which make up the ClC-1 channels.

  • Cytogenetic location –
    • 7q34 – long arm(q) of chromosome 7, at position 34
  • Genomic location –
    • Chr7: 143351924 (on Assembly GRCh38)
    • Chr7: 143049017 (on Assembly GRCh37)
  • Mutation –
    • Single Nucleotide Variation
    • CLCN1:NM_000083:exon23:c.C2926T:p.R976X
  • Protein change –
    • R976*; ARG976TER
  • Regional Population – MSL (Mende in Sierra Leone)
  • MAF score –
    • Global:
      • T=0.0003/32 (ExAC)
      • T=0.0006/3 (1000 Genomes)
      • T=0.0008/10 (GO-ESP)
      • T=0.0013/37 (TOPMED)
    • Comparison of allele frequency (Global vs Regional):
      • Global:
        • C=0.9994
        • T=0.0006
      • Regional:
        • C=0.9941
        • T=0.0059

Quality of genotype call:

Quality in Phred Score: 222. Thus, -10log10(prob(ALT call is wrong)) = 222. Probability that alternate nucleotide call is wrong = 10^(-22.2).

Description and critique of evidence for pathogenicity:

  • CADD phred score: 42
  • SIFT score: NA
  • LRT phred: 0.001 (D : deleterious)
  • Polyphen2_HVAR_score: NA

The cytosine(C) nucleotide at position 2926 is substituted with thymine(T) nucleotide.

This SNP causes stopgain mutation changing the codon CGA (which codes for Arginine) at position 976 to the stop codon TGA.

In a study exploring possible pathogenicity conducted on individuals with sporadic epilepsy of unknown origin found that 96.7% of them had at least one missense variant in the CLCN genes compared with 28.2% of 139 controls[25]. Further, nonsynonymous single nucleotide polymorphisms in the “skeletal” chloride channel gene CLCN1 and in CLCN2, a putative human epilepsy gene, were detected in threefold excess in cases relative to controls.


The structure for CLCN1 protein was not available on the PDB database and the structure was modeled using the Phyre2 web portal[26]. The point of mutation is shown as a red strand (Arg#976) in the complete structure. In the same image, the original structure is shown as a group of red balls. Due to the stop-gain mutation, the protein is terminated at the 976th residue (shown as a group of red balls in the truncated structure).


Figure 6: Structure of CLCN1 protein (left) and structure of mutated CLCN1 protein (right).

Advice for people with this variant:

  1. This mutation has been linked to pharmacoresistant epilepsy. Therefore, the first advice would be to get oneself tested since special investigations might be necessary for proper diagnosis of epilepsy and to prescribe suitable medication[25].
  2. In most cases, epilepsy can be fully controlled with the use of anticonvulsant/anti-epileptic medications. However, that is not the case with pharmacoresistant epilepsy.  In almost half the cases, epileptic surgery could be a potential cure.
  3. In cases when surgery is not possible, one can use the Vagus Nerve Stimulator[27] which is an FDA-approved treatment and requires minor surgery to implant a device under the skin near the collarbone. This device transmits weak electrical signals and help prevent electrical bursts in the brain which cause seizures.
  4. Lastly, a ketogenic diet (rich in fats and low in carbohydrates) is also suggested for people suffering from seizures[28].


  1. https://www.ncbi.nlm.nih.gov/variation/tools/1000genomes/?gts=rs137853026
  2. https://ghr.nlm.nih.gov/gene/DYNC2H1
  3. Dagoneau, Nathalie, Marie Goulet, David Geneviève, Yves Sznajer, Jelena Martinovic, Sarah Smithson, Céline Huber et al. “DYNC2H1 mutations cause asphyxiating thoracic dystrophy and short rib-polydactyly syndrome, type III.” The American Journal of Human Genetics 84, no. 5 (2009): 706-711.
  4. Zhang, Wenjuan, S. Paige Taylor, Hayley A. Ennis, Kimberly N. Forlenza, Ivan Duran, Bing Li, Jorge A. Ortiz Sanchez et al. “Expanding the genetic architecture and phenotypic spectrum in the skeletal ciliopathies.” Human mutation (2017).
  5. Jena 3D Protein Visualization, http://jenalib.leibniz-fli.de
  6. Lacombe A, Lee H, Zahed L, et al. Disruption of POF1B Binding to Nonmuscle Actin Filaments Is Associated with Premature Ovarian Failure. American Journal of Human Genetics. 2006;79(1):113-119.
  7. Riva, P., Magnani, I., Conti, A. M. F., Gelli, D., Sala, C., Toniolo, D. and Larizza, L. (1996), FISH characterization of the Xq21 breakpoint in a translocation carrier with premature ovarian failure. Clinical Genetics, 50: 267–269.
  8. S. Bione, F. Rizzolio, C. Sala, . et al. . Mutation analysis of two candidate genes for premature ovarian failure, DACH2 and POF1B, Human Reproduction, Volume 19, Issue 12, 1 December 2004, Pages 2759–2766,
  9. Padovano V. et al. . The POF1B candidate gene for premature ovarian failure regulates epithelial polarity. J. Cell Sci. 124, 3356–3368 (2011).
  10. NIH GHR database https://ghr.nlm.nih.gov/gene/RYR1
  11. http://www.enmc.org/home
  12. NCBI ClinVar Database: https://www.ncbi.nlm.nih.gov/clinvar/variation/29878/
    Wilmshurst, J. M., S. Lillis, H. Zhou, K. Pillay, H. Henderson, W. Kress, C. R. Müller et al.
  13. “RYR1 mutations are a common cause of congenital myopathies with central nuclei.” Annals of neurology 68, no. 5 (2010): 717-726.
  14. Shaaban S, Ramos-Platt L, Gilles FH, et al. RYR1 mutations cause ophthalmoplegia, facial weakness, and malignant hyperthermia. JAMA ophthalmology. 2013;131(12):10.1001
  15. Snoeck, M., B. G. M. Engelen, Benno Küsters, Martin Lammens, R. Meijer, J. P. F. Molenaar, J. Raaphorst et al. “RYR1‐related myopathies: a wide spectrum of phenotypes throughout life.” European journal of neurology 22, no. 7 (2015): 1094-1112.
  16. Hanson, Robert M., and Xiang-Jun Lu. “DSSR-Enhanced Visualization of Nucleic Acid Structures in Jmol.” Nucleic Acids Research 45.Web Server issue (2017): W528–W533. PMC. Web. 13 Dec. 2017.
  17. https://www.omim.org/entry/189909
  18. https://www.ncbi.nlm.nih.gov/sra/ERR250500/
  19. Williams T M, Moolten D, Burlein J, et al. Identification of a zinc finger protein that inhibits IL-2 gene expression[J]. Science, 1991, 254(5039): 1791-1794.
  20. Riazuddin S A, Zaghloul N A, Al-Saif A, et al. Missense mutations in TCF8 cause late-onset Fuchs corneal dystrophy and interact with FCD4 on chromosome 9p[J]. The American Journal of Human Genetics, 2010, 86(1): 45-53.
  21. https://www.ncbi.nlm.nih.gov/nuccore
  22. Vandewalle C, Van Roy F, Berx G. The role of the ZEB family of transcription factors in development and disease [J]. Cellular and molecular life sciences, 2009, 66(5): 773-787.
  23. Okumura N, Minamiyama R, Ho L T Y, et al. Involvement of ZEB1 and Snail1 in excessive production of extracellular matrix in Fuchs endothelial corneal dystrophy[J]. Laboratory Investigation, 2015, 95(11): 1291-1304
  24. Dunø, Morten, et al. “Difference in allelic expression of the CLCN1 gene and the possible influence on the myotonia congenita phenotype.” European journal of human genetics12.9 (2004): 738-743.
  25. Chen, Tim T., et al. “Novel brain expression of ClC-1 chloride channels and enrichment of CLCN1 variants in epilepsy.” Neurology 80.12 (2013): 1078-1085.
  26. Kelley, Lawrence A et al. “The Phyre2 Web Portal for Protein Modelling, Prediction and Analysis.” Nature protocols 10.6 (2015): 845–858. PMC. Web. 13 Dec. 2017.
  27. Giordano, F; Zicca, A; Barba, C; Guerrini, R; Genitori, L (April 2017). “Vagus nerve stimulation: Surgical technique of implantation and revision and related morbidity”. Epilepsia. 58 Suppl 1: 85–90. doi:10.1111/epi.13678. PMID 28386925.
  28. Barañano, Kristin W., and Adam L. Hartman. “The Ketogenic Diet: Uses in Epilepsy and Other Neurologic Illnesses.” Current treatment options in neurology 10.6 (2008): 410–419.

Exome Analysis of Female from Southern Han Chinese Population

BIOL6150 Group 1 Project 2

Dong Jo Ban, Saurabh Gulati, Monica McNerney, Beatriz Saldaña, Xinyu Wang


The 1000 Genomes project aims to analyze the genetic variation found across different populations by analyzing genomes from people around the world. The project fully sequenced over 1000 genomes with the goal of detecting genomic variants with a frequency of 1% (The 1000 Genomes Project Consortium).  In this project, we sought to analyze the genetic variation of one of the sequenced individuals (HG00537) by analyzing the variants in the individual’s exome.  This individual is a female from the Southern Han Chinese population, and at least three out of the four grandparents are also Southern Han.


Workflow Figure/Pipeline


1) Alignment and 2) Variant Calling

In order to analyze genomes, it is first required to map the genomic reads to a complete genome, also known as a reference. Burrows-Wheeler Aligner (BWA) will be utilized to map the sequence data to the reference genome. The pipeline uses BWA-MEM, which is the latest algorithm recommended for high-quality queries due to its speed and accuracy.

Once the mapping is finished, it is generally worthwhile to realign the raw gapped alignment which will reduce the number of miscalls of INDELs in the data. The Broad’s GATK Realigner will be responsible for performing this second step of the pipeline. The resulting BAM file will then be indexed using samtools before calling the variants.

The last step of the pipeline involves variant calling. It is necessary to convert the BAM file into genomic positions using BCFtool’s mpileup command. This will generate a BCF file which consists of all of the locations in the genome. It allows the pipeline to call genotypes and reduce the number of variant sites. Using the BCF file, the pipeline uses BCFtool’s call command to perform variant calling. The resulting VCF file is then prepared for querying using a tool called tabix to index it.

3) Variant Analysis

We filtered our VCF through wANNOVAR (Wang K et al, 2010, Chang X et al, 2012) to annotate variants from the aligned sequence file.  The VCF file of the alignment was uploaded and compared to the reference genome hg19.  The output from wANNOVAR details how the variant differs from the reference genome (ie: synonymous or nonsynonomous SNV, frameshift deletion or insertion, etc.) and gives gives additional quantitative information about each variant, which we used to filter our variants.  We used the criteria of a CADD phred score > 10 and an allele frequency in East Asian populations of <0.01 to determine variants with the potential to be pathogenic. This gave about 450 candidate variants, which were then passed through the Human Gene mutation Database to determine variants with clinical significance.  

4) List of Clinically Significant Variants

We made a list of the dbSNp IDs (rsids) of the variants extracted from the exome annovar file and uploaded the list to the GenomeTrax HGMD and ClinVar MySQL databases. Then we compared the uploaded variants to those in the databases and created a table containing the basic variant information of the variants in the databases which are typically correlated with diseases, ergo clinically significant. Five variants were determined to be potentially pathogenic and are listed in Table 1.

rsID Gene Variant consequence CADD Phred score ExAc Frequency ExAc EAS
rs41293503 BRCA2 G>A Missense variant 32.0 3.3E-05 0
rs141274774 PKD1 C>T Missense variant 24.4 7.8E-04 0
rs143235330 PMS2 T>C Missense variant 13.6 3.3E-05 0
rs773500082 MEN1 T>C Missense variant 28.0 2.6E-05 0
rs587777106 EIF2AK4 G>A Missense variant 35.0 8.3E-06 0.0001

Table 1: List of variants with potential clinical significance

Individual Variant Analyses

Monica McNerney: rs141274774 (PKD1)

Gene Name RSID Allele Change Amino Acid Change Frequency Regional Frequency
PKD1 rs141274774 G → A S (Ser) →  N (Asn) 7.8E-04 0

The variant rs141274774 describes an exonic mutation in the gene PKD1 that codes for the polycystin-1 (PC-1) protein.   This gene is on chromosome 16, and the mutation changes a G to an A at position 4055 in the fifteenth exon.  This causes a single amino acid mutation from S (serine) to N (asparagine) at position 1352 in the protein. The genotype call has a CADD phred score of 24.4, meaning that there is a 3.2% error in base calling, and the variant has an allele frequency of 7.8×10-4 in all populations, and a frequency of 0 in East Asian populations.

Description of variant:

Polycistin-1 (PC-1) is necessary for renal tubular development and proper mechanosensing.  It is a very large (4303 amino acids) protein that functions as a complex with polycystin-2 to modulate calcium signaling in response to the sensing of different external inputs (Dalagiorgou et al 2010, Zhou 2009). PC-1 has a long N-terminal extracellular domain that senses a variety of signals, nine transmembrane domains, and a C-terminal domain that interacts with polycystin-2

(Figure 1).  As Figure-1 shows, polycistin-1’s extensive extracellular domain contains a variety of motifs that can interact with carbohydrates, lipids, proteins, and other molecules. PC-1 shares many of the same characteristics as G-coupled protein receptors (GPCRs).


Figure 1: Schematic of polycistin-1 structure, showing its extracellular sensing domain and interaction with polycsistin-2.  Figure from Zhou 2009.

Assessment of pathogenicity:

Mutations in polycistin-1 and polycistin-2 can cause autosomal dominant polycystic kidney disease (ADPKD), which is characterized by development of cysts in the kidney that can cause renal failure (Potts et al 2017).  Mutations in the PKD1 gene are responsible for 85% of ADPKD cases. Genetic diagnosis of AKPKD is complicated by the large number of mutations can cause this disease: over 250 different mutations in PKD1 have been found in patients with ADPKD, and many mutations in the gene are benign (Sha et al 2017).

The specific mutation that this variant causes, a serine to asparagine mutation at amino acid position 1352, is in the extracellular PKD domain of the protein (green circles in Figure 1).  PKD domains are immoglobulin-like regions thought to be involved in mechanosensing and intracellular adhesion, and PC-1 has sixteen of these domains (Huges et al 1995).  Individual mutations in single PKD domains can fully disrupt intercellular adhesion and sensing (Ibraghimov-Beskrovnaya et al 2000), but the effect of the variant of interest (S1352N) on intercellular adhesion has not been tested. Because of the structural similarity of serine and asparagine (both have polar uncharged side chains), it is possible that this mutation has little effect on protein function.

Similarly, in clinical tests the S1352N mutation has indeterminate pathogenicity.  In a deep sequencing analysis of samples from patients with ADPKD, whenever this variant was found, a different known pathogenic variant was also found (Eisenberger et al 2015).  The ClinVar database of mutations has only one submission for this variant, supporting the difficulty of assessing the pathogenicity of this mutation (ClinVar).  

Protein model:

Because of the large size of the PC-1 protein and difficulty predicting transmembrane domain regions, only a portion of the protein coding sequence was used in structure predictions.  A 1000 amino acid sequence (amino acids 852-1852) was entered into the SWISS-MODEL protein modeling web interface (Arnold et al 2006, Keifer et al 2009, Biasani et al 2014). Because of the structural similarity of the PKD domains to the muscular protein titin (Zhou 2009), the crystal structure of titin was used as a template for structural alignment.  However, titin only has 16% sequence identity to the input sequence( which is actually more than all other available templates), which means that the alignment likely has some flaws that confound structure interpretation. The pdb file of the structure was downloaded and visualized using PyMol.  The same procedure was used to determine the structure of the variant protein.

The overall structure of the 1000 aa sequence analyzed (Figure 2A) shows five distinct PKD domains with characteristic beta sheets.  The amino acid of interest is located in the second visible PKD domain (highlighted in Figure 2A in blue).  Figure 2B shows a zoomed-in structure of this alignment with the wild type residue (serine) highlighted in red, and Figure 2C shows the mutated residue (asparagine) highlighted in yellow.


Figure 2: (A) PyMol rendering of amino acids 852-1852 of polycistin-1, with the PKD region of interest highlighted in blue. (B) PyMol visualization of the wild-type PKD domain structure. (C) PyMol visualization of the variant PKD domain structure, which contains the S1352N mutation.

Notably, the amino acid substitution appears to have little effect on the structure of the PKD domain. Both residues are polar and face away from the protein into an aqueous solution.  The reside is contained in a loop connecting the beta sheets, which suggests that it is not integral to the structure of the PKD domain.  However, this residue could change the interaction of the PKD domain with other molecules, and a structure of that interaction would be necessary to better visualize the true effect of the mutation on protein function.

Advice to individual with this variant:

Though possible, it seems unlikely that this mutation would cause ADPKD.  Since this mutation generally occurs with other mutations in the PKD1 gene, I recommend further screening to determine if there are other known pathogenic mutations in this individual’s PKD1 gene.  I would also suggest that though ADPKD is unlikely, the individual should be aware of symptoms for ADPKD (high blood pressure, side pain, blood in urine, urinary or kidney tract infections) and alert her healthcare provider of her potential genetic predilection for ADPKD upon the presentation of any of these symptoms so that the symptoms can be treated early to prevent kidney failure

Xinyu Wang: rs41293503 (BRCA2)

The gene is BRCA2, it provides instructions for making tumor suppressor proteins. It is also involved in repairing damaged DNA (Chen, Sining, and Giovanni Parmigiani). This variant is BRCA2 c.6706G>A at the cDNA level. It comes from a single nucleotide change from G (guanine) to A (adenine). This change results in amino acid changing from glutamic acid (Glu) to lysine (Lys) at codon 2236 of the BRCA2 protein (GeneDx). It has an allele frequency of 3.301e-05 among all human populations (ExAC Browser). To be more specific, there is an allele frequency of 9.696e-05 among African population, allele frequency of 6.057e-05 among South Asian population and an allele frequency of 3.003e-05 among European (Non-Finnish) population.

This change is considered a non-conservative amino acid substitution because glutamic acid and lysine have different properties including polarity, charge and size. In silico analyses predict that this variant may be damaging to protein structure and functions. Also, it has been reported in many individuals with breast and/or variant cancer and prostate cancer (Kluska, Anna, et al 2015 and  Kote-Jarai, Z., et al 2011). However, current information is insufficient to determine whether this variant is benign or pathogenic. Therefore, it is classified as a variant of uncertain significance.


Attached are pictures showing the structural model of the protein (breast cancer type 2 susceptibility protein) BRCA2 gene encodes (Bank, RCSB Protein Data). The amino acid change (G->A) happens at codon 2236.

There are more than 1800 identified BRCA2 mutations, and many of them increase the risk of breast and/or ovarian cancer. More than 30 of those mutations are associated with increasing the risk of prostate cancer. Most of the variants are very rare, and the least rare causal SNP is rs90359550 (SNPedia). Other variations include rs1799944, rs766173 and rs144848. The variant rs28897756 also has a mutation from guanine to adenine, but does not result in any amino acid change. However, it is identified as pathogenic and a causal mutation for breast cancer. The mutation is silent at the coding level, and it can disrupt a splice donor site. It has been shown in functional assays to cause aberrant RNA processing, specifically a skipping of exon 23 (ClinVar). This leads to a frameshift at codon 2985, causing a premature stop signal and is expected to result in an absent or disrupted protein (Acedo, Alberto, et al 2015, Colombo, Mara, et al 2013, Houdayer, Claude, et al 2012, and Willems‐Jones, Amber, et al 2012). In ClinVar, there are several submissions from different submitters and the results agree with each other. Criteria vary, and the most common criteria for identifying pathogenicity are from American College of Medical Genetics and Genomics (ACMG) (Green, Robert C., et al 2013).  The evidence about this variant meets the supporting criterion “multiple lines of computational evidence support a deleterious effect on the gene or gene product”  listed in ACMG guidelines. The mutation we study is similar to this mutation only in the sense that both of them are from a single nucleotide mutating from G to A. However, the mutation location and the resulting amino acid are both different.  

Saurabh Gulati: rs773500082 (MEN1)

Gene Name RSID Codon Change Amino Acid Change Frequency Regional Frequency
MEN1 rs773500082 TAT -> TGT Y (Tyr) -> C (Cys) 0.00002599 0

This variant represents a SNP in the MEN1 gene, which encodes menin, a putative tumor suppressor associated with a syndrome known as multiple endocrine neoplasia type 1 or Wermer’s syndrome. This is an autosomal dominant disorder which effects the endocrine system by forming neoplastic lesions in parathyroid, pituitary gland and pancreas (Lemmens, Irma, et al 1997). Although the exact function of MEN1 gene and protein is not known, but it is known that generally, people are born with 1 mutated copy of the gene and during their lifetime the other copy of the gene is mutated in some cells. This results in those cells dividing uncontrollably and forming tumors. This phenomenon is also known as Knudson’s two-hit hypothesis (Knudson, Alfred G. 1971), and the fact that MEN1 follows these inheritance rules shows that it is a tumor suppressor gene. There are several mutations observed in patients with Wermer’s syndrome including non-sense mutations, frameshift deletions or insertions, in-frame deletion or insertion, donor-splice site mutations, and missense mutations (Pannett, A. A., and R. V. Thakker 1999).

The change in this variant is that the codon TAT changes to TGT which results in a missense mutation of amino acid Tyrosine (Y) changing to Cystine (C) (Sherry ST, Ward MH, Kholodov M, Baker J, Phan L, Smigielski EM, Sirotkin K. 2001).  The amino acids have a large physiochemical difference with Tyrosine having a polar side chain and Cystine having a thiol. The genotype call of this variant had a high phred score (28) assuring that this is a genuine variation, also this variant has a low ExAc frequency of 0.00002599 which also supports the fact that this could be a potentially pathogenic variation (Landrum MJ, Lee JM, Benson M, Brown G, Chao C, Chitipiralla S, Gu B, Hart J, Hoffman D, Hoover J, Jang W, Katz K, Ovetsky M, Riley G, Sethi A, Tully R, Villamarin-Salomon R, Rubinstein W, Maglott DR. 2015). While ClinVar contains an entry for this variant, it has not been reported in any literature in individuals with MEN1 related disease. It has also been predicted as a disruptive variant by several algorithms such as SIFT (Vaser, Robert, et al. 2016), PolyPhen2 (Adzhubei, Ivan A., et al. 2010), Align-GVGD (Tavtigian, Sean V., et al. 2006), but these predictions have not been confirmed by any published functional studies. Due to the lack of evidence that this variant causes any disease or to prove otherwise ClinVar classifies it as a ‘Variant of Uncertain Significance’.


I used Swiss-Model (Biasini, Marco, et al. 2014) to model the structure of wild type and Y273C mutant proteins. Below are the images showing the WT type (left) and mutant (right) with the 273rd amino acid highlighted (red). The protein structure does seem to change in both the cases, but no conclusion can be drawn since these are predicted structures and not experimentally determined structures.

My advice for the individual with this mutation would be to have regular checks for multiple endocrine neoplasia post the age of 40 which is usually when this disease starts manifesting in individuals with the mutations. The typical sites for this disease are parathyroid, pituitary gland and pancreas.

Dong Jo: rs143235330 (PMS2)

Gene Name RSID Codon Change Amino Acid Change Frequency Regional Frequency
PMS2 rs143235330 GAC -> GGC D [Asp] ⇒ G [Gly] 0.00003327 0

Figure 1

Rs143235330 is a single nucleotide variant that refers to an exonic mutation in the gene PMS2. The location of this gene lies on chromosome 7. The codon and amino acid change is noted in Figure 1. The CADD phred score for the genotype call was reported to be 13.55. With CADD score of 20 being the top 1% deleterious variant in the human genome, this score indicates that PMS2 is fairly deleterious. In Eastern Asian population, it has frequency of 0. The protein encoded by PMS2 plays a key role in in the mismatch repair system. It exists to correct DNA mismatches as well as the small insertions and deletions that could occur during DNA replication and homologous recombination. Combining gene product of mutL homolog (MLH1) gene with heterodimers formed from this protein leads to the formation of MutL-alpha heterodimer (Entrez). The palpable interaction between mutL-alpha and clamp loader subunits of DNA polymerase III imply that the heterodimer could play a part in gathering DNA polymerase II to the site of mismatch repair. Moreover, mutL-alpha heterodimer is associated with DNA damage signaling, a process known to cause cell cycle arrest which could eventually result in apoptosis if DNA damages are severe (UniProtKB).Figure 1: Table of RSID, codon change, amino acid change, and frequency

From variant analysis of the genome of the person selected for this project, a variant identified by RSID rs143235330 was detected. According to Entrez, mutations in this gene have been linked with hereditary nonpolyposis colorectal cancer, also known as the Lynch syndrome, and Turcot syndrome. Lynch syndrome is a genetic disease due to autosomal dominant inheritance. It is identified as the resulting disease when at least one defect or error in several mismatch genes. Over 70% of all HNPCC cases are supposedly attributed to hMSH1 and hMLH1 genes. PMS2 seems to play a smaller part along with other unidentified defects affecting the individuals with Lynch syndrome (Green et al., 1998). In one study, risk of patients with PMS2 mutation appears to be lower than that of patients with other mutations such as MSH mutations. However, there is lack of empirical evidence that supports this claim due to uniqueness of PMS2 mutation (Steinke et al., 2013). Turcot syndrome is a rare hereditary disease where a brain tumor is correlated with colonic polyps, which are mostly harmless, but can develop into cancer (Ohsato et al., 1979). It was found out that there were two germline mutations within the PMS2 gene, a G deletion (1221delG) and a four-base-pair deletion (2361delCTTC) in exon 14. Both of these mutations were inherited from the individual’s parents who did not suffer from the condition. This shows that the frameshift mutations in PMS2 are not pathogenic when they are by themselves, but can become so when both are present as a compound heterozygote (De Rosa et al., 2000).


Figure 2: Template (left) and variant model (right). Colors blue and red indicate variant amino acid. In the variant model, we can observe region highlighted in yellow that doesn’t show up in the template. Images generated in Swiss Model and Jmol.

For the person with this variant, I would advise him/her to frequently screen for Lynch syndrome. In the past, the researchers mainly examined the contributions of MSH and MLH mutations. However, there are more recent studies that links PMS2 mutation with Lynch syndrome (Daniels et al., 2015). With more elucidated role of PMS2 in the development of the disease, it would be best to visit doctor periodically to make sure the person with the variant can receive appropriate treatment as soon as possible. In the case of Turcot syndrome, it still requires even more validation and studies compared to Lynch syndrome as there are not enough empirical data to make any predictions. Furthermore, there is a study that states that the complex inheritance pattern is making it even more difficult in studying this syndrome. In either type of syndrome known to be caused by PMS2, it would be beneficial to an individual with the variant to make sure he or she visits the hospital often to make sure they stay healthy.

Beatriz Saldaña: rs587777106 (EIF2AK4)

Gene Name RSID Allele Change Amino Acid Change Frequency Regional Frequency Inheritance
EIF2AK4 rs587777106 CGA –> CAA R (Arg) –> Q (Gln) 8.28E-06 0.0001 Autosomal Recessive

Eukaryotic Translation Initiation Factor 2-Alpha Kinase 4 (EIF2AK4) is part of the kinases that are responsible for phosphorylating the alpha subunit of eukaryotic translation initiation factor-2 (eIF2) (Berlanga et al, 1999). eIF2 is required for the initialization of translation by facilitating the binding of tRNAiMet to the ribosome. The phosphorylation takes place at the serine site of the alpha-sub-unit. The phosphorylation of eIF2 downregulates protein synthesis as a response to cellular stress. When there is a scarcity of amino acids, EIF2AK4 will help phosphorylate eIF2 to increase its affinity for its guanine nucleotide exchange factor, elF2B which converts inactive elF2-GDP to active eIF2-GTP (Kimball, 1999). Phosphorylated eIF2 acts as a competitive inhibitor of eIF2B causing it to lose its ability to activate eIF2-GDP and initialize translation, thus leading to a global protein synthesis repression.

The genome of the person who we chose to analyze contained a variant with the rsid rs587777106 which has been proven to be pathogenic and positively correlated with familial pulmonary capillary hemangiomatosis, a rare lung disorder that causes the proliferation of capillaries in the lungs. The single nucleotide polymorphism causes a change from Arginine to Glutamine in the 585th amino acid of the EIF2AK4 protein sequence due to a substitution of Guanine to Adenine in the nucleotide sequence (Landrum et al, 2015). Arginine is a basic and positively charged amino acid while Glutamine is amidic and polar, thus causing a large discrepancy in the affinity of the protein. This change will most likely cause a major loss of function in EIF2AK4 which would consequently become unable to phosphorylate eIF2. The lack of downward regulation of eIF2 would probably be responsible for the proliferation of capillaries in the lungs. The only viable solution to the problem caused by the variant is a complete lung transplant. My advice for anyone with this variant would be to frequently monitor their lungs’ condition and sign up for a lung transplant waiting list as soon as they notice problems with capillary proliferation in their lungs.

In order to determine the pathogenic variants, we first removed all variants with a frequency less than or equal to 5% and those with a phred scores greater than or equal to 10 from the vcf file. Then we compared the remaining variant dbSNP rsids to two databases: ClinVar and the Human Gene Mutation Database (Landrum et al, 2015 and Stenson et al, 2003). The comparison provided a list of variants that appeared in the databases, along with the clinical significance of each variant. Those variants with associated diseases and either pathogenic or uncertain clinical significance were analyzed. This specific variant had a definite pathogenic clinical significance.

Swiss Model was used to predict the 3-D structure of the wild type protein and the mutant protein. The mutant amino acid is highlighted in color yellow in the images below. The structure seems to change slightly, but we cannot make any definite conclusions due to the fact that these are predicted structures and not experimentally determined crystal structures.

Screenshot 2017-12-13 14.15.13

Figure 1. Protein model, yellow portion is the amino acid that is affected by the variant. Left image shows the normal functioning protein, image on the right show the protein structure with the mutation caused by the variant.


1000 Genomes Project Consortium. “A global reference for human genetic variation.” Nature 526.7571 (2015): 68-74.

Acedo, Alberto, et al. “Comprehensive splicing functional analysis of DNA variants of the BRCA2 gene by hybrid minigenes.” Breast Cancer Research 14.3 (2012): R87.

Arnold, K., et al., The SWISS-MODEL workspace: a web-based environment for protein structure homology modelling. Bioinformatics, 2006. 22(2): p. 195-201.

Acedo, Alberto, et al. “Functional classification of BRCA2 DNA variants by splicing assays in a large minigene with 9 exons.” Human mutation 36.2 (2015): 210-221.

Bank, RCSB Protein Data. “Breast Cancer Type 2 Susceptibility Protein – P51587 (BRCA2_HUMAN).” RCSB PDB – Protein Feature View – Breast Cancer Type 2 Susceptibility Protein – P51587 (BRCA2_HUMAN), www.rcsb.org/pdb/protein/P51587.

Berlanga, J. J., Santoyo, J., de Haro, C. Characterization of a mammalian homolog of the GCN2 eukaryotic initiation factor 2-alpha kinase. Europ. J. Biochem. 265: 754-762, 1999.

Biasini, M., et al., SWISS-MODEL: modelling protein tertiary and quaternary structure using evolutionary information. Nucleic Acids Res, 2014. 42(Web Server issue): p. W252-8.

Chang X, Wang K. wANNOVAR: annotating genetic variants for personal genomes via the web. J Med Genet. 2012 Jul;49(7):433-6.

Chen, Sining, and Giovanni Parmigiani. “Meta-analysis of BRCA1 and BRCA2 penetrance.” Journal of clinical oncology25.11 (2007): 1329-1333.

ClinVar. “NM_000059.3(BRCA2):C.9117G>A (P.Pro3039=) Simple – Variation Report – ClinVar – NCBI.” National Center for Biotechnology Information, U.S. National Library of Medicine, www.ncbi.nlm.nih.gov/clinvar/variation/38215/#summary-evidence.

Colombo, Mara, et al. “Comparative in vitro and in silico analyses of variants in splicing regions of BRCA1 and BRCA2 genes and characterization of novel pathogenic mutations.” PLoS One 8.2 (2013): e57173.

Dalagiorgou, G., E.K. Basdra, and A.G. Papavassiliou, Polycystin-1: function as a mechanosensor. Int J Biochem Cell Biol, 2010. 42(10): p. 1610-3.

Eisenberger, T., et al., An efficient and comprehensive strategy for genetic diagnostics of polycystic kidney disease. PLoS One, 2015. 10(2): p. e0116680.

ExAC Browser. “Variant: 13:32915198 G / A.” ExAC Browser, exac.broadinstitute.org/variant/13-32915198-G-A.

Eyries, M., Montani, D., Girerd, B., Perret, C., Leroy, A., Lonjou, C., Chelghoum, N., Coulet, F., Bonnet, D., Dorfmuller, P., Fadel, E., Sitbon, O., Simonneau, G., Tregouet, D.-A., Humbert, M., Soubrier, F. EIF2AK4 mutations cause pulmonary veno-occlusive disease, a recessive form of pulmonary hypertension. Nature Genet. 46: 65-69, 2014.

GeneDx. “Submissions for Variant NM_000059.3(BRCA2):C.6706G>A (P.Glu2236Lys) (rs41293503).” Submissions for Variant NM_000059.3(BRCA2):C.6706G>A (P.Glu2236Lys) (rs41293503) , 30 June 2017, clinvarminer.genetics.utah.edu/submissions-by-variant/NM_000059.3%28BRCA2%29%3Ac.6706G%3EA%20%28p.Glu2236Lys%29.

Green, Robert C., et al. “ACMG recommendations for reporting of incidental findings in clinical exome and genome sequencing.” Genetics in Medicine 15.7 (2013): 565-574.

He, Hongzhen, et al. “Crystal structures of GCN2 protein kinase C-terminal domains suggest regulatory differences in yeast and mammals.” Journal of Biological Chemistry 289.21 (2014): 15023-15034.

Houdayer, Claude, et al. “Guidelines for splicing analysis in molecular diagnosis derived from a set of 327 combined in silico/in vitro studies on BRCA1 and BRCA2 variants.” Human mutation 33.8 (2012): 1228-1238.

Hughes, J., et al., The polycystic kidney disease 1 (PKD1) gene encodes a novel protein with multiple cell recognition domains. Nat Genet, 1995. 10(2): p. 151-60.

Ibraghimov-Beskrovnaya, O., et al., Strong homophilic interactions of the Ig-like domains of polycystin-1, the protein product of an autosomal dominant polycystic kidney disease gene, PKD1. Hum Mol Genet, 2000. 9(11): p. 1641-9.

Kiefer, F., et al., The SWISS-MODEL Repository and associated resources. Nucleic Acids Res, 2009. 37(Database issue): p. D387-92.

Kluska, Anna, et al. “New recurrent BRCA1/2 mutations in Polish patients with familial breast/ovarian cancer detected by next generation sequencing.” BMC medical genomics 8.1 (2015): 19.

Kimball SR (1999). “Eukaryotic initiation factor eIF2”. Int. J. Biochem. Cell Biol. 31 (1): 25–9.

Kote-Jarai, Z., et al. “BRCA2 is a moderate penetrance gene contributing to young-onset prostate cancer: implications for genetic testing in prostate cancer patients.” British journal of cancer 105.8 (2011): 1230-1234.

Landrum MJ, Lee JM, Benson M, Brown G, Chao C, Chitipiralla S, Gu B, Hart J, Hoffman D, Hoover J, Jang W, Katz K, Ovetsky M, Riley G, Sethi A, Tully R, Villamarin-Salomon R, Rubinstein W, Maglott DR. ClinVar: public archive of interpretations of clinically relevant variants. Nucleic Acids Res. 2015 Nov 17.

National Center for Biotechnology Information (NCBI)[Internet]. Bethesda (MD): National Library of Medicine (US), National Center for Biotechnology Information; [1988] – [cited 2017 Apr 06]. Available from: https://www.ncbi.nlm.nih.gov/

Potts, J.W. and S.A. Mousa, Recent advances in management of autosomal-dominant polycystic kidney disease. Am J Health Syst Pharm, 2017. 74(23): p. 1959-1968.

The PyMOL Molecular Graphics System, Version 2.0 Schrödinger, LLC.

Roobol, Anne, et al. “p58IPK is an inhibitor of the eIF2α kinase GCN2 and its localization and expression underpin protein synthesis and ER processing capacity.” Biochemical Journal465.2 (2015): 213-225.

Sha, Y.K., et al., Use of targeted sequence capture and high-throughput sequencing identifies a novel PKD1 mutation involved in adult polycystic kidney disease. Gene, 2017. 634: p. 1-4.

SNPedia. “BRCA2.” BRCA2 – SNPedia, June 2017, www.snpedia.com/index.php/BRCA2.

Stenson et al (2003), The Human Gene Mutation Database (HGMD®): 2003 Update. Hum Mutat (2003) 21:577-581.

Wang K, Li M, Hakonarson H. ANNOVAR: functional annotation of genetic variants from high-throughput sequencing data. Nucleic Acids Research, 38:e164, 2010.

Willems‐Jones, Amber, et al. “High grade prostatic intraepithelial neoplasia does not display loss of heterozygosity at the mutation locus in BRCA2 mutation carriers with aggressive prostate cancer.” BJU international110.11c (2012).

Zhou, J., Polycystins and primary cilia: primers for cell cycle progression. Annu Rev Physiol, 2009. 71: p. 83-113.

Steinke, Verena et al. “Hereditary Nonpolyposis Colorectal Cancer (HNPCC)/Lynch Syndrome.” Deutsches Ärzteblatt International 110.3 (2013): 32–38. PMC. Web. 13 Dec. 2017.

Itoh, H et al. “Turcot’s Syndrome and Its Mode of Inheritance.” Gut 20.5 (1979): 414–419. Print.

Daniels, Molly S., and Karen H. Lu. “Clearer Picture of PMS2-Associated Lynch Syndrome Is Emerging.” Journal of Clinical Oncology, vol. 33, no. 4, 2015, pp. 299–300., doi:10.1200/jco.2014.58.9796.

Green, S. E., et al. “Hereditary non-Polyposis colorectal cancer.” International Journal of Colorectal Disease, vol. 13, no. 1, Apr. 1998, pp. 3–12., doi:10.1007/s003840050123.

Hegde, M. R. “A Homozygous Mutation in MSH6 Causes Turcot Syndrome.” Clinical Cancer Research, vol. 11, no. 13, Jan. 2005, pp. 4689–4693., doi:10.1158/1078-0432.ccr-04-2025.

Rosa, Marina De, et al. “Evidence for a recessive inheritance of Turcots syndrome caused by compound heterozygous mutations within the PMS2 gene.” Oncogene, vol. 19, no. 13, 2000, pp. 1719–1723., doi:10.1038/sj.onc.1203447.

Exome Variant Analysis on Female from United Kingdom

BIOL6150 Group 8 Project 2

Ryan Place, Brian Merritt, Tomáš Brůna, Yusuph Mavura


Exome Background

Sex: Female

Population: British

Code: GBR

Description: British in England and Scotland

Superpopulation: European

Superpopulation code: EUR

Donor, parents, and grandparents were all born in the United Kingdom

Biopsy source was from the peripheral vein

Exome data retrieved from B-Lymphocytes derived from blood source

Retrieved from the NHGRI Sample Repository for Human Genetic Research



exome pipeline

Variant Calling

Variant calling was done by first choosing an individual from the 1000 Genomes Project. Our individual HG00102-SRR077485 was then aligned to the GRCh38 reference genome using samtools, bwa, and bcftools which generated a vcf file in which we could run further analysis on.

Variant Analysis and Filtering

Variant analysis was done using wANNOVAR which is a web based tool that allows for the access of functions of ANNOVAR. The filtering parameters used were 1000G_All <= 0.05,  CADD PHRED > 30, and nonsynonymous SNV. We sorted by the PHRED score and chose four variants from the top 15.

Variant Table

t1 (1)

Tomáš Brůna

Variant Info

Gene affected: FRZB

Position : chr2, 182838608 (GRCh38)

Mutation: G → A

Quality of genotype call: 222

Read depth: 59

Type: nonsynonymous SNV

AA Change: R → W at position 200

dbSNP ID: rs288326

ClinVar ID: RCV000005531.2

Frequency in overall human population: 0.044

Frequency in European population: 0.12


Reason For Choosing

The main reason for choosing this variant is its high CADD PHRED [1] score = 34 which indicates that it belongs to a group of about 0.1% most deleterious substitutions that can occur in the human genome. Secondary reason is that its frequency in the overall human population is lower than 0.05.

Possible Mutation Effects

Hip Osteoarthritis

Gene FRZB encodes a protein which is is involved in the regulation of bone development [2]. This particular SNV in FRZB has been linked to osteoarthritis as a strong risk factor for primary osteoarthritis of the hip in females [3], with p-value = 0.0004. Therefore, this individual should avoid the environmental risk factors such as obesity or poor posture which are proven to contribute to the development of osteoarthritis.

The pathogenicity of this mutation surprised me at first since it is still relatively common, especially in European population (0.12). However, after learning that osteoarthritis is the most common joint disorder in the United States [4], the high frequency of this mutation is not as surprising.

The higher occurrence of this mutation in European (0.12) and American populations (0.086) compared to the overall frequency (0.044) made me investigate if there is evidence for hip osteoarthritis being more common in the mentioned populations. Study by Litwic et al. [5] in fact reports that the prevalence of primary radiographic hip osteoarthritis is 1.4% and 2.8% in Asia and Africa, compared to to the the mean prevalence of 10.1% and 7.2% in Europe and North America.

It does not mean that the difference is caused by this SNV. However, it is an interesting observation, the mutation might be one of the factors explaining the higher prevalence of hip Osteoarthritis in European and North American populations.

Tumor Suppressor

FRZB is also known to inhibit Wnt signalling pathways. Wnt signalling pathway has been implicated in several human tumors, among which are skin, connective tissue, colon, gastric, lung, breast, and prostate [6, 7]. Furthermore, FRZB is located in a region frequently deleted in cancers [6]. These facts make it a tumor suppressor gene candidate and several studies [7, 8] confirm this hypothesis. Therefore, there is a possibility that the mutation might also affect this function of the protein encoded by FRZB.


Structural Model of the Protein


3D predictions of the reference and mutated protein. Reference protein (with arginine in position 200) is displayed on the left, mutated protein (with tryptophan in pos. 200) on the right. As seen from the picture, the mutation has virtually no effect on the protein’s secondary structure (at least according to the prediction). The affected amino acid is highlighted by red color (in the upper left corner).


Brian Merritt
Variant Information
Gene affected: ABCB4:ATP binding cassette subfamily B member 4
Position: chr7 87431528-87431528 (GRCh38)
Nucleotide Change: C->T
dbSNP ID: rs45575636
Read Depth: 64
Genotype Call Quality: 222
ClinVar_ID(s): RCV000014696.26|RCV000033067.20|RCV000249752.2
Type: nonsynonymous SNV
AA Change: G->A at position 1769
Frequency in all populations: 0.0044
Frequency in Europeans: 0.0089

Reason For Choosing
This was chosen because of its relatively high CADD_phred score of 35. We use this method in order to score for the deleteriousness of single nucleotide variants (SNV) within a given genome [9]. These scores are important to understanding and prioritizing causal variation in a multitude of settings. Since the scaled CADD scores fall within the very top percentage of all scores, this variant falls within the top 0.1%. The variant was also filtered so that it only appears in less than 0.05 percent of the global population.

Mutagenic Effects of Variant

Pregnancy Complications
Gene ABCB4 is a member of the ATP binding cassette (ABC) transporter subfamily B [10]. These transports are important for the transportation of molecules across cellular membranes [11]. Particularly, this protein is influential in the development of MDR/TAP transporters. This subfamily involves drug resistances and antigen presentation [11]. Additionally, these protein may also be involved in the transport of phospholipids from liver hepatocytes into bile [11].  There is, primarily, a biased expression of the gene in liver, adrenal, and spleen tissues.
This variant has a likelihood of intrahepatic cholestasis of pregnancy (ICP). This disorders is characterized by maternal pruritus in the third trimester, raised serum bile acids and increased rates of adverse fetal outcomes [12]. Variants in ABCB4 have shown to be p=0.0017 when comparing pregnant Caucasian women with ICP vs control groups [13]. Since our patient is female, this disorder could cause problems during pregnancy should she decide to have children.

ICP has been reported in pregnant women as early as 8 weeks of gestation [14]. However, the majority of pregnant women afflicted with ICP experience symptoms after the 30th week [15]. Typical symptoms include pruritus, which may result in insomnia [12]. Additional constitutional symptoms include pale stools, dark urine, anorexia, malaise, and abdominal pain. Clinical jaundice has also been reported in 10-15% of women with ICP, but symptoms tend to be mild [12].
Some studies have been released regarding the correlation of serum biochemistry and fetal conditions. However, many sample sizes aren’t large enough to accurately predict correlations. Studies, recently, have shown that an increase in serum bile acids has a slight increased of risk of spontaneous preterm labor, asphyxia events, or meconium staining of amniotic fluids and/or placenta and membranes [12,16].

Liver Fibrosis
A list of 124 volunteers had ABCB4 sequencing between February 2004 and Match 2007 [17]. 32 volunteers were selected between the ages of 16-69 years. 8 different mutations were found in 11 different patients (34%). No mutations were present in control groups for several of the exons for this specific gene [17].  The fibrosis score and macrophagic infiltration of portal tracts were shown to be significantly higher in the patients exhibiting some form of ABCB4 mutation with a p=0.01. Multidrug-resistant P-glycoprotein 3 (MDR3) immunostaining was also performed. All patients with some form of ABCB4 mutation exhibited reduced or absent MDR3 immunostaining [17]. Reduced activity of this protein may result in improperly functioning bile systems.


Structural Model of ABCB4


BIOL6150 Group 8 Project 2BIOL6150 Group 8 Project 2 (1)

The wild-type protein model is on the left and the variant on the right. Template model used was multi-drug resistance protein 1A (4ksd.1.A). The change from Arginine to Glutamine (red) results in the alteration of an electrically charged amino acid to a polar uncharged chain. As a result, there may be a significant amino acid interaction change. There is no noticeable secondary structure change at this location. However, there is a noticeable alteration in the interaction of additional amino acid chains throughout the primary structure. The variant results in the loss of a strong protein interaction between the Arginine (590) and a short chain of amino acids (yellow): Threonine-Serine-Glycine (632-634). This generates the above effect where many of the chains are altered in the central region of the protein. This may disrupt the functionality of the transporter.



Variant Information

Gene Name: AP1S3


Type of Mutation: Nonsynonymous

Genomic coordinates (GRCh38): 2:223,755,329-223,837,601 (It occurs in Chr 2, 223777776 to 223777776)

Molecular Consequence: NM_001039569.1:c.97 C>T – missense variant

Amino acid change: Arg33Trp

ClinVarID: RCV000148042.3


Genotype: heterozygous(0/1)

Sequencing depth (DP)of 22

Frequency in all populations: 0.0024

Frequency in Europeans: 0.0089

Disease associated with:Susceptibility to Psosaris


Reason for Choosing

The variant was chosen as it had a high CADD_Phred score of 34. The Combined Annotation Dependent Depletion (CADD) score is a metric used for scoring the deleteriousness of single nucleotide variants as well as insertion/deletions variants in the human genome [18].

Possible Mutation Effects

The gene AP1S3 belongs to the family of Adaptor protein (AP) complexes which are cytosolic heterotetramers that promote the assembly and trafficking of small transport vesicles. AP-1 is dedicated to the transport of cargo between the trans-Golgi network and endosomes [20]. AP1S3 is a core AP-1 subunit that is predicted to stabilize AP-1 heterotetramers. It is involved in endosomal translocation of Toll-like receptor-3.

According to OMIM, the allelic variant rs138292988 (C/T) which is a missense mutation and leads to the change of Amino acid Arg33Trp in the protein at an interphase with another protein AP1M1A. The mutated protein showed reduced TLR3 (Toll-Like Receptor 3) processing. The c.97T-C allele frequency was significantly higher in affected individuals (3.6%) than in 1,695 unrelated controls (0.7%; p = 2.3 x 10(-5)) [21][24].

The individuals showed signs of pustular Psosaris. Psoriasis is a skin condition that speeds up the life cycle of skin cells. It causes cells to build up rapidly on the surface of the skin. The extra skin cells form scales and red patches that are itchy and sometimes painful [21][22][23].

From the protein models, we can see that the native protein has the positively charged arginine on the outside of the protein hence it most likely interacts with the external environment, most probably with AP1M1A and the change to the aromatic, neutral amino acid Tryptophan which might not interact well with the external environment since it is neutral.

It is suggested that the lack of family history among patients with AP1S3 mutations reflects the requirement of environmental triggers, such as infection, pregnancy, or drug exposure, for disease development. One can inherit the disease allele from an unaffected parent.

Hence my advice to individuals with this variant is to avoid use of drugs, or other triggers such as smoking or infections when trying to conceive or while pregnant in order to produce offspring who will not be affected by the allelic genotype and hence will not develop Psosaris [21].


Structural Model of The Protein


The wildtype protein is shown in the left and the mutant on the right [19]. There was no change in secondary structure overall. So as to show the possible effect of the Arg33Trp mutation, I chose to represent the models using ball and stick. As hypothesized, the positively charged Arg is replaced by the neutral aromatic Trp. This may reduce interaction with the mutated protein’s external environment, critically with the gene AP1M1A. This may lead to reduced TLR-3 processing and ultimately Psosaris.


Ryan Place

Gene affected: INSRR

Position: GRCh38, chr1 156842504-156842504

Nucleotide Change: C->T

Type: Nonsynonymous SNV

Read Depth: 43

AA change: R -> H at position 1044

dbSNP ID: rs138327752

COSMIC ID: COSM463170|COSM1645652

Frequency in all populations: 0.0004

Frequency in Europeans: 0.001


Reason for Choosing

One reason for choosing this variant is because of its CADD PHRED score = 34 [25]. Also after analyzing many other potential variants, most of the ones were considered benign or likely benign by Clinvar, so I decided to look at the variants that were linked to a COSMIC ID. COSMIC stands for catalogue of somatic mutations in cancer and this particular variant has a FATHMM prediction of pathogenic with a score of 0.99. FATHMM stands for Functional Analysis Through Hidden Markov Model and is used for the prediction of the functional effects of protein missense variants [26, 27].


Possible Mutation Effects

Insulin receptor-related receptor (IRR or INSRR) belongs to the insulin receptor minifamily of receptor tyrosine kinases that also includes the insulin receptor (IR) and insulin-like growth factor receptor (IGF-IR), and no peptide or protein has been discovered that activates INSRR[28]. INSRR does activate in the presence of alkaline media with a pH > 7.9, which is why INSRR is expressed in cell subsets of the kidney that secrete bicarbonate into urine. The secretion of bicarbonate is in response to alkali load, or to metabolic alkalosis [29].

No papers have been published on this variant, but there is a study of kidney renal clear cell carcinoma (KIRC) where this variant showed up. In that study out of 442 mutated samples only four were from INSRR [30]. KIRC is a common cancer and is known to be the most lethal of all the genitourinary tumours. This disease is known resistant to radiotherapy and chemotherapy, but if detected early can be potentially cured by surgical resection with recurrence rates not being high [31]. Cancer is extremely complex and usually many genetic variants contribute to cells turning malignant [31]. With this in mind, having this genetic variant does not mean that this person will get this cancer. Just that this variant has been associated to a study on kidney renal clear cell carcinoma and could potentially be considered a genetic marker for the disease upon further study. Some risk factors of KIRC are smoking, obesity, workplace exposure to carcinogens, and family history [32].

Advice for this individual if they are worried, would be to talk to their doctor about imaging such as an ultrasound or CT scan, to tell their doctor immediately about any blood in the urine or pain in the lower back on one side, and to get a yearly urinalysis [32]. Surgery is considered the only viable option for this cancer since it is resistant to most of the conventional treatments so vigilance is key.


Protein Structure



These images were created using Swiss-Model [33]. The top image is of INSRR as much as it could predict, with the red near the bottom of the structure being the site of the amino acid change. The image is made up up two templates, the first is 4zxb.1.E (52.58% identity and coverage from 1-896) and the second is 4xlv.1.A (74.92% identity and coverage from 933-1237) so there are large gaps where this predictive model did not have a template (897-932 and 1237-1268). The two images below show the position of the change closer with the amino acid structure shown, INSRR (R) on the left and INSRR variant (H) on the right . There is no difference in the predictive model since they both use the same template, however the structures of the amino acids are very different with arginine (R)  being charged and histidine (H) being polar [34]. This difference might account for the variant being associated with KIRC, but further study is necessary.



[1] http://cadd.gs.washington.edu/info

[2] http://www.genecards.org/cgi-bin/carddisp.pl?gene=FRZB

[3] Loughlin J, Dowling B, Chapman K, et al. Functional variants within the secreted frizzled-related protein 3 gene are associated with hip osteoarthritis in females. Proceedings of the National Academy of Sciences of the United States of America. 2004;101(26):9757-9762. doi:10.1073/pnas.0403456101.

[4] Zhang Y, Jordan JM. Epidemiology of Osteoarthritis. Clinics in geriatric medicine. 2010;26(3):355-369. doi:10.1016/j.cger.2010.03.001.

[5] Litwic A, Edwards M, Dennison E, Cooper C. Epidemiology and Burden of Osteoarthritis. British medical bulletin. 2013;105:185-199. doi:10.1093/bmb/lds038.

[6] Xiaolin Zi, Yi Guo, Anne R. Simoneau, et al. Expression of Frzb/Secreted Frizzled-Related Protein 3, a Secreted Wnt Antagonist, in Human Androgen-Independent Prostate Cancer PC-3 Cells Suppresses Tumor Growth and Cellular Invasiveness. Cancer Res November 1 2005 (65) (21) 9762-9770; DOI: 10.1158/0008-5472.CAN-05-0103

[7] Guo Y, Xie J, Rubin E, et al. Frzb, a Secreted Wnt Antagonist, Decreases Growth and Invasiveness of Fibrosarcoma Cells Associated with Inhibition of Met Signaling. Cancer research. 2008;68(9):3350-3360. doi:10.1158/0008-5472.CAN-07-3220.

[8] Byun T, Karimi M, Marsh JL, Milovanovic T, Lin F, Holcombe RF. Expression of secreted Wnt antagonists in gastrointestinal tissues: potential role in stem cell homeostasis. Journal of Clinical Pathology. 2005;58(5):515-519. doi:10.1136/jcp.2004.018598.

[9]Kircher M, Witten DM, Jain P, O’roak BJ, Cooper GM, Shendure J. A general framework for estimating the relative pathogenicity of human genetic variants. Nat Genet. 2014;46(3):310-315.

[10] https://www.genenames.org/cgi-bin/gene_symbol_report?hgnc_id=HGNC:45
[11] https://www.ncbi.nlm.nih.gov/gene/5244
[12] Geenes V, Williamson C. Intrahepatic cholestasis of pregnancy. World J Gastroenterol. 2009;15(17):2049-2066. doi:10.3748/wjg.15.2049.
[13] Bacq Y, Gendrot C, Perrotin F, et al. ABCB4 gene mutations and single-nucleotide polymorphisms in women with intrahepatic cholestasis of pregnancy. J Med Genet. 2009;46(10):711-715.
[14]Berg B, Helm G, Petersohn L, Tryding N. Cholestasis of pregnancy: clinical and laboratory studies. Acta Obstet Gynecol Scand. 1986;65(2):107-113.
[15] Kenyon AP, Piercy CN, Girling J, Williamson C, Tribe RM, Shennan AH. Obstetric cholestasis, outcome with active management: a series of 70 cases. BJOG An Int J Obstet Gynaecol. 2002;109(3):282-288.
[16]Glantz A, Marschall H, Mattsson L. Intrahepatic cholestasis of pregnancy: relationships between bile acid levels and fetal complication rates. Hepatology. 2004;40(2):467-474.
[17] Ziol M, Barbu V, Rosmorduc O, et al. ABCB4 heterozygous gene mutations associated with fibrosing cholestatic liver disease in adults. Gastroenterology. 2008;135(1):131-141.

[18] Nature Genetics in 2014: Kircher M, Witten DM, Jain P, O’Roak BJ, Cooper GM, Shendure J. A general framework for estimating the relative pathogenicity of human genetic variants. Nat Genet. 2014 Feb 2. doi: 10.1038/ng.2892. PubMed PMID: 24487276.

[19] Jmol: an open-source Java viewer for chemical structures in 3D. http://www.jmol.org/

[20] Hartz, P. A. Personal Communication. Baltimore, Md. 5/9/2014.

[21] Setta-Kaffetzi, N., Simpson, M. A., Navarini, A. A., Patel, V. M., Lu, H.-C., Allen, M. H., Duckworth, M., Bachelez, H., Burden, A. D., Choon, S.-E., Griffiths, C. E. M., Kirby, B., and 9 others. AP1S3 mutations are associated with pustular psoriasis and impaired Toll-like receptor 3 trafficking. Am. J. Hum. Genet. 94: 790-797, 2014.

[22] https://www.mayoclinic.org/diseases-conditions/psoriasis/symptoms-causes/syc-20355840

[23] Mössner R, Wilsmann-Theis D, Oji V, Gkogkolou P, Löhr S, Schulz P, Körber A, Christoph-Prinz J, Renner R, Schäkel K, Vogelsang L, Peters KP, Philipp S, Reich K, Ständer H, Jacobi A, Weyergraf A, Kingo K, Kõks S, Gerdes S, Steinz K, Schill T, Griewank KG, Müller M, Frey S, Ebertsch L, Uebe S, Sticherling M, Sticht H, Hüffmeier U..The genetic basis for most patients with pustular skin disease remains elusive.Br J Dermatol. 2017 Aug 5. doi: 10.1111/bjd.15867.

[24] https://www.omim.org/entry/615781#1

[25] Kircher M, Witten DM, Jain P, O’roak BJ, Cooper GM, Shendure J. A general framework for estimating the relative pathogenicity of human genetic variants. Nat Genet. 2014;46(3):310-315.

[26] Shihab HA, Gough J, Cooper DN, et al. Predicting the Functional, Molecular, and Phenotypic Consequences of Amino Acid Substitutions using Hidden Markov Models. Human Mutation. 2013;34(1):57-65. doi:10.1002/humu.22225.

[27] http://cancer.sanger.ac.uk/cosmic/mutation/overview?id=1645652

[28] Deyev IE, Chachina NA, Zhevlenev ES, Petrenko AG. Site-Directed Mutagenesis of the Fibronectin Domains in Insulin Receptor-Related Receptor. International Journal of Molecular Sciences. 2017;18(11):2461. doi:10.3390/ijms18112461.

[29] Deyev IE, Sohet F, Vassilenko KP, et al. Insulin receptor-related receptor as an extracellular alkali sensor. Cell metabolism. 2011;13(6):679-689. doi:10.1016/j.cmet.2011.03.022.

[30] http://cancer.sanger.ac.uk/cosmic/study/overview?study_id=416

[31] Yang W, Yoshigoe K, Qin X, et al. Identification of genes and pathways involved in kidney renal clear cell carcinoma. BMC Bioinformatics. 2014;15(Suppl 17):S2. doi:10.1186/1471-2105-15-S17-S2.

[32] https://www.cancer.org/cancer/kidney-cancer.html

[33] https://swissmodel.expasy.org/

[34] https://proteinstructures.com/Structure/Structure/amino-acids.html


Exome Analysis of Male with Northern and Western European Ancestry

The purpose of this project is to analyze a specific sample from 1000G Genome project, compare it against the reference genome and identify potential pathogenic variants from the sample. The sample our group chose is NA07048, a Caucasian male resident in Utah, USA.

  1. Sample profile:
Sample ID NA07048
Gender Male
Population CEPH (Utah residents with Northern and Western European ancestry)
Superpopulation EUR (Europe)
Biopsy Source Peripheral vein
Cell Type B-Lymphocyte
Tissue Type Blood
Race Caucasian
Country of Origin USA
Species Homo sapiens


2.  Methods:


Resource of samples and reference genome are shown below, along with a detailed variant calling pipeline.

After variant calling file is obtained, the file is uploaded to wANNOVAR for preliminary filtering. A total of 1312358 variants were fed into the annotation tool, and 2072 variants obtained after filtering; parameters used for filtering are: MAF<0.01, exonic, and non-synonymous mutation.

Further filtering was then applied by filter out variants with CADD_phred score lower than 30 or without a dbSNP record. CADD_phred score indicate the predicted deleteriousness of the variant; a score higher than 30 indicates the variant is predicted to be the 0.1% most deleterious substitution for human genome. The result is then ranked by SIFT score from low to high.


  • Sample source: SRR032632_1.filt.fastq.gz and SRR032632_2.filt.fastq.gz


  • reference source: GRCh38_full_analysis_set_plus_decoy_hla.fa



command line of samtools and bwa-mem pipeline for mapping and variants calling:

  • bwa index -a bwtsw GRCh38_full_analysis_set_plus_decoy_hla.fa
  • bwa mem -M -R ‘@RG\tID:flowcell\tSM:NA07048\tPL:microarray’ fa SRR032632_1.filt.fastq SRR032632_2.filt.fastq > NA07048bwamem.sam
  • samtools fixmate -O bam NA07048bwamem.sam NA07048bwamemfixmate.bam
  • samtools sort -O bam -o NA07048sorted.bam -T /tmp/NA07048temp bam
  • samtools index NA07048sorted.bam
  • samtools mpileup -go NA07048.bcf -f GRCh38_full_analysis_set_plus_decoy_hla.fa NA07048sorted.bam
  • bcftools call -vmO z -o NA07048.vcf.gz NA07048.bcf
  • tabix -p vcf NA07048.vcf.gz


wANNOVAR working pipeline:

Step 1: NA07048 variants (1312358 variants)

Step 2: identify exonic variants (7031 variants)

Step 3: identify missense, nonsense and splicing variants (4206 variants)

Step 4: Remove variants in the 1000 Genomes Project(ALL) with MAF>0.01 (2145 variants)

Step 5: Remove variants in gnomAD exome database with MAF>0.01 (2072 variants)

Table of variants chosen for further analysis:


 VARIANT 1: ITGA5 (Liu Yiqiuyi)

Gene Name: ITGA5

Variant genotype:

  • dbSNP ID: rs143754928
  • DNA change: G to A
  • Amino acid change: R558W (Arginine to Tryptophan)

Variant frequency in the overall human population and in the EUR population:

  • From 1000G_ALL, phase 3:
    • Major allele frequency (G): 0.9996
    • Minor allele frequency (A): 0.0004
  • From 1000G_EUR, phase 3:
    • Major allele frequency (G): 0.998
    • Minor allele frequency (A): 0.002

It shows that the mutation is a rather rare one among human population; it’s less rare in European population compared to all, but it’s still lower than 1%.

Variant effect on protein:

  • refGene: Nonsynonymous SNV
  • Functional Consequence: missense mutation

The mutation cause the R (Arginine) at position 558 change into W (Tryptophan).

Variant effect evaluation:

  • SIFT score: 0
  • Polyphen2 score: 1
  • CADD_phred: 34


rs143754928 has a SIFT score of 0 and a polyphen2 score of 1, indicating that the possibility of this mutation being benign is very high. It also has a CADD_phred score of 34, indicating this variant is predicted to be the 0.1% most deleterious substitutions that can be done to human genome.

Overall, rs143754928 is predicted to be a highly deleterious variant.


Variant effect on gene/protein function:

Integrin alpha-5/beta-1 is a receptor for fibronectin and fibrinogen. It recognizes the sequence R-G-D in its ligands. ITGA5:ITGB1 acts as a receptor for fibrillin-1 (FBN1) and mediates R-G-D-dependent cell adhesion to FBN1 (PubMed:12807887, PubMed:17158881). It also acts as a receptor for human metapneumovirus (PubMed:12907437). Integrin ITGA2:ITGB1 acts as a receptor for human parvovirus B19 (PubMed:24478423).

It’s understandable that as a major receptor, the mutation that results in rs143754928 variant could be deleterious as it interferes with the protein’s binding ability and is likely to cause mis-match or loss of binding function.

The variant hasn’t been published and doesn’t have a clinVar ID. However, multiple ITGA5 variants are known to cause cancer, and rs143754928 has been linked to lung cancer. There’re 238 ITGA5 variants in BioMuta database that are been associated with 23 types of cancer, especially skin cancer, lung cancer, and colon cancer.


Predicted model of integrin alpha-5.

Left: integrin alpha-5 model; Right: rs143754928 (R558W highlighted with red and blue)

Modeled by Swiss-Model, we can see that the mutation doesn’t impose evident change to the protein’s structure.

The reason that despite the highly delirious prediction offered by various evaluation scores and abundant examples of highly deleterious mutation near the site 558, structure study of rs143754928 and relevant literature yields no obvious evidence of this variant being deleterious is probably because the site 558 is safely outside the major substrate binding sites and mental binding sites of protein Integrin alpha-5, therefore assumes low interference for protein’s binding ability; also, it does not locate on the joint of subunits or domains, therefore assumes low interference for protein’s whole structure.

The reason that rs143754928 has still been linked with lung cancer and labeled “probably damaging” in HIVE database is likely due to the fact that the combined effect of multiple mutations on ITGA5 is highly deleterious, but the variant alone is not sufficient enough to cause serious damage.


 VARIANT 2: HPD (4-hydroxyphenylpyruvate dioxygenase) (Nirav Shah)


Gene Affected                : HPD (4-hydroxyphenylpyruvate dioxygenase)

dbSNP                        : rs1154510

Cytogenetic Location         : 12q24.31

Mutation Position            : chr12:121857429 (GRCh38.p7)

Mutation (DNA)               : T>C

Mutation (Protein)           : Tryptophan > Arginine

Genomic Placements           : NC_000012.11:g.122295335T>C

Quality of Genotype Call     : 4.23583 (Phred-Scaled Score) (From the variant file)

Variation Type               : SNV (Single Nucleotide Variation)

Gene Consequence             : Missense Variant


Fig1: Variant change site within the genome

Gene Function  :

The HPD gene translates into an essential enzyme –  4-hydroxyphenylpyruvate dioxygenase which is second in a series of five enzymes that work to break down the amino acid tyrosine (a non-essential amino acid). Tyrosine is an extremely crucial amino acid required for building proteins and is also one of the very frequent amino acids observed in the proteins. This enzyme is abundant observed abundantly in the liver, and smaller amounts are found in the kidneys where the residual proteins of the food consumed is processed.

Fig2: Breakdown of Tyrosine requiring HPD enzyme

The protein encoded by this gene is an enzyme in the catabolic pathway of tyrosine. The encoded protein catalyzes the conversion of 4-hydroxyphenylpyruvate to homogentisate. 4-hydroxyphenylpyruvate dioxygenase converts a tyrosine byproduct called 4-hydroxyphenylpyruvate to homogentisic acid. Homogentisic acid can follow two fates – broken down into further smaller molecules and later excreted (likely the case) or used to produce energy by Kreb’s (Tricarboxylic acid) cycle.

Defects in this gene are a cause of tyrosinemia type 3 (TYRO3) and hawkinsinuria (HAWK).

Observed Frequency in the Population   :

The average frequency observed for the mutation within the gene globally is about 0.85 (Mutation of A to C).

Study Population Group Sample Size Ref Allele Alt Allele
ExAC Release 1 Global Study-wide 121370 T=0.14978 C=0.85022
Europe Sub 73326 T=0.1445 C=0.8555
Asian Sub 25160 T=0.1516 C=0.8484
American Sub 21976 T=0.1655 C=0.8345
Other Sub 908 T=0.15 C=0.85

Table 1 : Observed frequency of the mutation (rs1154510)

Protein Mutation and change in Protein structure  :

Tryptophan (aromatic, non-polar, neutral) amino acid changes to Arginine (basic, non-polar, neutral) causing a characteristic and conformational change within the protein. The site of mutation is a part of alpha helix and is an important residue interacting with a ligand within the protein. The protein in humans is yet to be crystallized and thus for visualization purposes, the same protein observed in Arabidopsis thaliana is used for  Thus, the mutation can affect its interaction with the protein. The mutation is pointed in the figure below –

Fig 3: Mutation Site of the protein HPD (in Arabidopsis thaliana)

Advice for patients with this mutation :

Patients bearing this mutation develop metabolic acidosis and this terribly effects especially in the cases when women could be bearing a child and/or breast-feeding. A simpler approach in such cases would be consuming a low protein diet. This will not be a good advice for a child, as proteins are an important component, in which case a normal protein intake with reduced amounts of phenylalanine and tyrosine can be used for nourishment of the child. Phenylalanine is an amino acid which can be easily converted to tyrosine. As tyrosine is a conditional amino acid, which can be synthesized by the human body, diet with low-protein should not affect the growth of the child. Along with this, one could also take HPD protein subcutaneously after diet (lunch/dinner) consumption.

References :

  1. Aarenstrup L, Falch AM, Jakobsen KK, Neve S, Henriksen L LØ, Tommerup N, Leffers H, Kristiansen K. Expression and post-translational modification of human 4-hydroxy-phenylpyruvate dioxygenase. Cell Biol Int. 2002;26(7):615-25.
  2. Brownlee JM, Heinz B, Bates J, Moran GR. Product analysis and inhibition studies of a causative Asn to Ser variant of 4-hydroxyphenylpyruvate dioxygenase suggest a simple route to the treatment of Hawkinsinuria. Biochemistry. 2010 Aug 24;49(33):7218-26. doi: 10.1021/bi1008112.
  3. Rüetschi U, Cerone R, Pérez-Cerda C, Schiaffino MC, Standing S, Ugarte M, Holme E. Mutations in the 4-hydroxyphenylpyruvate dioxygenase gene (HPD) in patients with tyrosinemia type III. Hum Genet. 2000 Jun;106(6):654-62.
  4. Tomoeda K, Awata H, Matsuura T, Matsuda I, Ploechl E, Milovac T, Boneh A, Scott CR, Danks DM, Endo F. Mutations in the 4-hydroxyphenylpyruvic acid dioxygenase gene are responsible for tyrosinemia type III and hawkinsinuria. Mol Genet Metab. 2000 Nov;71(3):506-10.
  5. NCBI Resource Coordinators. Database resources of the National Center for Biotechnology Information. Nucleic Acids Research. 2016;44(Database issue):D7-D19. doi:10.1093/nar/gkv1290.
  6. Chang X, Wang K. wANNOVAR: annotating genetic variants for personal genomes via the web. Journal of medical genetics. 2012;49(7):433-436. doi:10.1136/jmedgenet-2012-100918.



VARIANT 3: PMM2 (phosphomannomutase 2) (SHREY MATHUR)


Gene Information: The PMM2 gene codes for an enzyme called phosphomannomutase 2 (PMM2). It is 51.49 kb in size with eight exons and codes for a transcript length of 2290 bp. This enzyme is involved in the process of glycosylation, which attaches groups of sugar molecules (oligosaccharides) to proteins. Glycosylation modifies proteins so they can perform a wider variety of functions. In one of the early steps of glycosylation, the PMM2 enzyme converts a molecule called mannose-6-phosphate to mannose-1-phosphate. Mannose-1-phosphate is then converted into GDP-mannose, which is transferred as small sugar molecule called mannose to the growing oligosaccharide chain. Once the correct number of small sugar molecules are linked together to form the oligosaccharide, it can be attached to a protein.[1]

Cytogenetic location:


Genomic location:

  • Chr16: 8811153 (on Assembly GRCh38)
  • Chr16: 8905010 (on Assembly GRCh37)


Protein change:


DNA Change:

  • 422 position ,G -> A










EXAC Europe 16045 G=0.9883 A= 0.0117
EXAC Asian 10448 G=0.9992 A=0.0008
EXAC American 5448 G=0.997 A=0.003
EXAC Other 306 G=0.99 A=0.01



For  UK TWIN COHORT :G=0.994   A=0.006


Tool Used: dbSNP and ClinVar



Results in single nucleotide variation, missense variant.

There is mutation which results in an amino acid change from Arginine to Histidine at the 141 position.[2]



Mutations in this gene have been shown to cause defects in glycoprotein biosynthesis, which results in carbohydrate-deficient glycoprotein syndrome type I.[3]

In individuals with PMM2-CDG (CDG-Ia) i.e congenital disorder of glycosylation type 1a, there is very low level of PMM2.

This results in  PMM2 enzyme activity in fibroblasts and leukocytes is typically 0% to 10% of normal [Van Schaftingen & Jaeken 1995, Carchon et al 1999, Jaeken & Carchon 2001].

For various clinical conditions that result in PMM2-CDG (CDG-Ia) i.e congenital disorder of glycosylation type 1a, following are the major ones:-

  • Cerebellar hypoplasia/atrophy and small brain stem [Aronica et al 2005]
  • Esotropia
  • Seizures
  • Stroke-like episodes
  • Hepatic dysfunction (elevated transaminases)
  • Coagulopathy with low serum concentration of factors IX and XI, antithrombin III, protein C, and/or protein S [4]
  • Absent puberty in females, small testes in males.[4]






Fig 1: 3-D structure of PMM2(Mutation site in Red )


ADVISE TO INDIVIDUAL WITH THIS VARIANT: Since the treatment of the disease/pathogenicity is still to be found there should be precautionary measures made to follow for individuals with this mutation.

  • Firstly there have to steps taken to identify the risks and the mutation.
  • Then the following tests should be done on a regular basis for prevention of further complication in the form of below mentioned diseases:-
  1. Liver function tests, to avoid ketosis and elevated transamines.[5]
  2. Measurement of serum albumin concentration
  3. Thyroid function tests to evaluate for decreased thyroid binding globulin, elevated serum concentration of TSH, and low serum concentration of free T4
  4. Coagulation studies including protein C, protein S, antithrombin III, and factor IX, to avoid their low levels.
  5. Urinalysis to evaluate for proteinuria[6,7]
  6. Echocardiogram to evaluate for pericardial effusions[6,7]
  7. Renal ultrasound examination to evaluate for microcysts.

It is better to avoid the above conditions in advance for the mutation to affect the person in a life threatening manner.



There has been a lot of research work going on with regards to  PMM2 mutations.



[1] https://ghr.nlm.nih.gov/gene/PMM2#resources

[2] https://www.ncbi.nlm.nih.gov/snp/

[3] https://www.ncbi.nlm.nih.gov/clinvar/

[4] Akaboshi S, Ohno K, Takeshita K. Neuroradiological findings in the    carbohydrate-deficient glycoprotein syndrome. Neuroradiology. 1995;37:491–5.

[5] Rush JS, Panneerselvam K, Waechter CJ, Freeze HH. Mannose supplementation corrects GDP-mannose deficiency in cultured fibroblasts from some patients with congenital disorders of glycosylation (CDG). Glycobiology 2000; 10: 829–835.

[6] www.jmol.org/

[7] Westphal V, Srikrishna G, Freeze HH. Congenital disorders of glycosylation: have you encountered them?. Genet Med 2000; 2: 329–337.


VARIANT 4: Gene Name: Interleukin 4 receptor (IL4R) (KUNAL AGARWAL)

Gene Description:  Interleukin 4 Receptor is a protein coding gene. IL4R gene codes for the alpha chain of interleukin-4 receptor, a type I transmembrane protein that binds to IL4 and IL13 for regulating IgE production. The encoded protein can promote differentiation of Th2 cells by binding to Interleukin 4. Its related pathways include: Hematopoietic cell lineage and p70S6K Signaling.

Variant genotype:

  • Variant: SNP rs1805010
  • DNA change: c.223A>G
  • Amino acid change: Ile 75 Val
  • Variation Location: NM_000418.3:c.223A->G
  • Cytogenetic location: 16p12
  • Genomic location: Chr16: 27344882 (on Assembly GRCh38) Chr16: 27356203 (on Assembly GRCh37)


Variant frequency in the overall human population and in the particular ethnic/regional population:

According to ExAC Release 1 Study-

Overall population: A = 0.54878, G = 0.45122

Europe: A = 0.5595, G = 0.4405

Asian: A = 0.5410, G = 0.4590

American: A = 0.5227, G = 0.4773


Variant Effect on Protein:

It leads to missense mutation. A missense mutation is a single nucleotide mutation in a codon which leads to the coding of a different amino acid. Isoleucine at position 75 changes to Valine due to the variant effect.


Variant effect on Gene/Protein Function:

IL4R gene codes for the alpha chain of interleukin-4 receptor, a type I transmembrane protein that binds to IL4 and IL13 for regulating IgE production.

This variant has been associated with atopy which can lead to development of allergic diseases like allergic rhinitis, asthma and atopic dermatitis. Atopy is associated with heightened immune responses to common allergens. Variation in the gene has also been associated with resistance to human immunodeficiency virus type-1 infection.

Tools Used: ClinVar


If the variant has been published, cite and summarize:


  • This research concluded that the variant (Ile 75 Val) in the IL4R gene is unlikely to contribute significantly to increased IgE levels and variations outside the coding region may influence atopy susceptibility (Franjkovic et al. 2005). https://www.ncbi.nlm.nih.gov/pubmed/15712015


A structural model of the protein:


Fig 1: IL4R Protein Structure with marked variant location in yellow using Pymol tool


  1. Franjkovic, Izolda, et al. “Effects of Common Atopy-Associated Amino Acid Substitutions in the IL-4 Receptor Alpha Chain on IL-4 Induced Phenotypes.” Immunogenetics, vol. 56, no. 11, 2005, pp. 808–817., doi:10.1007/s00251-004-0763-1.
  2. Soriano, Alex, et al. “Polymorphisms in the Interleukin-4 Receptor α Chain Gene Influence Susceptibility to HIV-1 Infection and Its Progression to AIDS.” Immunogenetics, vol. 57, no. 9, 2005, pp. 644–654., doi:10.1007/s00251-005-0041-x.
  3. http://jmol.sourceforge.net/
  4. https://www.ncbi.nlm.nih.gov/projects/snp/rs1805010#clinical_significance
  5. https://www.ncbi.nlm.nih.gov/clinvar/variation/14666/

 Worked by Group 4 : Liu Yiquyi, Kunal Agarwal, Shrey Mathur, Nirav Shah

















Group 6 – Exome analysis of a Dai Chinese woman

Group 6 members: Genevieve Brandt, Victoria Caban Figueroa, Mohit Thakur, Qinyu Yue


We chose a dataset of a Chinese woman, who lives in Southwest China, from the Dai ethnic minority and at least 3 out of her 4 grandparents are Dai. Southwest China is a historical settlement of many ethnic minorities. Historically, these ethnic minorities always had unique cultures and separately lived as small tribes, which allowed them to maintain certain genetic specificity and diversity. In this project, we hoped to find unique variants in the genome of this Dai women, to compare to the whole human population and East Asian population.



Figure 1. Pipeline of exome analysis.

We aligned our fasta files to the reference genome: GRCh38 using bwa. Then, we used samtools for sorting the bam file and samtools mpileup to call the variants to a vcf file. We uploaded the vcf file to wANNOVAR in order to get more annotation information about our variants and compared them to HGMD (Human genome mutation database) and ClinVar in order to see which variants were clinically significant. We then filtered our raw ANNOVAR output table to find variants with frequency below 0.1 and a Phred score above 30, and found which of those variants were clinically significant. There were 32 variants that fit our quality criteria and 4 of those variants were found on HGMD. Only 2 were possibly pathogenic and interesting to research, so we then chose two other variants that were likely pathogenic, though they did not have as low frequencies in the population.

Summary of Variants

Table 1: Sequential filtering of the variants from subject HG00879 down to 4 significant variants. Of these, 2 were used for analysis

Number of Variants
Annovar 8760
Clinvar 727
HGMD (human genome mutation database) 293
Quality Filter 32
Quality Filter and Clinically significant 4

Genevieve Brandt: rs73969684

Variant ID rs73969684
Gene name SCN9A (Sodium voltage-gated channel alpha subunit 9)
Variant genotype 554 C > A (Phred score 34)
Variant frequency whole population(ExAc): 0.003

East Asian: 0.008

Variant effect on protein missense
Change in the protein R158H (Arginine to Histidine)
Pathogenicity Possibly pathogenic

This variant, rs73969684, codes for a missense mutation of arginine to histidine. Arginine is charged while histidine is polar, which indicates that the SCN9A protein could have a changed shape as a result of this variation. The gene SCN9A codes for sodium voltage-gated channel alpha subunit 9, which is a part of the sodium ion channel NaV1.7 These sodium ion channels are essential in transporting electric signals, specifically in nociceptors, a type of nerve cell. Because the charge is essential in the functionality of these channels, changing of the amino acids could lead to a different function. I was not able to find a published structure of the protein from SCN9A, and found a predicted structure from the software RaptorX. I used JMOL to highlight the amino acid that is changed (Figure 2a) and also show a zoomed out version of the protein, though the structure does not seem to be correct, because of the long chain at both ends (Figure 2b).Changes in NaV1.7 lead to many different disease including pain disorders, insensitivity, extreme pain, and seizures.

The specific variant, R158H, has led to diseases such as small fiber neuropathy and the paroxysmal extreme pain disorder, which have been published in the last five years. A study by Meglic et al., 2012, on Paroxysmal Extreme Pain Disorder (PEPD) found the R158H variant in a child along with other family members that also experienced mild pain disorders. The child experienced moments of extreme pain, and the researchers concluded that since this location is very conserved in the genome, it indicates it would be essential for the functionality of neuron sensitivity (Meglic et al., 2012). Han et al., 2012 correlated the R158H variant to small fiber neuropathy in some patients, while other patients with small fiber neuropathy experienced different variants to the SCN9A gene. They found that R185H, the specific variant I studied, enhances the current and makes the dorsal root ganglion neurons hyperexcitable, which can lead to the pain problems. Over time, the signals degenerate the individual experiences differing pain sensitivity (Han et al., 2012). The different profiles of the disorder indicate that small changes to this gene have similar effects. They found that R185H, the specific variant I studied, enhances the current and makes the dorsal root ganglion neurons hyperexcitable, which can lead to the pain problems.

To summarize these results, this variant can lead to different pain disorders depending on how it specifically affects the individual and where the neurons are affected. Different variants of the same gene seem to lead to similar disorders, with some being more severe than others, which is why the pathogenicity on ClinVar is only listed as possible. Within our pipeline, our comparison to a clinical database and choice of high quality and low frequency variants indicates that this variant could cause problems for this individual, especially over time. However, it may also lead to no effects.


Figure 2. Structure of the predicted protein from SCN9A gene, from the raptorX software. SCN9A is a small subunit of the NaV1.7 sodium ion channel. a) The pink is the predicted protein structure and the yellow is the location of the missense mutation. b) A zoomed out version of the protein structure, demonstrating the long chains that raptorX was not able to place in the prediction, indicating that this predicted structure may not be accurate.

Mohit Thakur: rs786201057

Variant ID rs786201057
Gene name TP53, Tumor suppressor p53
Variant genotype 374C>T(Phred score 34)
Variant frequency See below for explanation*
Variant effect on protein Missense variation
Change in the protein T125M (Threonine to Methionine)
Pathogenicity Possibly pathogenic

*There was no exact frequency data either dbSNP or when searching for the EXAC frequency. However, the frequency is very low according to Bode et al. 2004, since it was found in very low numbers in a study of 6500 individuals of European or African American descent. Given its high Phred score and the information that the region is usually very highly conserved, we felt justified including it our analysis since our individual had very few pathogenic variants to begin with. We can assume the frequency of C is fairly large in both global and East European populations (the variant frequency being small).

TP53 (a tumor suppressor gene) creates the protein p53, which arrests cell growth and induces apoptosis in a variety of different cell types. It activates oxidative-stress induced apoptosis in order to cause necrosis in cells which are growing out of control. It does this by binding directly to DNA (it is located within the nucleus of cells) and controlling whether the DNA is repaired if damaged (p53 NIH, 2017).

There are other, relatively more common (but still rare), mutations in p53 have been associated with a wide variety of cancers, with the more common including breast/colon cancers, lymphomas/leukemias, lung/esophageal cancers, and hepatocellular cancers (Hollstein et al 1991). The mutation in our subject was still more rare. The mutation Threonine 125 to Methionine was not a common variation found during a study of approximately 6,500 individuals of European and African American ancestry in the NHLBI Exome Sequencing Project, which highlights its low frequency.  Furthermore, the location of the mutation is highly conserved and is on a DNA binding domain (Bode and Zigang 2004). It did disrupt protein function in yeast assays (Petitjean et al. 2007) which suggests that the change might not be fully benign. The variant was found in women with breast cancer (Zdenek and Vessela 2016) and adrenocortical carcinoma (Bougeard et al. 2015).The amino acid change is from a polar molecule (threonine) to a hydrophobic molecule (methionine) (figure 3). The structure of the protein is seen below (Figure 4).

Figure 3. On the left is Threonine, on the right in Methionine. M is hydrophobic and T is polar.  


Figure 4. Model of the p53 protein; the Threonine to Methionine amino acid change at position 125 is colored blue.

The evidence supporting the pathogenicity of this variant was medium to low quality. There were two assertions, one from Ambry Genetics on Jan 6, 207 and one from University of Washington Nov 20, 2015 that evaluated its clinical significance as likely pathogenic. However, there were few further details provided for either study. Due to this, but also considering the seriousness of the cancers which TP53 is known to affect, the advice I would give to this woman would be to make sure she screens herself and her children appropriately for cancers.

Victoria Caban Figueroa: rs1042714

Variant ID rs1042714
Gene name ADRB2(adrenoceptor beta 2); Gene encodes a major lipolytic receptor in human fat cells
Variant genotype 79 C > G (Phred score 5.76)
Variant frequency Whole population: 0.8

East Asian Population: 0.93

Variant effect on protein missense
Change in the protein Q27E (Glutamine to Glutamic Acid)
Pathogenicity Risk Factor

The reports of the level of review supporting the assertion of clinical significance for the variation in Clinvar for this variant is 0, but  but various studies have used this variant for supporting the genetic basis of obesity.

The beta-2 adrenoceptor (BAR-2) is a major lipolytic receptor in human fat cells. A polymorphism in codons 16, 27 and 164 of this receptor affects the function of subcutaneous adipocyte BAR-2(Large et al. 1997). To maintain a healthy system and energy balance the mobilization of lipids through lipolysis, the breakdown of lipids, in fat cells is key(Lafontan et al. 1993). This individual has a single nucleotide variant that causes a change in amino acid from glutamine to glutamic acid. This change affects the mobilization of lipids through lipolysis in fat cells affecting the lipolysis in human adipose tissue in vivo(Enocksson et al. 1995). This eventually becomes a major risk factor for obesity as observed in previous studies in women (Large et al. 1997). It is well known that obesity has a strong genetic influence as studied in animal models. Though the same can be said about human obesity, it is also well established that environmental factors play a big role in human obesity as well. A person with this variant could take certain precautions in lifestyle choices to ensure that environmental factors don’t play a role increasing the potential risk for obesity.


Figure 5: Structure of nanobody-stabilized active state of beta2 adrenoceptor.The green colored segment is the mutation that occurs in chr5 148826910.  The yellow fragment is the Camelid Antibody Fragment and the pink fragment is the Beta-2 Adrenergic Receptor. (This image was created using muPIT: http://mupit.icm.jhu.edu/MuPIT_Interactive/)  

Qinyu Yue: rs25640

Variant ID rs25640
Gene name HSD17B4 (Hydroxysteroid 17-beta dehydrogenase 4)
Variant genotype 317G>C (Phred score 34)
Variant frequency whole population(ExAc): 0.44

East Asian: 0.5166

Variant effect on protein missense
Change in the protein R106P (Arginine to Proline)
Pathogenicity Pathogenic

This variant in gene HSD17B4, rs25640, changed a single nucleotide from G to C at chr5, 119475838(GRCh38), which made the amino acid change from Arginine to Proline. So that may cause the function of this protein to change, leading to the bifunctional peroxisomal enzyme deficiency.

Protein function and structure:

The HSD17B4 gene encodes an enzyme involved in peroxisomal fatty acid beta-oxidation. It was first identified as a 17-beta-estradiol dehydrogenase (Leenders et al., 1996; van Grunsven et al., 1998). Multifunctional protein-2 (MFP2), another alternative title of beta-estradiol dehydrogenase 4, also called D-bifunctional protein, catalyzes the second (hydration) and third (dehydrogenation) reactions of the peroxisomal beta-oxidation of fatty acids and fatty acid derivatives (Ferdinandusse et al., 2006). Researchers revealed that HSD17B4 is expressed in many tissues as an approximately 3.0-kb mRNA transcript, with highest expression in liver, heart, prostate, and testis (Adamski et al., 1995). Here, I use an online tool, MuPIT Interactive to locate the amino acid variant in this protein(Figure 6) (Niknafs N et al., 2013).


Figure 6. The structure of hydroxysteroid 17-beta dehydrogenase 4. The green part is the position of variant, which changed from Arginine to Proline (R>P).

This variant has been reported in a male patient with D-bifunctional protein deficiency (Nakano, Kazutoshi, et al., 2001). The patient had a missense mutation R106P and a 52bp deletion in the gene for a peroxisomal beta-oxidation enzyme, D-3-hydroxyacyl-CoA dehydratase/D-3-hydroxyacyl-CoA dehydrogenase, D-bifunctional protein, whose fetal abnormalities including chylous ascites, polyhydramnios, claw hands, and hammer toes, died at 7 months of age. Considering a 52bp deletion involved in HSD17B4, we could not conclude that the missense mutation 317G>C of this gene directly related to the loss of function of D-bifunctional protein, but it still has possibility to be a pathogenic mutation.


Although the disease conditions may not happen to the Dai women, whose offspring should pay attention to prenatal genetic testing in case. Because the mortality rate of this genetic mutation seems higher in infancy.


Adamski, Jerzy, et al. “Molecular cloning of a novel widely expressed human 80 kDa 17β-hydroxysteroid dehydrogenase IV.” Biochemical journal 311.2 (1995): 437-443.

Bode, Ann M., and Zigang Dong. “Post-translational modification of p53 in tumorigenesis.” Nature Reviews Cancer4.10 (2004): 793-805.

Bougeard, Gaëlle, et al. “Revisiting Li-Fraumeni syndrome from TP53 mutation carriers.” Journal of Clinical Oncology 33.21 (2015): 2345-2352.

Enocksson, Staffan, et al. “Demonstration of an in vivo functional beta 3-adrenoceptor in man.” Journal of Clinical Investigation 95.5 (1995): 2239.

Ferdinandusse, Sacha, et al. “Mutational spectrum of D-bifunctional protein deficiency and structure-based genotype-phenotype analysis.” The American Journal of Human Genetics 78.1 (2006): 112-124.

Han, Chongyang, et al. “Functional profiles of SCN9A variants in dorsal root ganglion neurons and superior cervical ganglion neurons correlate with autonomic symptoms in small fibre neuropathy.” Brain 135.9 (2012): 2613-2628.

Hollstein, Monica, et al. “p53 mutations in human cancers.” Science 253.5015 (1991): 49-53.
Kleibl, Zdenek, and Vessela N. Kristensen. “Women at high risk of breast cancer: Molecular characteristics, clinical presentation and management.” The Breast 28 (2016): 136-144.

Lafontan, M., and M. Berlan. “Fat cell adrenergic receptors and the control of white and brown fat cell function.” Journal of lipid research 34.7 (1993): 1057-1091.

Large, V et al. “Human Beta-2 Adrenoceptor Gene Polymorphisms Are Highly Frequent in Obesity and Associate with Altered Adipocyte Beta-2 Adrenoceptor Function.” Journal of Clinical Investigation 100.12 (1997): 3005–3013. Print.

Leenders, Frauke, et al. “Porcine 80-kDa protein reveals intrinsic 17-hydroxysteroid dehydrogenase, fatty acyl-CoA-hydratase/dehydrogenase, and sterol transfer activities.” Journal of Biological Chemistry 271.10 (1996): 5438-5442.

Meglič, Anamarija, et al. “Painful micturition in a small child: an unusual clinical picture of paroxysmal extreme pain disorder.” Pediatric nephrology 29.9 (2014): 1643-1646.

Niknafs N, Kim D, Kim R, Diekhans M, Ryan M, Stenson PD, Cooper DN, Karchin R. Hum Genet. 2013 Nov;132(11):1235-43. doi: 10.1007/s00439-013-1325-0. Epub 2013 Jun 23. PMID: 23793516.

Nakano, Kazutoshi, et al. “D-bifunctional protein deficiency with fetal ascites, polyhydramnios, and contractures of hands and toes.” The Journal of pediatrics 139.6 (2001): 865-867.

Petitjean, Audrey, et al. “Impact of mutant p53 functional properties on TP53 mutation patterns and tumor phenotype: lessons from recent developments in the IARC TP53 database.” Human mutation 28.6 (2007): 622-629.

Van Grunsven, Elisabeth G., et al. “Peroxisomal D-hydroxyacyl-CoA dehydrogenase deficiency: resolution of the enzyme defect and its molecular basis in bifunctional protein deficiency.” Proceedings of the National Academy of Sciences 95.5 (1998): 2128-2133.


Exome Variant Analysis on British Female


Group 7: Casey Smith, Yihao Ou, Yihan Lu, Yuntian He, Rong Jin


In this project, we collected data from The 1000 Genomes Project and applied a series of analyses in order to find out the likely pathogens that may exist in the chosen subject.

The 1000 Genomes Project is a global project aiming to find genetic variants with frequencies of at least 1% in the populations studied. This project was run between 2008 and 2015, and created the largest public catalog of human variation and genotype data. The final data set contains data for 2504 individuals from 26 different populations. Low coverage and exome sequence data are present for all individuals, and 24 individuals were also sequenced to high coverage for validation purposes.

Details about the subject we chose to study are shown in the following table:

Identifiers BioSample: SAMN00004632; SRA: SRS006847; Coriell: HG00106; 1000G: HG00106
Population GBR
Super Population Description European
Population Description British From England and Scotland, UK
Sex Female
Family role Unrelated
Coriell panel MGP00003
Dna-id Hg00106


This is the workflow we followed to identify likely pathogens in our chosen individual:1

The criteria we have adopted to filter out the unwanted genes is as following:
  CADD-Phred score >= 20 (this shows the 1% most deleterious variants)
  SIFT score < 0.05 (Variants with scores in this range are considered deleterious.   Variants with scores closer to 0.0 are more confidently predicted to be deleterious)
 Exac freq (< 10^-5) (We only wanted to consider low frequency variants)

After filtering out all the unwanted data, five genes were picked by our group members’ interest.

Casey Smith: XYLT1
Protein Accession: Q86Y38
Variation type: Non-synonymous Single Nucleotide variation.

XYLT1 gene
Location ExAC_Freq SIFT_score CADD_phred Nucleotide Change
chr 16 17198335 8.25 x 10-6 3.0 x 10-3 31 G → T

Table 1: XYLT1 variation information

XYLT1 (Xylosyltransferase 1) is a protein coding gene that is associated with Desbuquois Dysplasia 2 and Pseudoxanthoma Elasticum. The encoded protein catalyzes transfer of UDP-xylose to serine residues of an acceptor protein substrate, and is a key enzyme in the reaction for biosynthesis of glycosaminoglycan chains.

In the case of the heritable disorder Pseudoxanthoma Elasticum (PXE), mutations in the gene ABCC6 lead to accumulation of deposits of calcium and other minerals in elastic fibers of the skin, eyes, blood vessels, and less frequently, the digestive tract. The missense mutation ala115 to ser in the gene XYLT1 is associated with higher serum xylotransferase activity, meaning that mutations of XYLT1 lead to increased severity of PXE. In the same study, all PXE patients with the variation suffered from skin lesions compared to only 75% of the wildtype patients. (Schon et al. 2006).

Desbuquois Dyplasia 2 (DBQD) is a heritable disorder related to the development of bones and cartilage, characterized by short extremities, severe joint laxity with dislocation, osteopenia, kyphoscoliosis, distinctive facial characteristics and other abnormalities. A missense mutation c.C1441T in XYLT1, which encodes XT1, within a large stretch on chromosome 16p13.12-p12.1 is not found in over 13,000 alleles in the exome variant server and is predicted to change a highly conserved arginine at position 481. Immunostaining of fibroblast samples and western blot analysis of the protein decorin shows a loss of predominance of Golgi localization in mutant cells and glycosylation differences between mutant and control cells, providing evidence that functional alterations of XT1 cause an autosomal recessive short stature syndrome associated with intellectual disability (Schreml et al. 2014).

When compared to the ALL Variants dataset of the 1000Genomes Project, the allelic frequency of this single nucleotide variation is 0.0002; and when compared to the EUR Variants dataset its allelic frequency is 0.001. The variant was not seen in the AFR (African), AMR (Ad Mixed American), EAS (East Asian), and SAS (South Asian) datasets, meaning that the variant is quite rare. The whole-exome allelic frequency of this variant is 8.25x 10-6 and the ESP6500 allele frequency is nonexistent, meaning that it has not been observed in the 6500 exomes sequenced.

Figure 1: The missense mutation occurs at the green section of the image (G → T).

Yihan Lu:  DRAM2
Protein Accession: Q6UX65
Variation type: Non-synonymous Single Nucleotide variation.

DRAM2 gene
Location ExAC_Freq SIFT_score CADD_phred Nucleotide Change
chr 1 111119896 4.148×10-5 0.008 25.9 T → C

Table 2: DRAM2 variation information

DRAM2 gene encode a 266-amino acids protein which called DNA damage regulated autophagy modulator 2 (or the alternative symbol, transmembrane protein 77). The protein encoded by this gene shares 45% identity with DRAM1 protein and is able to binds microtubule-associated protein 1. These binding ability facilitates autophagy.

In the previous researches, O’Prey et al. (2009) indicated that DRAM2 is mostly expressed in placenta and heart and almost no expression in brain and thymus.Park et al. (2009) shows that the protein has 6 transmembrane domain and is colocalized with DRAM1 in lysosome. In this study, they also indicated that in ovarian tumor cell, DRAM2 mRNA and protein will be downregulated. On the other had, overexpression of both DRAM1 and DRAM 2 in HEK293 cells increased apoptotic cell death, but if only one of them is overexpressed, the cell will not be affected. In the silencing experiment of DRAM2 in human colon carcinoma cell line, the cells are unaffected by p53, which imply that DRAM2 play an important role in p53 mediated apoptosis.

The deficition in this gene is reported to cause retinal disorder. El-Asrag et al. (2015) indicated that in many cases, deletion or substitution of a single nucleotide base pair (which happens on exon 3 or 6 most frequently) could cause the cone-rod dystrophy 21.  Two patients in this study has a c.G131A transition in exon 3, which result in a S44N mutation on peptide chain. This mutation cause the adult-onset retinal and macular dystrophy. For another patient suffered from central vision loss,  c.A362T transversion is found on exon 6.

For the data we analyzed, there are several single nucleotide variants happened. The results are list below:

Transcript ID Exon Nucleotide Change AA Change
NM_001349889 6 A191G H64R
NM_001349885 7 A581G H194R
NM_001349887 7 A311G H104R
NM_001349890 7 A191G H64R
NM_001349892 7 A191G H64R
NM_001349893 7 A191G H64R
NM_178454 7 A581G H194R
NM_001349881 8 A581G H194R
NM_001349882 8 A581G H194R
NM_001349884 8 A581G H194R
NM_001349886 8 A311G H104R
NM_001349888 8 A311G H104R
NM_001349891 9 A191G H64R

Table 3: Several single nucleotide variants

When compared to the ALL Variants dataset of the 1000Genomes Project, the allelic frequency of this single nucleotide variation is 0.0002; and when compared to the EUR Variants dataset its allelic frequency is 0.001. The variant was not seen in the AFR (African), AMR (Ad Mixed American), EAS (East Asian), and SAS (South Asian) datasets, meaning that the variant is quite rare. The whole-exome allelic frequency of this variant is 4.148x 10-5.  ESP6500 allele frequency is 0.0002, SIFT score is 0.008 and CADD_PHRED is 25.9.

pasted image 0

Figure 2: Homology model of DRAM2. The side chain labeled in green shows the variant amino acid.

Rong Jin: ITPA
Protein Accession: Q9BY32
Variation type: Non-synonymous Single Nucleotide variation.

ITPA gene
Location ExAC_Freq SIFT_score CADD_phred Nucleotide Change
chr 20 3213364 0.0002 0 31 G → A

Table 4: ITPA variant information

ITPA encodes an inosine triphosphate pyrophosphohydrolase (Q9BY32). The encoded protein hydrolyzes inosine triphosphate and deoxyinosine triphosphate to the monophosphate nucleotide and diphosphate. This protein, which is a member of the HAM1 NTPase protein family, is found in the cytoplasm and acts as a homodimer. Defects in the encoded protein can result in inosine triphosphate pyrophosphorylase deficiency which causes an accumulation of ITPA in red blood cells. Alternate splicing results in multiple transcript variants.

Disease takes the form of epileptic encephalopathy, a heterogeneous group of severe childhood onset epilepsies characterized by refractory seizures, neurodevelopmental impairment, and poor prognosis. Development is normal prior to seizure onset, after which cognitive and motor delays become apparent. EIEE35 is characterized by onset of seizures in the first months of life associated with essentially no normal development. Many patients die in early childhood.


Kevelam et al. (2015) reported 7 patients from 4 unrelated families with severe early-onset epileptic encephalopathy associated with a distinctive pattern of MRI abnormalities on brain imaging. Two of the families were consanguineous. Between 2 and 4 months of age, the patients showed T2 signal abnormalities and diffusion restriction in the posterior limb of the internal capsule. The optic radiation, brainstem tract, and cerebellar white matter regions were often affected. Imaging also showed delayed myelination and progressive brain atrophy. The patients presented shortly after birth with microcephaly, seizures, and failure to achieve developmental milestones; there was virtually no cognitive or motor development after disease onset, and all showed hypotonia with poor feeding. Additional features included cardiomyopathy (1 patient), electrocardiographic abnormalities (3 patients), and cataracts (3 sibs born of consanguineous parents). Six patients died between 10 months and 2.5 years; 1 was alive but severely disabled at age 3 years.

When compared to the ALL Variants dataset of the 1000Genomes Project, the allelic frequency of this single nucleotide variation is 0.0002; and when compared to the EUR Variants dataset its allelic frequency is 0.001. The variant was not seen in the AFR (African), AMR (Ad Mixed American), EAS (East Asian), and SAS (South Asian) datasets, meaning that the variant is quite rare. The whole-exome allelic frequency of this variant is 8.24x 10-5, and the ESP6500 allele frequency is 0.0001.


Figure 3: The SNV occurs at the green section of the image (G → A). (from MuPIT_Interactive)

Yihao Ou: CYP4V2
Protein Accession: Q6ZWL3
Variation type: Non-synonymous Single Nucleotide variation.

                                                               CYP4V2 gene         
Location ExAC_Freq SIFT_score CADD_phred         Nucleotide Change
Chr4 (start: 186000000) 7.41 x 10-5 1.3 x 10-2 31 G → A

Table 5: CYP4V2 variant information

The CYP4V2 gene (cytochrome p450, family 4, subfamily v, polypeptide 2) This gene encodes a member of the cytochrome P450 hemethiolate protein superfamily which are involved in oxidizing various substrates in the metabolic pathway. It is implicated in the metabolism of fatty acid precursors into n-3 polyunsaturated fatty acids.  The coding sequence begins in exon 1 and continues through exon 11 (Li et al., 2004). CYP4V2 is widely expressed in variety of tissues: heart, brain, placenta, lung, liver, skeletal muscle, kidney. Pancreas, retina, retinal pigment epithelium and lymphocytes/ The highest expression was found in retina. The predicted transmembrane segment of CYP4V2 resides near the N terminus, followed by a globular structural domain typical of the CYP450 family. The globular domain of CYP4V2 comprises 18 helices and beta structural segments. The heme group is located close to the surface of the protein, coordinated by the I helix toward the protein interior and the L helix superficially (Li et al., 2004).
This gene is found highly related to Bietti crystalline corneoretinal dystrophy which is an autosomal recessive retinal dystrophy characterized by multiple glistening intraretinal crystals scattered over the fundus, a characteristic degeneration of the retina, and sclerosis of the choroidal vessels, ultimately resulting in progressive night blindness and constriction of the visual field. In 23 of 25 unrelated patients with BCD, Li et al. (2004) identified 13 mutations in the CYP4V2 gene. As CYP4V2 is homologous to other members of CYP450 family 4, Li et al. (2004) suggested that it might have a role in fatty acid and steroid metabolism, which would be consistent with biochemical studies of patients with BCD (Lee et al., 2001). Patientss with BCD can be found accurse the world but more common in East Asia, particularly in China and Japanese populations (Hu 1982).

The mutation found in our subject is a single nucleotide substitution where a G is substituted with A at site 1339, exon 10, chromosome 4 and lead to the corresponding amino acid change from E to K. When compared to the ALL Variants dataset of the 1000Genomes Project, the allelic frequency of this single nucleotide variation is 0.0004; and when compared to the EUR Variants dataset its allelic frequency is 0.001. The whole-exome allelic frequency of this variant is 7.41x 10-5 and the ESP6500 allele frequency is 0.0001.

pasted image 0-2

Figure 4: 

Yuntian He: HPS1
Protein Accession: Q92902
Variation type: Non-synonymous Single Nucleotide variation.

                                                               HPS1 gene         
Location ExAC_Freq SIFT_score CADD_phred         Nucleotide Change
Chr10 (start: 98418200) 0.0005 0 33 C->T

Table 6: HPS1 variant information

Hermansky-Pudlak syndrome (HPS) is characterized by oculocutaneous albinism, a bleeding diathesis, and, in some individuals, pulmonary fibrosis, granulomatous colitis, or immunodeficiency. HPS causes oculocutaneous hypopigmentation, bleeding diathesis and granulomatous colitis or pulmonary fibrosis. (Sánchez-Guiu et al. 2014

HPS1 and HPS4 (606682) form a lysosomal complex that they termed BLOC3 (biogenesis of lysosome-related organelles complex-3). Coimmunoprecipitation experiments demonstrated that epitope-tagged and endogenous HPS1 and HPS4 proteins assembled with each other in vivo. The complex was predominantly cytosolic, with a small amount peripherally associated with membranes. Size exclusion chromatography and sedimentation velocity analysis of the cytosolic fraction indicated that HPS1 and HPS4 formed a moderately asymmetric complex with a molecular mass of about 175 kD. HPS1 and HPS4 are components of a cytosolic complex that is involved in the biogenesis of lysosomal-related organelles through a mechanism distinct from that operated by the AP3 complex(Martina et al. 2003).

Homozygous frameshifts in the HPS1 gene in Puerto Rican, Swiss, Irish, and Japanese patients with Hermansky-Pudlak syndrome (HPS1; 203300). Oh et al. (1998) performed mutation analysis on 44 unrelated Puerto Rican and 24 unrelated non-Puerto Rican HPS patients. A 16-bp frameshift duplication (604982.0001), the result of an apparent founder effect, was nearly ubiquitous among Puerto Rican patients. A frameshift at codon 322 may be the most frequent HPS mutation in Europeans. The mutation in these cases was a 1-bp insertion (or duplication) in a poly(C) tract at codons 322 to 324.

According to the result of wANNOVA, we found one of HPS1 mutations is located on chr10 98418200, where T replaces C leading Glu changes to Ser at site 639. Due to ClinVar database, this mutation is thought to be uncertain significance, with SIFT_Score 0 and CCAD_phred 33. Compared to the ALL Variants dataset of the 1000Genomes Project, the allelic frequency of this single nucleotide variation is 0.0004; compared to the EUR Variants dataset the frequency is 0.001. The whole-exome allelic frequency of this variant is 0.005.

Screen Shot 2017-12-09 at 4.02.42 PM

Figure 5: HPS1 protein with variant amino acid in green.


In this project, totally 20657 variants were analyzed. Five of these variants were picked out for further detailed study. DRAM2 and CYP4V2 are related to the retina, and mutation of these genes is related to cone-rod dystrophy 21 and Bietti crystalline corneoretinal dystrophy, respectively. HPS1 SNV causes Hermansky-Pudlak syndrome. XYLT1 and ITPA are multifunctional genes and the mutation of these two gene can cause more than one disease. XYLT1 mutations are related to both Desbuquois Dysplasia 2 and Pseudoxanthoma Elasticum, while ITPA mutations can cause cardiomyopathy and cataracts. These SNPs are proven to cause severe human genetic disorders due to the encoded amino acid changes. The data also show that the SNPs listed above are highly associated with European subjects and possibly do not affect other races.


1000 Genome Project Consortium, Auton, et al. “A global reference for human genetic variation”. Nature (2015); 526(7571):68-74.



El-Asrag, E. E., Sergouniotis, P. I., McKibbin, M., Plagnol, V., Sheridan, E., Waseem, N., Abdelhamed, Z., McKeefry, D., Van Schil, K., Poulter, J. A., UK Inherited Retinal Disease Consortium, Johnson, C. A., Carr, I. M., Leroy, B. P., De Baere, E., Inglehearn, C. F., Webster, A. R., Toomes, C., Ali, M. Biallelic mutations in the autophagy regulator DRAM2 cause retinal dystrophy with early macular involvement. Am. J. Hum. Genet. 96: 948-954, 2015.


Horikawa, T., Araki, K., Fukai, K., Ueda, M., Ueda, T., Ito, S., Ichihashi, M. Heterozygous HPS1 mutations in a case of Hermansky-Pudlak syndrome with giant melanosomes. Brit. J. Derm. 143: 635-640, 2000.

Hu DN Am J Med Genet. Genetic aspects of retinitis pigmentosa in China. 12(1):51-6, 1982

Lee, J., Jiao, X., Hejtmancik, J. F., Kaiser-Kupfer, M., Gahl, W. A., Markello, T. C., Guo, J., Chader, G. J. The metabolism of fatty acids in human Bietti crystalline dystrophy. Invest. Ophtal. Vis. Sci. 42: 1707-1714, 2001.

Li, A., Jiao, X., Munier, F. L., Schorderet, D. F., Yao et al. Bietti crystalline corneoretinal dystrophy is caused by mutations in the novel gene CYP4V2. Am. J. Hum. Genet. 74: 817-826, 2004.

Martina, J. A., Moriyama, K., Bonifacino, J. S. BLOC-3, a protein complex containing the Hermansky-Pudlak syndrome gene products HPS1 and HPS4. J. Biol. Chem. 278: 29376-29384, 2003.

Niknafs N, Kim D, Kim R, Diekhans M, Ryan M, Stenson PD, Cooper DN, Karchin R. (2013). MuPIY interactive: webserver for mapping variant positions to annotated, interactive 3D structures. Hum Genet. 132(11):1235-43.

Oh, J., Bailin, T., Fukai, K., Feng, G. H., Ho, L., Mao, J., Frenk, E., Tamura, N., Spritz, R. A.Positional cloning of a gene for Hermansky-Pudlak syndrome, a disorder of cytoplasmic organelles. Nature Genet. 14: 300-306, 1996.

O’Neill, M. J., & McKusick, V. A. (2004, April 27). CYTOCHROME P450, FAMILY 4, SUBFAMILY V, POLYPEPTIDE 2; CYP4V2. Retrieved December 8, 2017, from http://omim.org/entry/608614#4

O’Prey, J., Skommer, J., Wilkinson, S., Ryan, K. M. Analysis of DRAM-related proteins reveals evolutionarily conserved and divergent roles in the control of autophagy. Cell Cycle 8: 2260-2265, 2009.

Park, S.-M., Kim, K., Lee, E.-J., Kim, B.-K., Lee, T. J., Seo, T., Jang, I.-S., Lee, S.-H., Kim, S., Lee, J.-H., Park, J.Reduced expression of DRAM2/TMEM77 in tumor cells interferes with cell death. Biochem. Biophys. Res. Commun. 390: 1340-1344, 2009.

Sánchez-Guiu, I., Torregrosa, J. M., Velasco, F., Antón, A. I., Lozano, M. L., & Vicente, V., et al. (2014). Hermansky-pudlak syndrome. overview of clinical and molecular features and case report of a new hps-1 variant. Hmostaseologie, 34(4), 301-9.

Schon et al. (2006). Polymorphisms in the xylosyltransferase genes cause higher serum XT‐I activity in patients with pseudoxanthoma elasticum (PXE) and are involved in a severe disease course. J Med Genet. 43(9): 745-749.

Schreml et al. (2014). The missing “link”: an autosomal recessive short stature syndrome caused by a hypofunctional XYLT1 mutation. Hum Genet. 133(1):29-39.

Sietske H. Kevelam, Jörgen Bierau,et.al “Recessive ITPA mutations cause an early infantile encephalopathy”  Annals of Neurology Volume 78, Issue 4, October 2015 , Pages 649–658

Yang H, Wang K. Genomic variant annotation and prioritization with ANNOVAR and wANNOVAR. Nature Protocols, 10:1556-1566, 201

Exome Variants in Person of European Descent

Undergraduate Group 1: Allie Caughman, Kyle Hamilton, Robert Leon, Shin Park, Jessica Rosenfield


Variant Analysis: 

Exome: 34481510240802_annotated.vcf from the PGP Project of individual of European descent.

Analytical Pipeline:

Using the VCF file provided, we ran the data through wANNOVAR using reference genome hg19, RefSeq genes and default parameters. Variants were then filtered to create a table containing only clinically significant pathogenic variants according to ClinVar Significance. Each individual group member then researched a variant. RaptorX to model the protein for the genes for each variant chosen.

Summary table:

Chr Ref Alt Exonic Function 1000G_ALL ExAC_Freq ClinVar_SIG
chr1 T C nonsynonymous SNV 0.53 0.5887 Pathogenic
chr3 AA frameshift deletion . 0.0001 Pathogenic
chr4 G A nonsynonymous SNV 0.6 0.5365 Pathogenic
chr5 C T synonymous SNV 0.3 0.1913 Pathogenic
chr10 A C nonsynonymous SNV 0.14 0.2122 Pathogenic
chr12 T C nonsynonymous SNV 0.88 0.8502 Pathogenic
chr19 C T synonymous SNV 0.29 0.271 Pathogenic
chr19 C T synonymous SNV 0.1 0.088 Pathogenic
chrX C A nonsynonymous SNV 0.18 0.1434 Pathogenic
chrX A T nonsynonymous SNV 0.15 0.1975 Pathogenic
chr5 T C nonsynonymous SNV 0.98 0.9708 Pathogenic
chr16 C T nonsynonymous SNV 1 0.9938 Pathogenic
chr11 G T synonymous SNV 0.15 0.1611 Pathogenic
chr17 C T synonymous SNV 0.2 0.2656 Pathogenic
chr1 G A nonsynonymous SNV 0.74 0.7652 Pathogenic
chr1 T C nonsynonymous SNV 0.89 0.9138 Pathogenic
chr7 A T nonsynonymous SNV . 0.4751 Pathogenic
chr11 C T nonsynonymous SNV 0.024 0.0647 Pathogenic
chr16 C T nonsynonymous SNV 0.18 0.2446 Pathogenic
chrX C G nonsynonymous SNV . 0.0017 Pathogenic
chr1 G A nonsynonymous SNV 0.35 0.383 Pathogenic
chr6 C G nonsynonymous SNV 0.073 0.1066 Pathogenic
chr1 T C nonsynonymous SNV 0.99 0.9785 Pathogenic
chr1 G A stopgain 0.0034 0.0087 Pathogenic
chr22 C T nonsynonymous SNV 0.91 0.9171 Pathogenic
chr17 G A nonsynonymous SNV 0.0008 0.0047 Pathogenic
chr8 A G nonsynonymous SNV 0.053 0.0863 Pathogenic

Variant 1: Allie- Gene Name: IRGM immunity related GTPase M

Variant Genotype:

  • Variant: c313 C – > T
  • Amino Acid Change: – > Leu 105 – > Leu

Variant Effect on Protein:

  • Synonymous SNV causes no amino acid change.

Variant Effect on Protein Function:

  • Does not change protein function

Variant Effect on Gene Function:

  • Changes the ability of miRNA to bind to the DNA in this location

Variant Frequency:

  • GMAF = 0.3037, ExAC = 0.19130 ExAC Eurpean 0.1112

Variant Description:

Inflammatory Bowel Disease 19

This variant has been indicated in causing Crohn’s Disease. The SNP inhibits the binding of miRNA to the DNA. This binding reduced expression of the gene, but with the T base pair instead, inhibition is not achieved and leads to inflammation of the large intestine.

Evidence of pathogenicity: Evidence has been presented in literature only. Brest et al used microarray data to examine the gene expression and saw that in the 313C variant the gene expression was increased and in the 313T variant the gene was expressed.

Advice to person w variant: There is no cure for IBD or Crohn’s disease, but there are treatment options available to help manage symptoms. Diagnosis can be made via colonoscopy, biopsy, and blood tests in order to check for inflammation of the large intestine or anemia and increased white blood cell counts characteristic of these diseases.

Protein Model:

Screen Shot 2017-12-12 at 9.01.40 PM


Variant 2: Kyle – Gene Name: proline dehydrogenase 1

Variant genotype:

  • DNA Change: 1562A>G, missense, nonsynonymous variant
  • Minor (30% reduction in POX activity): arg431-to-his
  • Medium (30-70%): arg431-to-his
  • Major (>70%): leu441-to-pro

Variant frequency:

  • GO-ESP 0.00461 (T)
  • GMAF 0.00320 (T)
  • ExAC 0.00419 (T)
  • ExAC 0.9171 (T) (Regional population)

Variant effect on protein:

  • Missense Variant

Variant Description:

Only one paper details this mutation and replicates it. They introduced mutations, “For all expression studies, we utilized a subclone (CHO-K1-C9) of CHO-K1 cells that lack endogenous POX activity (Valle et al. 1973). For alleles with severe reduction in POX activity, we repeated the assay with the addition of 1 mM FAD.” They also used genotyping, “For those PRODH alleles whose frequency was not available in the literature (A167V, D426N, Q521E, and Q521R), we genotyped 50 North American controls. Because these mutations are also present in ΨPRODH, we used a long-range PCR strategy to selectively amplify an ∼10-kb fragment present only in PRODH, using primers that are not complementary to any ΨPRODH sequence.”

Description of variant effect on gene:

The normal gene encodes proline oxidase (POX) which is a mitochondrial inner-membrane enzyme expressed in kidney, liver, and brain that catalyzes the conversion of proline to P5C. High POX activity may play a role in apoptosis. Defects in the proline catabolic pathway are associated with increased plasma proline levels (Hyperprolinemia type I).

Evidence of pathogenicity: HPI is not well characterized but has been described as asymptomatic with some neurological issues and at least two instances with schizophrenia have been reported

(Humbertclaude et al. 2001; Jacquet et al. 2002, 2003)

Suggestions for person with variant: “HP-I is recognized by elevated blood proline levels. (The normal level is approximately 450 units, but people with HP-1 may have levels of 1900 to 2000 units.. Often, the diagnosis is made by exclusion. After failure to arrive at a diagnosis by other means, a blood proline level is ordered. The result confirms the diagnosis.”

“Because proline is so widespread among foods, attempts to control blood proline levels by restrictive dieting have not succeeded. Because the medical consequences of this particular inborn error of metabolism appear to be modest or inconsequential, many physicians do not take an aggressive approach toward treatment.”

“Hyperprolinemia Type I.” NORD (National Organization for Rare Disorders), rarediseases.org/rare-diseases/hyperprolinemia-type-i/.

Protein model:

Screen Shot 2017-12-12 at 9.12.42 PM

Variant 3: Robert – Gene Name: Aspartoacetate (ASPA)

Variant genotype:

  • C–>T, Synonymous

Variant frequency:

  • ~0.2% in overall population; Mainly affects individuals of Ashkenazi Jewish Family; autosomal recessive allele mutation.

Variant effect on protein:

  • The variant of a single nucleotide change causes a premature stop signal at Tyr231 codon.

Variant effect on gene/protein function:

Normal gene codes for an enzyme that catalyzes the conversion of N-acetyl_L-aspartic acid (NAA) to aspartate and acetate, which helps maintain white matter in the central nervous system.

Variant Description:

Pathogenic variant can be difficult to detect within DNA as it results in a synonymous base change for the protein. Variant can be determined due to physical attributions.

Canavan disease is an autosomal disease where the white matter tissue, such as myelin, is degraded in early infancy to early childhood, causing hypotonia, mental retardation, and eventually death at a young age. The mutation is a nonsense mutation where a single nucleotide mutation at Tyr231 results in a premature stop codon and overall lack of aspartoacetate enzyme in the blood to degrade N-acetylaspartic acid (NAA). NAA impairs normal myelination, resulting in spongy tissue of the nervous system.

Description of variant effect on gene:

Variant leads to truncated at Tyr231 codon or completely absent enzyme, leading to ultimate degradation of white matter in the central nervous system.

Evidence of pathogenicity: Pathogenic variant leads to degradation of white matter in the central nervous system, which can be determined by a brain biopsy. Large amounts of N-acetylaspartic acid (200x normal amount) will also be present in urine and the blood as the enzyme aspartoacetate will not be present to break down the acid

Suggestions for person with variant: Gene therapy for placing aspartoacetate enzyme in the brain by neurosurgical insertion of modified adeno-associated virus. (Janson, C., Mcphee, S. 2004)

Protein model:

Screen Shot 2017-12-12 at 9.23.02 PM


Variant 4: Shin – Gene Name: Hemochromatosis (HFE)

Variant genotype:

  • C > G / Histidine > Asparagine (H63D)

Variant frequency:

  • GO-ESP 0.11072 (G)
  • GMAF 0.07310 (G)
  • ExAC 0.10660 (G)

Variant effect on protein:

  • Missense mutation

Variant effect on gene/protein function:

The HFE gene encodes a protein involved in the regulation of hepcidin, the principal iron-regulatory hormone. Hepcidin works by preventing intestinal cells and macrophages from releasing iron into the bloodstream after iron levels have reached a certain threshold within the blood. Conversely, when the body is lacking iron, hepcidin levels decrease to promote the release of iron into the bloodstream. The mutation H63D along with C282Y cause a conformational change in the protein such that overall hepcidin levels decrease throughout the body. This decrease results in the body absorbing more iron than is necessary which can lead to severe hereditary hemochromatosis (characterized by liver failure, heart disease, and diabetes).

Variant Summary:

The H63D mutation of HFE results in a significant increase in serum transferrin saturation but does not result in a significant iron overload. The H63D mutation is not clinically significant in the absence of the C282Y mutation. The protein encoded by the HFE gene acts in a similar manner to MHC class I-type proteins by associating with beta2-microglobulin. The function of the encoded protein is to regulate blood iron levels by altering the interaction of transferring with transferrin receptors.

Evidence of pathogenicity: Multiple studies cite H63D as being a hemochromatosis associated allele with resulting effects of increased cellular uptake of iron. It is important to note, however, that few cases of H63D homozygotes actually manifest in clinical hemochromatosis unless otherwise paired with another mutation in the HFE gene.

Suggestions for person with variant – Treatment for patients with hemochromatosis involves removal of blood with phlebotomy to maintain normal blood iron levels or iron chelation therapy using specialized medicine for anemic patients.

Protein Model:

Screen Shot 2017-12-12 at 9.28.20 PM


Variant 5: Jessica – 4-hydroxyphenylpyruvate dioxygenase (HPPD)

Variant genotype:

  • 12q24.31 c.97G -> A

Variant frequency in the overall human population:

  • GMAF 0.12360 (T)
  • ExAC 0.14978 (T)

Variant effect on protein:

The variant results in a missense mutation, substituting alanine with threonine at codon 33. This is a change in the N-terminus region of the protein.


Variant effect on gene/protein function:

HPD is highly expressed in the liver and kidneys and is the second enzyme in a pathway that breaks down tyrosine, converting 4-hydroxyphenylpyruvate to homogentisic acid. While other missense variants could cause deficiency in the enzyme that leaves an excess the substrate (a form of tyrosinemia), this particular mutation results in the production of an intermediate that then reacts with glutathione to form hawkinsin.

Evidence of pathogenicity – Symptoms first appear during the transition away from breast milk. The notable symptoms are failure to thrive, acidosis, having fine and sparse hair, and excretion of hawkinsin, quinolacetic acid, and pyroglutamic acid. This disorder is termed hawkinsinuria.

Suggestions for person with variant – Traditional treatment involves a diet low in tyrosine and phenylalanine during the first few years of life. Those with hawkinsinuria are asymptomatic by late childhood with the exception of the excretion of hawkinsin, resulting in their urine having a chlorine-like smell. More recent research has shown that N-acetyl-L-cysteine treatment likely causes a decrease in hawkinsin in urine. It is proposed that this mitigates glutathione depletion caused by the variant.


Protein model:

Screen Shot 2017-12-12 at 9.41.04 PM






  1. Amre, D. K. et al. Autophagy gene ATG16L1 but not IRGM is associated with Crohn’s disease in Canadian children. Inflamm. Bowel Dis. 15, 501–507 (2009).
  2. Brest, P. et al. A synonymous variant in IRGM alters a binding site for miR-196 and causes deregulation of IRGM-dependent xenophagy in Crohn’s disease. Nature Genetics 43, 242–245 (2011).
  3. Parkes, M. et al. Sequence variants in the autophagy gene IRGM and multiple other replicating loci contribute to Crohn’s disease susceptibility.
  4. Weersma, R. K. et al. Confirmation of multiple Crohn’s disease susceptibility loci in a large Dutch-Belgian cohort. Am. J. Gastroenterol. 104, 630–638 (2009).
  5. https://www.ncbi.nlm.nih.gov/clinvar/variation/30716/
  6. Bender HU, Almashanu S, Steel G, Hu CA, Lin WW, Willis A, Pulver A, Valle D. Functional consequences of PRODH missense mutations. The American Journal of Human Genetics. 2005;76:409–420. doi: 10.1086/428142.
  7. Jacquet H., Raux G., Thibaut F., Hecketsweiler B., Houy E., Demilly C., Haouzir S., Allio G., Fouldrin G., Drouin V. et al. (2002). PRODH mutations and hyperprolinemia in a subset of schizophrenic patients. Hum. Mol. Genet. 11, 2243-2249. 10.1093/hmg/11.19.2243
  8. Gochee, P. A., Powell, L. W., Cullen, D. J., Du Sart, D., Rossi, E., & Olynyk, J. K. (2002). A population-based study of the biochemical and clinical expression of the H63D hemochromatosis mutation. Gastroenterology, 122(3), 646-651.
  9. https://www.ncbi.nlm.nih.gov/clinvar/RCV000000026.10/
  10. https://www.ncbi.nlm.nih.gov/clinvar/variation/10/
  11. Matalon, R., Kaul, R., Casanova, J., Michals, K., Johnson, A., Rapin, I., … & Deanching, M. (1989). Aspartoacylase deficiency: the enzyme defect in Canavan disease. Journal of inherited metabolic disease, 12(2), 329-331.
  12. https://www.ncbi.nlm.nih.gov/clinvar/variation/2609/#supporting-observations
  13. https://www.ncbi.nlm.nih.gov/clinvar/RCV000002727/
  14. https://www.ncbi.nlm.nih.gov/gene/443
  15. Janson, C., McPhee, S., Bilaniuk, L., Haselgrove, J., Testaiuti, M., Freese, A., … & Saslow, E. (2002). Gene therapy of Canavan disease: AAV-2 vector for neurosurgical delivery of aspartoacylase gene (ASPA) to the human brain. Human gene therapy, 13(11), 1391-1412.
  16. Tomoeda K. et al. Mutations in the 4-Hydroxyphenylpyruvic Acid Dioxygenase Gene Are Responsible for Tyrosinemia Type III and Hawkinsinuria. Molecular Genetics and Metabolism, Volume 71, Issue 3, 2000, Pages 506-510.
  17. Knox WE, LeMay-Knox M (Oct 1951). “The oxidation in liver of l-tyrosine to acetoacetate through p-hydroxyphenylpyruvate and homogentisic acid”. The Biochemical Journal. 49 (5): 686–93.
  18. Wang K, Li M, Hakonarson H. ANNOVAR: functional annotation of genetic variants from high-throughput sequencing data. Nucleic Acids Research, 38:e164, 2010.
  19. Chang X, Wang K. wANNOVAR: annotating genetic variants for personal genomes via the web. J Med Genet. 2012 Jul;49(7):433-6.
  20. Amre, D. K. et al. Autophagy gene ATG16L1 but not IRGM is associated with Crohn’s disease in Canadian children. Inflamm. Bowel Dis. 15, 501–507 (2009).
  21. Brest, P. et al. A synonymous variant in IRGM alters a binding site for miR-196 and causes deregulation of IRGM-dependent xenophagy in Crohn’s disease. Nature Genetics 43, 242–245 (2011).
  22. Parkes, M. et al. Sequence variants in the autophagy gene IRGM and multiple other replicating loci contribute to Crohn’s disease susceptibility.
  23. Weersma, R. K. et al. Confirmation of multiple Crohn’s disease susceptibility loci in a large Dutch-Belgian cohort. Am. J. Gastroenterol. 104, 630–638 (2009).
  24. Morten Källberg, Haipeng Wang, Sheng Wang, Jian Peng, Zhiyong Wang, Hui Lu, and Jinbo Xu. Template-based protein structure modeling using the RaptorX web server. Nature Protocols 7, 1511-1522, 2012.
  25. Jianzhu Ma, Jian Peng, Sheng Wang and Jinbo Xu. A Conditional Nueral Fields model for protein threading. Bioinformatics, Vol. 28, Issue 12, i59-i66, 2012.
  26. Jianzhu Ma, Sheng Wang, Feng Zhao, and Jinbo Xu. Protein threading using context-specific alignment potential. Bioinformatics, Vol. 29, Issue 13, pp. i257-i265.
  27. Jian Peng and Jinbo Xu. A multiple-template approach to protein threading. PROTEINS, 2011 Jun;79(6):1930-9. doi: 10.1002/prot.23016. Epub 2011 Apr 4.
  28. Jian Peng and Jinbo Xu. RaptorX: exploiting structure information for protein alignment by statistical inference. PROTEINS, 2011, Vol 79, Issue S10, pp. 161-171.
  29. Gochee, P. A., Powell, L. W., Cullen, D. J., Du Sart, D., Rossi, E., & Olynyk, J. K. (2002). A population-based study of the biochemical and clinical expression of the H63D hemochromatosis mutation. Gastroenterology, 122(3), 646-651.
  30. Bender HU, Almashanu S, Steel G, Hu CA, Lin WW, Willis A, Pulver A, Valle D. Functional consequences of PRODH missense mutations. The American Journal of Human Genetics. 2005;76:409–420. doi: 10.1086/428142.
  31. Jacquet H., Raux G., Thibaut F., Hecketsweiler B., Houy E., Demilly C., Haouzir S., Allio G., Fouldrin G., Drouin V. et al. (2002). PRODH mutations and hyperprolinemia in a subset of schizophrenic patients. Hum. Mol. Genet. 11, 2243-2249. 10.1093/hmg/11.19.2243


Group 9 – Stephen Wist, Vishnu Raghuram, Ayush Semwal, Kai Yuan


The 1000 Genomes project [1] sought to make a large database of global human genetic variation. Healthy individual’s genomes were collected, sequenced, and stored for public access. This is a boon for basic and integrated biomedical research. We chose to use this data to investigate the pathogenic exonic variants in an individual’s genome. From an online database [2] we chose HG00119. Details are:

  • Male
  • British (United Kingdom)
  • Parents and Grandparents born in the UK
  • Unknown age
  • Source from B-lymphocyte




FASTQ (SRR099967) files were downloaded from said database. Generally, the htslib workflow [3] was followed. Briefly, the tools used were: bwa for mapping to hg38 and samtools for sorting the ouput BAM, Genome Analysis Tool Kit for improving BAM quality, and bcftools for calling variants. This final vcf file was given to wANNOVAR for analysis. This software was chosen for its ease of use and support of hg38 . Our criteria for variant filtering were variants with  a CADD Phred score > 20  recorded in dbSNP and  listed as ‘Pathogenic’ in ClinVar.




  1. The 1000 Genomes Project Consortium. A global reference for human genetic variation. Nature; 2015 Oct 1; 526; 68-74
  2. International Genome Sample Resource
  3. htslib samtools workflow



Variant 1: rs143137713

Condition: Polyglucosan body myopathy  2

By: Stephen Wist

This SNP is located at coding position 304 of glycogenen 1 mRNA transcript variant 1, changing a first codon G to C and resulting in a aspartate to histidine mutation at the 104th amino acid [1]. From the VCF file the read depth at this SNP is 95 and has a mapping quality of 59, showing overall good quality in the call.

Glycogenin 1 (GYG1) is a member of the glycogenin family, a group broadly involved in converting glucose, the primary energy source of human cells, into glycogen, an energy reserve. Specifically GYG1 is a glucosyltransferase, catalyzing autoglucosylation of uridine diphosphate glucose into short glucose polymers. GYG1 uses divalent cations, usually manganese, as a cofactor. The aspartate at position 102 is important for function of GYG1. This Asp102His missense mutation may decreases GYG1’s affinity for cations since  these amino acids hold opposite charges.

Polyglucosal body myopathy 2 is a autosomal recessive disorder that results in weakness of the lower limbs and sometimes the arms or hands. The disease is typically slowly progressing but varies in severity and age of onset from childhood to late adulthood [2]. Muscle biopsy has shown a build-up of polyglucosan with unusually long and poorly branched glucose fibers. This abnormal polyglucosan also exhibits variably resistance to degradation by alpha-amylase [2], suggesting the ability of the D102H variant to yield an unfunctional protein.

This variant was reported twice. First in a clinical study of patients with myopathy [2]. Morphological, protein function, and genetic assays were done for seven patients with polyglucosan body myopathy. Of these, only one patient was reported to have the D104H mutation. An autoglycolysis in vitro assay of each patient’s GYG1 variant was performed. Addition of UDP-glucose did not result in a gel shift for the D102H variant, indicative of nonfunctional GYG1 [2].

The second report was by GeneDX [3], a company specializing in genetic testing for rare disorders. There is no attached publication, only a brief description of their finding:

The D102H variant is a strong candidate for a pathogenic variant, however the possibility it may be a rare benign variant cannot be excluded.”  [4]

Next to nothing is revealed about the methods apart that NGS is used and the test is 98% accurate (self-reported by GeneDX). NCBI’s Genetic Testing Registry does not show how this test was developed, and no citations of its accuracy [5].

Multiple sequence alignment shows this aspartate residue to be highly conserved across many species.


Fig 1. MSA of human GYG1 protein along with, from top to bottom of figure, Stylophora pistillata, Danio rerio, Ophiophagus hannah, Rattus norvegicus, Oryctolagus cuniculus, Bos taurus, Homo sapiens.

Together, these results are evidence for GYG1 D102H to be a dangerous variant implicated in polyglucosan body myopathy 2.


Fig 2. GYG1 alpha sheet and beta helix structure, complexed with manganese, UDP-glucose, and glucose [5]. The 102nd residue, aspartic acid,  is colored in blue in the center right of the image.

Polyglucosal body myopathy 2 is a rare (1 in 1 million [6]) disease, about which little is known and there is little incentive to find a treatment. As long as human gene editing is not safe, those with this mutation will have to live with it. The effects of this disease are slow to develop and thus may be abated by continual strength training, especially of the legs. Having stronger muscles should keep one more capable even as the disease progresses. Besides exercise, individuals should consider using mobility aids as their ability to walk decreases.



  1. NCBI Nucleotide Sequence,  NCBI dbSNP snpref
  2. Malfatti et al. A New Muscle Glycogen Storage Disease Associated with Glycogenin-1 Deficiency. Ann Neurol. 2014 Dec; 76(6); 891-898
  3. GeneDX
  4. NCBI Genetic Testing Registry
  5. Chaikuad, A. et al., Proc Natl Acad Sci USA. 2001 Dec 27; 108(52); 21028-21033
  6. OrphaNet


Variant: rs2238472
Condition: Pseudoxanthoma elasticum
By: Ayush Semwal

The location of the SNP (CADD phred score – 23) is position 3840 of chromosome 16. The SNP is basically a change of G to A, which leads to the residue change of Arg[R] (large size, basic) to Gln[Q] (medium size, polar) at position 1268. The exact functional change for this variant has not been reported, however, it was cited for association with plasma triglyceride and HDL cholesterol in Wang J et al [1].

Pseudoxanthoma elasticum(PXE) is a regressive disorder that results from the accumulation of calcium and other minerals (mineralization) in the elastic fibers. Yellowish bumps called papules on neck, underarm, and other areas of skin may be the symptoms of PXE. There may also be abnormalities in the eyes, such as change in pigmented cells of retina or bleeding and scarring of the retina [2].

The ATP binding cassette subfamily C member 6 (ABCC6) gene codes for Multidrug resistance-associated protein 6 (MRP6), which is primarily found in the liver and kidneys. In addition, it’s also found in other tissues such as skin, stomach, blood vessels and eyes. The MRP6 protein, as name suggests, belongs to a multi drug-resistance family which transport molecules across the cell membranes. However, a very little knowledge about the substances being transported by MRP6 is present [2].

Some research suggest that the release of ATP is stimulated by MRP6, the mechanism of which is still unknown. The ATP is broken down to AMP and pyrophosphate, which helps in controlling the deposition of calcium and other minerals inside the body. Mutations in the ABCC6 gene is believed to be the cause of an absent or non-functional MRP6 protein. However, it is not clearly understood how a lack of proper functional MRP6 protein leads to PXE. There may be an impairment to release of ATP from cells, resulting in little pyrophosphate production which may lead to accumulation of calcium and other minerals in the tissues. Alternatively, there can be an impairment to the transport of a substance that would normally prevent mineralization [2].

Ringpfiel et al. (2000) demonstrated that, in a family in which 2 brothers and a sister had PXE were compound heterozygotes for R1141X mutation and R1268Q mutation. In so called sporadic case of PXE R1138Q mutation was also found to be in compound heterozygosity with R1268Q mutation. In one of the families with PXE in which R1141X was identified by Ringpfiel et al. (2000), Germain et al. found R1268Q variant in homozygous state in the proband’s unaffected husband. Their investigation about R1268Q variant found 1268Q to be at relatively high frequency (0.19) in a control population of 62 Caucasians. Genotype frequencies were in Hardy-Weinberg equilibrium, and 3 healthy volunteers were homozygous for the Q1268 allele. R1268Q is thus a harmless polymorphism when present in homozygous state.


Since, the MRP6 structure couldn’t be found in any of the protein databases so, MRP1 was selected as the template (47.43% seq. identity) for predicting the MRP6 structure.




  1. Wang, J., et al. “ABCC6 Gene Polymorphism Associated with Variation in Plasma Lipoproteins.” Journal of Human Genetics, vol. 46, no. 12, Jan. 2001, pp. 699–705., doi:10.1007/s100380170003.
  2. “Pseudoxanthoma Elasticum – Genetics Home Reference.” U.S. National Library of Medicine, National Institutes of Health, ghr.nlm.nih.gov/condition/pseudoxanthoma-elasticum#resources.
  3. Ringpfeil, F., Lebwohl, M.G., Christiano, A. M., Uitto, J. Pseudoxanthoma elasticum: mutations in the MRP6 gene encoding a transmembrane ATP-binding cassette (ABC) transporter. Proc. Nat. Acad. Sci. 97: 6001-6006, 2000.
  4. Germain, D. P., Perdu, J., Remones, V., Jeunemaitre, X. Homozygosity for the R1268Q mutation in MRP6, the pseudoxanthoma elasticum gene, is not disease-causing. Biochem. Biophys. Res. Commun. 274: 297-301, 2000.


Variant: rs1800450

Condition: Mannose-binding protein deficiency

By Vishnu Raghuram – MBL2 variant rs1800450

Description of pathogenic variant chosen:
MBL2 gene: Mannose binding protein or Mannose-binding lectin.

Gene loci: chr10q11.2-q21.


Function: MBL are pattern recognition receptors that activate the innate immunity’s complement system through the lectin pathway. MBLs recognize and bind to carbohydrate pathogen associated molecular patterns, such as lipopolysaccharides in gram-negative bacterial cell membrane. This binding activates MBL-associated serine proteases (MASPs) and triggers the complement cascade, ultimately leading to lysis of the pathogen. [1]


Variant: Nonsynonymous missense mutation
Variant genomic location: chr10 52771475
Variant dbSNP ID: rs1800450
Human genome variation: C > T
Overall allele frequency: C = 0.8780; T = 0.1220
GBR population Allele frequency: C = 0.8901; T = 0.1099
Amino acid change: Gly54Asp
CADD phred score: 28.9
Condition: Mannose-binding protein deficiency

Description and critique of evidence for pathogenicity:
This codon 54 variation has been extensively studied and documented, and is one of the common causes of MBL deficiency. As MBL plays a critical role in innate immunity, MBL deficiency severely increases the chances of systemic infections. [2]. It is associated with several diseases such as diabetes [3-5], rheumatoid arthritis [6], acute lymphoblastic leukemia [7], systemic lupus erythematosus [8], and more. Thiel et al (2009) have published a comprehensive review highlighting the potential consequences of MBL deficiency [9].
Structural model of protein with variant AA highlighted: Modeled using G23D[10]po

Advice for individual with this variant:
The primary symptom of MLB2 deficiency is a weakened immune system, as it cripples lectin based complement activation. Fortunately, this is a common mutation and treatment is as simple as administering antibiotics and regular vaccinations. However, this would only lower the chances of systemic infections. The other diseases such as diabetes, ALL, etc which are associated with this mutation require their own specialized treatment.


  1. Eddie Ip, W., Takahashi, K., Alan Ezekowitz, R. and Stuart, L. (2009). Mannose-binding lectin and innate immunity. Immunological Reviews, 230(1), pp.9-21.
  2. Summerfield, J., Ryder, S., Sumiya, M., Thursz, M., Gorchein, A., Monteil, M. and Turner, M. (1995). Mannose binding protein gene mutations associated with unusual and severe infections in adults. The Lancet, 345(8954), pp.886-889.
  3. Muller, Y., Hanson, R., Bian, L., Mack, J., Shi, X., Pakyz, R., Shuldiner, A., Knowler, W., Bogardus, C. and Baier, L. (2010). Functional Variants in MBL2 Are Associated With Type 2 Diabetes and Pre-Diabetes Traits in Pima Indians and the Old Order Amish. Diabetes, 59(8), pp.2080-2085.
  4. Tsutsumi, A., Ikegami, H., Takahashi, R., Murata, H., Goto, D., Matsumoto, I., Fujisawa, T. and Sumida, T. (2003). Mannose binding lectin gene polymorphism in patients with type I diabetes. Human Immunology, 64(6), pp.621-624.
  5. Megia, A., Gallart, L., Fernández-Real, J., Vendrell, J., Simón, I., Gutierrez, C. and Richart, C. (2004). Mannose-Binding Lectin Gene Polymorphisms Are Associated with Gestational Diabetes Mellitus. The Journal of Clinical Endocrinology & Metabolism, 89(10), pp.5081-5087.
  6. Xie, Q., Wang, S., Bian, G., Zhan, F., Xie, J. and Li, J. (2012). Association of MIF−173G/C and MBL2 codon 54 gene polymorphisms with rheumatoid arthritis: A meta-analysis. Human Immunology, 73(9), pp.966-971.
  7. Neth, O., Hann, I., Turner, M.W., Klein, N.J., 2001. Deficiency of mannose-binding lectin and burden of infection in children with malignancy: a prospective study. Lancet 358, 614–618.
  8. Lee, Y., Lee, H., Choi, S., Ji, J. and Song, G. (2011). The association between the mannose-binding lectin codon 54 polymorphism and systemic lupus erythematosus: a meta-analysis update. Molecular Biology Reports, 39(5), pp.5569-5574.
  9. Thiel, S., Frederiksen, P. and Jensenius, J. (2006). Clinical manifestations of mannan-binding lectin deficiency. Molecular Immunology, 43(1-2), pp.86-96. Solomon O., Kunik V., Simon A., Kol N., Barel O., Lev A., Amariglio N., Somech R., Rechavi G., Eyal E. G23D: online tool for mapping and visualization of genomic variants on 3D protein structures. BMC Genomics. 2016; 17:681.



Variant: rs351855

Condition: increased disease susceptibility, accelerated cancer progression and poor prognosis

By: Kai Yuan

Variant rs351855 is a single nucleotide variation (SNP) on chromosome 5 exon 9 , located at coding position 1162, which leads to a change from G to A in the FGFR4 (fibroblast growth factor reporter 4) gene.  The SNP produces a missense mutation, G[Gly] to A[Arg] at the codon 388 in the transmembrane domain of the receptor.  Thus, the SNP is always referred to as the Gly388Arg variant.  The variant has a high mapping quality 60 based on the vcf file, and a CADD phred score 24.8, which indicates that it is a deleterious substitution.

According to our result based on the 1000 Genome Project, the overall frequency of Gly388Arg variant is 0.2995, with frequency of 0.11 for the African group, 0.31 for the American group, 0.46 for the East Asian group, 0.29 for the European group, and 0.39 for the South Asian group.  From the result, it can be concluded that Asian people are more likely to have the variant compared to other ethnic groups.

FGFR4 is a member of a family of transmembrane receptors with gland-induced tyrosine kinase activity.  The normal FGFR4 gene encodes tyrosine kinase and cell surface receptor for fibroblast growth factors.  Those proteins encoded by FGFR4 are involved in the regulation of several critical pathways including cell proliferation, cell differentiation, cell migration, lipid metabolism, bile acid biosynthesis, vitamin D metabolism, glucose uptake, and phosphate homeostasis.1  The expression of FGFR4 is significantly high in lung, kidney, ovary, liver, colon, and duodenum.  However, the Gly388Arg variant on FGFR4 will alter the transmembrane spinning segment and exposes a membrane-proximal cytoplasmic signal transducer and activator of transcription 3 (STAT3) binding site Y390-(P)XXQ393.1  The missense mutation leads to an exposed binding site for the growth factor STAT3 thereby accelerates tumor growth.   The Gly388Arg variant has been published with the proposed soluble protein structure shown in Figure1.  The entire protein structure is provided by Swiss-model4 that contains 802 amino acids, in which the 445-753 amino acids form a homo-3-mer structure that is the main functional group of the protein. The sequence alignments for this part is shown in Figure4.  Besides, amino acids 146-355 combined to form a monomer which lays inside the homo-3-mer structure (Figure2).  However, the Gly388Arg variant is on the transmembrane domain, which is not characterized on the protein structure.  In that case, the graphical structure representation is accessed in Figure3, which provides that the amino acids 370-390 form a helical structure in the transmembrane domain (blue rectangle inside the orange box) and the missense mutation G to A occurs right on it (the red line).

According to the published literature, the rs351855 G>A polymorphism has been associated with increased disease susceptibility, accelerated cancer progression and poor prognosis.  Xiong et al. (2017) performed a meta-analysis on functional Gly388Arg polymorphism in FGFR4 gene based on 27 studies consisting of 8,682 cases and 9,731 controls.  Their result showed that the variant was associated with increased cancer risk under the recessive model given the odds ratios (OR) equal to 1.19 and 95% confidence intervals (CI)2.  Xiong et al also proposed that the rs351885 variant was associated with an increased risk of breast (homozygous: OR=1.73, 95% CI=1.35-2.20) and prostate cancer (heterozygous: OR=1.16, 95% CI=1.02-1.32)2.  The critical towards this experiment is that the researchers did not find a significant correlation between Gly388Arg variant and lung disease where lung is one of the organ systems that has highest expression of FGFR4 gene.  Another case-control study carried out by Cha et al investigated the association between FGFR4 Gly388Arg and non-Hodgkin lymphoma (NHL) in the Chinese population revealed that the frequency of Gly388Arg is significant higher in 412 NHL patients group compared to the 476 healthy controls (OR 2.12, CI=95%)3.

To sum it up, rs351855 (Gly388Arg) is characterized as a pathogenic variant altering the helical structure of transmembrane domain on human Fibroblast Growth Factor Receptor 4.  The exposed binding domain due to the variant interacts with the growth factor STAT3 results in STAT3 phosphorylation and signaling activation, which causes increased cell motility and tumor cell invasion.  Thus, the variant is considered disease susceptible and may accelerate the progress of cancer development.  For patients with the variant, it is helpful to perform chemotherapy targeting the variant together with gene therapy to knock-down the STAT3 signal.  It is also a good idea to construct a structure that can competitively binding to STAT3 to reduce the effect of the exposed binding site.


Figure1. Crystal Structure of Human Fibroblast Growth Factor Receptor 44.


Figure2.  monomer 146-335. Human Fibroblast Growth Factor Receptor 44.


Figure3.  The structure representation of Human Fibroblast Growth Factor Receptor 4 with transmembrane domain labeled in orange box4.


Figure4. Sequence alignments for Human Fibroblast Growth Factor Receptor 44.



  1. “FGFR4 Fibroblast Growth Factor Receptor 4 [Homo Sapiens (Human)] – Gene – NCBI.” National Center for Biotechnology Information, U.S. National Library of Medicine, www.ncbi.nlm.nih.gov/gene/2264.
  2. Xiong, Si-Wei, et al. “Functional FGFR4 Gly388Arg Polymorphism Contributes to Cancer Susceptibility: Evidence from Meta-Analysis.” Oncotarget, Impact Journals, 1 Mar. 2017
  3. Cha, Zhanshan, et al. “Fibroblast Growth Factor Receptor 4 Polymorphisms and the Prognosis of Non-Hodgkin Lymphoma.” SpringerLink, Springer Netherlands, 1 Jan. 2014
  4. SWISS-MODEL | P22455, swissmodel.expasy.org/repository/uniprot/P22455.


Searching for Pathogenic Variants in Exomic Profile of Bengali Male

Group 2 – Penghao Xu, Prachiti Prabhu, Ragy Haddad, and William Harvey


The 1000 Genomes Projects was the first project to sequence the genomes of a large number of people, to provide a comprehensive resource on human genetic variation. Data from the 1000 Genomes Project was quickly made available to the worldwide scientific community through freely accessible public databases and used for this project. [1] Our exomic analysis centered around a Bengali male from Bangladesh. Additional information listed below:

Gender Male
Age Unknown
Affected Unknown
Repository NHGRI Sample Repository for Human Genetic Research
Cell Type B-Lymphocyte
Tissue Type Blood
Transformant Epstein-Barr Virus
Ethnicity Bengali
Country of Origin Bangladesh
Remarks At least three out of four grandparents are of Bengali ancestry.

Our analytical pipeline began with identifying variants between our 1000 genomes genomic sequence and the hg38 human reference genome. This involves indexing the reference genome via the Burrows-Wheeler Alignment (BWA) Tool. Through a combination of this BWA kit and Samtools, one can map the reads to the reference, realign to improve accuracy of calls, index these mappings and call these variants. This produces a file which contains variations from our reference genome which can then be annotated and analyzed.

Annotation takes place via wAnnovar which is used to identify functional significance of genetic variants identified in the previous step. This process also uses the hg38 human reference genome to identify these common variants. Upon finishing this analysis, we were left with 18 likely pathogenic variants. From this list of variants, we chose 4 to analyze which had Combined Annotation Dependent Depletion (CADD) phred scores of at least 10 and entries in ClinVar and dbSNP. CADD phred scores are meant to determine the likelihood of deleteriousness of given single nucleotide variants and insertions/deletions. [2] The four chosen variants have been highlighted in the summary table below.

Screen Shot 2017-12-11 at 10.38.29 PM

ABCC 6 – Penghao Xu


  1. Phred score: 45.4007
  2. CADD Phred score: 23

Genotype Information

  1. Chromosome: Chr16
    NC_000016.10: g.16157742C>T
  2. Gene: ABCC6
    NG_007558.2: g.70730G>A
  3. mRNA: ABCC6, transcript variant 1
    NM_001171.5: c.3803G>A
  4. Protein: MRP6, multidrug resistance-associated protein 6 isoform 1
    NP_001162.4: p.Arg1268Gln


  1. Global
    1. T=0.2446/29488 (ExAC)
    2. T=0.1813/908 (1000 Genomes)
    3. T=0.2240/2910 (GO-ESP)
    4. T=0.2073/6036 (TOPMED)
  2. Regional and ethnical: SAS, South Asian
    1. T=0.1833 (ExAC SAS)
    2. T=0.19 (1000 Genomes SAS)
    3. Data not available for ESP and TOPMED

Protein Background
The ABCC6 gene provides instructions for making a protein called multidrug resistance-associated protein 6 (MRP6, also known as the ABCC6 protein). ABC genes are divided into seven distinct subfamilies (ABC1, MDR/TAP, MRP, ALD, OABP, GCN20, White). The ABCC6 gene encodes MRP6 protein in MRP subfamily. This protein is found primarily in the liver and kidneys, with small amounts in other tissues such as the skin, stomach, blood vessels, and eyes. The MRP6 protein belongs to a group of proteins that transport molecules across cell membranes. Generate missense (substitution) in protein, from Arg to Gln at position 1268 as noted in the dbSNP Gene Model section. This missense variant seems to cause Pseudoxanthoma elasticum (PXE), and this SNP is marked as “Pathogenic” in ClinVar. Pseudoxanthoma elasticum is characterized by fragmentation and mineralization of elastic fibers in the body. 

Use UniProtKB to find the protein, the go to the SWISS-MODEL database for structure. This protein only has one chain, and the main secondary structure is alpha-helix when it is properly formed. The variant position is R1268Q as shown in the picture.

Medical Advice

Currently, there is no single medical treatment designed for the treatment of Pseudoxanthoma elasticum, but if monitored, preventative plastic surgery treatments can be utilized. There is also some evidence that antiangiogenic drugs have been successful in preventing these build ups in the eye and later in the blood vessels.

PRODH – Prachiti Prabhu 

ClinVar ID: RCV000004222.5
dbSNP: rs450046

Genotype Information

    1. Chromosome: Chr22
    2. Cytogenetic location: 22q11.21
    3. Gene: PRODH:proline dehydrogenase 1
    4. Contig: NT_001520.13 (position: 203927)
    5. Nucleotide change: 1562A>G – missense, nonsynonymous variant
    6. Protein change: Q521R; GLN521ARG
    7. Inheritance: autosomal recessive
    8. Gene encodes FAD-linked oxidoreductase and is involved in proline metabolism
    9. Conditions: hyperprolinemia type 1 (HYPRO1) and increased susceptibility to schizophrenia type 4 (SCZD4)

Minor Allele Frequency:


      1. C=0.0829/5514 (ExAC)
      2. C=0.0944/473 (1000 Genomes)
      3. C=0.0958/1245 (GO-ESP)
      4. C=0.1031/3003 (TOPMED)

Comparison of allele frequencies (1000 Genomes Project):


  • C=0.0944
  • T=0.9056

Regional– Bengali from Bangladesh (BEB):

  • C=0.0872
  • T=0.9128

Protein Background

The gene proline dehydrogenase (PRODH) is located on chromosome 22 at location 22q11.21 [3]. The gene comprises of 15 exons [4]. The protein encoded by PRODH, i.e. proline dehydrogenase is a mitochondrial protein which is primarily found in the brain, kidney and liver [5]. The protein catalyzes the first step in proline degradation by converting it to pyrroline-5-carboxylate, which is subsequently converted to glutamate [5]. The conversion of proline to glutamate is essential for protein production and energy transfer within the cell [5].

There are in all 16 PRODH missense mutations that alter proline oxidase (POX) activity of the enzyme [4]. These mutations are known to be linked to hyperprolinemia type 1 and also causes an increased susceptibility to schizophrenia type 4 [4].


The quality score is the probability (on the PHRED scale) that the ALT allele call is wrong. Higher the quality score, higher is the confidence of the variant being correctly assigned. From the vcf file obtained, the base call for PRODH gene showed a change from base “A” (REF) to base “G” (ALT) at genomic position 18913491 on chr 22. This base call had a quality score of 225, which is a high value. Thus, it can be asserted that the variant is almost surely present at that location. Also, the base call had a high CADD phred score of 23, further affirming the phred results and deleterious nature of this variation.


According to the results obtained, the mutation of base “A” to base “G” at gene position 1562 causes a germline mutation in the protein at position 521 from glutamine residue to arginine. ClinVar describes this mutation as “pathogenic” [6]. This mutation is recessive and leads to a slight increase in POX activity and is one of the 16 identified PRODH mutations that may cause hyperprolinemia type 1 and lead to an increased susceptibility to schizophrenia type 4 [4].

The POX activity has been shown to be dependant on the residue Y540, which suggests that Y540 should be in the active site of the enzyme [7]. Since the mutation (at position 521) is in proximity to the active site, it is highly likely that the mutation alters the active site, and hence causes a change in its POX activity.


The following is the structure of proline dehydrogenase protein obtained from PDB. It was visualized using JSmol and the variant amino acid residue (Q521R) was marked using Jena3D software [9].

Medical Advice

If a person is found to have this mutation, he is highly susceptible to Hyperprolinemia type 1 and Schizophrenia type 4. Thus, the first step would be to conduct diagnosis to check if the person actually suffers from Hyperprolinemia or Schizophrenia.

Hyperprolinemia can be checked by a simple blood test to check the levels of proline in the blood, which is 3 to 10 times more than that of a normal person. A newborn can also be screened for this condition if one of the parents is known to have this mutation, or if the child itself has been tested positive for this mutation. This can help early diagnosis. Hyperprolinemia type 1 may cause seizures and intellectual disability. There is no cure for the condition, but a control of protein intake in the diet can help reduce the levels of proline in the blood and may ease the symptoms.

Schizophrenia can be tested by conducting a combination of physical screening and psychiatric evaluations and if found positive for the same, then the patient can be prescribed antipsychotic drugs.

LDLR – Ragy Haddad

clinVar ID: CN169374
dbSNP : rs2228671

Genotype Information

  1. chromosome: chr 19
  2. Pathogenic: Familial hypercholesterolemia
  3. Uniprot ID: P01130
  4. Method of inheritance: Autosomal dominant.
  5. Contig position in hg38: 11040236
  6. mRNA position: 268
  7. Protein position: 27
  8. Pdb: 4NE9
  9. HGVS: (Human Genome variation Society)
    1. NG_009060.1:g.15856C>A
    2. NM_000527.4:c.81C>A
Different types of variation of the SNP and their outcomes: (Our variant include the STOP-GAIN form)







C [Cys] ⇒ Ter[*] [OPA]

C [Cys] W [Trp]

C [Cys] C [Cys]







C [Cys] ⇒ Ter[*] [OPA]

C [Cys] W [Trp]

C [Cys] C [Cys]

The variation in the nucleotide base C of the TGC (cystine coding)  to A forming TGA (stop codon),  causes a stop-gain (nonsense) mutation which leads to termination of transcription of LDLR which leads to a defective LDLR (low density lipoprotein receptor); a defection in LDLR causes hypercholesterolemia which is the increase of LDL (low density lipoprotein) in the blood. This occurs since a defective LDLR is unable to successfully mediate endocytosis of cholesterol-rich LDL thus the LDL remains in the blood causing elevated levels of cholesterol  in the blood lead to hypercholesterolemia condition.


The increase in cholesterol levels in the blood leads to accumulation of cholesterol on the arterial walls which leads to accelerated atherosclerosis  and premature coronary disease

A large number of mutations in LDLR gene have been reported in patients with Autosomal Dominant Hypercholesterolemia [10]. In addition, a study was done on LDLR in New Zealand which provided a pool of LDLR variants, among these variants was the variant rs2228671

And it has show to have a link to Familial Hypercholesterolemia [11].


Structure of LDLR with highlighted variant: Location of STOP-GAIN (highlighted in yellow)

Medical Advice

Avoid eating food with high cholesterol levels or high salt content, exercise often and always try to maintain a stable blood pressure. In addition, frequent doctor check ups and adhering to a low stress life style since it could increase blood pressure and cause accumulation of LDL in coronary artery.

Furthermore, this is an autosomal dominant allele so it would be advisable to run genetic screening on other family members or children of the person carrying the variant in order to avoid complications and coronary conditions.

ELAC2 – William Harvey

ClinVar ID: RCV000005358.3
dbSNP: rs4792311

Genotype Information

  1. Location: Chromosome 17, position 13011692
  2. Nucleotide Change: 13011692G>A; missense mutation
  3. Amino Acid Change: S217L

Minor Allele Frequency


      1. C=0.2711 (ExAC)
      2. C=0.21 (1000 Genomes)
      3. C=0.28 (GO-ESP)


ELAC2 is a gene responsible for the formation of Zinc phosphodiesterase ELAC protein 2 which aids mitochondrial tRNA 3′-processing endonuclease activity. This accurate processing of mt-RNA is crucial for the proper functioning of mitochondrial gene expression. [12] In our particular sample set, there occurs a missense mutation on chr17, pos 13011692 changing a Guanine to an Adenine thus influencing a amino acid change from Serine to Leucine at position 217 of the ELAC2 protein. This SNP is observed in our annotated variant file, and has a phred score of 10.6 which indicates it has a 91.29% accuracy call likelihood.

Pathogenicity of this SNP has been associated with an increased risk in prostate cancer. [13] This claim has been reinforced with a systematic evaluation of 18 studies which have estimated the increased odds ratio for the Ser217Leu polymorphism as 1.13 and even larger in Asian and Caucasian populations.


Pathogenicity of this variant can be explained by significant structural changes in the ELAC2 protein. This is due to both the significant change in characteristics between Serine and Leucine and the unique role of proper mitochondrial activity. Serine is a polar amino acid which is substituted for leucine, a non-polar amino acid, which has been shown to cause conformational changes in the protein which could affect functionality. If mitochondria do not function properly, energy requirements of certain cells could balloon which is one of the major characteristics of a cancerous cell line.

Medical Advice

Although prostate cancer is one of the most common types of cancer for men, it can be treated if detected at the early stages. [14] If this individual were to undergo regular check ups of their prostate, they could ensure that any adverse signs could be detected immediately and addressed appropriately.


  1. A global reference for human genetic variation, The 1000 Genomes Project Consortium, Nature 526, 68-74 (01 October 2015) doi:10.1038/nature15393.
  2. Kircher M, Witten DM, Jain P, O’Roak BJ, Cooper GM, Shendure J. A general framework for estimating the relative pathogenicity of human genetic variants. Nat Genet. 2014 Feb 2. doi: 10.1038/ng.2892. PubMed PMID: 24487276.
  3. Kempf, Lucas et al. “Functional Polymorphisms in PRODH Are Associated with Risk and Protection for Schizophrenia and Fronto-Striatal Structure and Function.” Ed. Nicholas Katsanis. PLoS Genetics 4.11 (2008): e1000252.
  4. Bender, Hans-Ulrich  et al. “Functional Consequences of PRODH Missense Mutations.” American Journal of Human Genetics 76.3 (2005): 409–420.
  5. NIH GHR database, ghr.nlm.nih.gov , last accessed on 10th December 2017
  6. NCBI ClinVar database, https://www.ncbi.nlm.nih.gov/clinvar/variation/4010/ ,  last accessed on 10th December 2017.
  7. Ostrander, Elizabeth L. et al. “A Conserved Active Site Tyrosine Residue of Proline Dehydrogenase Helps Enforce the Preference for Proline over Hydroxyproline as the Substrate,” Biochemistry 48.5 (2009): 951–959.
  8. NCBI dbSNP, https://www.ncbi.nlm.nih.gov/projects/SNP/, last accessed on 10th December 2017.
  9. Jena 3D Protein Visualization, http://jenalib.leibniz-fli.de, last accessed on 10th December 2017.
  10. Spectrum of mutations and phenotypic expression in patients with autosomal dominant hypercholesterolemia identified in Italy Bertolini, Stefano et al. Atherosclerosis , Volume 227 , Issue 2 , 342 – 348
  11. Genetic screening of patients with familial hypercholesterolaemia (FH): a New Zealand perspective. Laurie AD, Scott RS, George PM. Dec 2004.
  12. Brzezniak LK, Bijata M, Szczesny RJ, Stepien PP. Involvement of human ELAC2 gene product in 3′ end processing of mitochondrial tRNAs. RNA Biol. 2011;8:616–626. doi: 10.4161/rna.8.4.15393.
  13. Xu B, Tong N, Li J, Zhang Z, Wu H. ELAC2 polymorphisms and prostate cancer risk: a meta-analysis based on 18 case–control studies. Prostate Cancer and Prostatic Diseases. 2010;13(3):270-277. doi:10.1038/pcan.2010.6.
  14. Cuzick J, Thorat MA, Andriole G, Brawley OW, Brown PH, Culig Z, Eeles RA, Ford LG, Hamdy FC, Holmberg L, Ilic D, Key TJ, La Vecchia C, Lilja H, Marberger M, Meyskens FL, Minasian LM, Parker C, Parnes HL, Perner S, Rittenhouse H, Schalken J, Schmid HP, Schmitz-Drager BJ, Schroder FH, Stenzl A, Tombal B, Wilt TJ, Wolk A (2014) Prevention and early detection of prostate cancer. Lancet Oncol 15(11): e484–e492.
  15. Ringpfeil F, Lebwohl MG, Christiano AM, Uitto J. Pseudoxanthoma elasticum: Mutations in the MRP6 gene encoding a transmembrane ATP-binding cassette (ABC) transporter. Proceedings of the National Academy of Sciences of the United States of America. 2000;97(11):6001-6006.
    This article summarize the common mutations in MRP6 protein of eight kindreds with PXE including missense, nonsense, splice site mutations, and large deletions resulting in allelic loss of the MRP6 locus. Then the examinations of clinically unaffected families were done and the basis of genetic diagnosis were built.
  16. Dominique P. Germain, Jérôme Perdu, Véronique Remones, Xavier Jeunemaitre, Homozygosity for the R1268Q Mutation in MRP6, the Pseudoxanthoma Elasticum Gene, Is Not Disease-Causing, In Biochemical and Biophysical Research Communications, Volume 274, Issue 2, 2000, Pages 297-301, ISSN 0006-291X, https://doi.org/10.1006/bbrc.2000.3101.
    Researchers found some pathogenic mutations in MRP6 genes and then focus on the R1268Q which generated by the SNP analyzed. They found Q1268 is of high frequency in a group of Caucasian control population which indicates the Q1268 won’t cause the PXE per se but may play a role in compound heterozygotes.
  17. Le Saux O, Beck K, Sachsinger C, et al. A Spectrum of ABCC6 Mutations Is Responsible for Pseudoxanthoma Elasticum. American Journal of Human Genetics. 2001;69(4):749-764.
    The mutational analysis of ABCC6 in 122 PXE patients were done and 28 novel variants were found. Moreover, the results suggest that the distribution pattern reveals disease-causing variants within exons encoding a large C-terminal cytoplasmic loop and in the C-terminal nucleotide-binding domain (NBD2), which lay in the context of the complex relationship between the PXE phenotype and the function of ABCC6.
  18. KIM JY, CHEONG HS, PARK T-J, et al. Screening for 392 polymorphisms in 141 pharmacogenes. Biomedical Reports. 2014;2(4):463-476. doi:10.3892/br.2014.272. The 392 polymorphisms are found in 141 pharmacogenes which may be helpful in developing personalized medicines by using pharmacogene polymorphisms.