web_banner

William L. Duax, Ph.D.
Hauptman-Woodward Institute
H.A. Hauptman Distinguished Scientist

State University of New York at Buffalo
Structural Biology Department
Professor

American Crystallographic Assn. (ACA)
Executive Officer

 

 

International Union of Crystallography
President (2002 - 2005)

EDUCATION
B.A., Chemistry, St. Ambrose College, Davenport, Iowa 1961
Ph.D., Physical Chemistry, University of Iowa, Iowa City, Iowa, 1967

   
 
MAILING ADDRESS:
Hauptman-Woodward
Medical Research Institute
700 Ellicott Street
Buffalo, NY 14203-1102
CONTACT INFORMATION:
Phone: (716) 898-8616
Fax: (716) 898-8660
E-mail: duax@hwi.buffalo.edu
rule

Research Interests

An Ancient Family of Proteins Vital to Health and Disease

The short-chain oxidoreductase (SCOR) enzymes are an ancient family of proteins that are vital to growth and development of all forms of life.  The most highly evolved members of this protein family are found in the human body.  They control the levels of hormones that are vital to fertility, salt balance, and sugar metabolism.  Malfunction of these proteins can cause many diseases including cancer, high blood pressure, and Alzheimer disease.  The sequences of the DNA of the human genome, and the genomes of dozens of plants and animals as well as hundreds of bacteria and viruses are available for analysis and study in international gene banks.  Over 10,000 of those genes are members of the SCOR superfamily.  We do not know exactly what each of the proteins produced by the 3 million genes in the gene bank do.  This is the greatest challenge facing the biomedical community today.  We are combining our knowledge of the structure and function of 50 members of the SCOR family with the DNA and protein sequence information in the gene bank to predict the function and substrate specificity of all 10,000 SCOR genes.  We are testing our predictions by X-ray crystallographic analysis and biochemical studies.  In this way, we hope to identify the genetic basis of many diseases for which the cause is currently unknown.

Polycystic Kidney Disease

It has been estimated that polycystic kidney disease (PKD) affects as many as one in 500 people.  Symptoms of the disease do not arise until long after people have passed defective genes to their children.  The protein produced by one of the genes responsible for PKD is one of the largest proteins in the human body.  This 4200-residue protein (polycystin1) has 16 subdomains that have different functions.  The discovery of mutations at hundreds of sites in this protein makes it difficult to identify the exact causes of the disease.  The determination of the three-dimensional structure of polycystin1 (Pc-1) is critical to understanding the normal function of the protein, how mutations alter function, and how to correct for metabolic errors.  Because of its size and complex nature, it is not possible to crystallize the entire protein.  We are purifying individual domains of Pc-1 for crystallographic analysis, modeling other domains on structurally related proteins for which 3-dimensional structures have been reported, and modeling the way in which the various domains are assembled into the complex protein.  Our initial efforts to model a sugar-binding domain in Pc-1 suggest that our approach is valid.  In addition, we are exploring the possibility that mutation at one end of the Pc-1 gene may be producing an unnatural protein that not only destroys the vital function of the protein, but also triggers an immune response in individuals with PKD.


The Origin and Evolution of the Genetic Code, Protein Structure, and Protein Function

Multiple Open Reading Frames.  In principal, any strand of double-helical DNA could be read in six different ways to produce six proteins having completely different sequences, folds, and functions.  The different proteins come from six different frames in which the gene can be read.  The reading frame that corresponds to a protein product is called its open reading frame (ORF).  It has been assumed that, over the course of three billion years of evolution, only one of the six possible sequences is able to produce a viable protein.  Contrary to this assumption, we have discovered that 18% of all the genes in the gene bank have retained the potential to produce more than one protein by reading alternate frames of the gene.  These genes have multiple open reading frames (MORFs).  We have shown that the occurrence of genes having MORFs is 200 times greater than random.

Codon Bias.  We have also shown that over 90% of the genes that have MORFs have a severe bias in their use of the genetic code.  The use of the 64 codons that define the 20 amino acids in human proteins is random.  However, we have found that only half of the 64 codons are being used in the genes that have MORFs.  No bias in codon use comparable to this in severity or frequency has ever been detected.  There is accumulating evidence that those codons that contain two or three of the bases G and C were defined first.  We have found that genes with MORFs and a GC bias are most pronounced in a few families of proteins that are present in all species of bacteria and eukaryotes (yeast, plants, insects, and animals) and are vital to basic life processes common to all living things.  The protein families include ribosomal proteins, ATP binding proteins, SCORs, and heat shock proteins.

Amino Acid Bias.  We discovered that proteins that have MORFs and GC codon bias also have a pronounced amino acid bias.  This suggests that the most ancient proteins were not only encoded by a subset of the genetic code, but they were also composed of a subset of the 20 amino acids.  We conclude that tryptophan and cysteine were the last of the 20 amino acids to appear in proteins.

A Primordial Two-Letter Code.  Our most recent analysis of the molecular details of the phenomena of gene duplication has led to the identification of an ancient family of highly symmetric barrel-shaped molecules that were originally encoded by just 20 of the 64 codons.  These ancient genes support the possibility of a two-letter genetic code that preceded the three-letter code two billion years ago.  This work could revolutionize how we think about the evolution of protein sequences and folding, and it could help us to design proteins having specific functions.