Introduction

The chemical instability of the genome toward attack by a variety of reactive species with which it comes into routine contact has given rise to dedicated surveillance and maintenance systems that seek out DNA damage and repair it. Most forms of damage to the heterocyclic nucleobases of DNA are repaired by mechanistically related pathways collectively referred to as the base-excision repair (BER) mechanism. Both the recognition of damaged sites and initiation of repair by BER are performed by DNA glycosylases, lesion-specific enzymes that recognize specific nucleobase damages in the genome and catalyze their excision through cleavage of the glycosidic bond1,2,3,4.

DNA glycosylases carry out the formidable task of locating, on average, one aberrant base embedded among ~106–107 normal bases5. Many of these lesions differ from their undamaged counterparts by at most a few atoms and cause no frank distortion or little energetic destabilization of the DNA duplex6,7,8,9. Among them, 8-oxoguanine (oxoG, OG) presents the greatest challenge to the DNA glycosylases responsible for its repair, due to its unique combination of structural innocuousness, extrordinary mutagenicity, and chronic low-level production. Arising through the attack of the reactive by-products of aerobic respiration on guanine (G), oxoG differs from G by only two “atoms”, =O versus –H at C8, and lone-pair versus –H at N7 (Fig. 1a). Also, oxoG is highly mutagenic and mis-pairs with adenine at a greater than 90% frequency during processive DNA replication10. For these reasons, oxoG is believed to be responsible for the majority of G:C to T:A transversion mutations, which are the second most common type of spontaneous genetic change in humans11. The G:C to T:A transversion mutation has also been found in codon 12 of the highly oncogenic protein K-ras, which resulted in the formation of lung tumors in mice deficient in the oxidative DNA repair genes, myh and ogg41, the entire process is accelerated by DNA bending and the extent of protein–DNA contacts on the minor groove face of DNA. For example, hOGG1 uses a non-specific breakage of the target base-pair, assisted by N149. In addition, K249 contacts the 3′-side phosphate of oxoG/G at the beginning of the process (Figs. 5b, 6), thereby establishing a pivot for base extrusion. H270 is the first residue that specifically interacts with extruded oxoG, followed by K249 with C8=O of oxoG. This suggests that H270 and K249 function as a “cherry-picking” residue in hOGG1, with a similar role of R112 in MutM via significantly different mechanisms22. By contrast, in the case of G, its extrusion cannot be stabilized by the two residues and competes with the translocation of the enzyme along the DNA strand.

Figure 5 also suggests that hOGG1 extrudes the oxoG through the major groove, in accordance with the previously determined hOGG1/DNA complex structure32 with a barrier of 8.1 kcal/mol. The free energy profile for the minor-groove oxoG extrusion is also presented in Fig. 5a, and the free energy profiles along the entire base extrusion process are shown in Supplementary Fig. 9. The barrier for the minor groove extrusion is 17.9 kcal/mol. This result can be compared with the different results reported for MutM between the major30,42 and minor groove base extrusions22,29.

In summary, we present X-ray crystallographic structures of human DNA glycosylase hOGG1 interrogating DNA lesions in their intrahelical position, achieved by covalent trap** of an ordinarily transient state in DNA recognition. They reveal how hOGG1 discriminates oxoG from G while both are embedded in the DNA duplex. Specifically, the enzyme utilizes unique protein/DNA contacts to induce DNA bending at the target site. This bending brings the repulsive functional group of oxoG to the immediate vicinity of the DNA backbone, resulting in an oxoG specific distortion of the DNA backbone in its intrahelical orientation. In silico molecular dynamics simulations and free energy calculations corroborate the structural results and help to elucidate the role of the human enzyme in discriminating oxoG from G prior to a complete extrusion from the DNA stack. The results presented here broaden our understanding of one of the earliest events that occur as this extraordinary enzyme patrols genome in its surveillance of DNA damage.

Methods

Cross-linked complex formation and crystallization

A fragment of hOGG1 (amino acids 12–327, UniProtKB-015527) bearing the Y207C, Y207C/C253W, and Y207C/K249Q mutation was expressed in Escherichia coli BL21(DE3)pLysS cells. The cells were lysed by sonication in solution of 50 mM sodium phosphate pH 8.0, 10 mM imidazole, 500 mM NaCl, 5 mM BME, and 10% glycerol. The protein was immobilized by Ni-NTA resin (Qiagen) and eluted with 50 mM sodium phosphate pH 8.0, 250 mM imidazole, 500 mM NaCl, 5 mM BME, and 10% glycerol. Protein was concentrated, centrifuged, and diluted with 10 mM Tris pH 7.4 to 50 mM NaCl, loaded to Hi-Trap SP column (GE Healthcare) and eluted with increasing NaCl concentration. The N-terminal histidine-tag was cleaved by enterokinase digestion (New England Biolabs) using a 1:1 solution of 1 M CaCl2 for 36 h at 4 °C. Protein was further purified by Superdex-200 gel filtration chromatography (GE Healthcare) equilibrated with 10 mM Tris 7.4, 100 mM NaCl, 1 mM EDTA, and 10% glycerol43. Each mutant was prepared using QuickChange mutagenesis kit (Stratagene) and confirmed by sequencing (see Supplementary Table 2 for primer sequences used in mutagenesis).

Phosphoramidite derivatives of 8-oxoG and 2-F-dI were purchased from Glen research. DNA oligomers 5′-AGCGTCCAXG*TCTACC-3′, where X denotes 8-oxoG or G and G* refers the site of modification with the thiol-bearing tether, were synthesized using ABI Expedite 8909 DNA synthesizer and functionalized with X8 (NH2CH2CH2OCH2CH2OCH2CH2S–)2 using post synthetic modification44. DNA oligonucleotides were deprotected with ammonium hydroxide and purified in 20% denaturing urea polyacrylamide gel electrophoresis (PAGE). For DNA containing tether and oxoG on the same strand, 50 μM β-me was added to prevent oxidative degradation of 8-oxoG. DNA was purified by 20% urea-PAGE and dissolved in 10 mM Tris, pH 8.0, 1 mM EDTA, and annealed with complimentary strand 5′-TGGTAGACCTGGACGC-3′.

Cross-linked complexes were formed by mixing duplex DNA with 2-fold molar excess protein and incubating at 4 °C for several days. Unreacted DNA and protein were removed by Mono Q chromatography (GE Healthcare). The purified complexes were buffer-exchanged to 10 mM Tris 7.4 and 100 mM NaCl, concentrated and crystallized by hanging droplet vapor diffusion at 20 °C. For each complex, crystals were allowed to grow for several days, transferred to a cryoprotectant solution containing mother liquor supplemented with 25% glycerol, and frozen in liquid nitrogen for data collection. For LRC (Y207C/K249Q), diffraction quality crystals appeared with 16.4 mg/mL complex concentration (protein concentration measured using Bradford assay) within a few days in well solution containing 100 mM sodium cacodylate, pH 6.5, 200 mM MgCl2 and 18% polyethylene glycol 8000. For IC (Y207C), diffraction quality crystals appeared with 12 mg/mL complex concentration within a few days in well solution containing 200 mM NH4NO3, and 20% polyethylene glycol 3350. For EC (Y207C/C253W), diffraction quality crystals appeared with ~17 mg/mL complex concentration within a few days in well solution containing 100 mM sodium cacodylate, pH 6.1, 200 mM MgOAc, and 17% polyethylene glycol 8000.

Structure determination

Diffraction datasets were collected at −170 °C at the 24-ID-C and 24-ID-E beamlines (NE-CAT) of the Advanced Photon Source and processed using the HKL program suites45. Initial molecular replacement solutions were obtained by PHASER in the CCP4 suite46,47, using the coordinates of previously determined hOGG1 structure (PDB ID: 1EBM16) but omitting DNA as search models. Each hOGG1–DNA model was built through iterative cycles of manual model building in COOT48 and structure refinement using REFMAC549,50 and PHENIX51. The Ramachandran plots, calculated by MolProbity (http://molprobity.biochem.duke.edu), confirmed no residues in disallowed regions for all structures. Full details on the data collection and structure refinement are provided in Supplementary Table 1. PyMol (The PyMOL Molecular Graphics System, Version 2.0, Schrödinger, LLC.) was used to prepare all structure model figures presented in the paper.

System preparation for molecular dynamics (MD) simulations

Two systems were prepared based on the intrahelical IC (for G) and EC (for oxoG) structures, respectively. In the preparation of the IC system, we further refined the IC X-ray structure to build more base pairs. Although this resulted in slightly lower quality of DNA structure, the core DNA base pairs were essentially unchanged. We used this refined structure in the IC system building. In each system, protonation states of all ionizable residues were determined based on their hydrogen bonding interactions deduced from the X-ray structures as well as on their pKa values in water. All crystal waters were included. For DNA, the central 14 base pairs of the sequence presented in Fig. 1d were used in the simulations, in which any missing nucleotide coordinates from the crystal structures were model built as the standard B-form DNA. Then, the HBUILD facility of the CHARMM program52,53 was used to assign  atomic coordinates of hydrogen atoms. The resulting systems were solvated with a rhombic dodecahedron (RHDO) box of 11,712 TIP3P water molecules54 and any water molecules42 within 2.5 Å from any heavy atom of protein, DNA, and crystal water were removed, leaving, for example, 9554 TIP3P waters for the IC system. Finally, each system was neutralized by adding 50 Na+ and 26 Cl ions at random positions, making its ionic concentration equal to 150 mM. In addition, two additional systems were prepared based on the LRC X-ray structure (PDB-ID: 1YQR)23 to be used as the reference state of the targeted molecular dynamics (TMD) simulations55 (see the Supplementary Methods for details).

MD simulations

Each system was first energy minimized for 5000 steps and equilibrated for 500 ps at 300 K. The energy minimization and equilibration procedures were very similar to those employed in our previous studies of MutM/DNA complexes22,29. Then, the production MD was carried out for 500 ns for the IC and EC systems, during which the atomic coordinates of the entire system were saved at every 2 ps for later analysis. The all-atom CHARMM2256 and 27 force fields57 were used to represent the protein, DNA, and ions, the CMAP correction58 for protein backbone dihedrals, and the TIP3P water model54 for water molecules, respectively. For oxoG, we used the force field parameters developed in our previous study22. The RHDO periodic boundary conditions were imposed with the lattice length parameter of 78.5 Å. Electrostatic interactions were evaluated using the smooth particle mesh Ewald (PME) sum method59 and van der Waals interactions were evaluated using a switching function between 9.0 Å and 11.0 Å. All MD simulations were performed with a 2 fs integration time step and SHAKE60 applied to all bonds involving hydrogen atoms. The Langevin thermostat was used to maintain the system temperature at 300 K. In all simulations, we also applied harmonic restraints to the terminal base pairs to avoid them fraying away from each other.

System preparations, trajectory analysis, the TMD simulations, and the string method simulations36 (see below) were carried out using the CHARMM program (version c37a1)52,53 and the 500 ns production MD simulations were performed using the NAMD program60.

String method (SM) simulations

The base extrusion pathways for G and oxoG were determined by applying the string method in collective variables (SMCV)36. In SMCV, a path connecting two end state conformations (i.e., the intrahelical IC/EC and LRC conformations) is represented by N discretized images (called MD replicas), which are evenly distributed along the path. In the present work, we used N = 64 discretized images to represent the entire base extrusion pathway described by a total of 45 CVs defined in Supplementary Fig. 10a. Starting from the initial path generated by the TMD simulation (Supplementary Methods), each path was optimized for 25 ns in an iterative manner. A total of 1.28 μs path optimization MD was performed. Then, the Markovian milestoning simulations with Voronoi tessellations36,61 were performed for 10 ns for each MD replica (thus, 0.64 μs MD simulations collectively) to determine the free energy change along the optimized base extrusion paths. The details of the SMCV path optimization and Voronoi tessellations simulations are provided in the Supplementary Methods.

Reporting summary

Further information on research design is available in the Nature Research Reporting Summary linked to this article.