Mathematical modelling of DNA
Informations générales
Enseignant:
Horaires:
Cours: lundis de 16h15 à 18h00, salle MAA330
Exercices: vendredis de 15h15 à 17h00, salle MAA330
Assistant:
Cours
Requirements
1st and 2nd year courses in math or physics, (or with teacher's permission)
Helpful although not required
Differential Geometry of Framed Curves (MATH423) .
Contents
This course is designed to be an introduction, within the particular context of DNA, to the interplay between analysis, computation and experiment that makes up the process called mathematical modelling. In addition to students whose primary interest is in DNA, the syllabus is intended for students wishing an introduction to the modelling process in general, and the course will describe a number of widely encountered mathematical and computational techniques.
The course will be a detailed introduction to the cgDNA sequencedependent coarse grain model of DNA, including both how to use it to predict various biologically pertinent sequencedependent expectations with an associated Monte Carlo code, and all the extensive underlying applied mathematics necessary to estimate cgDNA parameter sets from a library of Molecular Dynamics simulations. The cgDNA model is a research tool that has its own web page . The course will work through the details of publications described on that page, specifically, [1],[2], and [3] below.
The course has five chapters.
0) Introduction to DNA and a brief overview of its coarse grain models.
1) The sequencedependent, rigidbase cgDNA model.
2) Monte Carlo methods for sampling cgDNA model equilibrium distributions and application to DNA persistence lengths.
3) Parameter estimation for the cgDNA model from Molecular Dynamics time series.
4) Equality constrained nonlinear optimisation with application to computing cgDNA equilibria.
Weekbyweek correspondence
Week 1 (20.2) 
Description of the basic structure of DNA, and multiscaling (or coarse graining) approaches. The need for a tertiary structure model of DNA, i.e. a sequencedependent coarse grain model. Overview of the cgDNA coarse grain model to predict a Gaussian PDF for the configuration distribution of a DNA fragment of given sequence. (three periods lecture, one period exercises) Here you have the link to the supplementary material for this first lecture. 
Week 2 (27.2)  Coarse graining groups of atoms (in our case atoms forming a base) to a rigid body or frame (R,r), with the data structure of R∈ SO(3) r∈ R^3. The Lie group SE(3) of rigid body displacements and its 4x4 matrix representation 
Week 3 (6.3)  Relative coordinates of a double chain of rigid bodies. Definition of Watson (or reading) and Crick strands, and the reembedding of frames on the Crick strand. Here you can find the supplementary material of [2] , see Bibliography at the bottom of the page. This week we covered pages 12 until Figure S3. 
Week 4 (13.3)  Finish of cgDNA internal coordinates. Watson or reading strand, and definition of base, basepair and junction frames. cgDNA model configuration coordinates: translations expressed in midframes (basepair frame between two base frames for intras, junction frames between two basepair frames for inters) and Cayley vectors of relative rotations for both intra and inter relative rotations (with matrix multiplication on the right). Sketch of transformation of frames under CrickWatson change of reading strand and associated transformation rules for cgDNA coordinates (detailed treatment in exercise session later in the semester). Start of the cgDNA model to construct a Gaussian PDF approximation to the equilibrium distribution for a DNA fragment in solution given its sequence and a cgDNA parameter set. (Much of the material of these lectures covered in pages 25 of the PDF linked to under the Week 3 summary). 
Week 5 (20.3) 
Completion of the definition and assumptions underlying the cgDNA model free energy and Gaussian PDF. Nearestneighbour interactions and bandedness. Localised sequencedependence of stiffness and sigmas, and nonlocal sequence dependence of mu. Structure of a cgDNA parameter set. End of Chapter 1.

Weeks 6  7  8 (27.3  3.4  10.4)  What can be done with the cgDNA model? Discussion of i) expectations, chain correlations and persistence lengths, and ii) probabilities and looping experiments. Numerical approximations of both via the cgDNAmc Monte Carlo code. Definition and analytical computation of persistence lengths in a simplified model (a version of the Helical Worm Like Chain or HWLC model), and relation to numerics for the cgDNA model. Polycopies are available for the material in weeks (6, 7) and the Monte Carlo part of 8 . Shape factorised persistence length was introduced and is treated in the Exercise Session 7. 
Week 9 (17.4)  Easter break. 
Week 10 (24.4)  Start of Chapter 3, estimation of cgDNA parameter set. Rappel: definition and assumptions underlying cgDNA model free energy and Gaussian PDF. Nearestneighbour interactions and bandedness. Localised sequencedependence of stiffness and sigmas.. Sufficient conditions for a) the stiffness matrix to be positive definite for all sequences, and b) parameters for palindromic sequences to satisfy CrickWatson symmetry conditions. The count on number of independent scalar parameters in a cgDNA parameter set. 
Week 11 (1.5)  Maximum likelihood and start of maximum entropy parameter estimation giving rise to Gaussians with both unbanded and banded stiffness matrices. Jensen inequality. Entropy and KullbackLeibler relative entropy. 
Week 12 (8.5)  Computation of firstorder necessary conditions for maximum entropy fits, and use of the constraints to determine the associated Lagrange multipliers. 
Week 13 (15.5)  How to design a good sequence library. How to better estimate the oligomer based ground state mu(S) and covariances from MD time series for palindromic sequences. Units of cgDNA internal coordinates and the internal rescaling chosen in the cgDNA model between rotation and translation coordinates. 
Week 14 (22.5)  Extraction of a cgDNAparamset using a sum of KullbackLeibler divergences as objective fitting functional. Some details about cgDNAparamset2 and the ABC molecular dynamics simulation project. A palindromic sequence library. The simpler case study of fitting with an L^2 norm objective functional and the associated least squares approach to compute a paramset: an illustration of how to deal with triple overlap blocks in cgDNA stiffness matrices, and the associated null space in the parameter set. For the least square system we refer to the paragraphs 7.4 and 7.5 of [1]. 
Week 15 (29.5)  Monday exercise session and Friday exercise session/demo of cgDNAeq. Here you can find some complementary notes that can be useful for Session 13. 
Exercices
Séries d'exercices  Corrigés  


Bibliography
The following references for the cgDNA model are available on the cgDNA web page .
 [1] A DNA CoarseGrain Rigid Base Model and Parameter Estimation from Molecular Dynamics Simulations , D. Petkevičiūtė Thesis #5520, EPFL, (2012).
 [2] cgDNA: a software package for the prediction of sequencedependent coarsegrain free energies of Bform DNA , D. Petkevičiūtė, M. Pasi, O. Gonzalez and J. H. Maddocks Nucleic Acids Research 42, no. 20 (2014), p. e153, (2014) .
 [3] A sequencedependent rigidbase model of DNA , O. Gonzalez, D. Petkevičiūtė, and J. H. Maddocks, Journal of Chemical Physics 138, no. 5 (2013), p. 055122 128 .
References for general books on DNA.

[4] Understanding DNA, The molecule & how it work C. R. Calladine, H. R. Drew, B. F. Luisi, A. A. Travers, Third Edition, 2004, Academic Press, ISBN 9780121550893 .
Summary: Understanding DNA explains, step by step, why DNA forms specific structures, the form of these structures and how they fundamentally affect the biological processes of transcription and replication. 
[5] Unraveling Dna: The Most Important Molecule Of Life M. D. FrankKamenetskii, Revised and Updated Edition, 1997, Perseus Publishing, ISBN 9780201155846.
Summary: A curious blend of history, biographical details to cover the development of molecular biology from the influence of physicists earlier in the century, through the central dogma of molecular biology to discussion of social issues raised by genetic engineering. 
[6] DNA topology A. D. Bates & A. Maxwell, 2005, Oxford University Press, ISBN 9780198506553.
Summary: A clear, concise explanation of the relevance of supercoiling and catenation in the context of biological activity of the DNA molecule. 
[7] DNA structure and Function R. R. Sinden, 1994, Academic Press, ISBN 9780126457506.
Summary: a timely and comprehensive resource, that provides a simple yet comprehensive introduction to nearly all aspects of DNA structure. It also explains current ideas on the biological significance of classic and alternative DNA conformations.