Mathematical modelling of DNA
Informations générales
Enseignant:
Horaires:
Cours: Monday 16h15 à 18h00, salle MAA331
Exercices: Friday 15h15 à 17h00, salle MAA112
Assistant:
Cours
Requirements
1st and 2nd year courses in math or physics, (or with teacher's permission)
Helpful although not required
Differential Geometry of Framed Curves (MATH423) .
Contents
This course is designed to be an introduction, within the particular context of DNA, to the interplay between analysis, computation and experiment that makes up the process called mathematical modelling. In addition to students whose primary interest is in DNA, the syllabus is intended for students wishing an introduction to the modelling process in general, and the course will describe a number of widely encountered mathematical and computational techniques.
The course will be a detailed introduction to the cgDNA sequencedependent coarse grain model of DNA, including both how to use it to predict various biologically pertinent sequencedependent expectations with an associated Monte Carlo code, and all the extensive underlying applied mathematics necessary to estimate cgDNA parameter sets from a library of Molecular Dynamics simulations. The cgDNA model is a research tool that has its own web page . The course will work through the details of publications described on that page, specifically, [1],[2], and [3] below.
The course has five chapters.
0) Introduction to DNA and a brief overview of its coarse grain models.
1) The sequencedependent, rigidbase cgDNA model.
2) Monte Carlo methods for sampling cgDNA model equilibrium distributions and application to DNA persistence lengths.
3) Parameter estimation for the cgDNA model from Molecular Dynamics time series.
4) Equality constrained nonlinear optimisation with application to computing cgDNA equilibria.
Note: Login id to access lecture notes is "moddna" and password will be announced in the class.
Weekbyweek correspondence
Week 1 (17.2.2020) 
Description of the basic structure of DNA, and multiscaling (or coarse graining) approaches. The need for a tertiary structure model of DNA, i.e. a sequencedependent coarse grain model. Overview of the cgDNA coarse grain model to predict a Gaussian PDF for the configuration distribution of a DNA fragment of given sequence. (three periods lecture, one period exercises) Here you have the link to the supplementary material for this first lecture. 
Week 2 (24.2.2020)  Definition of Watson (or reading) and Crick strands. Coarse graining groups of atoms (in our case atoms forming a base) to a rigid body or frame (R, r) and start of describing the matrix groups SO(3) of proper orthogonal matrices (ie 3x3 matrices R such that R^{1} = R^T and det R = +1) and (the homogeneous representation of) SE(3) of rigid body displacements. The notes of this lesson are here. 
Week 3 (2.3.2020)  Further description of the group SO(3) of proper rotation matrices, and choices for local coordinates. Interpretation of elements of SO(3) as direction cosine matrices. The group SE(3) of rigid body displacements and its 4x4 matrix representation, both algebraic definition and geometrical interpretation. Exercises of this week: introduction of the Cayley transform and completion of the identification of R^3 as coordinates for matrices in SO(3) with rotation angle less than \pi. The notes of this lesson are here. 
Week 4 (09.3.2020) 
Symmetric coordinates for relative SE(3) displacements between a pair of rigid bodies, and the importance of introducing a midpoint frame. Definition of mid frame involves (principal) square root of rotation matrix, but Euclidean average of origins. (more detail of choices in Exercise Session 4). With components of relative translation expressed in midframe, and components of Cayley vector of relative expression expressed in any of the three frames R, R^+, R^, the transformation on the coordinates corresponding to reversing roles of + and  bodies is u <> u, v<> v. A related transformation for the CrickWatson strand symmetry will be used in our coordinate system for double stranded DNA, but it is different because for the moment note there is no account taken of the additional feature that the Crick and Watson embedding rules for frame into base are different. See Qu 2 in Series 4. The notes of this lesson are here. In the following lectures and exercises we will make use of various matrix factorisations. A brief summary of the results we will use are provided in this PDF . Most or indeed all of the factorisations should be familiar to you. 
Week 5 (16.3.2020)  Generalisation of relative SE(3) coordinates between a pair of rigid bodies to the cgDNA model internal coordinates and the associated tree structure for a double chain of rigid bodies, with intra and inter coordinates. Watson or reading strand, and the reembedding of frames on the Crick strand to avoid rotations through angles close to \pi. Definition of basepair and junction frames as midframes. cgDNA model configuration coordinates: translations expressed in midframes (basepair frame between two base frames for intras, junction frames between two basepair frames for inters) and Cayley vectors of relative rotations for both intra and inter relative rotations (with matrix multiplication on the right). First discussion of transformation of frames under CrickWatson change of reading strand and associated transformation rules for cgDNA coordinates. Introduction to cgDNAweb interface. Much of the material this week can be found at here which is the supplementary material for article [2] in the Bibliography at the bottom of the page. In particular Figures S3 and S4 for the cgDNA coordinates and definitions of base, basepair and junction frames. 
Week 6 (23.3.2020)  Transformation of frames under CrickWatson change of reading strand and associated transformation rules for cgDNA coordinates (more detailed treatment in exercise session). Further details on cgDNA coordinates. Non dimensionalisation and scaling of cgDNA coordinates. Remarks on the parametrisation of the rotation group. Introduction of matrix exponential of skew matrices and matrix logarithm of SO(3) matrices. Explanation of why the Cayley vector are more suitable than the exponential coordinates as degrees of freedom in the context of Gaussian distribution. By rescaling by half the Cayley transformation, for small rotations, it coincides with the matrix exponential. 
Week 7 (30.3.2020)  Definitions and assumptions underlying the cgDNA rigid base coarse grain model free energy and its associated Gaussian PDF: a) (five) nearestneighbour base interactions, plus b) dimer sequencedependence of parameter set blocks. Leads to a Gaussian model where the stiffness matrix has a banded structure with overlapping 18x18 blocks. Description of the CrickWatson symmetry property of cgDNA predicted groundstates and stiffness matrices and the CrickWatson symmetry property of the elements of the parameter set. The latter property reduces the total number of independent entries of the parameter set starting from the number of independent bases and independent dimer steps that are respectively, 2 and 10. 
Week 8 (06.4.2020)  Start of Chapter 2: What can be done with the cgDNA model? Brief discussion of i) probabilities and looping experiments, and longer discussion of ii) expectations, specifically correlations along a polymer chain. Numerical approximations of both from an ensemble of configurations generated by an appropriate (direct sampling for our multivariate Gaussian pdf as opposed to Metropolis or Markov chain sampling that are necessary for more complicated pdfs) Monte Carlo code e.g. cgDNAmc, see for instance the article and its supplementary material , counting hits and misses for i), and averaging over an ensemble as a simple quadrature rule for ii). First mention of expectations leading to persistence lengths. Importance for efficiency of MC of bandedness of the stiffness matrix and use of the associated banded Cholesky factorisation to diagonalise. Polycopies are available for the material in weeks (8, 9) . 
Week 9 (20.4.2020)  Discussion on expectations leading to tangenttangent and Flory persistence lengths. Correlations of relative frame rotations and translations along a chain using homogeneous coordinates in SE(3) and the associated matrix multiplication. Simplifications when junction statistics are independent (the I.D. case), and when the chain is uniform (the I.I.D. case). Exponential decay of frame rotation correlations as the index difference grows, and convergence of the translation block to the Flory persistence vector. Comparison with cgDNAmc data for poly(XY) ie sequences with close to intrinsically straight ground states, and cgDNAmc data for lambdaphage sequences some with significantly bent ground states, where shape factorised semilog tantan plots remain close to linear is made in the exercise session. End of chapter 2. 
Week 10 (27.4.2020)  Start of Chapter 3, that is Parameter Estimation in the cgDNA model. Start estimation of oligomer based mean and centred covariance from MD time series data for the cgDNA coarse grain variables. Maximum likelihood approach to obtain estimates of oligomerbased Gaussian pdfs from an ensemble of configuration snapshots. Cases both with and without imposed banded sparsity pattern in the stiffness matrix. 
Week 11 (04.5.2020)  Introduction of entropy for a continuous pdf with respect to an associated measure, and relative entropy (or KullbackLeibler divergence) between two continuous pdfs with associated measure. Jensen inequality to prove that entropy minimizing (or maximizing depending on sign convention) pdf on a bounded domain is uniform with respect to the measure. Jensen inequality to prove that relative entropy is always nonnegative. Start of Jaynes max entropy principle for pdf constrained by prescibed values of some moments, and sufficiency of the associated firstorder necessary conditions. 
Summary and description of the exercices
This document contains an overview and a description of all the exercises given so far.
Access to videos of lectures
This document contains information about how to access the videos/audios/lecture_notes of the recent lectures
Exercices
Séries d'exercices  Corrigés  

Bibliography
The following references for the cgDNA model are available on the cgDNA web page .
 [1] A DNA CoarseGrain Rigid Base Model and Parameter Estimation from Molecular Dynamics Simulations , D. Petkevičiūtė Thesis #5520, EPFL, (2012).
 [2] cgDNA: a software package for the prediction of sequencedependent coarsegrain free energies of Bform DNA , D. Petkevičiūtė, M. Pasi, O. Gonzalez and J. H. Maddocks Nucleic Acids Research 42, no. 20 (2014), p. e153, (2014) .
 [3] A sequencedependent rigidbase model of DNA , O. Gonzalez, D. Petkevičiūtė, and J. H. Maddocks, Journal of Chemical Physics 138, no. 5 (2013), p. 055122 128 .
 [4] Sequencedependent persistence lengths of DNA , J. S. Mitchell, J. Glowacki, A. E. Grandchamp, R. S. Manning and J. H. Maddocks, Journal of Chemical Theory and Computation, no. 13 (2017), p. 15391555 .
 [5] Absolute versus relative entropy parameter estimation in a coarsegrain model of DNA , O. Gonzalez, M. Pasi, D. Petkevičiūtė, J. Glowacki, J.H. Maddocks, Multiscale Modeling and Simulation 15, no. 3 (2017), p. 1073  1107 .
References for general books on DNA.

[6] Understanding DNA, The molecule & how it work C. R. Calladine, H. R. Drew, B. F. Luisi, A. A. Travers, Third Edition, 2004, Academic Press, ISBN 9780121550893 .
Summary: Understanding DNA explains, step by step, why DNA forms specific structures, the form of these structures and how they fundamentally affect the biological processes of transcription and replication. 
[7] Unraveling Dna: The Most Important Molecule Of Life M. D. FrankKamenetskii, Revised and Updated Edition, 1997, Perseus Publishing, ISBN 9780201155846.
Summary: A curious blend of history, biographical details to cover the development of molecular biology from the influence of physicists earlier in the century, through the central dogma of molecular biology to discussion of social issues raised by genetic engineering. 
[8] DNA topology A. D. Bates & A. Maxwell, 2005, Oxford University Press, ISBN 9780198506553.
Summary: A clear, concise explanation of the relevance of supercoiling and catenation in the context of biological activity of the DNA molecule. 
[9] DNA structure and Function R. R. Sinden, 1994, Academic Press, ISBN 9780126457506.
Summary: a timely and comprehensive resource, that provides a simple yet comprehensive introduction to nearly all aspects of DNA structure. It also explains current ideas on the biological significance of classic and alternative DNA conformations.