Sources of Error in DNA Computation

Presentation for Computer Science 881a

Kris Langohr

Abstract | Introduction | Discussion
A Model for Error-Resistant Computation
Advice from the Chemists
References

Abstract

The purpose of this paper is to present information with respect to errors in the biological operations used when implementing DNA computations. The error rates of several of the most common bio-operations are presented along with proposed techniques for limiting the impact of the errors. A model of computation is presented which avoids the use of the notoriously error-prone affinity purification operation. Lastly some useful tips are presented that researchers in the area of DNA computing should keep in mind when designing models of computation.

Introduction

Work in DNA computing can be divided into two distinct areas of concentration. The first area lives in the realm of theoretical computer science and is concerned with the development of algorithms and mathematical systems to perform calculations. The second area lives in the realm of molecular biology and is concerned with the real world implementation of the algorithms designed by the computer scientists. It is this second realm of concentration that will be discussed here. The basis of the molecular biologists work is a set of well-defined operations that are used to accomplish a goal. Each of these operations has a procedure that the biologist must follow if he/she wishes to successfully complete the operation. Unfortunately even if these procedures are followed exactly there still exists a possibility of errors creeping into the final product of the operation. The purpose of this paper is to discuss these sources of error and to present potential solutions to the problems they create.

The Discussion

The fundamental problem that must be overcome for DNA computing to become a reality is the inherent errors that come with the use of biological operations. These errors reduce the reliability of the system to the point that running the same algorithm on the same set of data may not yield the same result. This is, of course, unacceptable and until the problem can be effectively dealt with the future of DNA computing is limited. Let us now examine some of the potential errors that can occur within DNA co mputations.

One of the most commonly used bio-operations is affinity purification. This operation is used to filter a heterogeneous solution of DNA strings with the goal of selecting strings matching a certain criteria. This operation has an accepted error rate of 5%. That is to say that 5% of the time the process will not filter out strings that it should or it will filter strings that it should not. This error rate does not seem unusually high until you consider that an algorithm may perform this operation tens or hundreds of times. This translates to a less than 1% chance of a ``good'' string surviving the entire process if it is iterated one-hundred times. There is hope however. In [1] the authors present two techniques for increasing the survival probability of a given string.

The first technique is based on the fact that the affinity purification process is volume decreasing. This means that through the process of affinity purification the number of DNA strands in solution is reduced. This means that the number of ``go od'' strings is reduced, lowering the chances of a successful run of the algorithm. The obvious solution to this problem is to try and make the algorithm constant volume. This is accomplished by using PCR periodically to increase the number of st rands in the solution. The second technique is an alteration of the encoding scheme. The idea of the altered encoding scheme is to encode all of the information twice. The purpose of doing this is to increase the probability that affinity purification will successfully filter ``good'' strings by increasing the number of possible binding locations.

Another commonly used operation is polymerase chain reaction (PCR). The purpose of this technique is to reproduce the DNA strands within a solution. It is often assumed that this process is error-free. This is, however, not entirely true. PCR depends upon the use of an enzyme called DNA polymerase which will build double stranded DNA. DNA polymerase is prone to what are known as misincorporation errors. This means that the DNA polymerase will accidently make wrong base pairings when assembling a DNA double strand. Typical error rates are in the range of one misincorporation per one-hundred thousand bases. This error rate is low enough to be ignored for small problems with short encodings but will need to be considered when large problems are being solved. Another potential problem with PCR is noted in [7]. The authors found that while doing experiments PCR created DNA strands with unexpected sizes. After much investigation the researchers discovered that, when using large volum es of template, the templates themselves began to interact with one another. The researchers also noted that by reducing the volumes of template within the PCR process these errors can be avoided completely.

Ligation is the process by which double stranded DNA molecules are joined to one another using the enzyme DNA ligase and ATP. This process is relatively error free with the exception that some DNA ligases are ``hungry'', that is they are not picky about what they ligate. This can present problems if a certain degree of specificity is required within ligation reactions. This error can be compensated for by choosing a less ``hungry'' ligase.

Annealing or hybridization, is the joining of two single stranded DNA molecules into a single double stranded molecule. The usual result of this is that complementary bases join to another and a ``normal'' molecule is formed. However it is possible that when two strands anneal they will join imperfectly, causing bulges of unjoined bases within the double strand. As well, it is possible for a molecule to fold over on itself and bind to itself thus forming an unusual product. Thankfully most of these pr oblems are rare and can be guarded against by choosing smart encodings of data.

At this point it is starting to seem that the sources of error are overwhelming. Take heart, several models which use low-error operations have been developed, one of which is presented here.

A Model for Error-Resistant Computation

In [4] the authors present a model of computation that is designed to reduce the current dependence on affinity purification as one of the bio-operations used. The authors present a system that uses four operations, each of which is impl emented by a relatively error-free molecular biology operation.

The model presented in [4] is based on four set operations: remove, union, copy and select. The remove operation will remove from the input set all those strings that contain a given substring. The union operation w ill create an output set that is the union a set of input sets. The copy operation will make a number of copies of the input set. Lastly the select operation will select a random element of the input set. If the input set is empty it will return empty. The authors present solutions to several NP-complete problems including 3-vertex-colourability, Hamiltonian path, subgraph isomorphism, maximum clique and maximum independent set. Since solutions are presented for several NP-comple te problems this means that the model of computation presented will be able to solve any of the problems within NP. The represents a tremendous amount of computational power for the system. This power can only be achieved if a set of bio-operations for the presented operations exists.

The authors provide bio-operation implementations of each of the set operations they have defined. The bio-operations used were chosen because of their low error rates, thus making the model very resistant to errors. A description of the bio-operations used follows.

The remove operation is a composite operation made up of a mark and a destroy operation. The mark operation will mark all the strings in the input set that contain at least one copy of a substring S. This is ac complished by adding an amount of primer corresponding to the complement of S. The primer will anneal to any string that contains S. DNA polymerase is then used to extend the primers. The result of this procedure is that all strings containing S become double stranded. The destroy operation uses a restriction enzyme to cut up the double stranded strings which contain S. The intact single strands can then be removed by gel electrophoresis. The union operation is accomplished by simply pouring into a single tube, several tubes which we want to combine. The copy operation is accomplished by pouring equal volumes of the contents of the input tube into i tubes. Si nce it is assumed that the input tube contains multiple copies of each string, then the assumption is made that each tube will contain at least one copy of each string. The last operation, select, is accomplished using PCR to determine if any stri ngs remain.

The strength of this model is that it eliminates the dependence on the error prone affinity purification. This is very useful since models that do depend on this method have very low probability of success. Another strength is that algorithms based on t his model simply destroy wrong answers, leaving correct answers untouched. This may also help to prevent a failure to capture the correct response.

In reading [4] several possible objections to the described model become evident. The copy operation seems to make assumptions that may not necessarily be true with respect to distribution of strings among the ``copied'' tubes. Th is problem can be overcome if the distribution of strings within the sample is sufficiently random, thus making the solution essentially homogeneous.

As well, the copy operation does not really copy anything as copying implies that you have more than you started with. This problem is more a problem with the semantics of the operations name rather than the operation it represents.

The remove operation requires that you know some substring of the results you wish to remove so that you can use an appropriate restriction enzyme. It seems that there would be situations where this would not be the case. Luckily, if the user is careful in planning the encoding scheme to be used, it is possible to insert a known string into each of the encodings.

Overall this model of computation is reasonable. It demonstrates sufficient power to solve hard problems and can be easily implemented using off-the-shelf components and well understood processes.

Advice from the Chemists

In [2] the Department of Chemistry at New York University provides some useful suggestions for investigators researching DNA computing. There advice pertains to working with DNA and is based on their experiences building unusual DNA stru ctures for genetic recombination and nanofabrication. In the paper the authors make several points about working with DNA which they summarise at the end of the paper. Those points are:

Symmetry Presents Problems to Control: Encodings that are similar will cause problems in determining answers.
Non-Watson-Crick Pairing is possible: It is possible that strange structures may form in solution. Each base appears to be able to bond to each other base including itself.
Correct Protocols for Hybridization are Important: You must ensure that the conditions are appropriate for the formation of the desired structure.
DNA Concentration and Solution Makeup are Important: It is important to use appropriate concentrations of DNA in the formation of complexes. As well the chemical makeup, pH and temperature at which the reaction are run is important.
Ligation is not Perfect: Ligase may attempt to bond the wrong ligands if their structure is similar
DNA Molecules Breathe: Occasionally segments of a DNA molecule will unbind and may be replaced by other strands of the same sequence.
DNA Strands May Join to Form Closed Molecules: It is possible that long DNA molecules may join with others to form closed molecules. This will present problems when trying to interpret results.
Affinity Binding acts as a Filter: The process of affinity binding is imperfect. Special measures, such as multiple tags, may be required to achieve higher success rates.
Complex Ligations Can Cause Problems: Large complicated mixtures used for ligation may form unusual structures such as cyclic molecules.
Restriction Endonucleases Produce Undigested Products: It may be necessary to find methods for removing these undigested molecules.
Treat DNA as a Physical Chemical System: A general purpose system will have to be able to vary the conditions under which the chemical reactions occur. It may be necessary to have conditions that are not optimal for each element of the system but which provide an overall balance.

References

R.Lipton, D.Boneh and C.Dunworth. Making DNA computers error resistant. Proceedings of the 2nd Annual Meeting of DNA Based Computers, pages 102-110, Princeton, 1996.
N. Seeman et al. The perils of polynucleotides: the experimental gap between the design and assembly of unusual DNA structures. Proceedings of the 2nd Annual Meeting of DNA Based Computers, pages 191-204, Princeton, 1996.
Sir John Kendrew, editor. Encyclopedia of Molecular Biology, Blackwell Science, 1994.
M.Amos, A.Gibbons and D.Hodgson. Error-resistant implementation of DNA computations. Proceedings of the 2nd Annual Meeting of DNA Based Computers, pages 87-99, Princeton, 1996.
P.Berg and M.Singer. Genes and Genomes, University Science Books, 1991.
N.Campbell. Biology, Benjamin Cummings Publishing Company, second edition, 1990.
A.Libchaber, G.Cecchi and P.Kaplan. DNA based molecular computation: template-template interactions in PCR. Proceedings of the 2nd Annual Meeting of DNA Based Computers, pages 159-165, Princeton, 1996.
J.Royer, J.Simon, S.Kurtz and S.Mahaney. Active transport in biological computing (preliminary version). Proceedings of the 2nd Annual Meeting of DNA Based Computers, pages 111-121, Princeton, 1996.

If you have any questions I can be reached at

langohr@csd.uwo.ca