Published by Geoff Harrison | 6 July 2023
A very good case that explains DNA testing and reporting is the murder trial and appeal of: Xie v R [2021] NSWCCA 1 which is extracted below as it related to the DNA testing. Of relevance was the discovery and testing of stain 91 which was located in the accused 's garage. Stain 91 contained the blood/DNA of four of the five victims.
....
Discovery of Stain 91
91. As noted, late in the afternoon of 13 May 2010, a stain that became known as Stain 91 was discovered on an area of the floor of the garage at the Beck Street home. It was located on the floor, in a location previously covered by a “tallboy”, by forensic biologists Ms Melanie Le Compte, Ms Nicole Campbell and Ms Jae Gerhard.[183] It was an elliptical shape, approximately 2cm long and 6mm at its widest point and described as having a “dark grey/brown staining/disclouration”.[184]
92. At the scene an “O‑tol” test was performed on the stain which produced a strong reaction (“dark, intense blue colour”).[185] The forensic biologists who found Stain 91 thought that it looked like an aged bloodstain. Presumptive testing of the stain was positive for blood, with the strength of the reaction indicating that it was more likely than not to have been a true positive.[186]
Mr Walton’s Evidence on DNA Testing
93. At the time of the 2016 trial, Mr Walton was the acting manager of the NSW Forensic and Analytical Science Service (“FASS”) which was formerly known as the Division of Analytical Laboratories (“DAL”). He holds an Honours degree in science and has worked as a DNA analyst since 2001. Neither at the trial, nor on appeal, was there any challenge to his expertise or his evidence. Mr Walton gave evidence on three occasions during the 2016 trial. On the second occasion he explained his analysis of Stain 91.[187] On the third occasion he was recalled to address various matters that arose during the evidence of Dr Perlin.
94. On the first occasion that he gave evidence, Mr Walton explained DNA and DNA analysis as follows.[188] A deoxyribonucleic acid (or DNA) double helix is the fundamental unit of genetic material. Each cell of the human body, other than red blood cells, has a nucleus. Inside the nucleus there are 23 pairs of chromosomes each of which consists of a segment of DNA[189] of different length.[190] One of the chromosomal pairs determines gender. Biological females have two X chromosomes and males have one X and one Y chromosome. The remaining 22 pairs of chromosomes determine various human characteristics. Each member of the pair of chromosomes is inherited from one parent.[191]
95. Specific areas on the DNA strand are known as locus. Those areas that contain coding information, such as hair colour and eye colour, are known as genes.[192] DNA testing focuses on the non‑coding areas as they contain differences between individuals that aid identification of a person’s DNA.[193] The relevant form of testing for this case involves detecting Short Tandem Repeats (“STR”), that is, analysing these loci and determining the number of repetitions of a particular combination of the four “bases” of DNA (Adenine, Thymine, Guanine and Cytosine) at that location.[194]
96. Three tests adopted in NSW laboratories over time were the “Profiler Plus”, which examines 9 loci plus an area that determines biological gender, the Identifiler Kit, which examines 15 loci plus an area that determines biological gender and the “PowerPlex 21”, which examines 20 loci plus an area that determines biological gender.[195] Most of the loci examined with the PowerPlex 21 were located on different chromosomes although, according to Mr Walton, “[t]here is a couple on the same chromosome”.[196] In addition, there is a “Y filer” test which examines only the Y chromosome passed from fathers to sons,[197]specifically 16 “areas” or loci on the Y chromosome.[198]
97. At each locus, each person has two DNA sequences known as alleles, one from each parent.[199] Across the population at that locus there are many different alleles as measured by STR.[200] One sense of the phrase “genotype” is to describe a pair of alleles at a particular locus.[201] For example, at a particular locus a person may have a genotype of “10,12” meaning that there were “10” and “12” short tandem repeats of a particular combination of four bases at that locus.[202] Another sense of the phrase “genotype” is to describe the collection of pairs of alleles for a particular DNA sample, which in the case of PowerPlex 21, is 21 pairs.[203]
98. Mr Walton explained to the jury that DNA analysis involves extracting DNA from a sample taken from a suspect, crime scene, victim or location, measuring it, amplifying it by making many copies of the area of interest (known as the Polymerase Chain Reaction method; “PCR”)[204] and then subjecting it to an STR analysis.[205] A graphical representation of the digital results of the analysis can be presented in the form of an electropherogram (“EPG”) which, in the case of a simple sample concerning one person, illustrates the presence of two alleles at a particular locus by two peaks. Hence, a simplified example of part of an EPG for a DNA sample from one contributor is as follows:[206]
99. This extract from an EPG depicts a DNA sample that a STR analysis shows as having an allele pair of “15,17” at one locus, “14,18” at another and “19,21” at a third. Mr Walton explained that an EPG for the locus, known as amelogenin (“AMEL”), shows the result for an allele that depicts gender. He explained that the y‑axis to this image measures “Relative Fluorescence Unit” (“RFU”) and can be considered “roughly equivalent to the amount of DNA that may have been present”.[207] Mr Walton described the small bumps in the above EPG as “various artefacts” which the interpreting scientists are “trained to try and identify”.[208] He explained that some are “technical artefacts, due to power fluctuations and things like that and some of them are due to the actual copying or testing process”,[209] which he described as a “stutter”.[210]
Reporting Results
100. Mr Walton explained that, if an EPG revealed more than two peaks at a particular locus, then that would confirm that there was more than one contributor to a sample. He explained that there would then be a determination of the minimum number of contributors, such that “if we have an area that has three or four peaks, then that indicates there’s at least two people there”, but it “could be three” as there may be common alleles between the contributors.[211] Mr Walton said that with mixtures it is possible to exclude a person as a contributor if their DNA profile (ie, combination of allele pairs) does not “match” the sample, in the sense of there being a correspondence with the peaks on an EPG.[212] If there is a correspondence between the profile contained in the sample of interest and a reference sample, it means that person “is not excluded”. In that circumstance, there are two explanations for the presence of the person’s DNA profile being: that it is either their DNA or, alternatively, it is a random match.[213] In that event, an assessment is made of “how likely a random person could match” that sample and that is made by reference to how “common it might be in the population”.[214] Mr Walton explained that this is undertaken by reference to a population frequency database, which is Australia‑wide and broken down into Caucasian, Asian and Aboriginal databases.
101. As it is of relevance to the attack mounted on Dr Perlin’s evidence on appeal, Mr Walton’s evidence concerning the construction of the population database should be noted:[215]
“A. ... What we’ve done is we’d create a population database. We get some samples, we [take] them and get profiles, and we break-down all those profiles and determine for a particular area, how much of each type is in that group of samples that we’ve taken. For the statistical and scientific purposes we try and get a random sample for the population for doing these things. In the past, when we did our Profiler Plus testing we had 739 samples that we took randomly from the population and broke them down and got a frequency table for the Profiler Plus types. Q. A frequency table, is it a case what you’re saying with all those samples that you got, there was a record made of each type that was present in a particular area and you could then work out how common a particular type was over the entire population that you’d put in? A. That’s right, and we use that [to] extrapolate it out to the entire population.” (emphasis added).
102. The reference to a “particular area” in this evidence is to a particular loci. Thus, Mr Walton explained that the population database is created by determining, from the “739 samples”, the frequency of a particular allele pair at a particular locus. He explained that this approach was “conservative”, in the sense of increasing the estimate of a random match, in two respects. First, because if a particular allele pair is not present in the sample it is nevertheless assumed to be present in 2% of the population.[216] Second, because this method treats the presence of a particular allele pair as independent of the presence of another pair, it therefore assumes an equal likelihood that “any person in the population could mate with any other person in the population”.[217] According to Mr Walton, experience shows that there is a greater tendency for people to marry within their racial and ethnic groups.[218]
103. Mr Walton also said that the possibility of a random match in Australia was calculated by reference to three different populations, namely Caucasian (European/Middle Eastern), Asian and Aboriginal,[219] and “the lowest one, the most common one [is used] as the statistic for the general population”.[220]
104. Mr Walton explained that the “preferred way of reporting”[221] was to use a “likelihood ratio” that compares two different hypotheses that explain the evidence.[222] Hence, with a single source a likelihood ratio will compare two hypotheses, namely, the probability that the evidence can be explained by a contribution from a particular victim (or offender) compared to the probability that “it comes from someone else who matches by chance”, with the result that “we can get a statistic to explain those two”.[223] With mixtures involving more than one contributor, Mr Walton stated that a likelihood ratio can be provided by comparing two hypotheses, for example the probability that the evidence is explained by DNA being contributed by person A and an unknown person, compared to two unknown persons.[224] However, such a ratio can only be provided if it is known (or assumed) “how many people [are] in a mixture”.[225]
105. Mr Walton explained that, depending on how much DNA of each of the contributors is present, which is measured by the height of the various peaks on the EPG, the DNA sample is often separated into a “major” and “minor” component.[226] With some samples these components may themselves be a mixture of different DNA contributors.[227] Mr Walton explained that some DNA mixtures are too complicated for analysts to determine individual contributors,[228] although in cross‑examination he added that the minimum number of contributors can be determined.[229] Mr Walton foreshadowed Dr Perlin’s evidence by explaining that there are “computer programs that have been developed specifically for interpreting complex mixtures in DNA profiles”.[230]
Mr Walton’s Evidence on Stain 91
106. On the second occasion that he gave evidence at the 2016 trial, Mr Walton told the jury that Stain 91 was subjected to each of the Y filer, Profiler Plus, Identifiler and PowerPlex 21 tests, although the latter could not be performed until it became available in 2013.[231] Of present relevance are the EPG results for the PowerPlex 21 which were displayed to the jury.[232] Mr Walton addressed the results at various loci. For the locus that determines gender,[233] the results showed two large peaks of slightly different height. Mr Walton explained that this meant that the results definitely included a male and, given that the other loci indicated multiple contributors, “there might be multiple males present”,[234] although the small height differential in the peaks “may indicate there is also a female individual present”[235] or “it could just be an imbalance from the testing”.[236]
107. Mr Walton told the jury that the difference in the peaks on the results for Stain 91 allowed for a “differentiation of what we would call a major component and a minor component part”.[237] He said that, given the number of peaks at some locations, he “determined there was at least three individuals ... in the major component”[238] and “at least one” in the minor component,[239] yielding a minimum of four contributors.[240] In relation to the major component, Mr Walton said that at least one of the contributors was male and most likely all were. However, he could not exclude a female contributor,[241] although “[if] there’s a female present, then it probably indicates it’s more likely to be in the minor component”.[242]
108. Mr Walton said that each of Min, Henry and Terry Lin could not be excluded as one of the major contributors because all their genotypes (ie, pairs of alleles) were present in the major component.[243] He said that each of Lily Lin and Irene Lin could be excluded as major contributors, but could not be excluded as minor contributors.[244] He added that the sharing of allele pairs by the major contributors “can be indicative of related individuals”.[245] Mr Walton stated that, with one of the locus for Stain 91, the major component was all “consumed” with a peak for a STR of “11”, meaning that all of the contributors to the major component were “homozygous”, ie, of the same allele type.[246] Mr Walton said that the appellant, Kathy Lin and XX were excluded as contributors to the major component.[247] He said he could not exclude Kathy Lin and XX as contributors to the minor component, but contrasted their position with that of Lily and Irene Lin in relation to the minor component as follows:[248]
“...They [Lily and Irene Lin] were not excluded [from the minor component], because I could see all their types, but many others, such as Kathy could not be excluded because even though I couldn’t see all their types, there is a possibility that their types could be present and just have dropped out of the mixture, and we could not see them.”
109. Mr Walton then referred to the Y filer testing of Stain 91. With the major component, he stated that he could only identify one “Y profile” from the major component and it was the same Y profile that each of Min, Henry and Terry Lin shared. It was different to the appellant’s and XX’s Y profile.[249] In relation to the minor component, Mr Walton said that there was “no trace of a second Y” so that it was “probable that they are not in the mixture”, although that possibility could not be excluded.[250]
110. As noted above, a Y filer test generates a profile referable to 16 loci on the Y chromosome. Mr Walton said that the Y profile in the major component of Stain 91 which matched Min, Henry and Terry Lin was compared to a database containing 2200 samples taken from NSW residents, one of which was Min Lin because his DNA was collected following a theft from his newsagency some years previous.[251] The only match was to Min Lin’s sample. Based on that Mr Walton stated:[252]
“... we determined that all male relatives of the Lin family cannot be excluded [as contributors to the major component], and approximately one in 760 unrelated males in the general population cannot be excluded as the source of that Y profile.”
111. Mr Walton also said that the Y profile was compared to an international database containing 30,300 Y profiles, which included “just over 5,000 Chinese profiles”, and yielded no matches.[253]
112. Mr Walton concluded that the DNA mixture extracted from Stain 91 contained a minimum of four contributors (“definitely four”[254]) and possibly more (“[f]ive, yes, or six or seven or – yeah, or more”[255]). Mr Walton said he could not exclude Brenda Lin as either a contributor to the major component or minor component,[256] although if she was a contributor, it was more likely to the minor component than the major component given his conclusion that probably all three minimum contributors to the major component were males.[257] He undertook a so‑called “Random Man Not Excluded” calculation to determine how large a segment of the population was not excluded from contributing to Stain 91.[258] He determined that one in 730,000 members of the Australian Caucasian population and one in 210,000 members of the Australian South-East Asian population could not be excluded as a major contributor to the mixture.[259] He explained that, taking the latter figure, it was the “equivalent of saying out of every million people we would expect five to not be able to be excluded” from the DNA mixture in Stain 91.[260]
113. The cross‑examination of Mr Walton did not seek to challenge any of the above conclusions but was instead directed to the validation studies of Dr Perlin’s TrueAllele system, and the potential for human bias to affect TrueAllele’s results. The evidence in relation to the validation of TrueAllele is addressed in detail below. In summary, Mr Walton explained that TrueAllele had been validated for “four contributors to a mixture and are trying to do five people” but the process was time consuming.[261] In relation to the potential for human bias, Mr Walton observed that the “computer program will analyse the sample without any bias”.[262]
114. Mr Walton was also cross‑examined on the possibility that, in the swabbing of Stain 91, the forensic biologists may have collected DNA from outside the area of the stain[263] and the results for other stains.[264]He explained that with some very small peaks on an EPG an analyst will exercise a judgment to exclude that peak as representing a peak from a contributor and instead treat it as “static”.[265]
Dr Perlin’s Evidence
115. During a break in Mr Walton’s evidence, and in the absence of the jury, it emerged that counsel for the appellant was raising an objection to the admissibility of Dr Perlin’s evidence. Whether that objection was ultimately pressed and its disposition is the subject of Ground 2 of the appeal. It is addressed below and rejected. Of present relevance is that, after Mr Walton had left the witness box for the second time and before Dr Perlin was called, counsel for the appellant agreed that it “must be accepted” that the DNA “of the three deceased males was ... represented in [the] mixed profile”;[266] ie, it was accepted that the DNA mixture obtained from Stain 91 included the DNA of Min, Henry and Terry Lin. Counsel for the appellant advised that the real issue was whether Brenda Lin was also present and Irene and Lily Lin were not.[267]
116. This concession is of particular significance to an assessment of the scope and content of the evidence that was led from Dr Perlin at the 2016 trial and the criticisms that are made of it on appeal, especially those that concern the potential presence of DNA from Min, Henry and Terry Lin in the DNA mixture obtained from Stain 91. In particular, these latter criticisms all concern attempts to reagitate a matter that was not in issue at the 2016 trial. The making of this concession by counsel for the appellant is one of the particulars of Ground 8 of the appeal. It is addressed below and also rejected.
Dr Perlin’s Qualifications
117. Amongst other academic qualifications, Dr Perlin has a PHD in mathematics, a PHD in computer science and a medical degree from the Prizker School of Medicine at the University of Chicago. His PHD in mathematics concerned probability theory and his PHD in computer science concerned artificial intelligence.[268] At the time Dr Perlin gave evidence at the 2016 trial, he was an adjunct Professor at Duquesne University in Pittsburgh, Pennsylvania and a member of the American Academy of Forensic Science, the American Society of Human Genetics and the American Statistical Association.[269] He has published extensively on the topic of statistics and DNA analysis.[270]
118. Dr Perlin is the Chief Executive Officer and Chief Scientist of Cybergenetics, a company owned by him and his family, which predominantly undertakes forensic DNA analysis, although it also undertakes research in “genetics, cancer research and medical diagnostics”.[271]
119. Dr Perlin told the jury that he had been involved in “400 cases to date” for either the prosecution, police forces, defence or “innocence projects”, predominantly in the USA,[272] but also in Canada, England and the Netherlands.[273] Of those 400 cases, 79 cases involved an examination of samples that had five or more contributors, being a total of 134 items.[274] A list of those cases was marked for identification.[275]
TrueAllele’s Determination of Match Statistics
120. Dr Perlin developed the TrueAllele software. He explained its operation to the jury at the 2016 trial. Dr Perlin stated that, with each of the 20 loci the subject of the Profiler Plus tests, there were approximately 100 potential allele pairs. Treating the areas as genetically independent meant that there are 10020 (or 1040) possible combinations of allele pairs or genotypes “which is a vast number of possible genotypes relative to the number of people [on] earth”, being 1010 (in fact less).[276]
121. The first step undertaken by TrueAllele is to take the results of the DNA testing of the evidence sample provided, in this case Stain 91, and without considering any reference sample (eg Min, Henry or Terry Lin’s DNA), determine the probability that a particular contributor had a particular allele pair at a particular locus.[277] The result of this step is sometimes described as the “inferred genotype”. The nature of this inferred genotype and what was conveyed about it at first instance was the subject of debate on appeal. It suffices to state at this point that it is not a single set of 20 allele pairs, but instead a probability distribution of allele pairs, in that it attributes a probability to each possible allele pair that may be found at a particular locus. As we will explain, contrary to the appellant’s submissions, that matter was clearly conveyed by Dr Perlin to the jury and judges who sat at first instance.
122. Dr Perlin explained to the jury the process of inferring the probability of a particular allele pair at a locus being a contributor to a mixed DNA sample.[278] He commenced with a simple example in which the jury was shown a slide depicting a sample EPG result for one locus showing a STR peak for each of 10, 11 and 12, with the peak for 12 twice the height of each of the peaks for 10 and 11.[279] On the assumption there were two contributors and one of the contributors had an allele pair of 10,12, then by a process of matching by trial and error, or in effect running simulations using the “Markov-chain Monte Carlo” method,[280] it could be inferred that the probability that the other contributor had an allele pair of 11,12 was 50%, the probability they had an allele pair of 11,11 was 30% and the probability they had an allele pair of 10,11 was 20%.[281]
123. Dr Perlin’s explanation to the jury of the next steps in the process of “inferring” genotypes utilised three slides. The jury was first shown the following two slides which set out a portion of the EPG for Stain 91 at a particular locus known as “FGA”:[282]
Slide 6[283]
Slide 7[284]
124. The figures 19, 21, 22, 24 and 25 represent the STR (ie, number of repeats) yielded by the sample at this locus.[285] The y‑axis measures the RFU of the relevant STR peak. As noted, it is in effect a measure of the quantity of the DNA mixture that has that STR score.[286] With the smaller peaks, Dr Perlin, referring to slide 7, explained that human analysts will apply a RFU threshold before they will regard such a peak as recording the presence of an allele with a particular STR.[287] Dr Perlin explained that the analysis undertaken by TrueAllele, indicated by slide 6, uses all the data, including very small peaks, “as potential events that come from alleles”,[288] although the size of the peak affects the probability attributed to its presence because it is considering the “degree” to which an allele is present.[289] Dr Perlin also stated that TrueAllele uses the pattern of highs and lows in the peak heights in inferring genotypes, as well as variations in the peaks.[290]
125. Dr Perlin then explained the operation of TrueAllele by reference to the following slide:
Slide 8[291]
126. This slide is a graphical representation of one combination of allele pairs that are generated by TrueAllele to explain the results for the DNA mixture at locus FGA (ie, “inferred” allele pairs). Dr Perlin told the jury that in this example TrueAllele has been given an assumption of three contributors[292] and, as this combination of allele pairs has a result that is “very similar to the underlying data”, it “confers [a] higher probability to the different genotypes [on] this proposed pattern”.[293] This process of generating combinations of allele pairs also includes variations in the level of contribution from each such pair to the entire DNA sample, bearing in mind the RFU level for each STR score; eg, for the third person’s allele pair depicted in this slide “there may be a 10% contributor instead of a 50% contributor”.[294] However, the other possible allele combinations are not excluded by TrueAllele. Instead, the higher probabilities are attributed to the allele pairs noted above and all the other allele pair possibilities are assessed with relatively low possibilities.[295]
127. Dr Perlin explained that this process is repeated as follows:[296]
“... What the computer does is it takes a look at all the different possibilities of closely‑fitting patterns like this one, and 100,000 other patterns that denote it as well, and from that it gives greater probability to the genotypes that participate in better explanations, lower probability to those genotype values that are not part of better explanations and, when it's done for every area or locus and for each of the contributors ‑ three in this case ‑ it assigns probability to each of the possibilities and what it does is it works out what the probabilities are for each genotype at each area and for each contributor.”
128. In this extract Dr Perlin referred to “100,000 other patterns”. He later explained that TrueAllele ran “100,000 or 200,000 different possibilities for each locus”.[297]
129. A simplified example of the result of this process just for the locus “FGA” was then presented to the jury in the following slide:
Slide 9[298]
130. Dr Perlin explained that this simplified probability distribution for locus “FGA” concerned the first of the three assumed contributors for the DNA sample being considered.[299] Thus, for that contributor there would be a 35% probability that it would have an allele pair “19,21”. The probabilities in this slide total 99%. The remaining 1% is spread between all the other possible allele pair combinations that could occur at this locus. Dr Perlin told the jury that the “probability is concentrated largely” on the “three possible genotypes, the 19,21, the 19,24 and the 21,24” and the “other 95 or so possibilities have lower probabilities than these genotype possibilities”. [300]
131. As it is of some significance to the complaints made on appeal, it is appropriate to note Dr Perlin’s explanation of the process undertaken by TrueAllele at this point:[301]
“Q. Therefore, it [TrueAllele] goes through creating these possibilities for each of the three [assumed] contributors in respect of the particular data; is that right? A. Yes, so ‘for FGA’ there’d be a genotype of probabilities like this for the blue contributor, that’s the major; there’d be a separate genotype for the orange contributor, that’s the middle genotype; and for the third genotype, that’s the minor one in green, there would be a listing with probabilities, like what we’re seeing here on slide 9, there would be a different genotype for the minor contributor.” (emphasis added)
132. The references to colours in this answer are to those set out in slide 8 (see [125]). The significance of this answer is that Dr Perlin is clearly describing that each inferred genotype for each assumed contributor that is created by TrueAllele is a set of probabilities of the presence of a particular allele pair at each locus; ie, a probability distribution. In his evidence on the voir dire before Johnson J, Dr Perlin explained that with “probabilistic genotyping ... you end up with a probability distribution for each of the separated contributors” and not an “exact” or “definite genotype” which is not “needed to compute a match statistic”.30[302]
133. Having generated the probability distribution for each locus without regard to any reference sample, TrueAllele then compares that distribution to the DNA profile of the reference sample (eg, Min or Terry Lin) to generate a “match statistic”. This was explained to the jury using the following slide:
Slide 10[303]
134. The blue bars on the left of each pair in this chart represent the calculated probability that each allele pair formed part of the genotype of this inferred contributor to the analysed mixture. The brown bars on the right of each pair “represent the genotype of a random person ... selected from the population”[304] and, in particular, the probability of that allele pair’s occurrence in the general population (or in this case the Australian Asian Population).[305] Dr Perlin explained that there would be about “100 of those [allele pairs], even though only four are shown”, representing all the possible allele pair combinations at that locus.[306]The circled pair of bars concerns the allele pair “19,24” which is the allele pair possessed by Terry Lin’s DNA at this locus;[307] ie, the allele pair at this locus of the reference sample. The figure of 2.5% represents the probability of the random occurrence of that allele pair in the Australian Asian Population (as derived from the Population or Frequency Database for the Australian Asian Population).[308]
135. From these figures the “match statistic” is derived. Dr Perlin explained that “what the [math] lets us do, essentially, is ignore the other allele pairs and simply divide the probability of the genotype after we've seen the data, the probability of an evidence match, the blue bar height, by the chance of a coincidental match, the brown bar height”.[309] In the case of Terry Lin, at this locus this meant dividing 34% by 2.5% which the slide suggests is 13 (but is in fact 13.6). This is the match statistic for this locus. This process is then repeated at all 20 loci with the match statistics multiplied by one another.
136. A total match statistic that exceeds one provides “inclusionary support” for the reference sample while a match statistic that is less than one provides “exclusionary support”.[310] The extent to which the total match statistic exceeds or is lower than one is a measure of the strength of that support.[311] Hence, a total match statistic of one in a million is relatively strong exclusionary and a match statistic of a million is relatively strong inclusionary.[312] In the above example, a match statistic for the allele pair “24,24” would produce a match statistic at that locus of less than 1 (as the right hand bar exceeds the left hand bar) which would tend to exclude someone with that allele pair. However, any overall assessment of that would be based on the match statistic derived from multiplying the statistic for all 20 allele pairs.[313] Dr Perlin stated that it was not uncommon to obtain different results for different loci.[314]
137. At this point, it is appropriate to explain the underlying rationale for the comparison between the probability of occurrence of the reference samples’ allele pair in the relevant mixture and the probability of its occurrence in the general population. The heading to slide 10 asked “How much more does the victim match the evidence than a random person?” (see [133]). Before the jury Dr Perlin expanded on this as follows:31[315]
“Q. So in terms of that 13, does that represent how much more Terry matches the evidence than a random person does, in accordance with your heading on the slide? A. Yes, there are several ways to look at the math. There are a number of different formulas. They all give the same number. In the end it is a match statistic. One can look at the math as a relationship between the degree of match of a victim's evidence and a random person, or one can look at ratios of genotype probabilities. They give you the same number. They are different ways of understanding the same concept. Q. Does the concept of likelihood ratio come in, in terms of being another way of describing essentially a match statistic? A. Yes, there's a mathematical rearrangement that can be done with formulas. I've written on it so other scientists using Bayes' theorem, which is a way of bringing likelihoods in and probabilities, and rearranging, in which there's an equivalent form where this same number can be written algebraically as a likelihood ratio, the support of the evidence under one hypothesis divided by the support for evidence conditioned on a different hypothesis.” (emphasis added)
138. The first answer in the above extract clarifies that the match statistic involves a comparison between the “degree” of the match between, in this case, a victim’s DNA and the evidentiary sample, with the degree of a match with a random person. This answer is of particular significance because of the complaint made on appeal that Dr Perlin’s methodology was flawed because, inter alia, it ignores the existence of allele pairs with higher probabilities at a particular locus than the reference sample, such as “19,21” which had a 35% chance of contributing, compared to “19,24” which had a 34% chance of contributing to the DNA mixture. At this point it suffices to note that Dr Perlin clearly told the jury that the other allele pairs were ignored at this point of the analysis because the relevant comparison was between the probability of occurrence of the reference sample and the probability of occurrence of a random person.
139. As stated by Dr Perlin, this reasoning deploys “Bayes' Theorem” which is a means of describing the probability of an event based on prior knowledge of conditions that might be related to the event. The theorem holds that the odds of a hypothesis given certain evidence (the “posterior odds”) equals the odds of the hypothesis without that evidence (the “prior odds”) multiplied by the ratio between the probability of that further evidence if the hypothesis is true and the probability of that further evidence if the hypothesis is not true (ie, the likelihood ratio) (see D Hodgson, “A Lawyer looks at Bayes Theorem” (2002) 76 ALJR 109 at 109 to 110; cited by Spigelman CJ in R v Galli [2001] NSWCCA 504; (2001) 127 A Crim R 493; “Galli” at [55]). The application of Bayes’ Theorem by juries to non‑statistical evidence or a combination of statistical and non-statistical evidence has been discussed but generally deprecated (Hodgson supra; R v Denis Adams [1996] 2 Cr App R 467; R v Denis Adams (No 2) [1998] EWCA Crim 2364; [1998] 1 Cr App R 377).
140. The underlying logic behind the application of Bayes’ Theorem by TrueAllele in this context can be illustrated by a simplified analogy based on an explanation that Dr Perlin provided in his evidence before this Court.[316] Assume that TrueAllele determines the probability that an inferred contributor to a DNA mixture had a particular allele pair is 30%, that that allele pair corresponded with a person having blonde hair and that a reference DNA sample of a victim also contained that allele pair. (Although generally nuclear DNA testing does not concern genes that affect a person’s appearance or other characteristics). According to Dr Perlin, that would not provide information about the likelihood that the victim’s DNA was present in the mixture unless it was also known what the incidence of that allele pair is in the relevant population. If, for example, all the relevant events and persons concerned the Scandinavian states, and assuming that the incidence of this allele pair resulting in blonde hair in the Scandinavian states was 60%, then the TrueAllele type analysis at this locus would yield a match statistic of ½ (30/100 divided by 60/100). Such a figure would tend to exclude that victim as a contributor to the sample (although any assessment of that would require a consideration of all allele pairs). On the other hand, if all the relevant events occurred somewhere else in the world where the incidence of this allele pair at this locus resulting in blonde hair was say, 3%, then that would yield a match statistic of 10 (30/100 divided by 3/100) and that circumstance would tend to support a conclusion that the victim was a contributor to the sample. In the Scandinavian states example, there is a lower chance that the inferred contributor has blonde hair compared to the general population and that circumstance tends to exclude a victim with blonde hair as having contributed to the sample. In terms of Bayes’ Theorem, the prior odds decrease by a factor of 2. In the other example, there is a higher chance that the inferred contributor has blonde hair compared to the general population and that circumstance tends to include a victim with blonde hair as having contributed to the sample. In terms of Bayes Theorem, the prior odds increase by a factor of 10.
Reproducible Results
141. Dr Perlin explained that, as the first step in this process involves the running of hundreds of thousands of simulations, then it follows that a rerunning of TrueAllele will not produce an identical probability distribution even if the same number of contributors is assumed and the same number of simulations is performed. Instead, one should expect “to get a slightly different result”.[317] He said that where a number of runs are performed, he would report a “typical value”.[318] In this case, “computer runs [were undertaken] assuming three, four or five contributors”, but Dr Perlin chose to report “the consistent, or more conservative numbers that [were] found on the three assumed contributor runs”.[319] However, as indicated below, he gave evidence before the jury of results for four and five contributor runs as well in respect of each person.
TrueAllele’s Determination of Contribution Levels
142. As noted, after being given an assumption about the number of contributors in a sample, TrueAllele then determines a DNA profile of each such contributor in the form of a probability distribution for each possible allele pair at each locus (the so called “inferred contributor”). In doing so, it also determines a percentage contribution that each such inferred contributor made to the DNA sample based on the heights of the various peaks on the EPG. Dr Perlin told the jury that “as the computer is determining the different variables that it’s solving for, one of those variables is the relative quantities of the amounts of DNA”.[320] Dr Perlin told the jury that there is a “strong correlation” between the calculated contribution percentage for an inferred contributor and the size of the match statistic.[321]
Match Statistics for Stain 91 and Min, Henry and Terry Lin
143. The jury was then shown a slide that Dr Perlin explained set out the match statistic at each of the 20 loci for Terry Lin, assuming there were three contributors to the DNA found in Stain 91.[322] The jury was then shown the following slide which set out the match statistic for Terry Lin derived from multiplying those 20 match statistics:[323]
“Is the victim in the evidence? A match between the garage floor and Terry Lin is; 50.4 quadrillion times more probable than a coincidental match to an unrelated Asian person 80.8 quintillion times more probable than a coincidental match to an unrelated Caucasian person.”
144. The logarithmic equivalent to the first of these figures is 16.70 (ie, 1016.7). Dr Perlin told the jury that if four contributors were assumed then the result is “essentially the same”, being 16.86 (ie, 1016.86 compared to 1016.7).[324] Similarly, with five assumed contributors the match statistic was 1013.11.[325] Dr Perlin said that each of these figures was associated with what TrueAllele determined was the major component of the mixture, specifically 59% when three contributors were assumed, 55% when four contributors were assumed and 31% when five contributors were assumed.[326]
145. Dr Perlin told the jury that, from his knowledge of the various validation studies that have been performed, the instances of “false positives” (ie, an inclusionary match statistic for someone whose DNA is not present[327]) decrease as the match statistics increase. He said that once they “reach a match statistic of 1,000 or three zeros we rarely, if ever in the studies, will see a false positive match even if millions of comparisons are examined”.[328] He was then asked:[329]
“Q. By the time you are getting to a match statistic of 1000 you’re not seeing false positives, is that what you’re saying? A. Yes. Q. So bearing that in mind, in respect of the testing that was done for Terry Lin and item 550, and the statistics that you have outlined, what is your view as to whether or not he is included in item 550? A. With match statistics ranging from 10 trillion to 10 quadrillion, it would be extremely unlikely, in fact, a chance of less than 1 in 10 trillion, that he wouldn't be present. Q. ‘He wouldn't be present’. So does that equate to being very strong statistical support for him being present? A. Yes, 1 in 10 trillion or 1 in 10 quadrillion is a very small number of a chance of a false positive.” (emphasis added)
146. The second answer in this extract is the subject of criticism by the appellant and Professor Gill on the basis that it evinces the “prosecutor’s fallacy”. This criticism is addressed below.[330]
147. In relation to Henry Lin, the jury was shown a slide similar to the one above which stated a “match between the garage floor and Henry Lin is ... 2.21 billion times more probable than a coincidental match to an unrelated Asian person” (assuming three contributors).[331] He was then asked:[332]
“HER HONOUR Q. So again, converting that, what is the way of expressing that statistically when the question is whether Henry Lin is not in the mix? You've said, so far as Terry is concerned, that statistically there is less than 1 in 10 million chance that Terry Lin is not in the mix. How does one express it for Henry Lin? A. I believe for Terry, the number would have been 1 over 10 trillion. HER HONOUR Q. Yes? A. Whereas for Henry Lin, the chance that he is actually not there, but we're getting a number this large, would be less than 1 in a billion; actually less than 1 in 2 billion; it's 1 over the match statistic.” (emphasis added)
148. Again, these answers were criticised on the basis that they are said to involve the prosecutor’s fallacy. As explained below, Dr Perlin defended these answers on the basis that mathematically he was correct in attributing the maximum likelihood of seeing this match statistic for a person who was not in fact a contributor as being the inverse of the match statistic.
149. Dr Perlin then provided the figures based on four assumed contributors. Dr Perlin referred to two “runs” of TrueAllele for four assumed contributors which produced match statistics for Henry Lin of 106.88 and 108.96 and a run of TrueAllele that assumed five contributors where the match statistics were “in the billions”.[333] In relation to the percentage contribution, Dr Perlin explained:[334]
“Q. Can you tell us what, if anything, the testing showed in relation to whether Henry Lin would be a major contributor or a minor contributor? A. Yes. In the reported three‑contributor run, he is in together with Terry's genotype, in the 59% component; in the four contributor run, he is in the 43, or 55% component; and in the five contributor run, where his match statistic is in the billions, there he is a 31% contributor, though a different 31% contributor than Terry.
That's assuming five contributors, they are separated out and each is one of the major 31% contributors, though separate 31% contributors, different genotypes.
Q. So overall, does that suggest that he is present as a major contributor within item 550? A. Yes, it does.” (emphasis added)
150. The jury was then shown a slide referable to Min Lin which stated that a “match between the garage floor and Min Lin is ... 226 thousand times more probable than a coincidental match to an unrelated Asian person” (assuming three contributors).[335] Dr Perlin explained that one four contributor run of TrueAllele yielded a match statistic for Min Lin “at the level of” 108.72 , another four contributor run yielded a match statistic “in the, nine zeros” and a five contributor run yielded a “match statistic [that] had 11 zeros in it”, being 1011.43.[336] Dr Perlin explained that his contribution was assessed as 30% when three contributors were assumed, “around 30% out of four contributors ... and, when assuming five contributors for that separation, his percentage was 26%”.[337] Dr Perlin was asked:[338]
“Q. So, again, in terms of the issue of him not being present within the mixture, how would you express that result? A. Looking over all the results, I would say that the chance of having a match statistic this high, but Min not being present is in the order of 1 in a billion. Q. And you told us earlier in your evidence about what you've done, in terms of testing and the concept of false positives. Does that figure that you've just outlined really demonstrate just how far beyond the false positive number that you've given of the thousand, potentially, are relevant to the relevant statistical number, in this instance? A. Yes. It's based on looking at all the match statistics.”
Match Statistics for Irene and Lily Lin
151. The jury was then shown a slide stating that a “match between the garage floor and Yun Bin Lin [ie, Irene Lin] is ... 28.5 thousand times more probable than a coincidental match to an unrelated Asian person” (assuming three contributors).[339] Dr Perlin said that with four assumed contributors the relevant match statistic was 104.37 and with five contributors it was 105.23.[340] In relation to whether she was present as a major or minor contributor, Dr Perlin stated that, with three assumed contributors, “she is present in a 7% component, along ‑ in a female fraction”, with the results “getting a better separation with four, in one result she was at 11%, another at 8%. And with five unknown contributors ... she is present at 11%”.[341] It was put to Dr Perlin in cross‑examination that the result for Irene Lin was “miserably unreliable”, a proposition he rejected.[342]
152. Bearing in mind the match statistics associated with Irene Lin, Dr Perlin was reminded of his earlier evidence of having seen “false positives”. He was then asked:[343]
“Q. You mentioned before that by the time you got to 1,000 you weren't really seeing false positives based on your studies; is that correct? A. Yes, they would occasionally appear maybe at the level of 1 in a million, but yes. Q. What about at the level of 10,000 in terms of a statistic? A. I don't think we've seen false positives at that level.”
153. The jury was then show a slide referable to Lily Lin which stated that a “match between the garage floor and Yun Li Lin [ie, Lily Lin] is: 289 times more probable than a coincidental match to an unrelated Asian person” (assuming three contributors).[344] Dr Perlin told the jury that for four assumed contributors her match statistic was 104.45 on one run and 104.04 on the other and for five assumed contributors it was 103.58.[345]In relation to the level of her contribution, Dr Perlin explained that “she [was] following the same genotype as Irene in each case”, that is, a minor contributor.[346]
154. When it came to expressing an opinion on the presence of Lily Lin’s DNA in the DNA taken from Stain 91, Dr Perlin was circumspect. He observed that “based on validation studies, it’s more probable that [her result] could be a false positive than with the higher statistics”.[347] In cross‑examination he agreed that he was “not able to say whether or not Lily is present”.[348]
Brenda Lin, Shadowing and the Table of Match Statistics
155. Dr Perlin also referred to a “notion ... in family studies” which he called “shadowing”,[349] whereby an “individual is not present [in the DNA sample], but they are giving an apparently strong match statistic because of the close genetic relationship between the true contributor and the relative”.[350] He was asked as follows:[351]
“Q. So what you're saying in this case is, that looking at the results, that that comes to mind in respect of Lily because she is present when Irene is present in relation to a particular inferred genotype and that she is present at a lower match statistic; is that correct? A. Correct, she is present. It's more of a female fraction. She is only present when her sister is present, and the degree to which there's roughly equal amounts of their contribution, and they are both there as opposed to their having similar genotypes, and one sister shadowing the other with one truly in there, and the other one not, is hard to determine with certainty. Q. In terms of the example that you've given, what we're working off in terms of, you're saying Irene can be truly in there, but Lily could not be, but is only returning those match statistics because of Irene; is that what you're saying in simple terms? A. Yes, that is a possibility. That's a good possibility as well as that they are both in there, that's another possibility. HER HONOUR Q. But there's no possibility is there, on the work done by the computer that neither of them [ie, Irene and Lily Lin] are there? A. No, not based on what I'm seeing. Q. So it's one or other or both of the related women or the related females? A. Correct, and more likely if it were only one, it would be Irene as opposed to Lily.” (emphasis added)
156. The emphasised answer in this passage was criticised by Professor Gill on the basis that it somehow displayed the prosecutor’s fallacy. This criticism is also addressed below.[352] At this point it suffices to observe that Dr Perlin’s answer about there being no possibility that neither woman’s DNA was in Stain 91 was given in the context of his having discussed documented error rates, provided the jury with match statistics for each of them, and then discussed the inter-relationship between the match statistics of Irene and Lily Lin and the possibility of shadowing between them.
157. Dr Perlin expanded on the concept of “shadowing” stating:[353]
“If you have an individual and you obtain a match statistic, say it’s at the level of a trillion or 12 zeros, what you’ll often observe is if you take the same evidence genotype and compare it against a close relative, like a father, a child, or a sibling, you may get a match statistic that is a million, maybe six zeros less, just because they are relatives and not because they are actually present. So that would be to the same contributor that you’ve separated out, if you see a positive match statistic that’s indicating [inclusion],[354] but it’s a lot less than the main match. Studies show that that’s often shadowing, that individual is not present, but they are giving an apparently strong match statistic because of the close genetic relationship between the true contributor and the relative.”
158. As noted, one matter raised by the appellant as potentially defeating the incriminating effect of a conclusion that a stain containing DNA from multiple victims was found in his garage was the possibility that it also contained Brenda Lin’s DNA. Accordingly, Dr Perlin addressed TrueAllele’s assessment of the probability that Stain 91 contained DNA from her. The jury was shown a slide referable to an assumption of three contributors stating that a “match between the garage floor and Brenda Lin is: 69.3 more times more probable than a coincidental match to an unrelated Asian person”.[355]
159. Dr Perlin was then asked:[356]
“Q. Can you tell us what happened in relation to the statistics for Brenda Lin when you did the runs on the basis of the number of assumed contributors, being four, and then in relation to the assumed contributor number being five? A. Yes, what happened is that the ‑ assuming four contributors, the number increased to four zeros or tens of thousands, staying within that female separated genotype, the 11% fraction of the two other women. When we ‑ let me check ‑ when the computer, the other run with four unknown contributors was at the level of thousands, it was at 3.16, again within ‑ actually I'm looking for the largest number here. Give me one second. So with four contributors, the numbers were similar, they were all in the thousands and all in fraction ‑ separations that involved other relatives, brothers or ‑ so the numbers went in the thousands with some shadowing relatives. And then with five contributors, the number went up to the tens of thousands with a 4.91, and there again was shadowing for brothers and her father. So—" ... Q. In terms of, from what you've just said, is it correct then that in respect of Brenda Lin, you're saying that in relation to her statistics, that she was either shadowing her mother, Lily, and her aunt Irene, or she was shadowing her father, Min, and her two brothers, Terry and Henry? A. Yes, now that I'm marking this correctly on this table, 3.16. Yes, but with the largest match statistic, she is always shadowing the male fraction with Terry, Henry and Min. Q. And in terms of that largest match statistic, she is only shadowing Terry, Henry and Min; is the match statistics of Terry, Henry and Min much greater than her, suggesting that she is the one that's shadowing in terms of only present, because of there ‑ the presence of her father and two brothers? A. Yes. So for example, considering the first question you asked about three contributors, her match statistic is 1.84 or 69, whereas in that same genotype that's being compared, that one of three genotypes, the match statistic to Henry is six zeros, 6.22 in the millions. To Terry, it's hundreds of millions, 8.34. And to Min, it's again hundreds of millions, 8.77. So she is in a fraction that is comprised of her brothers and her father with a much lower match statistic that's a male fraction. Q. So what does that suggest in relation to that particular statistic? A. That suggests based on the statistics alone and the shadowing and the values in other relatives, that she is not actually there, that her match statistic is appreciably lower than those of her siblings and her father, and she is genetically being carried along for the ride, if you will, because of allele sharing and producing some match statistic, but one much less one than those for relatives for the same separated component. Q. That's what the statistics would suggest to you based on your expert opinion. Can you positively exclude the presence of Brenda ... As a contributor to item 550 overall? A. No, it's just very unlikely.” (emphasis added)
160. Dr Perlin was asked whether shadowing explained the high match statistics for Min, Henry and Terry Lin. Dr Perlin said it did not, especially having regard to the results yielded for them by TrueAllele when five contributors were assumed. He said that by that stage “they are more nicely separated out”, that is, with five assumed contributors there is a better correspondence between each inferred contributor and each of Min, Henry and Terry Lin, rather than more than one of them having results that reflect a single inferred contributor.[357]
161. The topic of shadowing in relation to Brenda Lin was taken up with Dr Perlin in cross‑examination as follows:[358]
“Q. The person Brenda, are you able to say whether or not Brenda is in the mixture, being item 550? A. I would say she is not in there. Q. Is it not the case that she may be there or might not be there, but you cannot say? A. You asked me if I thought she was in there. She's travelling along with male DNA and based on the match statistics alone, it's possible she is there, but I think it is more likely than not that she is not there, considering who she is shadowing. If you have one woman shadowing her sister, that's one situation. If you have a woman shadowing male relatives, that's a different situation. That's additional information, which is gender, in addition to the statistic being low. ... Q. There is a deficiency of evidence, Dr Perlin, to say that Brenda is not present in that mixture, that's what I raise with you? A. Based on the match statistic alone, the number is fairly inconclusive. You asked about the error rate. The error rate would be about 1 in 70. If there were no genetic factors, but there are genetic factors which is, she shares genetic contents with her relatives, Terry, Henry and Min, and the genotype that's producing that match statistic of 69 is the one that's following male DNA. So there is additional information beyond the match statistic alone, having to do with shadowing and tracking male relatives.”
162. In re‑examination, Dr Perlin was again asked about shadowing and was asked about the results of a five contributor run as follows:[359]
“... if there is a blood relative present, it is often the case that you will see a match statistic that’s positive to someone who isn’t there, but a much higher [match] statistic to an individual who actually is there ... ... So in that row of this contributor for the computer run, where there were five assumed unknown contributors, just rounding numbers without decimal places, I could give more, but the statistic to Min had 11 zeros, to Terry, had 11 zeros, and to Henry, had 9 zeros, whereas the statistic to Brenda had 4 zeros. And that difference between 9 and 4 is at the level of 5 zeros difference and between 11 and 4 is at 7 zeros difference. That’s what we see in the studies. Moreover, the presence of three males with very strong match statistics, as well as looking at a genotype – this is a male component – so the fact that there is higher match statistics at the level of about 6 zeros higher is consistent with the validation studies. The fact that it is a male component that’s giving strong statistics to three males, indicates from those validation studies that seeing a statistic to Brenda of 4.91, that’s in the order of 6 zeroes, or a million-fold less, suggest she is not there.”
163. It is necessary to set out these passages because it relates to the issue raised at the trial on behalf of the appellant, namely the possible presence of Brenda Lin’s DNA in Stain 91, and was the subject of much debate on appeal. So far as the jury is concerned, the effect of Dr Perlin’s evidence was that for three contributors her match statistic was very low, for four contributors the match statistic was in the thousands or “tens of thousands” and was higher for five contributors, however Brenda Lin was shadowing her father and brothers. The bold answer (“yes”) in [159] was ambiguous as to whether Dr Perlin was also stating that, for some match statistics, Brenda Lin was shadowing her mother and aunt. The balance of his evidence refers to her shadowing her father, and especially her brothers, with whom she shares a significant number of allele pairs. In cross‑examination, Dr Perlin was pressed for his opinion as to whether Brenda Lin’s DNA was present in the mixture and gave it.
164. At this point it is necessary to identify the source of the match statistic figures referred to by Dr Perlin as it assists in resolving the issues on appeal. During the voir dire before Johnson J in 2014, the table which is Annexure 1 to this judgment was attached to one of Dr Perlin’s reports and was the subject of cross‑examination. It sets out the match statistics obtained for each of the relevant TrueAllele runs that were undertaken with variations in the number of assumed contributors and the number of simulations undertaken in each run. It seems that Dr Perlin was referring to that document in his evidence set out at [159], (ie, “this table” and “I’m looking from the largest number here”).
165. Of relevance to considering the above evidence are the following rows taken from Annexure 1:
166. The second column in this table indicates which run of TrueAllele generated the results in the balance of the entries in the row. Thus, the first row concerns a run of TrueAllele based on an assumption of three contributors and using 100,000 simulations. The third column indicates which of the inferred contributors for each run the results in the balance of the row relate to. Hence, row 19 concerns the second inferred contributor. The fourth column indicates the number of assumed contributors for that run. Hence, that column for row 25 indicates that for that run five contributors were assumed. The weight column indicates the calculated percentage contribution of that inferred contributor. Thus, row 13 indicates that inferred contributor 4 was calculated to contribute 11% of the total amount of DNA derived from Stain 91. The next column is the standard deviation for the calculated weight contribution. The column headed KL is a reference to “Kalback-Leibler”. This refers to a statistic that, according to evidence given by Dr Perlin at the voir dire hearing, “measures to what extent one probability distribution diverges from another probability distribution”.[360] The balance of the rows indicate the logarithm of the match statistics calculated for each of the named persons for that inferred contributor for that run of TrueAllele. The blank entries indicate that the logarithm of the match statistic for that person was negative; ie, the match statistic was less than one and thus exclusionary.
167. The match statistic for three contributors for Brenda Lin that was provided to the jury (ie, 69.3) was derived from row 4 of this table (ie, 101.84). The figure of 11% that Dr Perlin quoted as the calculated contribution to a four person sample in his evidence set out at [159] above was a reference to row 13. Dr Perlin’s reference to Brenda Lin’s match statistic of 3.16 for another four contributor run was taken from row 19. His reference to a match statistic of 4.91 for a five contributor run was taken from row 25.361 [361]the above extract, Dr Perlin repeatedly referred to Brenda Lin shadowing her father and brothers which he illustrated by, inter alia, the entries for rows 1, 4 and 25 where Brenda Lin’s match statistic was always greatly exceeded by one or more of the match statistics for her father and brothers.
168. In the extract from his re‑examination set out at [162], Dr Perlin referred to row 25 and the match statistics for Min, Henry and Terry Lin.36[362]r Perlin told the jury that the “fact that ... a male component, that’s giving strong statistics to three males, indicates from those validation studies that ... a [match] statistic to Brenda of 4.91, that’s in the order of 6 zeroes, or a million fold less, suggests that she is not there”.36[363]ith row 19, Brenda Lin’s match statistic was exceeded by each of Lily and Irene Lin’s match statistic.36[364]n cross‑examination at the 2016 trial, Dr Perlin was taken to row 18 which contains match statistics for Brenda Lin of 3.69 and 7.05, 3.39 and 5.01 for Min, Henry and Terry Lin respectively. Dr Perlin described that as Brenda Lin shadowing her three male relatives.36[365]
The Appellant’s DNA
169. Dr Perlin told the jury that there was “no statistical support for a match” between the appellant and the contributors to the DNA in Stain 91.[366]
Number of Contributors
170. Dr Perlin told the jury that “[w]hen we initially look[ed] at the data and examin[ed] it under the assumption that people were not related, it looks like there’s three people present”.[367] However, once it was recognised that they are relatives who share alleles “we went on and tested, assuming four contributors, [and] assuming five contributors”.[368] His opinion on how many contributors were present was “at least four people” and “possibly five”.[369] Dr Perlin stated that if there is allele sharing between contributors then “[y]ou may end up with lower match statistics than you might otherwise”.[370]
Validation of TrueAllele
171. Dr Perlin stated that he did not make the source code of TrueAllele publicly available, describing it as a “trade secret”.[371] He said it would take one person approximately eight and a half years to read it all.[372] In any event, Dr Perlin said that the mathematical function of TrueAllele had been expanded on in various published papers and that systems such as TrueAllele can be validated “by testing the executable software ... on real data ... by testing technology on actual inputs, not by reading source code”.[373]
172. Dr Perlin described two types of validation tests, one by the manufacturer and one by an individual laboratory that adopts it.[374] Consistent with Mr Walton’s evidence, he said that in NSW, TrueAllele had been validated for mixtures involving four contributors and was in the process of being validated for five contributors.[375] He identified six laboratories in the USA that had validated TrueAllele for five contributors or more.[376]
173. Dr Perlin also told the jury that a Master’s student at Duquense University had conducted a study of TrueAllele using two, three, four and five person mixtures from the same family and it had produced inclusionary match statistics (although he could not recall whether “all five were pulled out”).[377] In cross‑examination, he said he facilitated that student’s use of TrueAllele and was a member of her “Thesis committee”, but she was supervised by another faculty member.[378]
174. In cross‑examination it was suggested that TrueAllele was not appropriately validated:[379]
“Q. The first issue, Dr Perlin, with respect, is that TrueAllele is not valid in respect of the interpretation of complex mixtures exceeding three contributors? A. Based on a published peer reviewed studies, as well as other studies done, both collaboratively, and independently by crime labs and other groups, TrueAllele has been tested on ten unknown contributors by one laboratory, six unknown contributors by another crime laboratory, and a wholly independent study, five unknown contributors by several laboratories; one was published in a peer review journal. A number of studies with four. So the basis for your statement doesn't seem to be scientifically founded. Q. Can I suggest this to you, Dr Perlin, that your method, as exemplified in TrueAllele, has not been validated anywhere in the world in respect of complex mixed samples involving related individuals in respect of contributors more than three in number? A. That would be incorrect. There was a study done independently by the New York State Police, Jay Choma, that looked at four contributor mixtures, and looked at shadowing of relatives. There was an independent study by the Virginia Department of Forensic Science that looked at four contributors and relatives. There is also the study from the Duquesne Master's student, ..., which looked at up to five mixtures of five related individuals. That's just off the top of my head. People do look at this and they study it.”
175. In re‑examination, Dr Perlin was taken to a list of 33 validation studies of TrueAllele that were published in the period from 2004 to October 2016.[380] Dr Perlin nominated various papers in the list which he stated support his answers about the validation of TrueAllele. The list was marked for identification.[381] A revised version of the document which delineated between the various topics and nominated studies that were independent of TrueAllele was later tendered without objection as Exhibit GD.[382] It will be necessary to return to some of the papers listed in Exhibit GD later, but at this point it suffices to note that Exhibit GD lists the papers as addressing various topics, namely, “[d]egradation where 3 or more contributors assumed” (six papers), “[a]llele drop out” (five papers), “[a]llele overlap with three or more contributors” (10 papers), “[m]ixed contributors with more than three contributors and up to five contributors where kinship specifically addressed” (4 papers) and “[p]apers referred to in the Report to the President nominated by Dr Perlin as inconsistent with the findings of the report” (2 papers).[383]
Other Aspects of the Cross‑Examination of Dr Perlin
176. It was suggested to Dr Perlin in cross‑examination that TrueAllele’s match statistics were affected by degradation of the DNA. Dr Perlin agreed that a comparison of the results of the Profiler Plus test undertaken in 2010 and the PowerPlex 21 test in 2013 suggested that there had been some degradation in the data, but nevertheless the data was “of sufficient quality to render a good interpretation”.[384] He said that TrueAllele takes account of degradation as it models the variation in the peak heights at each locus.[385]Dr Perlin told the jury that he did not see evidence of any “significant differential degradation”, that is, degradation of a particular contributor.[386] He was also cross‑examined on the “President’s Report” referred to above.[387]
Further Evidence from Mr Walton
177. The Crown recalled Mr Walton to address three issues that arose during Dr Perlin’s evidence. First, he discussed the quantity of DNA that was available to be tested.[388] Mr Walton stated that “[w]e obtained a strong profile”.[389] Second, he described the process by which DNA strands degrade over time, being that “as cells die” the DNA strands “break down into shorter fragments”,[390] and “differential degradation” is where the DNA contributions to a mixture degrade at different rates.[391] With this sample, Mr Walton said that it was affected by “inhibition”, being the effect of the chemicals used in the extraction process on DNA testing,[392] and “probably degradation ... as well”,[393] but they could not determine whether there was differential degradation.[394] Third, Mr Walton confirmed that when DAL validated TrueAllele for DNA mixtures involving three and four contributors, the testing included mixtures where the minor component was less than 20% and in some cases less than 10%.[395] In cross‑examination, he said that the validation process included testing of contributors who were related and comparisons to persons who were not contributors but were related to contributors.[396]
Commentaires