1
00:00:00,000 --> 00:00:11,933
JEFFREY KOPP: So, we have one more talk before lunch time and it’s by Dr. Steve Rich, who’s professor at the University of Virginia and
2
00:00:11,933 --> 00:00:20,799
Director of Public Health Genomics at the university. He studied the genetics of diabetes Type I for a good number of years, is a member
3
00:00:20,800 --> 00:00:28,333
of the National Academy of Sciences, and his website reads in part, “His work is centered on understand the genetic epidemiology of complex
4
00:00:28,333 --> 00:00:37,033
human disease, including genes contributing to atherosclerosis, stroke, and intermediate risk factors,” and we’ve asked him to talk about study
5
00:00:37,033 --> 00:00:43,299
designs for a whole exome of discovery. Thank you.
6
00:00:43,300 --> 00:00:50,300
STEPHEN RICH: Thanks to Jeffrey and Robbie for inviting me. One correction: I’m not a member of the National Academy of Sciences, but I’d be
7
00:00:50,300 --> 00:01:06,900
happy to join if they asked. Also, I’d like to thank John Sedor and Barry Freedman for coming up with this terrible title, because when it comes time
8
00:01:06,900 --> 00:01:17,233
to think about overview of whole genome study designs, we really don’t know anything about this, per se, so up front I thought I’d just go ahead
9
00:01:17,233 --> 00:01:34,499
and clarify the situation. So, what you’ll hear from me is pretty much my view as of February 11th—I think today is February 11th—and February 12th I
10
00:01:34,500 --> 00:01:48,133
may have a different view, so just to let you know what you’re in for. This is a typical slide that I’ve used many times—somewhat fancier
11
00:01:48,133 --> 00:01:58,633
than it used to be, given modern technology, which is always something we learn; technology drives a lot of the applications—but it does point
12
00:01:58,633 --> 00:02:09,866
out a couple to things I think that we need to recognize, in that, what we think of in kidney disease is likely to be a combination of multiple
13
00:02:09,866 --> 00:02:21,099
genes and multiple environments, and just as this intersection of Trait 1 and Trait 2 might be some aspect of kidney morphology, some aspect of
14
00:02:21,100 --> 00:02:30,666
hypertension, some aspect of other types of risk factors, to come into chronic kidney disease, which is what you see at the bottom, there are
15
00:02:30,666 --> 00:02:41,099
multiple genes that contribute, as actually Kári said, Gene 1 contribute a great deal to some aspect of hypertension. Gene 1 could also
16
00:02:41,100 --> 00:02:50,466
contribute to some aspect of kidney morphology. So you have this combination of pleiotropic effects of genes, some effects being larger on
17
00:02:50,466 --> 00:02:59,466
one trait than another trait. Obviously, traits can be correlated. So, the question is whether this trait in terms of hypertension is correlated
18
00:02:59,466 --> 00:03:11,299
genetically with the variation in kidney morphology. You have another gene that may affect kidney morphology, as well as this gene.
19
00:03:11,300 --> 00:03:20,766
You also have the effects of environments. So, it’s really a combination of things that we talk about in terms of genetic effects, environmental
20
00:03:20,766 --> 00:03:32,832
effects, and at the end of the day, can you tease any of this apart? One of the things that I’ve always recognized in most of not only disease
21
00:03:32,833 --> 00:03:47,233
traits but phenotypic traits in general: If you have someone holding a gun to your head and they say, “Okay, tell me, what portion of the variation in
22
00:03:47,233 --> 00:03:57,999
disease risk or proportion of variance in the phenotype is due to genes? If you get it wrong, I’ll blow your head off,” answer “50%.” You now,
23
00:03:58,000 --> 00:04:07,566
pretty much 50% is…you can find an estimate of 50% someplace and if you look at the standard errors on these estimates, they’ll be covering
24
00:04:07,566 --> 00:04:20,566
50%. So, one of the things that was quite clear is that there’s familial aggregation for about everything. Back in the ‘80s I worked with
25
00:04:20,566 --> 00:04:31,799
Minnesota—worked with Mike Mower and Mike Steffies-to look at kidney morphology related to individuals who had Type I diabetes undergoing a
26
00:04:31,800 --> 00:04:41,566
kidney transplant. They had individuals who had kidney biopsies, and so you looked at the kidney morphology not only of the individual who was
27
00:04:41,566 --> 00:04:50,866
going to have the transplant but also of the donors and other people in the family who were being looked at in terms of kidney function, and
28
00:04:50,866 --> 00:05:01,732
my assumption going in was that something important like a kidney would not have variation in things like mesangial matrix or various aspects of
29
00:05:01,733 --> 00:05:13,133
the structure, but we actually saw quite a bit of variation from family to family and that, in fact, members of the same family tended to cluster, so
30
00:05:13,133 --> 00:05:22,799
you could just sort of make a line going up and you saw clustering by sibships. So, it suggested that even something as important, as Kári said,
31
00:05:22,800 --> 00:05:32,766
you know, that would create piss, would essentially allow you to see that there is variation and that’s heritable. So, I think it’s important to
32
00:05:32,766 --> 00:05:39,599
recognize that almost everything we see may have some heritable component and almost everything we look at will have an environmental
33
00:05:39,600 --> 00:05:49,833
component. So overall, what are the steps for disease gene identification? The first question: is there a genetic component? As a geneticist I
34
00:05:49,833 --> 00:06:01,466
would say, of course. You know, why bother asking the question? If I’m a non-believer I’d have to see some evidence but in general you see
35
00:06:01,466 --> 00:06:18,832
some aggregation in families. Back in, again, the early 90s we had a neurology fellow from Iceland who was looking at cerebral aneurysms and in
36
00:06:18,833 --> 00:06:27,966
the Icelandic pedigrees of all the Icelandic individuals they had information on going back generations, there was something like 13 people
37
00:06:27,966 --> 00:06:36,732
with cerebral aneurysms. Those 13 people fell into a couple of families. So even though in a large population you may not have very many
38
00:06:36,733 --> 00:06:45,166
people, if you can show that these people then fall into a family or a lineage, then that suggests there might be something genetic involved. One of
39
00:06:45,166 --> 00:06:55,832
the real issues is trying to decide on the appropriate study design, and this is not an easy question. I think we have the ability to generate
40
00:06:55,833 --> 00:07:07,499
genetic data at an unprecedented rate—we are swimming in genetic data at this stage—and so the questions really that are critical are: what’s
41
00:07:07,500 --> 00:07:14,766
the appropriate design, and even more critical, what’s the appropriate phenotype? And this is where I think we have to recognize as
42
00:07:14,766 --> 00:07:26,732
geneticists and as epidemiologists, you know, people in general, that the information that our clinical researchers and our colleagues in clinical
43
00:07:26,733 --> 00:07:34,133
research have is invaluable, and that without that information we really are going to be spinning our wheels in terms of trying to identify the
44
00:07:34,133 --> 00:07:43,499
appropriate genetic contribution to the appropriate phenotypes. The molecular analysis, you may have heard the term “$1,000 genome-$100,000
45
00:07:43,500 --> 00:07:56,166
analysis.” This is pretty much probably an understatement because you also have to throw in the $250,000 informatics bill. So, eventually we
46
00:07:56,166 --> 00:08:05,032
get to functional analysis and I put that in parentheses for a reason: I don’t know what to do because functional analysis not only requires
47
00:08:05,033 --> 00:08:14,799
the appropriate tissue, the appropriate targets, the appropriate measurements. People in this audience know more than I about how one can
48
00:08:14,800 --> 00:08:24,133
utilize the information from sequencing, RNA sequencing, or other types of sequencing procedures to really build up an integration of the
49
00:08:24,133 --> 00:08:33,333
genomic and the transcriptomic data, and this whole area of systems genomics and systems biology is something that’s very critical in the
50
00:08:33,333 --> 00:08:40,333
future, and we need to really recognize these aren’t separate entities, that they need to be integrated to really get an understanding of
51
00:08:40,333 --> 00:08:50,399
where we’re going, to understand chronic kidney disease. So, this is a classic—well, I think it’s a classic—a classic slide to think about patient
52
00:08:50,400 --> 00:09:03,533
ascertainment and of all the ways we can ascertain individuals. I break it down into three types: sib-pairs, families or pedigrees, and
53
00:09:03,533 --> 00:09:12,033
cases and controls, and this is just based upon disease but it can also be based on extreme phenotypes or anything else you might think of. I’ll
54
00:09:12,033 --> 00:09:22,533
just go through these one at a time, first with the case-control study. Case-control studies are obviously one of the easy things to do and I put
55
00:09:22,533 --> 00:09:34,133
“easy” in quotes for a reason. You have essentially the idea that you’re getting unrelated cases and controls or some, actually, I like to think
56
00:09:34,133 --> 00:09:44,533
of it as a case comparison group. Case-control is one type of comparison group. I’m not even certain of what a control is in many cases, in any
57
00:09:44,533 --> 00:09:57,699
instances…maybe not case is a good term. One can think of extremes of individuals being case groups and non-case groups. Obviously, the idea
58
00:09:57,700 --> 00:10:12,100
of a case-control comparison has been used in this NHLBI Exome Sequencing Project, but if you think about case comparison groups you’re
59
00:10:12,100 --> 00:10:19,866
assuming that they’re unrelated. One thing I’ve learned from years of ascertaining cases and controls, there is this issue of this underlying
60
00:10:19,866 --> 00:10:30,166
genetic predisposition to volunteer for studies, and if you generate enough groups of individuals who volunteer, you’ll find out that Nancy
61
00:10:30,166 --> 00:10:39,332
volunteered for this study and then it was such a wonderful experience, then she talked about it to all her friends and colleagues and her family, that
62
00:10:39,333 --> 00:10:48,966
her sister, June, also volunteered for this study. And even though you think that you’ve got unrelateds, you may have sisters, you may have
63
00:10:48,966 --> 00:10:59,699
cousins, and fortunately we have genetic methods to tease a lot of that apart. So, when we started the Exome Sequencing Project—this is the
64
00:10:59,700 --> 00:11:11,466
project that you’ve heard about once or twice—this is, again, an NHLBI study. I’m leading the Heart GO effort which includes six cohorts:
65
00:11:11,466 --> 00:11:21,699
ARIC-Atherosclerosis Risk in Communities; Cardiovascular Health Study-CHS; CARDIA-Coronary Artery Risk Determinants in the Youth;
66
00:11:21,700 --> 00:11:32,333
Framingham; Jackson Heart Study; and the Multi-Ethnic Study of Atherosclerosis, or MESA, has two sequencing centers, a lung compartment, a
67
00:11:32,333 --> 00:11:42,799
lung segment, and a women’s health initiative. But this was really getting started about three years ago or four years ago at the time when exome
68
00:11:42,800 --> 00:11:55,400
sequencing was fairly new and we had no clue of what’s the appropriate study design, and so a lot of the discussion was centered upon: do you
69
00:11:55,400 --> 00:12:06,133
take a couple of people in extremes, do you take cohorts, do you take families, do you take large numbers of people? So, the standard design was
70
00:12:06,133 --> 00:12:19,766
sampling in each tail and I think Linda Kao mentioned this quite well, but essentially the idea is you have individuals with very low LDL, those
71
00:12:19,766 --> 00:12:33,466
with very high LDL, or early-onset MI or controls, and you perform exome sequencing in each of these sets, thinking that you’re essentially treating
72
00:12:33,466 --> 00:12:44,532
this LDL distribution at the very extremes—and we’re saying, like, less than 1% or it’s like less than 1% and greater than 99th percentile—to
73
00:12:44,533 --> 00:12:53,166
Mendelize things, to try to get those extremes so that you would pick up these genes of major effect. Of course when you talk about LDL, you
74
00:12:53,166 --> 00:13:00,032
also have to think: what are the factors that contribute to these high and low LDLs, and then including that, what are the drugs that they’re
75
00:13:00,033 --> 00:13:10,266
taking, and trying to adjust for those things. For the early-onset MI, you have the situation where you’re trying to identify those with myocardial
76
00:13:10,266 --> 00:13:21,732
infarction at a very young age, so this would be, say, earlier than 55 in women and 45 in men. What is the comparison group here: people in the
77
00:13:21,733 --> 00:13:33,033
general population, or are these people who have all the risk factors but don’t have an MI? Well, leading risk factors are something like age. I like to
78
00:13:33,033 --> 00:13:41,433
think of this control group that’s like Winston Churchill. So, it’s someone who’s old, fat, smokes, runs around, does everything bad but
79
00:13:41,433 --> 00:13:54,499
doesn’t have an MI until very late. Again, it may be something personality-driven, so if we have risk, if there’s genes that are involved with being old,
80
00:13:54,500 --> 00:14:06,300
fat, and running around and smoking cigars, then we may find genes for that, too. CHARGE-S was mentioned, also by Linda. It’s a different design.
81
00:14:06,300 --> 00:14:16,366
It’s a design that’s like a case cohort design. And so the CHARGE-S sampling design, again, is there’s cohort random sample and there’s
82
00:14:16,366 --> 00:14:25,699
specific case groups, so there was a CRP group, a blood pressure group, and a kidney group—and this was expanded by Linda for her own
83
00:14:25,700 --> 00:14:33,666
studies—and then other types of groups. And so, this is very much like the Wellcome Trust Case Control Consortium that was reported from the
84
00:14:33,666 --> 00:14:47,666
initial GWAS several years ago where they used a group of sort of controls by all the different cases. So, this is yet another idea of unrelateds.
85
00:14:47,666 --> 00:14:57,966
Now as Linda pointed out, there was basically three cohorts involved that contribute to this, and so that’s another sort of complexity that needs to
86
00:14:57,966 --> 00:15:07,166
be added on to how one thinks about the analysis of any type of genetic data. When you have ARIC, which is Atherosclerosis Risk in
87
00:15:07,166 --> 00:15:16,866
Communities, where the ascertainment was over several communities of an age of entry into the cohort of about 45 years of age to 65 years of
88
00:15:16,866 --> 00:15:27,732
age. The Cardiovascular Health Study, or CHS, recruited at four different communities, not necessarily the same communities as ARIC, they
89
00:15:27,733 --> 00:15:37,599
were entry at 65 years of age or older, and then you have Framingham, which as we all know and love, Framingham, Massachusetts is not
90
00:15:37,600 --> 00:15:46,666
necessarily a representative of anything but Framingham, Massachusetts, and they’re almost all whites. So it’s a very mixed group here, so
91
00:15:46,666 --> 00:15:56,532
even though it looks pretty homogenous, there’s a lot of heterogeneity that underlies this that needs to be taken into account. So, what are my
92
00:15:56,533 --> 00:16:05,799
thoughts on this “easy” design? It is relatively easy to collect samples and data—that’s the one great thing—because as someone who’s
93
00:16:05,800 --> 00:16:16,733
collected family data in the past, your family as a unit of inference is only as good as having the actual family members, and if you’ve ever tried to
94
00:16:16,733 --> 00:16:26,999
get blood out of people, you can go to a door, knock on a door, say, “We’re here to get the blood from the family,” and you here the back door slam
95
00:16:27,000 --> 00:16:36,000
as the father runs across the hills because the fathers hate giving blood. As a father, I hate giving blood; I don’t know about other fathers in
96
00:16:36,000 --> 00:16:45,666
the audience. Moms will give blood, kids will give blood because their moms tell them to do it, the dads are out of town, you know. So in general,
97
00:16:45,666 --> 00:16:57,866
sampling individuals is a lot easier than sampling families or relateds. It’s also, you know, you get samples and you use that data for sequencing.
98
00:16:57,866 --> 00:17:07,799
It’s relatively easy to find “affected” or the “extreme” phenotype. It’s actually quite…you can think about early age of onset, severity of the
99
00:17:07,800 --> 00:17:15,333
phenotype, some high recurrence risk. I think it’s relatively difficult to define the comparison group, and so it takes a lot more thinking about how
100
00:17:15,333 --> 00:17:24,066
you’re going to compare those severe phenotypes. And then, analysis of the sequence data, I think, becomes much more difficult
101
00:17:24,066 --> 00:17:36,466
because now you have to define what’s a sequencing artifact from individuals who are unrelated and whether that’s an artifact or a truly
102
00:17:36,466 --> 00:17:47,932
rare variant, and the issues of quality control and formal analysis are somewhat easier for cases, controls, and families, but still you don’t have this
103
00:17:47,933 --> 00:17:57,066
ability to track a variant from parent to child, which in my sense of the world, is that this is a good way to know if something’s a spurious
104
00:17:57,066 --> 00:18:06,999
variant or not is if you see it in a child and in a parent. So, this is where some of the thoughts are on the “easy” design. Let’s go to sib-pairs:
105
00:18:07,000 --> 00:18:21,533
sib-pairs may not have either parent or can be both affected, so it’s concordant sib-pairs or discordant sib-pair. So, we move from unrelated
106
00:18:21,533 --> 00:18:29,466
to sibships. This is a bit more difficult. The concept is you focus on families in which you have a common disease or a phenotype that
107
00:18:29,466 --> 00:18:39,866
might segregate in an apparent Mendelian way. It’s familial, it’s a complementary alternative to the case comparison. The goals of case-control
108
00:18:39,866 --> 00:18:50,866
design or family sib-pair design are the same: you’re basically trying to find genes and also trying to utilize sequence data to look for fairly
109
00:18:50,866 --> 00:19:02,866
highly penetrant variants, whether they’re coding or noncoding. Hopefully, you can identify familial subsets. So, in our original concordant sib-pair
110
00:19:02,866 --> 00:19:13,199
you obviously have clustering in a family. You get this issue of whether the clustering is truly genetic or whether there’s a cause and then a
111
00:19:13,200 --> 00:19:22,733
spurious genetic effect or some spurious environmental effect that makes it look like an affected sib-pair is truly genetic or is it one cause
112
00:19:22,733 --> 00:19:34,933
and then a separate cause for the affect sib? They’re harder to collect and, again, sometimes sporadic events do mimic genetic events. Here’s
113
00:19:34,933 --> 00:19:43,733
the example from the group at University of Washington that was published in Nature Genetics that has shown that exome sequencing
114
00:19:43,733 --> 00:19:57,533
can identify single gene Mendelian disorders and I think it’s instructive for a couple of reasons. The way this was performed…and this was
115
00:19:57,533 --> 00:20:10,599
published in 2010, so again, let’s think. This is 2012 so it’s only been a couple of years or really just a year or more from the time that we started
116
00:20:10,600 --> 00:20:22,800
with exome sequencing and discovery of a Mendelian disorder. The idea is that you’ve have a couple of kindreds and really only a couple of
117
00:20:22,800 --> 00:20:32,666
individuals that would allow you to then do filtering of variants, and then after filtering of variants you basically come up with the right
118
00:20:32,666 --> 00:20:42,666
result. This is wonderful; this will work for everything, right? Well, not quite, but you know, this was a landmark paper because it showed
119
00:20:42,666 --> 00:20:56,932
exome sequencing in a single gene disorder will allow you to find a gene. The filtering helped tremendously. So you basically sequenced, you
120
00:20:56,933 --> 00:21:06,933
found rare variants, you then looked to see if those variants were in dbSNP and you could then apply filters to reduce the known variants to
121
00:21:06,933 --> 00:21:15,399
come down to a small number that were novel. You could then look across a couple of pedigrees for families and then you could find, okay, well
122
00:21:15,400 --> 00:21:23,466
this is great; you found it. Well, one of the problems is, as we continue to sequence we continue to find a lot of variants, we put those in
123
00:21:23,466 --> 00:21:33,866
dbSNP, so now just because you see something that’s in dbSNP, should we throw it out? Well, as you continue to sequence, you get more and
124
00:21:33,866 --> 00:21:41,332
more rare variants, you’ll get more and more things deposited in the databases, so I don’t think you can use that strategy anymore. Again, the
125
00:21:41,333 --> 00:21:49,999
focus was on Mendelian, not complex, phenotypes. I think kidney disease is complex. Certainly, my friend, Barry Freedman, tells me it’s
126
00:21:50,000 --> 00:21:59,600
complex and anything Barry says is correct, so that must mean that we’re going to be looking for more than single genes. Although the disease is
127
00:21:59,600 --> 00:22:13,533
common in a family, the sib-pairs could provide a minor but some reduction in the genome that could be considered, because if you think about
128
00:22:13,533 --> 00:22:19,466
it, we can start narrowing down. Well, if you expect there’s a quarter of the genome that’s shared between affected siblings, then you’re
129
00:22:19,466 --> 00:22:31,266
throwing out a certain portion of the genome, but we’re not throwing out a lot. And so if you look at this report from the New England Journal from
130
00:22:31,266 --> 00:22:43,866
Kiran Musunuru and Sekar Kathiresan, identifying ANGPTL3 mutations and familial combined hypolipidemia, again, focused on siblings but then
131
00:22:43,866 --> 00:22:56,999
doing a lot more of the sequencing and bioinformatics work, it narrowed down specific gene candidates. So, one of the things is looking
132
00:22:57,000 --> 00:23:10,200
at pedigrees; large families. And so if you move from sibships to pedigrees, these are more difficult to ascertain, collect, and get clinical data.
133
00:23:10,200 --> 00:23:20,133
One of the issues that we’ve seen is that if you have large pedigrees, that means you must have individuals who’ve been followed over
134
00:23:20,133 --> 00:23:32,666
generations, effectively, and in fact some of the definitions of the phenotypes or diseases may change over time; that’s clearly true in the
135
00:23:32,666 --> 00:23:41,199
psychiatric disorders. There’s also clear evidence that what may be…I mean, I’m sort of a baseball fanatic, so I like to look at old-time
136
00:23:41,200 --> 00:23:50,466
baseball players’ histories. There’s always the person who, you know, fell off the bridge crossing Niagara Falls because he was drunk.
137
00:23:50,466 --> 00:23:59,866
That’s probably not genetic, except there’s some tendency to drink, alcoholism and so forth, but if you look at how baseball players died in the
138
00:23:59,866 --> 00:24:12,399
1920s, 1930s, and 1900s you’ll see diagnoses that look really strange. And so, when you think about assembling pedigrees and you start looking
139
00:24:12,400 --> 00:24:23,333
at diagnoses for cause of death or what individuals had in the early 1900s, they’re totally different terms than we see now. Are they the
140
00:24:23,333 --> 00:24:32,066
same disease? Are the same diagnostic tools used? How were they treated? If you think about gene-environment interactions—which someone
141
00:24:32,066 --> 00:24:43,332
brought up before—we looked at this in terms of lung cancer, and when…I think it was World War I when they started giving cigarettes to men who
142
00:24:43,333 --> 00:24:56,799
were fighting overseas. And so, before World War I, people who had lung cancer had parents who didn’t smoke. Guess what? They looked
143
00:24:56,800 --> 00:25:08,933
recessive. After that, parents smoked, typically men; they looked autosomal dominant. So it depends on the environment, in a way, of how
144
00:25:08,933 --> 00:25:18,099
you actually interpret the exposures. So, the definition of phenotypes may change. Treatments clearly change. So, when we look at lipid levels
145
00:25:18,100 --> 00:25:28,966
and look at lipid levels in cohorts now versus Cardiovascular Health Study, remember, these are people who were aged 65 at entry. They
146
00:25:28,966 --> 00:25:37,732
weren’t using statins at that point. So, there’s a lot of information that needs to be tabulated and this is where the clinical information and the
147
00:25:37,733 --> 00:25:46,199
interpretation of the clinical information becomes critical. You can think about the idea of pedigree-specific genes. We heard, I guess, a comment or
148
00:25:46,200 --> 00:25:58,133
a question earlier for Kári Stefánsson. Well, whatever you found in Iceland, is it only good for Iceland? Well maybe, but I think that’s okay
149
00:25:58,133 --> 00:26:07,733
because, in fact, that might even uncover a novel pathway that we hadn’t thought about before, but oftentimes it’s usually not a problem. In fact, the
150
00:26:07,733 --> 00:26:19,733
TCF7L2 gene that was identified as the best—it’s actually the strongest genetic risk factor for Type II diabetes, odds ratio of about 1.4—was found in
151
00:26:19,733 --> 00:26:30,399
Iceland. So, there’s a lot of those genes that have been identified in isolated populations that are not just restricted to the isolated population; it’s really
152
00:26:30,400 --> 00:26:44,933
part of a normal pathway of causal relationships. If you go to the most used reference these days, which is Wikipedia, and you type in
153
00:26:44,933 --> 00:26:56,466
“genetics”—this is a conference on genetics and chronic kidney disease or kidney disease—you’ll get this: “from the Ancient Greek”…which they
154
00:26:56,466 --> 00:27:03,599
actually had some Greek symbols here but I couldn’t figure out how to do this on my laptop at the time…but it’s “genetikos,” and from that,
155
00:27:03,600 --> 00:27:14,000
genesis or origin. So, genetics is a “discipline of biology that’s a science of genes, heredity, and variation in living organisms.” So the idea behind
156
00:27:14,000 --> 00:27:24,566
genetics is looking at transmission of a gene or a variant from parent to child along the phenotype from parent to child. So in a sense, what we’re
157
00:27:24,566 --> 00:27:33,099
getting at here in using sib-pairs in families is really classic genetics, as opposed to looking at cases and controls which are unrelated, and
158
00:27:33,100 --> 00:27:42,500
therefore there’s no real sharing in those individuals. So, the utility of the pedigree is to increase ability to really sort out some
159
00:27:42,500 --> 00:27:50,566
sequencing artifacts from really true variants. You can track the genotype with a phenotype from parent to child and to other relationships.
160
00:27:50,566 --> 00:28:00,532
The individual pedigrees, if they’re sufficiently large, can provide a lot of power to identify causal genes, and then there’s also the potential
161
00:28:00,533 --> 00:28:08,599
for longitudinal data--we heard someone mention longitudinal data earlier—risk factor data and related phenotypes. Unfortunately, there’s a
162
00:28:08,600 --> 00:28:19,666
limited number of these types of large pedigrees. Iceland is a large pedigree, essentially. We have Framingham with longitudinal studies with a lot of
163
00:28:19,666 --> 00:28:31,999
families there, a lot of the religious isolates like the Amish and Hutterites and things like that, but it’s actually hard to find those specifically enriched
164
00:28:32,000 --> 00:28:44,533
for kidney phenotypes, so again, narrowing within phenotypes. One of the things that we did accomplish with the exome sequencing project
165
00:28:44,533 --> 00:28:53,166
out of the 7,000 or so exomes that have been sequenced, we allocated a certain number for pedigree studies, and this is one of those
166
00:28:53,166 --> 00:29:04,232
pedigree studies. So, you essentially could think about sending this person and this person’s DNA to have exome sequencing and then, because it’s
167
00:29:04,233 --> 00:29:17,233
now 1/16th sharing instead of 1/4th, you can restrict the search space for variants that contribute to the specific disease. This is a Joslin
168
00:29:17,233 --> 00:29:32,033
pedigree that we published with Damian Fogarty back in 2000, looking at diabetic nephropathy, and there are some pedigrees like this out there. I
169
00:29:32,033 --> 00:29:39,966
know that a number of you in the audience have collected pedigrees like that. I know Barry Freedman has been collecting families. I know
170
00:29:39,966 --> 00:29:48,766
that a various number of you have families, sort of the family in the freezer study, in a way. You have the samples and you have the data, and
171
00:29:48,766 --> 00:29:59,699
now it’s time to go in and collect the DNA and have it sequenced. I think these are valuable resources that, quite frankly, the NIDDK should
172
00:29:59,700 --> 00:30:15,633
use. Here’s a situation—this is from Ray Hershberger—of familial dilated cardiomyopathy, another of the pedigrees that had been individual-
173
00:30:15,633 --> 00:30:25,266
sequenced by the Exome Sequencing Project, and you can’t really see it but there’s some proband here but I think we allocated three or
174
00:30:25,266 --> 00:30:35,132
four. I think there’s 10 individuals who had exome sequencing. One thing I would certainly recommend—and it’s something that we do
175
00:30:35,133 --> 00:30:49,733
ourselves—is that I recommend forming 2.5 million or 5 million SNP GWAS on every single individual in a family. Not only can you then utilize that for
176
00:30:49,733 --> 00:30:59,666
imputation, but it actually helps the quality control when you also sequence individuals. From this same type of kindred you wonder, though:
177
00:30:59,666 --> 00:31:12,266
there’s no medical records available, is there some effect of family history? So, you can have problems in terms of the available data. So again,
178
00:31:12,266 --> 00:31:22,399
it points to the fact that you really need to have extensive record-keeping, being able to clearly define the phenotypes, and with this type of small
179
00:31:22,400 --> 00:31:33,800
investment of maybe 10 exomes, Ray was able to identify a gene called Bag3 which contributed to familial dilated cardiomyopathy, and it turns out
180
00:31:33,800 --> 00:31:44,533
there’s a mouse model that has been looked at for peripheral artery disease, and in that linkage region for peripheral artery disease is Bag3. So,
181
00:31:44,533 --> 00:31:55,666
there’s a large amount of supporting information to suggest that exome sequencing identified a gene that’s actually been validated in other model
182
00:31:55,666 --> 00:32:10,532
systems. I know that I’m keeping you from lunch, which I’m not certain what’s worse: keeping you from lunch or having you listen to more of my talk.
183
00:32:10,533 --> 00:32:21,466
But the idea here is that I think we really need to recognize that there is familial aggregation of kidney disease in kidney phenotypes. It’s also
184
00:32:21,466 --> 00:32:32,399
incredibly important to work with the clinical investigators who really know the phenotype to help you design the study and know what to go
185
00:32:32,400 --> 00:32:41,733
for. I think it’s time for whole genome sequencing. I mean, it’s here. It’s Pandora’s Box; it’s open. People are performing whole genome sequencing
186
00:32:41,733 --> 00:32:55,466
in Type II diabetes, Type I diabetes; it’s time to do it in kidney disease. There’s no reason why you shouldn’t; the price is coming down. It’s not free,
187
00:32:55,466 --> 00:33:08,399
but nothing is free, but if the price is getting…you know, you see $1,000, probably $2,500 a genome is probably the more appropriate cost at
188
00:33:08,400 --> 00:33:19,333
this point. But it’s not going to be long. So, I think you need to invest in this ability. Whole genome sequencing, as Kári mentioned, is a commodity. I
189
00:33:19,333 --> 00:33:32,199
would not want to have it done in my lab. I’m old enough to remember the first time you run a genome-wide linkage scan with RFLPs it was
190
00:33:32,200 --> 00:33:41,533
really exciting; the 10th time was a bore. The first time you run a genome-wide scan with SNPs it was really exciting; the 10th time it was a bore.
191
00:33:41,533 --> 00:33:50,999
You know, the first time you run these high-density SNP chips it’s really exciting. After a while, everything becomes a bore, and if it’s this
192
00:33:51,000 --> 00:34:01,733
much investment in technology, my approach is to outsource it and then hire as many great bioinformaticians and computational biologists and
193
00:34:01,733 --> 00:34:12,599
statistical geneticists as you can. I think also, this whole area is dependent upon rapid development, not just of sequencing technology
194
00:34:12,600 --> 00:34:21,266
but of computational analytic methods. You have to be able to handle the data and analyze the data. There’s also this transition. We have a
195
00:34:21,266 --> 00:34:33,732
history of thinking about variants and SNP-based analyses and really we should be focused on genes and pathways. There’s a critical demand
196
00:34:33,733 --> 00:34:47,133
on phenotyping and I think we have pedigrees, families, sibships and case-comparisons. I think the pedigree approach is probably a high-risk,
197
00:34:47,133 --> 00:34:56,733
high-impact approach because you may find something more quickly if you have access to these pedigrees. I think in the long run you’ll need
198
00:34:56,733 --> 00:35:08,199
all the tools in your tool box to really address this. So, with that, this is sort of my view on designs of whole genome approaches and I’d be happy to
199
00:35:08,200 --> 00:35:38,600
take any questions. Thank you. It’s 12:23, not that anyone’s counting. Wait, someone’s here first. I was keeping track; he beat you by three
200
00:35:38,600 --> 00:35:46,533
seconds. Why don’t you go first? GEORGIA DUNSTON: Very good. Thank you. That
201
00:35:46,533 --> 00:35:56,033
was…I totally agree with your assessment of where we are and I also want to say, since this is the last session, I have found this whole
202
00:35:56,033 --> 00:36:11,699
session totally fascinating. With regard to study design I just want to make the comment that I think that using your first slide of the multiple genes
203
00:36:11,700 --> 00:36:26,900
going to individual traits, perhaps if you do a slide tomorrow there might be some consideration of a single gene going to the multiple traits to reflect
204
00:36:26,900 --> 00:36:36,800
what we’re finding about, especially with the common variants in terms of regulation and understanding the pleiotropic effect of single
205
00:36:36,800 --> 00:36:46,733
genes is probably going to be important here. And in that regard, though, I too…I’ve gotten here several times so let me say I’m Georgia Dunston
206
00:36:46,733 --> 00:36:49,466
and I’m at Howard University. STEPHEN RICH: And I’ve known you for,
207
00:36:49,466 --> 00:36:51,732
like, 40 years.
208
00:36:51,733 --> 00:36:59,533
GEORGIA DUNSTON: And that’s why I say that, to be at this—and I’m a human geneticist—to be at this point in the science where the genome is
209
00:36:59,533 --> 00:37:11,533
really, in my mind, trying to give us new paradigms for looking at biology, if you will, that one of the challenges right now is that the way to
210
00:37:11,533 --> 00:37:20,633
the genomes in terms of study designs, populations, and perhaps the way to the genome was one route that we can track steps to the
211
00:37:20,633 --> 00:37:31,799
genome, but it’s almost now like we’ve landed on the genome and it’s like being in the forest and there are so many pathways back out to the
212
00:37:31,800 --> 00:37:44,733
phenotype or the clinical entity, that we’re trying to decide what’s the most efficient way. And also in defense of…all biology is genetic, ultimately. I
213
00:37:44,733 --> 00:37:54,466
mean, that’s not a discipline point-of-view, my point being that we’re making the distinction between heritable from one generation to the
214
00:37:54,466 --> 00:38:04,432
next in terms of the population or the individual, where now we’re clearly dealing with cellular inheritance or changes that occur in the genome
215
00:38:04,433 --> 00:38:14,766
that are transmitted from cell to cell. So, maybe a study design that we haven’t begun to really appreciate is the individual being both the case
216
00:38:14,766 --> 00:38:26,566
and the control, where we now have the technology to look at the normal cell and the altered cell and really now trying to see the
217
00:38:26,566 --> 00:38:39,099
genes that are expressed from that same genome that distinguishes, quote, “the case,” at the cellular level from the control, if you will, and
218
00:38:39,100 --> 00:38:47,900
then use that, also, as an approach to now back-track to the gene you might want to look at in your population.
219
00:38:47,900 --> 00:38:56,066
STEPHEN RICH: Yeah. Thanks, Georgia. Actually, Georgia makes a couple of interesting points. Obviously, now you can do single cell
220
00:38:56,066 --> 00:39:06,232
sequencing and I think that’s where a lot of this is going, to see if you can delineate not only from looking at the RNA-seek, for example, in a cell
221
00:39:06,233 --> 00:39:16,666
from CD4+ T cells and another cell from CD8+ T cells, another cell from a liver, another cell from adipose. That landscape is going to be just
222
00:39:16,666 --> 00:39:27,566
phenomenal to transit. The one thing you were saying about the different pathways is also a Tolkien theme, so if you remember from Lord of
223
00:39:27,566 --> 00:39:37,466
the Rings when Frodo and Sam are trying to get to Mordor and almost every path they took, they wound up at the same place. They couldn’t figure
224
00:39:37,466 --> 00:39:47,732
out the best pathway of getting from one spot into the gates of Mordor. Of course, Golum is the one that helped them and I don’t think we want to
225
00:39:47,733 --> 00:39:57,299
use Golum as the way to get us from which path to take that’s most effective to understand the genetic basis of kidney disease. But it is a case
226
00:39:57,300 --> 00:40:04,400
that there are many pathways and we don’t know exactly which is the right one, but that shouldn’t stop us; we should just move forward.
227
00:40:04,400 --> 00:40:08,200
Thank you. Yes? GEORGE NELSON: A very nice talk. Thank you.
228
00:40:08,200 --> 00:40:20,000
George Nelson, SAIC, Frederick, but more relevantly part of Jeffrey Kopp’s collaboration here. You showed the picture of comparing
229
00:40:20,000 --> 00:40:29,333
extremes of a distribution without perhaps specifically saying that’s what you wanted to base the study on and it was shown earlier, also,
230
00:40:29,333 --> 00:40:39,033
and six or eight years ago when this was suggested in an AIDS study, I vehemently objected—I’m not so sure now—but I feel
231
00:40:39,033 --> 00:40:48,533
honor-bound to make the same point just because I made it so vehemently before, that the factors that influence being on one extreme, you know,
232
00:40:48,533 --> 00:40:55,699
it’s not necessarily the absence of those factors that puts you on the other extreme. What I argued for then was that it was essential to have some
233
00:40:55,700 --> 00:41:01,466
comparison group in the middle to compare both with.
234
00:41:01,466 --> 00:41:09,166
STEPHEN RICH: Absolutely. You know, one of the things that we did in the Exome Sequencing Project was, number one, we didn’t know if this
235
00:41:09,166 --> 00:41:17,299
would work; number two, we didn’t know what the sample size requirements were to make it work if it’s feasible; and number three, we knew
236
00:41:17,300 --> 00:41:26,100
that the cohorts that were contributing samples actually had huge numbers of people in the middle, and that’s why we also had this deeply
237
00:41:26,100 --> 00:41:36,300
phenotyped reference group. I personally have seen studies get really complicated when you have multiple control groups. You know, if you
238
00:41:36,300 --> 00:41:44,066
have a case group, one control group you get a significant result, the other control group you don’t get the same significant result, and the one
239
00:41:44,066 --> 00:41:53,499
thing that I can say is that from our exome sequencing work in our now close to 7,000 exomes, we came together with several other
240
00:41:53,500 --> 00:42:05,566
groups who were performing exome sequencing and developed this exome chip so that the main objective was to use that exome chip in all the
241
00:42:05,566 --> 00:42:14,466
cohort members that we could to try to get around this question now of this extreme, because my view, similar to yours now, is that,
242
00:42:14,466 --> 00:42:25,066
you know, you can have very high LDL caused by one set of genes, you could have very low LDL caused by another set of genes, and if you
243
00:42:25,066 --> 00:42:36,466
compare them, you know, what are you going to get? And so, I think in some ways, this is another reason I’m not really happy with an extreme
244
00:42:36,466 --> 00:42:47,299
design when we start thinking about any type of sequencing or any type of study. I like families. It seems to me that if you have families, you can
245
00:42:47,300 --> 00:42:56,566
track transmission of genotype with phenotype. That gives you much more power to know that these rare variants that you’re seeing are real
246
00:42:56,566 --> 00:43:01,466
and that they may actually have an association. Did I answer your question?
247
00:43:01,466 --> 00:43:05,532
GEORGE NELSON: Yes, very good. STEPHEN RICH: Okay.
248
00:43:10,266 --> 00:43:18,432
MALE: So, I have a question about your last slide and I’m taking this contrary view on purpose, so don’t take it personally, but you said that the time
249
00:43:18,433 --> 00:43:26,399
is right to do it now but the other argument could be made to wait, right? Because the costs are dropping, the people like you are figuring out how
250
00:43:26,400 --> 00:43:32,833
to do this properly. So, is there a counter-argument?
251
00:43:32,833 --> 00:43:33,866
STEPHEN RICH: I guess the… MALE: Now, I know it’s not going to be widely
252
00:43:33,866 --> 00:43:44,032
popular at this meeting but I think we do have to raise…because the question, you know, we’ve clearly put money into this, so I’m not taking that
253
00:43:44,033 --> 00:43:48,866
view, but I think it’s an important one to discuss. STEPHEN RICH: No, actually, I think it’s important
254
00:43:48,866 --> 00:44:00,266
to discuss this. It’s one of those situations where the technology is evolving. One can say that the difference between $2,500 per genome and
255
00:44:00,266 --> 00:44:10,766
$1,000 a genome is, you know, once you get to 1,000 people, that difference becomes like real money and when an institute doesn’t have any
256
00:44:10,766 --> 00:44:20,166
money to begin with, real money means something. At the same time, I think that we know enough about how to evaluate the sequence
257
00:44:20,166 --> 00:44:33,699
data, control the sequence data, issues of quality control that we had no idea about 2-3 years ago, we now understand much more deeply. My
258
00:44:33,700 --> 00:44:46,700
personal view is that it’s being done by us, by some others…Europe…there are a number of places performing whole genome sequencing.
259
00:44:46,700 --> 00:44:54,066
Rotterdam is doing a lot of whole genome sequencing. There’s a lot of places doing whole genome sequencing. I’m more concerned about
260
00:44:54,066 --> 00:45:03,399
just the ability to find the genetic cause for some of these diseases that we can then move into a sort of translational mode so we can get more
261
00:45:03,400 --> 00:45:13,233
into the clinic and get actually treating people. So, that’s sort of my elevator speech of why one should do it now as opposed to waiting,
262
00:45:13,233 --> 00:45:17,066
assuming that you have the money to do it. MALE: The other more scientific question is: the
263
00:45:17,066 --> 00:45:26,899
discussion seems to be dichotomizing rare diseases…I’m sorry, rare traits versus common traits. Are there analytical strategies to combine
264
00:45:26,900 --> 00:45:32,533
this to see if they’re landing on the same pathways to make some sense of those Manhattan plots where things are not quite at
265
00:45:32,533 --> 00:45:37,933
statistical threshold, but look, quote “interesting” if you find a rare variant nearby or on a pathway?
266
00:45:37,933 --> 00:45:49,033
STEPHEN RICH: I’m going to leave Suzanne Leal to tell us all about the analytic things and why she knows how to deal with that, or if she doesn’t
267
00:45:49,033 --> 00:45:57,466
know how to deal with it, she’ll think of a reason between now and her talk, to deal with it. I guess the key thing for me is that when you perform a
268
00:45:57,466 --> 00:46:07,866
GWAS and you identify SNPs from a GWAS, those SNPs are chosen to provide information across a genome at certain distances, and
269
00:46:07,866 --> 00:46:15,332
whether that SNP actually is the critical causal variant or whether that’s just in linkage to its equilibrium or something else, that’s really
270
00:46:15,333 --> 00:46:29,633
important. My belief is that these are in LD with something that’s important. And so, when you do find mapping…so, on my other hat it’s Type I
271
00:46:29,633 --> 00:46:42,466
diabetes genetics consortium, also an NIDDK project, it’s…I personally believe it’s the best NIDDK project, but nonetheless, we identified
272
00:46:42,466 --> 00:46:54,566
over 40 loci for Type I diabetes risk, but remember, it’s a locus, it’s not a gene, so within the average number of genes within each locus
273
00:46:54,566 --> 00:47:09,899
was 7, and so you had to then fine-map across the genome—we did that with an immunochip—and now we’ve identified in 40 loci the strongest
274
00:47:09,900 --> 00:47:19,700
candidate gene in each one of those, and a couple of loci—there’s 2 or 3 still—and in a certain case, the variants lined up between
275
00:47:19,700 --> 00:47:29,666
genes suggesting something really important in regulation. So, I think part of this missing heritability argument and some of the things that
276
00:47:29,666 --> 00:47:35,099
you’re talking about relate not to the fact that GWAS tells us something but it doesn’t account for very much, is that we haven’t found the right
277
00:47:35,100 --> 00:47:45,700
gene. And so now, we’re finding right genes, we’re getting much more of the total heritability explained by this, and so now it’s time to really
278
00:47:45,700 --> 00:47:57,566
look beyond that and I do think that the movement from genome to transcriptome and the integration of that information will tell us that there are some
279
00:47:57,566 --> 00:48:05,832
rare variants that point to one part of a pathway. There’s some common variants, perhaps, in the same gene that point to a different pathway or
280
00:48:05,833 --> 00:48:23,266
even a different part of the same pathway. This is all new stuff that it’s sort of an exciting time to be a geneticist, basically.
281
00:48:23,266 --> 00:48:28,166
STEPHANIE MALIA FULLERTON: Hi. Thank you very much for your presentation. Malia Fullerton, Center for Genomics and Health Care Quality at
282
00:48:28,166 --> 00:48:36,166
the University of Washington. Just first, to comment and then a question. I mean, I think it’s fascinating for those of us in the bioethics
283
00:48:36,166 --> 00:48:42,799
community who have been sort of watching the evolution of human genetics over the last 15 years or the move from pedigrees to population-
284
00:48:42,800 --> 00:48:50,400
based cohort investigations now back to pedigrees, and I think it’s a function of the changing technologies and your presentation
285
00:48:50,400 --> 00:49:01,500
was really great from that point-of-view. On the question of using previously ascertained pedigrees, I and others are really concerned that
286
00:49:01,500 --> 00:49:09,200
we do not leave out the interests of underrepresented minorities as we do work, and I’m just wondering about, in terms of these
287
00:49:09,200 --> 00:49:16,400
complex pedigrees, particularly in the context of kidney disease, are there many from ethnic minority communities, or are we going to be
288
00:49:16,400 --> 00:49:22,900
starting already at a disadvantage for those communities if we begin with previously ascertained pedigrees?
289
00:49:22,900 --> 00:49:32,500
STEPHEN RICH: Thank you. That’s a great question. Just one comment…more on the ELSI side. When we were putting together the Heart
290
00:49:32,500 --> 00:49:41,900
GO component of the Exome Sequencing Project, we were concerned about identifying rare variants and coding regions of genes that had
291
00:49:41,900 --> 00:49:50,800
known clinical significance that were actionable, for example, and what do you do with that information? We’re researchers. The sequencing
292
00:49:50,800 --> 00:50:03,100
sites are not CLIA-certified labs, but as cohort leaders we felt that we had to do something because we have a relationship with our
293
00:50:03,100 --> 00:50:15,700
cohorts. And so, we built into our particular study a medical genetics counseling component so that whenever there was a potential variant that’s
294
00:50:15,700 --> 00:50:26,100
clinically actionable and identified, we would sort of have this system where the genetic counselor would be notified, the counselor would notify the
295
00:50:26,100 --> 00:50:40,700
source of the cohort where the person was a participant, the cohort would then call in that person who had identified themselves as saying,
296
00:50:40,700 --> 00:50:48,100
“If anything comes up, I want to find out about it.” The counselor would then have a discussion with that person, there would be a new blood
297
00:50:48,100 --> 00:50:55,400
sample drawn, that sample would be taken to a CLIA-certified lab for sequencing to confirm the variant, the counselor would be brought back in
298
00:50:55,400 --> 00:51:05,200
and counsel that person, and then that person could make a decision of what to do. It was wonderful in concept; in practice, almost every
299
00:51:05,200 --> 00:51:15,800
university IRB failed to actually address that. So, I think there’s a major ELSI component to this that we haven’t discussed.
300
00:51:15,800 --> 00:51:20,400
STEPHANIE MALIA FULLERTON: That doesn’t surprise me and we are going to talk about return of results tomorrow morning.
301
00:51:20,400 --> 00:51:23,800
STEPHEN RICH: And so I’m just setting you up for this.
302
00:51:23,800 --> 00:51:26,300
STEPHANIE MALIA FULLERTON: Thank you. Yeah.
303
00:51:26,300 --> 00:51:32,900
STEPHEN RICH: But I think I would have to suggest…you know, Barry Freedman, perhaps, would know as well as anyone, if there are
304
00:51:32,900 --> 00:51:36,800
minority pedigrees for kidney disease. BARRY FREEDMAN: So actually, since this is an
305
00:51:36,800 --> 00:51:44,200
NIDDK meeting and Robbie, this gets back to your question about: Is it time? I mean, one of the big costs is collecting these families, collecting these
306
00:51:44,200 --> 00:51:50,800
individuals, phenotyping them, and many studies in the nephrology community are outcome studies. But the FIND was actually a genetic
307
00:51:50,800 --> 00:51:59,000
study built with a family design with a severe phenotype—not population-based—and there’s 10,000 people in FIND of four ethnic groups.
308
00:51:59,000 --> 00:52:06,000
There’s European…actually, European Americans are the minority there—they’re the fewest—African-Americans, Hispanic Americans, and
309
00:52:06,000 --> 00:52:13,800
Mexican Americans. And, you know, Makias, by the way, who is sitting next to me, has all these tissue banks and gene expression work in
310
00:52:13,800 --> 00:52:21,300
multiple ethnic groups as well. So, I think that there is, because whites are the lowest risk group for kidney disease, there are minority
311
00:52:21,300 --> 00:52:25,700
populations represented for diabetic nephropathy. STEPHEN RICH: I think the other question, Barry,
312
00:52:25,700 --> 00:52:30,300
is: Are there pedigrees, specifically, available as opposed to just individuals?
313
00:52:30,300 --> 00:52:36,800
BARRY FREEDMAN: So yeah, the FIND Study is a family investigation of nephropathy in diabetes and it’s an affected sib-pair study or a discordant
314
00:52:36,800 --> 00:52:44,500
sib-pair study, but multiple additional siblings, including some without diabetes or nephropathy. Parents, where available, were recruited. The
315
00:52:44,500 --> 00:52:51,400
Mexican Americanand the Pima families are very large; they’re extended. There are cousins and there are thing like that. You know, there are
316
00:52:51,400 --> 00:53:00,500
studies like Irish families that have albuminuria, GFR, those kind of things, but the FIND was to find severe diabetic end-stage renal disease.
317
00:53:00,500 --> 00:53:02,100
STEPHEN RICH: Thank you, Barry. STEPHANIE MALIA FULLERTON: Thank you.
318
00:53:02,100 --> 00:53:05,600
STEPHEN RICH: Yes? GREG LENNON: Hello, I’m Greg Lennon from
319
00:53:05,600 --> 00:53:15,100
SNPedia and I thank you for mentioning both phase studies and SNP chips at the same time, and I’m curious if you’ve had a chance yet to look
320
00:53:15,100 --> 00:53:21,500
at the importance and have reviewed or developed a view on the importance of using phase data versus unphased.
321
00:53:21,500 --> 00:53:32,700
STEPHEN RICH: I mean, we’ve typically…I mean, to us, phase data is critically important. I’ll just say “yes.” The other point is…
322
00:53:32,700 --> 00:53:40,600
GREG LENNON: I’m curious about the evidence and what’s backing up that? I think we all believe that, but what have you seen that backs that up?
323
00:53:40,600 --> 00:53:49,100
STEPHEN RICH: You know, again, in our families that we’ve been working on, you can basically sort through sections of the genome that really
324
00:53:49,100 --> 00:54:00,800
can be transmitted from one individual to another, and having that phased information gives you much more leverage of deciding what is really
325
00:54:00,800 --> 00:54:09,000
contributing to the phenotype that you’re working with? The other thing I just wanted to mention is that there is a study called the Collaborative
326
00:54:09,000 --> 00:54:23,500
Cross. I don’t know if any of you know about this. They’ve taken eight inbred mouse lines, and because it’s actually hard to identify genes in all
327
00:54:23,500 --> 00:54:31,200
these inbred mouse lines because there are stretches of homozygosity, they are now breaking these up on purpose. So, they’re
328
00:54:31,200 --> 00:54:41,900
basically making mouse strains look like people, and so again, but using the phase information that you get from the breeding, you can actually track
329
00:54:41,900 --> 00:54:52,000
things much better and identify specific genes related to phenotypes. So in a mouse world, they’re breaking apart all this homozygosity and
330
00:54:52,000 --> 00:55:02,200
getting into new crosses. The advantage of the people working in mice have is they can actually make, say, this mouse and this mouse get mated.
331
00:55:02,200 --> 00:55:13,333
It’s hard to do that with people, so that’s why the imprint comes in, and so, having the ability to do the phasing in that sense is very helpful.
332
00:55:19,000 --> 00:55:20,700
Thank you.
Date Last Updated: 9/18/2012