Whole Genome Approaches to Complex Kidney Disease
February 11-12, 2012 Conference Videos

Overview of Whole Genome Study Designs to Find Genes Contributing to Complex Diseases
Steve Rich, University of Virginia

Video Transcript


1
00:00:00,000 --> 00:00:11,933
JEFFREY KOPP: So, we have one more talk before lunch time and it’s by Dr. Steve Rich, who’s professor at the University of Virginia and

2
00:00:11,933 --> 00:00:20,799
Director of Public Health Genomics at the university. He studied the genetics of diabetes Type I for a good number of years, is a member

3
00:00:20,800 --> 00:00:28,333
of the National Academy of Sciences, and his website reads in part, “His work is centered on understand the genetic epidemiology of complex

4
00:00:28,333 --> 00:00:37,033
human disease, including genes contributing to atherosclerosis, stroke, and intermediate risk factors,” and we’ve asked him to talk about study

5
00:00:37,033 --> 00:00:43,299
designs for a whole exome of discovery. Thank you.

6
00:00:43,300 --> 00:00:50,300
STEPHEN RICH: Thanks to Jeffrey and Robbie for inviting me. One correction: I’m not a member of the National Academy of Sciences, but I’d be

7
00:00:50,300 --> 00:01:06,900
happy to join if they asked. Also, I’d like to thank John Sedor and Barry Freedman for coming up with this terrible title, because when it comes time

8
00:01:06,900 --> 00:01:17,233
to think about overview of whole genome study designs, we really don’t know anything about this, per se, so up front I thought I’d just go ahead

9
00:01:17,233 --> 00:01:34,499
and clarify the situation. So, what you’ll hear from me is pretty much my view as of February 11th—I think today is February 11th—and February 12th I

10
00:01:34,500 --> 00:01:48,133
may have a different view, so just to let you know what you’re in for. This is a typical slide that I’ve used many times—somewhat fancier

11
00:01:48,133 --> 00:01:58,633
than it used to be, given modern technology, which is always something we learn; technology drives a lot of the applications—but it does point

12
00:01:58,633 --> 00:02:09,866
out a couple to things I think that we need to recognize, in that, what we think of in kidney disease is likely to be a combination of multiple

13
00:02:09,866 --> 00:02:21,099
genes and multiple environments, and just as this intersection of Trait 1 and Trait 2 might be some aspect of kidney morphology, some aspect of

14
00:02:21,100 --> 00:02:30,666
hypertension, some aspect of other types of risk factors, to come into chronic kidney disease, which is what you see at the bottom, there are

15
00:02:30,666 --> 00:02:41,099
multiple genes that contribute, as actually Kári said, Gene 1 contribute a great deal to some aspect of hypertension. Gene 1 could also

16
00:02:41,100 --> 00:02:50,466
contribute to some aspect of kidney morphology. So you have this combination of pleiotropic effects of genes, some effects being larger on

17
00:02:50,466 --> 00:02:59,466
one trait than another trait. Obviously, traits can be correlated. So, the question is whether this trait in terms of hypertension is correlated

18
00:02:59,466 --> 00:03:11,299
genetically with the variation in kidney morphology. You have another gene that may affect kidney morphology, as well as this gene.

19
00:03:11,300 --> 00:03:20,766
You also have the effects of environments. So, it’s really a combination of things that we talk about in terms of genetic effects, environmental

20
00:03:20,766 --> 00:03:32,832
effects, and at the end of the day, can you tease any of this apart? One of the things that I’ve always recognized in most of not only disease

21
00:03:32,833 --> 00:03:47,233
traits but phenotypic traits in general: If you have someone holding a gun to your head and they say, “Okay, tell me, what portion of the variation in

22
00:03:47,233 --> 00:03:57,999
disease risk or proportion of variance in the phenotype is due to genes? If you get it wrong, I’ll blow your head off,” answer “50%.” You now,

23
00:03:58,000 --> 00:04:07,566
pretty much 50% is…you can find an estimate of 50% someplace and if you look at the standard errors on these estimates, they’ll be covering

24
00:04:07,566 --> 00:04:20,566
50%. So, one of the things that was quite clear is that there’s familial aggregation for about everything. Back in the ‘80s I worked with

25
00:04:20,566 --> 00:04:31,799
Minnesota—worked with Mike Mower and Mike Steffies-to look at kidney morphology related to individuals who had Type I diabetes undergoing a

26
00:04:31,800 --> 00:04:41,566
kidney transplant. They had individuals who had kidney biopsies, and so you looked at the kidney morphology not only of the individual who was

27
00:04:41,566 --> 00:04:50,866
going to have the transplant but also of the donors and other people in the family who were being looked at in terms of kidney function, and

28
00:04:50,866 --> 00:05:01,732
my assumption going in was that something important like a kidney would not have variation in things like mesangial matrix or various aspects of

29
00:05:01,733 --> 00:05:13,133
the structure, but we actually saw quite a bit of variation from family to family and that, in fact, members of the same family tended to cluster, so

30
00:05:13,133 --> 00:05:22,799
you could just sort of make a line going up and you saw clustering by sibships. So, it suggested that even something as important, as Kári said,

31
00:05:22,800 --> 00:05:32,766
you know, that would create piss, would essentially allow you to see that there is variation and that’s heritable. So, I think it’s important to

32
00:05:32,766 --> 00:05:39,599
recognize that almost everything we see may have some heritable component and almost everything we look at will have an environmental

33
00:05:39,600 --> 00:05:49,833
component. So overall, what are the steps for disease gene identification? The first question: is there a genetic component? As a geneticist I

34
00:05:49,833 --> 00:06:01,466
would say, of course. You know, why bother asking the question? If I’m a non-believer I’d have to see some evidence but in general you see

35
00:06:01,466 --> 00:06:18,832
some aggregation in families. Back in, again, the early 90s we had a neurology fellow from Iceland who was looking at cerebral aneurysms and in

36
00:06:18,833 --> 00:06:27,966
the Icelandic pedigrees of all the Icelandic individuals they had information on going back generations, there was something like 13 people

37
00:06:27,966 --> 00:06:36,732
with cerebral aneurysms. Those 13 people fell into a couple of families. So even though in a large population you may not have very many

38
00:06:36,733 --> 00:06:45,166
people, if you can show that these people then fall into a family or a lineage, then that suggests there might be something genetic involved. One of

39
00:06:45,166 --> 00:06:55,832
the real issues is trying to decide on the appropriate study design, and this is not an easy question. I think we have the ability to generate

40
00:06:55,833 --> 00:07:07,499
genetic data at an unprecedented rate—we are swimming in genetic data at this stage—and so the questions really that are critical are: what’s

41
00:07:07,500 --> 00:07:14,766
the appropriate design, and even more critical, what’s the appropriate phenotype? And this is where I think we have to recognize as

42
00:07:14,766 --> 00:07:26,732
geneticists and as epidemiologists, you know, people in general, that the information that our clinical researchers and our colleagues in clinical

43
00:07:26,733 --> 00:07:34,133
research have is invaluable, and that without that information we really are going to be spinning our wheels in terms of trying to identify the

44
00:07:34,133 --> 00:07:43,499
appropriate genetic contribution to the appropriate phenotypes. The molecular analysis, you may have heard the term “$1,000 genome-$100,000

45
00:07:43,500 --> 00:07:56,166
analysis.” This is pretty much probably an understatement because you also have to throw in the $250,000 informatics bill. So, eventually we

46
00:07:56,166 --> 00:08:05,032
get to functional analysis and I put that in parentheses for a reason: I don’t know what to do because functional analysis not only requires

47
00:08:05,033 --> 00:08:14,799
the appropriate tissue, the appropriate targets, the appropriate measurements. People in this audience know more than I about how one can

48
00:08:14,800 --> 00:08:24,133
utilize the information from sequencing, RNA sequencing, or other types of sequencing procedures to really build up an integration of the

49
00:08:24,133 --> 00:08:33,333
genomic and the transcriptomic data, and this whole area of systems genomics and systems biology is something that’s very critical in the

50
00:08:33,333 --> 00:08:40,333
future, and we need to really recognize these aren’t separate entities, that they need to be integrated to really get an understanding of

51
00:08:40,333 --> 00:08:50,399
where we’re going, to understand chronic kidney disease. So, this is a classic—well, I think it’s a classic—a classic slide to think about patient

52
00:08:50,400 --> 00:09:03,533
ascertainment and of all the ways we can ascertain individuals. I break it down into three types: sib-pairs, families or pedigrees, and

53
00:09:03,533 --> 00:09:12,033
cases and controls, and this is just based upon disease but it can also be based on extreme phenotypes or anything else you might think of. I’ll

54
00:09:12,033 --> 00:09:22,533
just go through these one at a time, first with the case-control study. Case-control studies are obviously one of the easy things to do and I put

55
00:09:22,533 --> 00:09:34,133
“easy” in quotes for a reason. You have essentially the idea that you’re getting unrelated cases and controls or some, actually, I like to think

56
00:09:34,133 --> 00:09:44,533
of it as a case comparison group. Case-control is one type of comparison group. I’m not even certain of what a control is in many cases, in any

57
00:09:44,533 --> 00:09:57,699
instances…maybe not case is a good term. One can think of extremes of individuals being case groups and non-case groups. Obviously, the idea

58
00:09:57,700 --> 00:10:12,100
of a case-control comparison has been used in this NHLBI Exome Sequencing Project, but if you think about case comparison groups you’re

59
00:10:12,100 --> 00:10:19,866
assuming that they’re unrelated. One thing I’ve learned from years of ascertaining cases and controls, there is this issue of this underlying

60
00:10:19,866 --> 00:10:30,166
genetic predisposition to volunteer for studies, and if you generate enough groups of individuals who volunteer, you’ll find out that Nancy

61
00:10:30,166 --> 00:10:39,332
volunteered for this study and then it was such a wonderful experience, then she talked about it to all her friends and colleagues and her family, that

62
00:10:39,333 --> 00:10:48,966
her sister, June, also volunteered for this study. And even though you think that you’ve got unrelateds, you may have sisters, you may have

63
00:10:48,966 --> 00:10:59,699
cousins, and fortunately we have genetic methods to tease a lot of that apart. So, when we started the Exome Sequencing Project—this is the

64
00:10:59,700 --> 00:11:11,466
project that you’ve heard about once or twice—this is, again, an NHLBI study. I’m leading the Heart GO effort which includes six cohorts:

65
00:11:11,466 --> 00:11:21,699
ARIC-Atherosclerosis Risk in Communities; Cardiovascular Health Study-CHS; CARDIA-Coronary Artery Risk Determinants in the Youth;

66
00:11:21,700 --> 00:11:32,333
Framingham; Jackson Heart Study; and the Multi-Ethnic Study of Atherosclerosis, or MESA, has two sequencing centers, a lung compartment, a

67
00:11:32,333 --> 00:11:42,799
lung segment, and a women’s health initiative. But this was really getting started about three years ago or four years ago at the time when exome

68
00:11:42,800 --> 00:11:55,400
sequencing was fairly new and we had no clue of what’s the appropriate study design, and so a lot of the discussion was centered upon: do you

69
00:11:55,400 --> 00:12:06,133
take a couple of people in extremes, do you take cohorts, do you take families, do you take large numbers of people? So, the standard design was

70
00:12:06,133 --> 00:12:19,766
sampling in each tail and I think Linda Kao mentioned this quite well, but essentially the idea is you have individuals with very low LDL, those

71
00:12:19,766 --> 00:12:33,466
with very high LDL, or early-onset MI or controls, and you perform exome sequencing in each of these sets, thinking that you’re essentially treating

72
00:12:33,466 --> 00:12:44,532
this LDL distribution at the very extremes—and we’re saying, like, less than 1% or it’s like less than 1% and greater than 99th percentile—to

73
00:12:44,533 --> 00:12:53,166
Mendelize things, to try to get those extremes so that you would pick up these genes of major effect. Of course when you talk about LDL, you

74
00:12:53,166 --> 00:13:00,032
also have to think: what are the factors that contribute to these high and low LDLs, and then including that, what are the drugs that they’re

75
00:13:00,033 --> 00:13:10,266
taking, and trying to adjust for those things. For the early-onset MI, you have the situation where you’re trying to identify those with myocardial

76
00:13:10,266 --> 00:13:21,732
infarction at a very young age, so this would be, say, earlier than 55 in women and 45 in men. What is the comparison group here: people in the

77
00:13:21,733 --> 00:13:33,033
general population, or are these people who have all the risk factors but don’t have an MI? Well, leading risk factors are something like age. I like to

78
00:13:33,033 --> 00:13:41,433
think of this control group that’s like Winston Churchill. So, it’s someone who’s old, fat, smokes, runs around, does everything bad but

79
00:13:41,433 --> 00:13:54,499
doesn’t have an MI until very late. Again, it may be something personality-driven, so if we have risk, if there’s genes that are involved with being old,

80
00:13:54,500 --> 00:14:06,300
fat, and running around and smoking cigars, then we may find genes for that, too. CHARGE-S was mentioned, also by Linda. It’s a different design.

81
00:14:06,300 --> 00:14:16,366
It’s a design that’s like a case cohort design. And so the CHARGE-S sampling design, again, is there’s cohort random sample and there’s

82
00:14:16,366 --> 00:14:25,699
specific case groups, so there was a CRP group, a blood pressure group, and a kidney group—and this was expanded by Linda for her own

83
00:14:25,700 --> 00:14:33,666
studies—and then other types of groups. And so, this is very much like the Wellcome Trust Case Control Consortium that was reported from the

84
00:14:33,666 --> 00:14:47,666
initial GWAS several years ago where they used a group of sort of controls by all the different cases. So, this is yet another idea of unrelateds.

85
00:14:47,666 --> 00:14:57,966
Now as Linda pointed out, there was basically three cohorts involved that contribute to this, and so that’s another sort of complexity that needs to

86
00:14:57,966 --> 00:15:07,166
be added on to how one thinks about the analysis of any type of genetic data. When you have ARIC, which is Atherosclerosis Risk in

87
00:15:07,166 --> 00:15:16,866
Communities, where the ascertainment was over several communities of an age of entry into the cohort of about 45 years of age to 65 years of

88
00:15:16,866 --> 00:15:27,732
age. The Cardiovascular Health Study, or CHS, recruited at four different communities, not necessarily the same communities as ARIC, they

89
00:15:27,733 --> 00:15:37,599
were entry at 65 years of age or older, and then you have Framingham, which as we all know and love, Framingham, Massachusetts is not

90
00:15:37,600 --> 00:15:46,666
necessarily a representative of anything but Framingham, Massachusetts, and they’re almost all whites. So it’s a very mixed group here, so

91
00:15:46,666 --> 00:15:56,532
even though it looks pretty homogenous, there’s a lot of heterogeneity that underlies this that needs to be taken into account. So, what are my

92
00:15:56,533 --> 00:16:05,799
thoughts on this “easy” design? It is relatively easy to collect samples and data—that’s the one great thing—because as someone who’s

93
00:16:05,800 --> 00:16:16,733
collected family data in the past, your family as a unit of inference is only as good as having the actual family members, and if you’ve ever tried to

94
00:16:16,733 --> 00:16:26,999
get blood out of people, you can go to a door, knock on a door, say, “We’re here to get the blood from the family,” and you here the back door slam

95
00:16:27,000 --> 00:16:36,000
as the father runs across the hills because the fathers hate giving blood. As a father, I hate giving blood; I don’t know about other fathers in

96
00:16:36,000 --> 00:16:45,666
the audience. Moms will give blood, kids will give blood because their moms tell them to do it, the dads are out of town, you know. So in general,

97
00:16:45,666 --> 00:16:57,866
sampling individuals is a lot easier than sampling families or relateds. It’s also, you know, you get samples and you use that data for sequencing.

98
00:16:57,866 --> 00:17:07,799
It’s relatively easy to find “affected” or the “extreme” phenotype. It’s actually quite…you can think about early age of onset, severity of the

99
00:17:07,800 --> 00:17:15,333
phenotype, some high recurrence risk. I think it’s relatively difficult to define the comparison group, and so it takes a lot more thinking about how

100
00:17:15,333 --> 00:17:24,066
you’re going to compare those severe phenotypes. And then, analysis of the sequence data, I think, becomes much more difficult

101
00:17:24,066 --> 00:17:36,466
because now you have to define what’s a sequencing artifact from individuals who are unrelated and whether that’s an artifact or a truly

102
00:17:36,466 --> 00:17:47,932
rare variant, and the issues of quality control and formal analysis are somewhat easier for cases, controls, and families, but still you don’t have this

103
00:17:47,933 --> 00:17:57,066
ability to track a variant from parent to child, which in my sense of the world, is that this is a good way to know if something’s a spurious

104
00:17:57,066 --> 00:18:06,999
variant or not is if you see it in a child and in a parent. So, this is where some of the thoughts are on the “easy” design. Let’s go to sib-pairs:

105
00:18:07,000 --> 00:18:21,533
sib-pairs may not have either parent or can be both affected, so it’s concordant sib-pairs or discordant sib-pair. So, we move from unrelated

106
00:18:21,533 --> 00:18:29,466
to sibships. This is a bit more difficult. The concept is you focus on families in which you have a common disease or a phenotype that

107
00:18:29,466 --> 00:18:39,866
might segregate in an apparent Mendelian way. It’s familial, it’s a complementary alternative to the case comparison. The goals of case-control

108
00:18:39,866 --> 00:18:50,866
design or family sib-pair design are the same: you’re basically trying to find genes and also trying to utilize sequence data to look for fairly

109
00:18:50,866 --> 00:19:02,866
highly penetrant variants, whether they’re coding or noncoding. Hopefully, you can identify familial subsets. So, in our original concordant sib-pair

110
00:19:02,866 --> 00:19:13,199
you obviously have clustering in a family. You get this issue of whether the clustering is truly genetic or whether there’s a cause and then a

111
00:19:13,200 --> 00:19:22,733
spurious genetic effect or some spurious environmental effect that makes it look like an affected sib-pair is truly genetic or is it one cause

112
00:19:22,733 --> 00:19:34,933
and then a separate cause for the affect sib? They’re harder to collect and, again, sometimes sporadic events do mimic genetic events. Here’s

113
00:19:34,933 --> 00:19:43,733
the example from the group at University of Washington that was published in Nature Genetics that has shown that exome sequencing

114
00:19:43,733 --> 00:19:57,533
can identify single gene Mendelian disorders and I think it’s instructive for a couple of reasons. The way this was performed…and this was

115
00:19:57,533 --> 00:20:10,599
published in 2010, so again, let’s think. This is 2012 so it’s only been a couple of years or really just a year or more from the time that we started

116
00:20:10,600 --> 00:20:22,800
with exome sequencing and discovery of a Mendelian disorder. The idea is that you’ve have a couple of kindreds and really only a couple of

117
00:20:22,800 --> 00:20:32,666
individuals that would allow you to then do filtering of variants, and then after filtering of variants you basically come up with the right

118
00:20:32,666 --> 00:20:42,666
result. This is wonderful; this will work for everything, right? Well, not quite, but you know, this was a landmark paper because it showed

119
00:20:42,666 --> 00:20:56,932
exome sequencing in a single gene disorder will allow you to find a gene. The filtering helped tremendously. So you basically sequenced, you

120
00:20:56,933 --> 00:21:06,933
found rare variants, you then looked to see if those variants were in dbSNP and you could then apply filters to reduce the known variants to

121
00:21:06,933 --> 00:21:15,399
come down to a small number that were novel. You could then look across a couple of pedigrees for families and then you could find, okay, well

122
00:21:15,400 --> 00:21:23,466
this is great; you found it. Well, one of the problems is, as we continue to sequence we continue to find a lot of variants, we put those in

123
00:21:23,466 --> 00:21:33,866
dbSNP, so now just because you see something that’s in dbSNP, should we throw it out? Well, as you continue to sequence, you get more and

124
00:21:33,866 --> 00:21:41,332
more rare variants, you’ll get more and more things deposited in the databases, so I don’t think you can use that strategy anymore. Again, the

125
00:21:41,333 --> 00:21:49,999
focus was on Mendelian, not complex, phenotypes. I think kidney disease is complex. Certainly, my friend, Barry Freedman, tells me it’s

126
00:21:50,000 --> 00:21:59,600
complex and anything Barry says is correct, so that must mean that we’re going to be looking for more than single genes. Although the disease is

127
00:21:59,600 --> 00:22:13,533
common in a family, the sib-pairs could provide a minor but some reduction in the genome that could be considered, because if you think about

128
00:22:13,533 --> 00:22:19,466
it, we can start narrowing down. Well, if you expect there’s a quarter of the genome that’s shared between affected siblings, then you’re

129
00:22:19,466 --> 00:22:31,266
throwing out a certain portion of the genome, but we’re not throwing out a lot. And so if you look at this report from the New England Journal from

130
00:22:31,266 --> 00:22:43,866
Kiran Musunuru and Sekar Kathiresan, identifying ANGPTL3 mutations and familial combined hypolipidemia, again, focused on siblings but then

131
00:22:43,866 --> 00:22:56,999
doing a lot more of the sequencing and bioinformatics work, it narrowed down specific gene candidates. So, one of the things is looking

132
00:22:57,000 --> 00:23:10,200
at pedigrees; large families. And so if you move from sibships to pedigrees, these are more difficult to ascertain, collect, and get clinical data.

133
00:23:10,200 --> 00:23:20,133
One of the issues that we’ve seen is that if you have large pedigrees, that means you must have individuals who’ve been followed over

134
00:23:20,133 --> 00:23:32,666
generations, effectively, and in fact some of the definitions of the phenotypes or diseases may change over time; that’s clearly true in the

135
00:23:32,666 --> 00:23:41,199
psychiatric disorders. There’s also clear evidence that what may be…I mean, I’m sort of a baseball fanatic, so I like to look at old-time

136
00:23:41,200 --> 00:23:50,466
baseball players’ histories. There’s always the person who, you know, fell off the bridge crossing Niagara Falls because he was drunk.

137
00:23:50,466 --> 00:23:59,866
That’s probably not genetic, except there’s some tendency to drink, alcoholism and so forth, but if you look at how baseball players died in the

138
00:23:59,866 --> 00:24:12,399
1920s, 1930s, and 1900s you’ll see diagnoses that look really strange. And so, when you think about assembling pedigrees and you start looking

139
00:24:12,400 --> 00:24:23,333
at diagnoses for cause of death or what individuals had in the early 1900s, they’re totally different terms than we see now. Are they the

140
00:24:23,333 --> 00:24:32,066
same disease? Are the same diagnostic tools used? How were they treated? If you think about gene-environment interactions—which someone

141
00:24:32,066 --> 00:24:43,332
brought up before—we looked at this in terms of lung cancer, and when…I think it was World War I when they started giving cigarettes to men who

142
00:24:43,333 --> 00:24:56,799
were fighting overseas. And so, before World War I, people who had lung cancer had parents who didn’t smoke. Guess what? They looked

143
00:24:56,800 --> 00:25:08,933
recessive. After that, parents smoked, typically men; they looked autosomal dominant. So it depends on the environment, in a way, of how

144
00:25:08,933 --> 00:25:18,099
you actually interpret the exposures. So, the definition of phenotypes may change. Treatments clearly change. So, when we look at lipid levels

145
00:25:18,100 --> 00:25:28,966
and look at lipid levels in cohorts now versus Cardiovascular Health Study, remember, these are people who were aged 65 at entry. They

146
00:25:28,966 --> 00:25:37,732
weren’t using statins at that point. So, there’s a lot of information that needs to be tabulated and this is where the clinical information and the

147
00:25:37,733 --> 00:25:46,199
interpretation of the clinical information becomes critical. You can think about the idea of pedigree-specific genes. We heard, I guess, a comment or

148
00:25:46,200 --> 00:25:58,133
a question earlier for Kári Stefánsson. Well, whatever you found in Iceland, is it only good for Iceland? Well maybe, but I think that’s okay

149
00:25:58,133 --> 00:26:07,733
because, in fact, that might even uncover a novel pathway that we hadn’t thought about before, but oftentimes it’s usually not a problem. In fact, the

150
00:26:07,733 --> 00:26:19,733
TCF7L2 gene that was identified as the best—it’s actually the strongest genetic risk factor for Type II diabetes, odds ratio of about 1.4—was found in

151
00:26:19,733 --> 00:26:30,399
Iceland. So, there’s a lot of those genes that have been identified in isolated populations that are not just restricted to the isolated population; it’s really

152
00:26:30,400 --> 00:26:44,933
part of a normal pathway of causal relationships. If you go to the most used reference these days, which is Wikipedia, and you type in

153
00:26:44,933 --> 00:26:56,466
“genetics”—this is a conference on genetics and chronic kidney disease or kidney disease—you’ll get this: “from the Ancient Greek”…which they

154
00:26:56,466 --> 00:27:03,599
actually had some Greek symbols here but I couldn’t figure out how to do this on my laptop at the time…but it’s “genetikos,” and from that,

155
00:27:03,600 --> 00:27:14,000
genesis or origin. So, genetics is a “discipline of biology that’s a science of genes, heredity, and variation in living organisms.” So the idea behind

156
00:27:14,000 --> 00:27:24,566
genetics is looking at transmission of a gene or a variant from parent to child along the phenotype from parent to child. So in a sense, what we’re

157
00:27:24,566 --> 00:27:33,099
getting at here in using sib-pairs in families is really classic genetics, as opposed to looking at cases and controls which are unrelated, and

158
00:27:33,100 --> 00:27:42,500
therefore there’s no real sharing in those individuals. So, the utility of the pedigree is to increase ability to really sort out some

159
00:27:42,500 --> 00:27:50,566
sequencing artifacts from really true variants. You can track the genotype with a phenotype from parent to child and to other relationships.

160
00:27:50,566 --> 00:28:00,532
The individual pedigrees, if they’re sufficiently large, can provide a lot of power to identify causal genes, and then there’s also the potential

161
00:28:00,533 --> 00:28:08,599
for longitudinal data--we heard someone mention longitudinal data earlier—risk factor data and related phenotypes. Unfortunately, there’s a

162
00:28:08,600 --> 00:28:19,666
limited number of these types of large pedigrees. Iceland is a large pedigree, essentially. We have Framingham with longitudinal studies with a lot of

163
00:28:19,666 --> 00:28:31,999
families there, a lot of the religious isolates like the Amish and Hutterites and things like that, but it’s actually hard to find those specifically enriched

164
00:28:32,000 --> 00:28:44,533
for kidney phenotypes, so again, narrowing within phenotypes. One of the things that we did accomplish with the exome sequencing project

165
00:28:44,533 --> 00:28:53,166
out of the 7,000 or so exomes that have been sequenced, we allocated a certain number for pedigree studies, and this is one of those

166
00:28:53,166 --> 00:29:04,232
pedigree studies. So, you essentially could think about sending this person and this person’s DNA to have exome sequencing and then, because it’s

167
00:29:04,233 --> 00:29:17,233
now 1/16th sharing instead of 1/4th, you can restrict the search space for variants that contribute to the specific disease. This is a Joslin

168
00:29:17,233 --> 00:29:32,033
pedigree that we published with Damian Fogarty back in 2000, looking at diabetic nephropathy, and there are some pedigrees like this out there. I

169
00:29:32,033 --> 00:29:39,966
know that a number of you in the audience have collected pedigrees like that. I know Barry Freedman has been collecting families. I know

170
00:29:39,966 --> 00:29:48,766
that a various number of you have families, sort of the family in the freezer study, in a way. You have the samples and you have the data, and

171
00:29:48,766 --> 00:29:59,699
now it’s time to go in and collect the DNA and have it sequenced. I think these are valuable resources that, quite frankly, the NIDDK should

172
00:29:59,700 --> 00:30:15,633
use. Here’s a situation—this is from Ray Hershberger—of familial dilated cardiomyopathy, another of the pedigrees that had been individual-

173
00:30:15,633 --> 00:30:25,266
sequenced by the Exome Sequencing Project, and you can’t really see it but there’s some proband here but I think we allocated three or

174
00:30:25,266 --> 00:30:35,132
four. I think there’s 10 individuals who had exome sequencing. One thing I would certainly recommend—and it’s something that we do

175
00:30:35,133 --> 00:30:49,733
ourselves—is that I recommend forming 2.5 million or 5 million SNP GWAS on every single individual in a family. Not only can you then utilize that for

176
00:30:49,733 --> 00:30:59,666
imputation, but it actually helps the quality control when you also sequence individuals. From this same type of kindred you wonder, though:

177
00:30:59,666 --> 00:31:12,266
there’s no medical records available, is there some effect of family history? So, you can have problems in terms of the available data. So again,

178
00:31:12,266 --> 00:31:22,399
it points to the fact that you really need to have extensive record-keeping, being able to clearly define the phenotypes, and with this type of small

179
00:31:22,400 --> 00:31:33,800
investment of maybe 10 exomes, Ray was able to identify a gene called Bag3 which contributed to familial dilated cardiomyopathy, and it turns out

180
00:31:33,800 --> 00:31:44,533
there’s a mouse model that has been looked at for peripheral artery disease, and in that linkage region for peripheral artery disease is Bag3. So,

181
00:31:44,533 --> 00:31:55,666
there’s a large amount of supporting information to suggest that exome sequencing identified a gene that’s actually been validated in other model

182
00:31:55,666 --> 00:32:10,532
systems. I know that I’m keeping you from lunch, which I’m not certain what’s worse: keeping you from lunch or having you listen to more of my talk.

183
00:32:10,533 --> 00:32:21,466
But the idea here is that I think we really need to recognize that there is familial aggregation of kidney disease in kidney phenotypes. It’s also

184
00:32:21,466 --> 00:32:32,399
incredibly important to work with the clinical investigators who really know the phenotype to help you design the study and know what to go

185
00:32:32,400 --> 00:32:41,733
for. I think it’s time for whole genome sequencing. I mean, it’s here. It’s Pandora’s Box; it’s open. People are performing whole genome sequencing

186
00:32:41,733 --> 00:32:55,466
in Type II diabetes, Type I diabetes; it’s time to do it in kidney disease. There’s no reason why you shouldn’t; the price is coming down. It’s not free,

187
00:32:55,466 --> 00:33:08,399
but nothing is free, but if the price is getting…you know, you see $1,000, probably $2,500 a genome is probably the more appropriate cost at

188
00:33:08,400 --> 00:33:19,333
this point. But it’s not going to be long. So, I think you need to invest in this ability. Whole genome sequencing, as Kári mentioned, is a commodity. I

189
00:33:19,333 --> 00:33:32,199
would not want to have it done in my lab. I’m old enough to remember the first time you run a genome-wide linkage scan with RFLPs it was

190
00:33:32,200 --> 00:33:41,533
really exciting; the 10th time was a bore. The first time you run a genome-wide scan with SNPs it was really exciting; the 10th time it was a bore.

191
00:33:41,533 --> 00:33:50,999
You know, the first time you run these high-density SNP chips it’s really exciting. After a while, everything becomes a bore, and if it’s this

192
00:33:51,000 --> 00:34:01,733
much investment in technology, my approach is to outsource it and then hire as many great bioinformaticians and computational biologists and

193
00:34:01,733 --> 00:34:12,599
statistical geneticists as you can. I think also, this whole area is dependent upon rapid development, not just of sequencing technology

194
00:34:12,600 --> 00:34:21,266
but of computational analytic methods. You have to be able to handle the data and analyze the data. There’s also this transition. We have a

195
00:34:21,266 --> 00:34:33,732
history of thinking about variants and SNP-based analyses and really we should be focused on genes and pathways. There’s a critical demand

196
00:34:33,733 --> 00:34:47,133
on phenotyping and I think we have pedigrees, families, sibships and case-comparisons. I think the pedigree approach is probably a high-risk,

197
00:34:47,133 --> 00:34:56,733
high-impact approach because you may find something more quickly if you have access to these pedigrees. I think in the long run you’ll need

198
00:34:56,733 --> 00:35:08,199
all the tools in your tool box to really address this. So, with that, this is sort of my view on designs of whole genome approaches and I’d be happy to

199
00:35:08,200 --> 00:35:38,600
take any questions. Thank you. It’s 12:23, not that anyone’s counting. Wait, someone’s here first. I was keeping track; he beat you by three

200
00:35:38,600 --> 00:35:46,533
seconds. Why don’t you go first? GEORGIA DUNSTON: Very good. Thank you. That

201
00:35:46,533 --> 00:35:56,033
was…I totally agree with your assessment of where we are and I also want to say, since this is the last session, I have found this whole

202
00:35:56,033 --> 00:36:11,699
session totally fascinating. With regard to study design I just want to make the comment that I think that using your first slide of the multiple genes

203
00:36:11,700 --> 00:36:26,900
going to individual traits, perhaps if you do a slide tomorrow there might be some consideration of a single gene going to the multiple traits to reflect

204
00:36:26,900 --> 00:36:36,800
what we’re finding about, especially with the common variants in terms of regulation and understanding the pleiotropic effect of single

205
00:36:36,800 --> 00:36:46,733
genes is probably going to be important here. And in that regard, though, I too…I’ve gotten here several times so let me say I’m Georgia Dunston

206
00:36:46,733 --> 00:36:49,466
and I’m at Howard University. STEPHEN RICH: And I’ve known you for,

207
00:36:49,466 --> 00:36:51,732
like, 40 years.

208
00:36:51,733 --> 00:36:59,533
GEORGIA DUNSTON: And that’s why I say that, to be at this—and I’m a human geneticist—to be at this point in the science where the genome is

209
00:36:59,533 --> 00:37:11,533
really, in my mind, trying to give us new paradigms for looking at biology, if you will, that one of the challenges right now is that the way to

210
00:37:11,533 --> 00:37:20,633
the genomes in terms of study designs, populations, and perhaps the way to the genome was one route that we can track steps to the

211
00:37:20,633 --> 00:37:31,799
genome, but it’s almost now like we’ve landed on the genome and it’s like being in the forest and there are so many pathways back out to the

212
00:37:31,800 --> 00:37:44,733
phenotype or the clinical entity, that we’re trying to decide what’s the most efficient way. And also in defense of…all biology is genetic, ultimately. I

213
00:37:44,733 --> 00:37:54,466
mean, that’s not a discipline point-of-view, my point being that we’re making the distinction between heritable from one generation to the

214
00:37:54,466 --> 00:38:04,432
next in terms of the population or the individual, where now we’re clearly dealing with cellular inheritance or changes that occur in the genome

215
00:38:04,433 --> 00:38:14,766
that are transmitted from cell to cell. So, maybe a study design that we haven’t begun to really appreciate is the individual being both the case

216
00:38:14,766 --> 00:38:26,566
and the control, where we now have the technology to look at the normal cell and the altered cell and really now trying to see the

217
00:38:26,566 --> 00:38:39,099
genes that are expressed from that same genome that distinguishes, quote, “the case,” at the cellular level from the control, if you will, and

218
00:38:39,100 --> 00:38:47,900
then use that, also, as an approach to now back-track to the gene you might want to look at in your population.

219
00:38:47,900 --> 00:38:56,066
STEPHEN RICH: Yeah. Thanks, Georgia. Actually, Georgia makes a couple of interesting points. Obviously, now you can do single cell

220
00:38:56,066 --> 00:39:06,232
sequencing and I think that’s where a lot of this is going, to see if you can delineate not only from looking at the RNA-seek, for example, in a cell

221
00:39:06,233 --> 00:39:16,666
from CD4+ T cells and another cell from CD8+ T cells, another cell from a liver, another cell from adipose. That landscape is going to be just

222
00:39:16,666 --> 00:39:27,566
phenomenal to transit. The one thing you were saying about the different pathways is also a Tolkien theme, so if you remember from Lord of

223
00:39:27,566 --> 00:39:37,466
the Rings when Frodo and Sam are trying to get to Mordor and almost every path they took, they wound up at the same place. They couldn’t figure

224
00:39:37,466 --> 00:39:47,732
out the best pathway of getting from one spot into the gates of Mordor. Of course, Golum is the one that helped them and I don’t think we want to

225
00:39:47,733 --> 00:39:57,299
use Golum as the way to get us from which path to take that’s most effective to understand the genetic basis of kidney disease. But it is a case

226
00:39:57,300 --> 00:40:04,400
that there are many pathways and we don’t know exactly which is the right one, but that shouldn’t stop us; we should just move forward.

227
00:40:04,400 --> 00:40:08,200
Thank you. Yes? GEORGE NELSON: A very nice talk. Thank you.

228
00:40:08,200 --> 00:40:20,000
George Nelson, SAIC, Frederick, but more relevantly part of Jeffrey Kopp’s collaboration here. You showed the picture of comparing

229
00:40:20,000 --> 00:40:29,333
extremes of a distribution without perhaps specifically saying that’s what you wanted to base the study on and it was shown earlier, also,

230
00:40:29,333 --> 00:40:39,033
and six or eight years ago when this was suggested in an AIDS study, I vehemently objected—I’m not so sure now—but I feel

231
00:40:39,033 --> 00:40:48,533
honor-bound to make the same point just because I made it so vehemently before, that the factors that influence being on one extreme, you know,

232
00:40:48,533 --> 00:40:55,699
it’s not necessarily the absence of those factors that puts you on the other extreme. What I argued for then was that it was essential to have some

233
00:40:55,700 --> 00:41:01,466
comparison group in the middle to compare both with.

234
00:41:01,466 --> 00:41:09,166
STEPHEN RICH: Absolutely. You know, one of the things that we did in the Exome Sequencing Project was, number one, we didn’t know if this

235
00:41:09,166 --> 00:41:17,299
would work; number two, we didn’t know what the sample size requirements were to make it work if it’s feasible; and number three, we knew

236
00:41:17,300 --> 00:41:26,100
that the cohorts that were contributing samples actually had huge numbers of people in the middle, and that’s why we also had this deeply

237
00:41:26,100 --> 00:41:36,300
phenotyped reference group. I personally have seen studies get really complicated when you have multiple control groups. You know, if you

238
00:41:36,300 --> 00:41:44,066
have a case group, one control group you get a significant result, the other control group you don’t get the same significant result, and the one

239
00:41:44,066 --> 00:41:53,499
thing that I can say is that from our exome sequencing work in our now close to 7,000 exomes, we came together with several other

240
00:41:53,500 --> 00:42:05,566
groups who were performing exome sequencing and developed this exome chip so that the main objective was to use that exome chip in all the

241
00:42:05,566 --> 00:42:14,466
cohort members that we could to try to get around this question now of this extreme, because my view, similar to yours now, is that,

242
00:42:14,466 --> 00:42:25,066
you know, you can have very high LDL caused by one set of genes, you could have very low LDL caused by another set of genes, and if you

243
00:42:25,066 --> 00:42:36,466
compare them, you know, what are you going to get? And so, I think in some ways, this is another reason I’m not really happy with an extreme

244
00:42:36,466 --> 00:42:47,299
design when we start thinking about any type of sequencing or any type of study. I like families. It seems to me that if you have families, you can

245
00:42:47,300 --> 00:42:56,566
track transmission of genotype with phenotype. That gives you much more power to know that these rare variants that you’re seeing are real

246
00:42:56,566 --> 00:43:01,466
and that they may actually have an association. Did I answer your question?

247
00:43:01,466 --> 00:43:05,532
GEORGE NELSON: Yes, very good. STEPHEN RICH: Okay.

248
00:43:10,266 --> 00:43:18,432
MALE: So, I have a question about your last slide and I’m taking this contrary view on purpose, so don’t take it personally, but you said that the time

249
00:43:18,433 --> 00:43:26,399
is right to do it now but the other argument could be made to wait, right? Because the costs are dropping, the people like you are figuring out how

250
00:43:26,400 --> 00:43:32,833
to do this properly. So, is there a counter-argument?

251
00:43:32,833 --> 00:43:33,866
STEPHEN RICH: I guess the… MALE: Now, I know it’s not going to be widely

252
00:43:33,866 --> 00:43:44,032
popular at this meeting but I think we do have to raise…because the question, you know, we’ve clearly put money into this, so I’m not taking that

253
00:43:44,033 --> 00:43:48,866
view, but I think it’s an important one to discuss. STEPHEN RICH: No, actually, I think it’s important

254
00:43:48,866 --> 00:44:00,266
to discuss this. It’s one of those situations where the technology is evolving. One can say that the difference between $2,500 per genome and

255
00:44:00,266 --> 00:44:10,766
$1,000 a genome is, you know, once you get to 1,000 people, that difference becomes like real money and when an institute doesn’t have any

256
00:44:10,766 --> 00:44:20,166
money to begin with, real money means something. At the same time, I think that we know enough about how to evaluate the sequence

257
00:44:20,166 --> 00:44:33,699
data, control the sequence data, issues of quality control that we had no idea about 2-3 years ago, we now understand much more deeply. My

258
00:44:33,700 --> 00:44:46,700
personal view is that it’s being done by us, by some others…Europe…there are a number of places performing whole genome sequencing.

259
00:44:46,700 --> 00:44:54,066
Rotterdam is doing a lot of whole genome sequencing. There’s a lot of places doing whole genome sequencing. I’m more concerned about

260
00:44:54,066 --> 00:45:03,399
just the ability to find the genetic cause for some of these diseases that we can then move into a sort of translational mode so we can get more

261
00:45:03,400 --> 00:45:13,233
into the clinic and get actually treating people. So, that’s sort of my elevator speech of why one should do it now as opposed to waiting,

262
00:45:13,233 --> 00:45:17,066
assuming that you have the money to do it. MALE: The other more scientific question is: the

263
00:45:17,066 --> 00:45:26,899
discussion seems to be dichotomizing rare diseases…I’m sorry, rare traits versus common traits. Are there analytical strategies to combine

264
00:45:26,900 --> 00:45:32,533
this to see if they’re landing on the same pathways to make some sense of those Manhattan plots where things are not quite at

265
00:45:32,533 --> 00:45:37,933
statistical threshold, but look, quote “interesting” if you find a rare variant nearby or on a pathway?

266
00:45:37,933 --> 00:45:49,033
STEPHEN RICH: I’m going to leave Suzanne Leal to tell us all about the analytic things and why she knows how to deal with that, or if she doesn’t

267
00:45:49,033 --> 00:45:57,466
know how to deal with it, she’ll think of a reason between now and her talk, to deal with it. I guess the key thing for me is that when you perform a

268
00:45:57,466 --> 00:46:07,866
GWAS and you identify SNPs from a GWAS, those SNPs are chosen to provide information across a genome at certain distances, and

269
00:46:07,866 --> 00:46:15,332
whether that SNP actually is the critical causal variant or whether that’s just in linkage to its equilibrium or something else, that’s really

270
00:46:15,333 --> 00:46:29,633
important. My belief is that these are in LD with something that’s important. And so, when you do find mapping…so, on my other hat it’s Type I

271
00:46:29,633 --> 00:46:42,466
diabetes genetics consortium, also an NIDDK project, it’s…I personally believe it’s the best NIDDK project, but nonetheless, we identified

272
00:46:42,466 --> 00:46:54,566
over 40 loci for Type I diabetes risk, but remember, it’s a locus, it’s not a gene, so within the average number of genes within each locus

273
00:46:54,566 --> 00:47:09,899
was 7, and so you had to then fine-map across the genome—we did that with an immunochip—and now we’ve identified in 40 loci the strongest

274
00:47:09,900 --> 00:47:19,700
candidate gene in each one of those, and a couple of loci—there’s 2 or 3 still—and in a certain case, the variants lined up between

275
00:47:19,700 --> 00:47:29,666
genes suggesting something really important in regulation. So, I think part of this missing heritability argument and some of the things that

276
00:47:29,666 --> 00:47:35,099
you’re talking about relate not to the fact that GWAS tells us something but it doesn’t account for very much, is that we haven’t found the right

277
00:47:35,100 --> 00:47:45,700
gene. And so now, we’re finding right genes, we’re getting much more of the total heritability explained by this, and so now it’s time to really

278
00:47:45,700 --> 00:47:57,566
look beyond that and I do think that the movement from genome to transcriptome and the integration of that information will tell us that there are some

279
00:47:57,566 --> 00:48:05,832
rare variants that point to one part of a pathway. There’s some common variants, perhaps, in the same gene that point to a different pathway or

280
00:48:05,833 --> 00:48:23,266
even a different part of the same pathway. This is all new stuff that it’s sort of an exciting time to be a geneticist, basically.

281
00:48:23,266 --> 00:48:28,166
STEPHANIE MALIA FULLERTON: Hi. Thank you very much for your presentation. Malia Fullerton, Center for Genomics and Health Care Quality at

282
00:48:28,166 --> 00:48:36,166
the University of Washington. Just first, to comment and then a question. I mean, I think it’s fascinating for those of us in the bioethics

283
00:48:36,166 --> 00:48:42,799
community who have been sort of watching the evolution of human genetics over the last 15 years or the move from pedigrees to population-

284
00:48:42,800 --> 00:48:50,400
based cohort investigations now back to pedigrees, and I think it’s a function of the changing technologies and your presentation

285
00:48:50,400 --> 00:49:01,500
was really great from that point-of-view. On the question of using previously ascertained pedigrees, I and others are really concerned that

286
00:49:01,500 --> 00:49:09,200
we do not leave out the interests of underrepresented minorities as we do work, and I’m just wondering about, in terms of these

287
00:49:09,200 --> 00:49:16,400
complex pedigrees, particularly in the context of kidney disease, are there many from ethnic minority communities, or are we going to be

288
00:49:16,400 --> 00:49:22,900
starting already at a disadvantage for those communities if we begin with previously ascertained pedigrees?

289
00:49:22,900 --> 00:49:32,500
STEPHEN RICH: Thank you. That’s a great question. Just one comment…more on the ELSI side. When we were putting together the Heart

290
00:49:32,500 --> 00:49:41,900
GO component of the Exome Sequencing Project, we were concerned about identifying rare variants and coding regions of genes that had

291
00:49:41,900 --> 00:49:50,800
known clinical significance that were actionable, for example, and what do you do with that information? We’re researchers. The sequencing

292
00:49:50,800 --> 00:50:03,100
sites are not CLIA-certified labs, but as cohort leaders we felt that we had to do something because we have a relationship with our

293
00:50:03,100 --> 00:50:15,700
cohorts. And so, we built into our particular study a medical genetics counseling component so that whenever there was a potential variant that’s

294
00:50:15,700 --> 00:50:26,100
clinically actionable and identified, we would sort of have this system where the genetic counselor would be notified, the counselor would notify the

295
00:50:26,100 --> 00:50:40,700
source of the cohort where the person was a participant, the cohort would then call in that person who had identified themselves as saying,

296
00:50:40,700 --> 00:50:48,100
“If anything comes up, I want to find out about it.” The counselor would then have a discussion with that person, there would be a new blood

297
00:50:48,100 --> 00:50:55,400
sample drawn, that sample would be taken to a CLIA-certified lab for sequencing to confirm the variant, the counselor would be brought back in

298
00:50:55,400 --> 00:51:05,200
and counsel that person, and then that person could make a decision of what to do. It was wonderful in concept; in practice, almost every

299
00:51:05,200 --> 00:51:15,800
university IRB failed to actually address that. So, I think there’s a major ELSI component to this that we haven’t discussed.

300
00:51:15,800 --> 00:51:20,400
STEPHANIE MALIA FULLERTON: That doesn’t surprise me and we are going to talk about return of results tomorrow morning.

301
00:51:20,400 --> 00:51:23,800
STEPHEN RICH: And so I’m just setting you up for this.

302
00:51:23,800 --> 00:51:26,300
STEPHANIE MALIA FULLERTON: Thank you. Yeah.

303
00:51:26,300 --> 00:51:32,900
STEPHEN RICH: But I think I would have to suggest…you know, Barry Freedman, perhaps, would know as well as anyone, if there are

304
00:51:32,900 --> 00:51:36,800
minority pedigrees for kidney disease. BARRY FREEDMAN: So actually, since this is an

305
00:51:36,800 --> 00:51:44,200
NIDDK meeting and Robbie, this gets back to your question about: Is it time? I mean, one of the big costs is collecting these families, collecting these

306
00:51:44,200 --> 00:51:50,800
individuals, phenotyping them, and many studies in the nephrology community are outcome studies. But the FIND was actually a genetic

307
00:51:50,800 --> 00:51:59,000
study built with a family design with a severe phenotype—not population-based—and there’s 10,000 people in FIND of four ethnic groups.

308
00:51:59,000 --> 00:52:06,000
There’s European…actually, European Americans are the minority there—they’re the fewest—African-Americans, Hispanic Americans, and

309
00:52:06,000 --> 00:52:13,800
Mexican Americans. And, you know, Makias, by the way, who is sitting next to me, has all these tissue banks and gene expression work in

310
00:52:13,800 --> 00:52:21,300
multiple ethnic groups as well. So, I think that there is, because whites are the lowest risk group for kidney disease, there are minority

311
00:52:21,300 --> 00:52:25,700
populations represented for diabetic nephropathy. STEPHEN RICH: I think the other question, Barry,

312
00:52:25,700 --> 00:52:30,300
is: Are there pedigrees, specifically, available as opposed to just individuals?

313
00:52:30,300 --> 00:52:36,800
BARRY FREEDMAN: So yeah, the FIND Study is a family investigation of nephropathy in diabetes and it’s an affected sib-pair study or a discordant

314
00:52:36,800 --> 00:52:44,500
sib-pair study, but multiple additional siblings, including some without diabetes or nephropathy. Parents, where available, were recruited. The

315
00:52:44,500 --> 00:52:51,400
Mexican Americanand the Pima families are very large; they’re extended. There are cousins and there are thing like that. You know, there are

316
00:52:51,400 --> 00:53:00,500
studies like Irish families that have albuminuria, GFR, those kind of things, but the FIND was to find severe diabetic end-stage renal disease.

317
00:53:00,500 --> 00:53:02,100
STEPHEN RICH: Thank you, Barry. STEPHANIE MALIA FULLERTON: Thank you.

318
00:53:02,100 --> 00:53:05,600
STEPHEN RICH: Yes? GREG LENNON: Hello, I’m Greg Lennon from

319
00:53:05,600 --> 00:53:15,100
SNPedia and I thank you for mentioning both phase studies and SNP chips at the same time, and I’m curious if you’ve had a chance yet to look

320
00:53:15,100 --> 00:53:21,500
at the importance and have reviewed or developed a view on the importance of using phase data versus unphased.

321
00:53:21,500 --> 00:53:32,700
STEPHEN RICH: I mean, we’ve typically…I mean, to us, phase data is critically important. I’ll just say “yes.” The other point is…

322
00:53:32,700 --> 00:53:40,600
GREG LENNON: I’m curious about the evidence and what’s backing up that? I think we all believe that, but what have you seen that backs that up?

323
00:53:40,600 --> 00:53:49,100
STEPHEN RICH: You know, again, in our families that we’ve been working on, you can basically sort through sections of the genome that really

324
00:53:49,100 --> 00:54:00,800
can be transmitted from one individual to another, and having that phased information gives you much more leverage of deciding what is really

325
00:54:00,800 --> 00:54:09,000
contributing to the phenotype that you’re working with? The other thing I just wanted to mention is that there is a study called the Collaborative

326
00:54:09,000 --> 00:54:23,500
Cross. I don’t know if any of you know about this. They’ve taken eight inbred mouse lines, and because it’s actually hard to identify genes in all

327
00:54:23,500 --> 00:54:31,200
these inbred mouse lines because there are stretches of homozygosity, they are now breaking these up on purpose. So, they’re

328
00:54:31,200 --> 00:54:41,900
basically making mouse strains look like people, and so again, but using the phase information that you get from the breeding, you can actually track

329
00:54:41,900 --> 00:54:52,000
things much better and identify specific genes related to phenotypes. So in a mouse world, they’re breaking apart all this homozygosity and

330
00:54:52,000 --> 00:55:02,200
getting into new crosses. The advantage of the people working in mice have is they can actually make, say, this mouse and this mouse get mated.

331
00:55:02,200 --> 00:55:13,333
It’s hard to do that with people, so that’s why the imprint comes in, and so, having the ability to do the phasing in that sense is very helpful.

332
00:55:19,000 --> 00:55:20,700
Thank you.




Date Last Updated: 9/18/2012

General Inquiries may be addressed to:
Office of Communications and Public Liaison
NIDDK, NIH
Building 31, Rm 9A06
31 Center Drive, MSC 2560
Bethesda, MD 20892-2560
USA
Phone: 301.496.3583