Whole Genome Approaches to Complex Kidney Disease
February 11-12, 2012 Conference Videos

U.S. Studies and Repositories: Opportunity and Challenge
Linda Kao, The Johns Hopkins University

Video Transcript

1
00:00:00,200 --> 00:00:06,233
JEFFREY KOPP: Our next speaker is Linda Kao, who is Associate Professor at Johns Hopkins School of Public Health where she works on the

2
00:00:06,233 --> 00:00:14,066
genetics of diabetes and uses various approaches, including admixture mapping. We’ve asked her to give the first of two talks about the

3
00:00:14,066 --> 00:00:20,499
role of cohorts and what’s available and what we should be looking for. We heard a comment from the last speaker about the importance of

4
00:00:20,500 --> 00:00:28,066
trios, that will be interesting to hear, and Linda has worked with numerous cohorts including ARIC and FIND and many of these very large

5
00:00:28,066 --> 00:00:36,099
GWAS consortia, so she’s obviously a very collaborative person, and as I say, she’ll be talking about the kinds of U.S. cohorts available and

6
00:00:36,100 --> 00:00:41,200
what we should be looking for.

LINDA KAO: Thanks for the opportunity to speak

7
00:00:41,200 --> 00:00:52,933
today. It’s a terrible spot to have to go after the keynote speaker, though; this is not nearly as exciting. So, the goal of my presentation today is

8
00:00:52,933 --> 00:01:02,366
really to share with you our experience transitioning from GWAS consortium into now sequence studies to follow up our GWAS

9
00:01:02,366 --> 00:01:13,132
signals, and then give a few examples of large sequencing projects that I am personally familiar with because I’m working with them, as there are

10
00:01:13,133 --> 00:01:22,999
many sequencing projects ongoing, but to end with some common challenges that I think are present in many of these sequence projects if

11
00:01:23,000 --> 00:01:33,633
we want to use them for our own purpose; so, I think setting up the challenges for the speakers for the rest of the day. So, I’ll first start with just a

12
00:01:33,633 --> 00:01:42,533
few slides on our experience and our lessons learned from our GWAS. So, Caroline Fox and I co-lead a CKDGen consortium where we have

13
00:01:42,533 --> 00:01:54,899
primarily used large population-based studies to identify common variants for renal traits. At present we have about 45 studies—we might be

14
00:01:54,900 --> 00:02:06,500
closer to 50 now—with over 130,000 samples with GWAS data that’s been imputed to HapMap. Our outcomes of interest have been the, quote,

15
00:02:06,500 --> 00:02:16,366
“dirtier and messier outcomes,” but also easier to obtain in large population-based studies. We use GFR estimated from creatinine, GFR estimated

16
00:02:16,366 --> 00:02:28,132
from cystatin-C, urinary albumin to creatinine ratio, and CKD to dichotomize outcome defined by eGFR. So, why focus on early CKD? They

17
00:02:28,133 --> 00:02:38,433
definitely represent a large at-risk group in whom prevention is feasible. We can possibly study disease initiation. There’s less confounding by

18
00:02:38,433 --> 00:02:53,566
metabolic and vascular derangement that’s present in Stages 4 and 5 of CKD, and 3 of our 4 traits are quantitative, so greater statistical

19
00:02:53,566 --> 00:03:02,866
power. Very importantly, the samples are readily available in large numbers, which ended up being very important for GWA studies in an outbred

20
00:03:02,866 --> 00:03:11,999
population, unlike in Iceland. So, what we learned also, in terms of exposure from GWAS, is that we were looking at common SNPs, every study

21
00:03:12,000 --> 00:03:21,700
would genotype their own chip—a commercial chip—but their arrays are definitely not comprehensive. So then, everybody then imputed

22
00:03:21,700 --> 00:03:30,166
to this common set of HapMap SNPs so that we would have 2.5 million SNPs per participant to increase information to decrease

23
00:03:30,166 --> 00:03:40,799
misclassification of exposure. The general paradigm is then every study would conduct their GWA study, they would upload to some central

24
00:03:40,800 --> 00:03:50,200
shared space. The CKDGen analysis group would then meta-analyze those results. We would then take our top findings and replicate

25
00:03:50,200 --> 00:04:05,200
them in other studies and essentially keep going in circles, and it’s been fruitful. It’s helped us identify…today we have about 26 eGFR loci that

26
00:04:05,200 --> 00:04:14,233
have been discovered and replicated and what we learned is collaboration is important. Large sample sizes are needed to achieve the right

27
00:04:14,233 --> 00:04:23,866
balance of type I and type II errors. We also learned that we can increase our statistical power by imputation of SNPs to improve the

28
00:04:23,866 --> 00:04:32,399
measurement error of the exposure. We also learned that you can increase power by increasing sample size, even though increasing

29
00:04:32,400 --> 00:04:40,333
sample size at the same time also introduces phenotype measurement error and heterogeneity, and so this is a concept that I’m going to come

30
00:04:40,333 --> 00:04:52,666
back to at the end. So, larger sample sizes lead to discovery of more loci as we embark on our next meta-analysis, but larger sample sizes will be

31
00:04:52,666 --> 00:05:04,099
finite, and as Dr. Stefánsson pointed out, there’s got to be a more efficient and cheaper way of studying than to have to sequence 200,000

32
00:05:04,100 --> 00:05:16,400
people. So the question for me has been: does this GWAS paradigm still hold for studies of sequence data? So, our experience going from

33
00:05:16,400 --> 00:05:27,833
CKDGen GWAS results now to sequencing have primarily come from our experiences with another consortium called the CHARGE-Sequencing

34
00:05:27,833 --> 00:05:38,733
Consortium, CHARGE standing for Cohorts for Heart Aging Research in Genomic Epidemiology. The CHARGE-S cohorts consist of Framingham

35
00:05:38,733 --> 00:05:48,999
Heart Study; Cardiovascular Health Study; AGES from Iceland; Rotterdam; and then ARIC, Atherosclerosis Risk in Community Studies. So, all

36
00:05:49,000 --> 00:06:00,100
five studies are population-based with common high-priority disease phenotypes. The objective of CHARGE-S, the sequencing component of this

37
00:06:00,100 --> 00:06:09,266
consortium, is to leverage existing population, laboratory, and computational resources to identify susceptibility genes underlying selected

38
00:06:09,266 --> 00:06:19,732
well-replicated GWAS findings for heart, lung, and blood diseases. And so, the two aims for the study for the grant was to follow up GWA signals

39
00:06:19,733 --> 00:06:27,433
to localize the functional variant, and that’s being done through the targeted sequencing arm of CHARGE-S, and then to evaluate whether rare

40
00:06:27,433 --> 00:06:37,166
variants may also contribute to the trait, which is done through the exome sequencing part of CHARGE-S. The sampling design of CHARGE-S is

41
00:06:37,166 --> 00:06:46,599
a case cohort design and it’s a multiple case cohort design. So, the total number sampled…sorry…the total number sequenced for target

42
00:06:46,600 --> 00:06:56,066
sequencing will be about 4,800 European Americans from the cohorts in this consortium. Out of these 4,800 there’s going to be about

43
00:06:56,066 --> 00:07:06,832
2,000 cohort random samples, so these are individuals selected irrespective of their outcomes with 14 different case groups. Each

44
00:07:06,833 --> 00:07:19,866
case group could be…for example, one case group is hyperglycemia but non-diabetic, so each case group has 200 cases that they can select.

45
00:07:19,866 --> 00:07:34,799
The targeted sequencing case groups are represented here and then what’s shown in red are the targets—the region that they’re targeting

46
00:07:34,800 --> 00:07:47,133
to sequence. And so you can see, most of these phenotypes or many of these phenotypes are related to chronic kidney disease. In terms of the

47
00:07:47,133 --> 00:07:58,233
sampling design, most working groups have chosen to do an extreme tail of a quantitative trait—extremely high insulin, extremely high glucose,

48
00:07:58,233 --> 00:08:06,766
extremely high blood pressure---and then in terms of gene targets, many have chosen boundaries of their original GWA signal. Some

49
00:08:06,766 --> 00:08:15,399
have chosen to fine map whereas others have chosen to focus on specific, known gene regions, and then many groups have used

50
00:08:15,400 --> 00:08:25,166
evolutionary conservation to help them decide which regions they want to prioritize for sequencing. So, in parallel in CHARGE-S there is

51
00:08:25,166 --> 00:08:36,199
also an exome sequencing component, and Caroline Fox and I obtained funding from NIDDK to do a larger selection of exome sequencing within

52
00:08:36,200 --> 00:08:47,333
the CHARGE-S setting. So, in total there will be about 2,400 samples with exome data. There will be 1,000 random cohort controls, so the study

53
00:08:47,333 --> 00:08:57,566
design, again, is the same as the targeted sequencing—one common control group that’s unselected for any phenotype with multiple case

54
00:08:57,566 --> 00:09:10,032
groups. So in this case, we were able to choose 400 CKD cases, and the idea is to have these samples go through solid sequencing, validate,

55
00:09:10,033 --> 00:09:21,399
we’ll perform association analysis, and then ultimately validate back to the entire cohort of the cohort studies in CHARGE-S. our CKD case

56
00:09:21,400 --> 00:09:30,333
selection—and this is where we had to think, given that we had a limited number of people that we could sequence—the cases came from ARIC,

57
00:09:30,333 --> 00:09:39,366
Cardiovascular Health Study, and Framingham Heart Study. All three of us took the same approach in that we ranked eGFR estimated from

58
00:09:39,366 --> 00:09:47,466
creatinine from lowest to the highest and picked from the lowest to the highest, but we also used a slightly different approach in refining our

59
00:09:47,466 --> 00:09:58,032
phenotype. For example, in ARIC’s study, we have eGFR from 3 visits over 12 years, so we decided to use our longitudinal data to help inform

60
00:09:58,033 --> 00:10:09,766
us of the best cases to sequence. So, we imposed an extra criteria of an individual who can’t revert back to eGFR greater than 60 in the

61
00:10:09,766 --> 00:10:23,132
next study visit. I know in Framingham they had incorporated medical adjudication record data into their selection of cases. So, I think the message

62
00:10:23,133 --> 00:10:32,399
here is that we really…given the limited resource, we really tried to use a different approach than what we did with the GWAS. With the GWAS we

63
00:10:32,400 --> 00:10:41,000
took the lowest common denominator phenotype, had as many samples as we could with something like sequence, where we’re really

64
00:10:41,000 --> 00:10:53,600
trying to define a phenotype that’s more specific, and then they had to have prior GWAS data and pass GWAS QC. The data sequencing is being

65
00:10:53,600 --> 00:11:05,133
done at Baylor University, the platform is SOLiD, data collection will be completed in the spring of 2012, and then data will be posted to dbGap, and

66
00:11:05,133 --> 00:11:13,333
for those of you who are not familiar with dbGaP, I think this is an important resource for people to know about. dbGaP is the database of genotype

67
00:11:13,333 --> 00:11:22,133
and phenotype. It archives and distributes results of studies, including GWAS and sequence studies that have investigated the interaction of

68
00:11:22,133 --> 00:11:32,999
genotype—NCBI—and you need to submit an application to gain access to these access-controlled data. The other project that we’re

69
00:11:33,000 --> 00:11:44,300
familiar with is another large-scale project, the NHLBI Exome Sequencing Project, or the NHLBI GO ESP, and I’ll only speak very briefly about this

70
00:11:44,300 --> 00:11:55,800
because one of the PIs of this study, Dr. Rich, is going to talk a lot more about the study design. The goal of this study was really to examine the

71
00:11:55,800 --> 00:12:13,166
exome and examine how variants in the exome would contribute to diseases of interest for heart, lung, blood phenotypes. The ESP has three

72
00:12:13,166 --> 00:12:20,832
cohort groups, there are two sequencing centers, and there are other components, including CHARGE-S that’s working with ESP. My

73
00:12:20,833 --> 00:12:30,733
primary purpose for showing ESP here, one, is to contrast the difference in study design. If you remember from CHARGE-S, the investigators

74
00:12:30,733 --> 00:12:44,366
decided on a case cohort design with this common control group, whereas in ESP some of the phenotype working groups have chosen to

75
00:12:44,366 --> 00:12:53,132
study the two tails, as you had heard about earlier. So in this case, it would be early-onset MI as cases and then controls would be those with

76
00:12:53,133 --> 00:13:05,766
high Framingham risk score with no MI. There are other phenotypes, other case selections, included in ESP, including lung, Type II diabetes,

77
00:13:05,766 --> 00:13:22,099
blood pressure, ischemic stroke, and then they will also have this equally phenotyped reference group of about 1,000. Data from ESP is available

78
00:13:22,100 --> 00:13:32,000
on the ESP website and the current release, this ESP5400, I actually had to do a bit of data search myself or web research myself. It’s taken from

79
00:13:32,000 --> 00:13:44,966
about 5,400 samples from ESP cohorts and the future release will include up to 7,000 exomes from the entire project. I think a neat resource that

80
00:13:44,966 --> 00:13:56,332
the ESP is providing is this Exome Variant Server where, if you go to their website, you can just enter the gene of interest for you and then it will

81
00:13:56,333 --> 00:14:09,099
output back all the exonic variants that been identified in the ESP samples. so those are just two large sequencing efforts that we’re working

82
00:14:09,100 --> 00:14:21,166
with in CKDGen, but there are many additional large sequencing efforts. And so, what I did was I went to NIH RePORTER and I just typed in key

83
00:14:21,166 --> 00:14:32,999
word “exome,” and for example, Mike Behnke has a study where in 3,000 Type II diabetes cases and control samples, they will be performing next

84
00:14:33,000 --> 00:14:43,100
generation high density SNP arrays—so about 5 million—but also a low-pass whole genome sequencing. And then there is a different grant,

85
00:14:43,100 --> 00:14:54,466
David Altshuler’s grant, where it’s using multi-ethnic cohorts, each one in the thousands of people, where they’ll be performing deep

86
00:14:54,466 --> 00:15:03,799
resequencing of exome to identify rare causal variants. And then, obviously, there are many exome projects that are looking at Mendelian

87
00:15:03,800 --> 00:15:14,966
forms of kidney diseases. And then lastly, one other resource is definitely the 1000 Genomes Project, where we have whole genome data on

88
00:15:14,966 --> 00:15:23,866
individuals from multiple populations. The samples were not selected for a particular phenotype, so that would also mean that rare variants that are

89
00:15:23,866 --> 00:15:34,966
enriched in disease populations are likely not to be found in this data set, but the data set can be used as a reference panel to impute existing data

90
00:15:34,966 --> 00:15:44,632
sets with GWAS and maybe this imputed data set could serve as a potential control. So with all the different projects that are ongoing, I think the

91
00:15:44,633 --> 00:15:54,499
end users—users who want to use these projects for an alternative purpose—face the same challenges. I think the first one and the most

92
00:15:54,500 --> 00:16:03,566
important one is that different studies have different sampling schemes. So for example, studies in CHARGE-S where the different case

93
00:16:03,566 --> 00:16:13,466
groups were essentially phenotypes that are related to chronic kidney disease, how will that influence…that will likely introduce a bias into the

94
00:16:13,466 --> 00:16:22,999
spectrum of alleles that are identified?: this idea that many people are sampling on extremes. Again, sampling on extremes is not a huge

95
00:16:23,000 --> 00:16:36,266
problem. Statistically, the association estimate that you obtain from these studies will be biased because you’re sampling from the extremely high

96
00:16:36,266 --> 00:16:45,399
and the extremely low, but I think that’s a problem that could be resolved statistically or it could be resolved eventually by genotyping in large

97
00:16:45,400 --> 00:16:55,200
population cohorts. Collaboration with original investigators, I think, would be very important. Really getting back to this point of how did they

98
00:16:55,200 --> 00:17:04,600
initially sample? How did they initially come up with the sampling scheme, which really will have a great deal of influence on interpretation of your

99
00:17:04,600 --> 00:17:13,800
findings? And then, different platforms are used by different studies and this is a problem that we saw with GWA studies, too, where if the cases

100
00:17:13,800 --> 00:17:24,066
and controls come from two different studies, then confounding can result, and this is in addition to matching cases of controls on genetic

101
00:17:24,066 --> 00:17:35,066
background. And then lastly, statistical power is going to be limited in both the association studies in the sequence set and the follow-up, because

102
00:17:35,066 --> 00:17:44,799
even if you were able to follow up in 20,000 people, rare variants are still not going to be very common in 20,000 people. So, you can increase

103
00:17:44,800 --> 00:17:54,666
sample size to combat this problem, but again, that might be limited. You can use sequence data to impute your own data set, and that would

104
00:17:54,666 --> 00:18:05,132
minimize the measurement error of genotypes that you have, but imputation accuracy for rare variants still needs to be improved in outbred

105
00:18:05,133 --> 00:18:15,133
populations. The last thing that you could think about is minimizing misclassification of outcomes in exposures by considering multiple outcome

106
00:18:15,133 --> 00:18:25,799
measures, and that’s something that I think is well-known for epidemiologists but maybe not so commonly thought about by others. So, we know

107
00:18:25,800 --> 00:18:34,133
measurement error exists for genes, for environmental factors, and for outcome measures, so regardless of what the data

108
00:18:34,133 --> 00:18:41,466
collection method is, there’s always going to be a true value for each one of those and then there’s going to be some discrepancy between what you

109
00:18:41,466 --> 00:18:50,332
actually collect and what the true value is. And so, those measures are…that difference between the true value and what you actually

110
00:18:50,333 --> 00:18:59,733
measure is expressed as reliability and validity. Do you have an outcome that can be reproducibly measured? Do you have an outcome that comes

111
00:18:59,733 --> 00:19:10,666
close to the actual true value, so it is valid? And so I think this problem—the idea that if you have a measurement error in your outcome or in your

112
00:19:10,666 --> 00:19:20,399
exposure—will lead to loss in power, or the other way to think about it is, if you have measurement error it’s going to lead to a greater sample size

113
00:19:20,400 --> 00:19:29,400
requirement to achieve the same power. Again, that’s also not very new. What’s shown on this graph here on the X axis is the number of cases

114
00:19:29,400 --> 00:19:40,466
required, on the Y axis is power. The blue line here would be a study where you have no measurement error in either the exposure or the

115
00:19:40,466 --> 00:19:53,866
outcome. The yellow and the green lines are where, one, you have introduced measurement error in the genetic assessment, and the other

116
00:19:53,866 --> 00:20:00,732
one is where you have introduced measurement error in the outcome assessment, and then the red line is where you’ve introduced measurement

117
00:20:00,733 --> 00:20:10,733
error in both the exposure and the outcome. And so, it’s obvious that, one, with increasing sample size you always get increased power, so that’s

118
00:20:10,733 --> 00:20:19,933
good confirmation, but at any given sample size every time you introduce some sort of measurement error you essentially end up with

119
00:20:19,933 --> 00:20:28,533
less power; there’s a decrease of statistical power. And so, one of the things that we’ve been thinking a great deal about is, with eGFR we

120
00:20:28,533 --> 00:20:39,599
know there is definitely measurement error in eGFR. First of all, it is estimated based on a biomarker, so there is going to be some

121
00:20:39,600 --> 00:20:48,533
imprecision; there’s going to be some validity issues, too, depending on the population that you’re studying. And so, we’ve been thinking a lot

122
00:20:48,533 --> 00:20:58,533
about how do we make that outcome better? So, this is work done by my doctorate student, Adrienne Tin. On the X axis here it shows the

123
00:20:58,533 --> 00:21:06,866
number of outcome measures going from just having 1 measure of an outcome to having 10 measures in an outcome, and on the Y axis is

124
00:21:06,866 --> 00:21:21,399
equivalent sample size. So, if we look at this black line here, the reference sample size would be where we have 1 eGFR creatinine, so the

125
00:21:21,400 --> 00:21:32,200
sample size requirement for that study to achieve 80% power is fixed at 1 here, and the next dot here at 2 indicates that if we just add 1 more

126
00:21:32,200 --> 00:21:44,266
measure of eGFR creatinine into this scenario, then we’ve essentially added…we have introduced an increase of 20% in our sample

127
00:21:44,266 --> 00:21:55,266
size. So, the equivalent sample size here is roughly 1.2. If we can measure eGFR creatinine 3 times it gets us up to 1.3. So, it’s like as if you

128
00:21:55,266 --> 00:22:08,999
have recruited an additional 30% of participants. And so, this increase plateaus after a certain point but what’s reassuring is that it is definitely

129
00:22:09,000 --> 00:22:17,266
going to be cheaper to measure another creatinine than to recruit another 3,000 people. And so, she did it for different scenarios where,

130
00:22:17,266 --> 00:22:27,066
instead of looking at multiple outcomes of the same measure—so in this case it would be 10 measures of eGFR creatinine—she also varied

131
00:22:27,066 --> 00:22:39,799
the outcomes so that it could be 3 measures of creatinine and maybe the addition of 2 other biomarkers to estimate GFR. And so, that is

132
00:22:39,800 --> 00:22:50,066
introduced by this concept of residual correlation between the outcomes. And so, what you can see here is…taking this purple line as an example

133
00:22:50,066 --> 00:23:00,599
where you can see a very steep gain in equivalent sample size, and that’s because the residual correlation of the outcomes is low—it’s

134
00:23:00,600 --> 00:23:10,200
only at .5—and so that’s a situation where one marker might be eGFR creatinine, another marker might be eGFR cystatin C, another marker might

135
00:23:10,200 --> 00:23:19,933
be BTP. So, these are different markers that are correlated with each other but not perfectly correlated with each other, and so there is just

136
00:23:19,933 --> 00:23:30,399
enough correlation of about .5 to allow them to pick up different components of kidney function. And so, I think this is a good way to think about it,

137
00:23:30,400 --> 00:23:37,333
especially with studies that have already been done and there’s already sequence data. It’s a good way to think about how you can use

138
00:23:37,333 --> 00:23:49,733
multiple phenotypes to really come up with a more defined outcome for your analysis. So, just to summarize, there are many opportunities that

139
00:23:49,733 --> 00:23:58,733
exist for leveraging existing sequence data sets. There are many sequence data sets out there to advance the field of complex kidney disease

140
00:23:58,733 --> 00:24:10,466
genetics. I think caution has to exist with respect to many things: with respect to how the original study was sampled; the sequencing platform; the

141
00:24:10,466 --> 00:24:21,799
limited sample size; and then I think it would benefit investigators a great deal to think really about how to incorporate additional outcome

142
00:24:21,800 --> 00:24:35,266
measures to get a more concise kidney phenotype, given that there is going to be limited power in terms of sample size. And then, I just

143
00:24:35,266 --> 00:24:55,732
want to acknowledge Caroline Fox, my co-leader in the work in these consortium, and Adrienne Tin, my doctoral student. Okay? Thanks.

144
00:24:55,733 --> 00:25:08,733
ANDRE SHAW: Yeah, Andre Shaw from Washington University. I just kind of wanted to go back to this issue about expanding GWA studies

145
00:25:08,733 --> 00:25:15,766
to using sequencing in the search of finding the functional variants in the sense that the assumption here, it seems to me, is that the

146
00:25:15,766 --> 00:25:25,166
GWAS hit will be associated with a functional coding sequence variant when it seems like another philosophy would be that the GWAS hit is

147
00:25:25,166 --> 00:25:29,366
really a regulatory element.

LINDA KAO: Yeah.

148
00:25:29,366 --> 00:25:34,432
ANDRE SHAW: Either something known or unknown, so that that is not going to actually get us anything.

149
00:25:34,433 --> 00:25:43,099
LINDA KAO: Definitely, and I think that’s why, in CHARGE-S, there ended up being two parallel arms. With the targeted sequencing, I think very

150
00:25:43,100 --> 00:25:51,233
much the focus there is, if this is the GWA signal, I’m only going to sequence the region that surrounds that signal, irrespective of where the

151
00:25:51,233 --> 00:25:59,366
gene is, even though some investigators still placed preference in regions that they wanted to sequence. And then, there’s this whole other

152
00:25:59,366 --> 00:26:08,699
separate hypothesis that the exome…I agree. I think whatever is found in the exome, the prior expectation can’t be that they can account for the

153
00:26:08,700 --> 00:26:14,366
GWA signal that was never in the next exon to begin with.

154
00:26:14,366 --> 00:26:17,966
ANDRE SHAW: And what are people finding doing this?

155
00:26:17,966 --> 00:26:26,966
LINDA KAO: Well, Dr. Rich may have more examples, but personally, we haven’t been so lucky as Iceland. We haven’t found…following

156
00:26:26,966 --> 00:26:39,899
our top signals, we’re not seeing signals in exons, definitely, that account for our GWA signal.

157
00:26:39,900 --> 00:26:50,300
FEMALE: I want to thank you for this talk, too, and it’s an excellent follow up to Dr. Stefánsson’s talk. The question that I have…maybe if Dr.

158
00:26:50,300 --> 00:27:00,900
Stefánsson is still here. I wonder, how generalizable are the findings from Iceland to European populations? Has there been any kind

159
00:27:00,900 --> 00:27:08,366
of assessment? I don’t know if he’s still here.

LINDA KAO: He’s there. I think earlier he had said

160
00:27:08,366 --> 00:27:21,699
that most of the findings from deCODE genetics have been replicated in other populations of European ancestry.

161
00:27:21,700 --> 00:27:29,700
FEMALE: Okay, because the question I’m just wondering, even in the strategies that you’ve presented…there may be publications, I’m just not

162
00:27:29,700 --> 00:27:39,700
familiar with them. But because of the numbers that you have of European samples, to what extent, if we follow this idea of the power of the

163
00:27:39,700 --> 00:27:51,633
greater the homogeneity of the population for mining the genetic information to then test in the larger group, to what extent have you looked at

164
00:27:51,633 --> 00:28:06,433
increasing the magnification, shall I say power, of your population by using ancestry markers of European populations to enrich for a more

165
00:28:06,433 --> 00:28:12,899
homogenous group to do the initial search?

LINDA KAO: I think that’s a great point. At least

166
00:28:12,900 --> 00:28:23,633
with CHARGE-S, that was not part of the consideration in sampling individuals; the sampling was done on phenotypes. But as we have been

167
00:28:23,633 --> 00:28:32,033
thinking about this more, definitely with our CKD Working Group as we do our association analysis, I think even within the European

168
00:28:32,033 --> 00:28:43,833
Americans we will work off this idea of creating the…making the chromosomes more homogenous when we actually do the disease versus

169
00:28:43,833 --> 00:28:56,133
non-disease comparison. So, it would be ideal if they’re homogenous with respect to everywhere else but the actual locus for the disease.

170
00:28:56,133 --> 00:29:02,799
MALE: In the spirit of the conference, Linda, we’re talking about complex diseases and approaches. I wonder if you can comment on the

171
00:29:02,800 --> 00:29:12,600
use of eGFR and simple things like proteinuria, which are associated with all the scads of renal diseases that we looked at. We had a lot of

172
00:29:12,600 --> 00:29:22,533
trouble establishing a genetic association with phenotype in diabetic nephropathy, which is a disease of low GFR and high proteinuria. How

173
00:29:22,533 --> 00:29:31,933
much are we losing by making a gmish of diseases that are all tracking a common endpoint or a common progressive path, and how much

174
00:29:31,933 --> 00:29:38,533
are we gaining from that approach?

LINDA KAO: So, I think we’re gaining because

175
00:29:38,533 --> 00:29:48,666
we definitely are finding proteins that are shedding light on different pathophysiology of kidney disease. We’re definitely losing, because

176
00:29:48,666 --> 00:29:59,199
you’re right, it is a dirty phenotype, and I think that’s why it’s, at least for us as we think about our next round of analysis, we’re sort of coming

177
00:29:59,200 --> 00:30:08,533
back in a circle. We started off with a really dirty phenotype in hundreds of thousands of people and that really just pointed us to regions that are

178
00:30:08,533 --> 00:30:21,133
common, probably across many different subtypes of kidney disease phenotype, and as we now get to a much more refined look of the

179
00:30:21,133 --> 00:30:30,133
genome with sequence data, I think we’re going to have to now focus on getting our phenotype data much more specific.

180
00:30:30,133 --> 00:30:38,399
MALE: Do you have the ability in ARIC and [---] and all these things to delineate what the renal phenotype is rather than just a GFR phenotype?

181
00:30:38,400 --> 00:30:42,933
LINDA KAO: Yes.

MALE: Yeah, I think we know that it’s associated

182
00:30:42,933 --> 00:30:46,666
with progressive disease, but I don’t know how we translate that knowledge. So, do we have an opportunity to look closely at these parts?

183
00:30:46,666 --> 00:30:59,999
LINDA KAO: Yeah. In ARIC we’re just about to start an ancillary study validating our CKD cases, so we hope that can improve our outcome a bit,

184
00:31:00,000 --> 00:31:02,933
although we’re not going to know for another year.

185
00:31:02,933 --> 00:31:12,066
MALE: This is building on something that Dr. Stefánsson talked about where he showed this beautiful example of Alzheimer’s disease where

186
00:31:12,066 --> 00:31:19,799
there was this stable cohort and a progressive cohort. I’m simplifying it a bit, but what if we start with a clinical question that we get asked all the

187
00:31:19,800 --> 00:31:31,233
time—what is the meaning of a GFR of 59 in an elderly person?—and look in some of the cohorts that have elderly patients and look at progression.

188
00:31:31,233 --> 00:31:43,033
Who’s progressing and who’s staying stable? I mean, it’s a clinically relevant question that the renal community will acknowledge as a really

189
00:31:43,033 --> 00:31:44,766
open question and see if there is a genetic component to that.

190
00:31:44,766 --> 00:31:52,299
LINDA KAO: I completely agree with you. So, we have a GWAS now of progressive kidney phenotypes. The sample size is a little smaller

191
00:31:52,300 --> 00:32:01,500
because not all the CKDGen cohorts are longitudinal, so that, hopefully…we’re in the replication stage now. There are a few hits that

192
00:32:01,500 --> 00:32:09,033
we’re not sure of but they need to be replicated. So hopefully, in the next six months, we’ll know something.

193
00:32:09,033 --> 00:32:14,833
MALE: Well, I would say it’s not so much a GWAS, but maybe that’s an appropriate rare variant search question.

194
00:32:14,833 --> 00:32:24,299
LINDA KAO: Mmm. So, that’s the other R01 that’s under review now. It has the same…it is exactly that idea.

195
00:32:24,300 --> 00:32:33,366
MALE: Linda, to bring you back to the same concept here, we’ve been recruiting from dialysis clinics for 20 years and we get all the Type II, the

196
00:32:33,366 --> 00:32:38,366
diabetic nephropathies in whites and blacks, and we thought the non-diabetic nephropathies in blacks were more consistent with a common

197
00:32:38,366 --> 00:32:43,299
disease, which in fact it proved to be, but the white patients in the dialysis unit we never recruited because there are so many diverse

198
00:32:43,300 --> 00:32:52,500
causes from coronary disease with emboli to FS, you know, you name it, and I think the question Robbie asked of the GFR 59, which [---] may help

199
00:32:52,500 --> 00:33:00,300
with, but I think it’s critical to validate these in dialysis patients because that’s the ultimate meaning of a fallen GFR. I’m not sure a GFR going

200
00:33:00,300 --> 00:33:07,300
from 70 to 60 is important to somebody ending up on dialysis, although clearly there are related factors in what you saw with the brain stuff. You

201
00:33:07,300 --> 00:33:13,500
know, a heart disease gene might give you a drop in GFR because you got cathed and you got dye, but it’s not really a nephropathy gene. I think

202
00:33:13,500 --> 00:33:17,500
ESRD is a great replication.

LINDA KAO: Yeah. I mean, I agree. I think that’s

203
00:33:17,500 --> 00:33:27,433
ultimately an important question, but I think to not take away the importance of the GFR findings—because one could also imagine this multi-step

204
00:33:27,433 --> 00:33:37,099
progression to disease, right?—that you have to somehow have these variants that influence the GFR to being with that would get you to, like, 61

205
00:33:37,100 --> 00:33:46,300
or 59, and then you have another variant that really takes you down this big decline.

206
00:33:46,300 --> 00:33:52,766
JEFFREY KOPP: Jeffrey Kopp. Just to make a point explicitly that the nephrologists in the room are thinking about and I think Rob knows part of

207
00:33:52,766 --> 00:33:59,832
the answer to the question he raised is, there’s this heat map that Joe Coresh and Andrew Levey have published that looks at GFR versus

208
00:33:59,833 --> 00:34:09,433
albuminuria, and the more you have bad things of both, the worse-off you do. So clearly, if your study groups included albuminuria, then we’re

209
00:34:09,433 --> 00:34:17,433
most worried about the person with a GFR below 30, then 30-60 but with higher levels of proteinuria in a graded fashion, but I think part of

210
00:34:17,433 --> 00:34:22,366
the problem is you don’t necessarily have that data in all of these cohorts.

211
00:34:22,366 --> 00:34:32,966
LINDA KAO: Right, and so that actually is another project that’s ongoing now, where with the bigger sample size we’re able to separate people into

212
00:34:32,966 --> 00:34:45,766
these distinct groups: the low GFR-high UACR, the high GFR-low UACR. So, I agree. I think it really comes back to this idea of you might start

213
00:34:45,766 --> 00:34:50,899
big but I think right now we are definitely trying to refine the phenotypes as much as we can.

214
00:34:50,900 --> 00:34:59,966
MALE: So, this is more of a theoretical question because I don’t think we have the data sets to answer it yet, but we’d like to put in a plug for

215
00:34:59,966 --> 00:35:08,232
acute kidney injury. I think there’s a tendency to assume that we’re talking about smooth progression, when the data from ASK and

216
00:35:08,233 --> 00:35:16,899
beginning to show from [---] also, is that it’s anything but smooth progression in most people. So, is there something else? It could be

217
00:35:16,900 --> 00:35:24,633
environment, it could be a AKI, it could be some other things that we’re just missing in our conceptual models that, when we try and do the

218
00:35:24,633 --> 00:35:28,966
genetics, that’s why the genetics haven’t worked out yet. I think that’s part of the dark matter.

219
00:35:28,966 --> 00:35:42,166
LINDA KAO: Yeah, that’s actually part of our validation study now in ARIC, too, to collect that particular phenotype. Okay? Thank you.




Date Last Updated: 9/18/2012

General Inquiries may be addressed to:
Office of Communications and Public Liaison
NIDDK, NIH
Building 31, Rm 9A06
31 Center Drive, MSC 2560
Bethesda, MD 20892-2560
USA
Phone: 301.496.3583