Whole Genome Approaches to Complex Kidney Disease
February 11-12, 2012 Conference Videos

U.S. Studies and Repositories: Opportunity and Challenge
Linda Kao, The Johns Hopkins University

Video Transcript

00:00:00,200 --> 00:00:06,233
JEFFREY KOPP: Our next speaker is Linda Kao, who is Associate Professor at Johns Hopkins School of Public Health where she works on the

00:00:06,233 --> 00:00:14,066
genetics of diabetes and uses various approaches, including admixture mapping. We’ve asked her to give the first of two talks about the

00:00:14,066 --> 00:00:20,499
role of cohorts and what’s available and what we should be looking for. We heard a comment from the last speaker about the importance of

00:00:20,500 --> 00:00:28,066
trios, that will be interesting to hear, and Linda has worked with numerous cohorts including ARIC and FIND and many of these very large

00:00:28,066 --> 00:00:36,099
GWAS consortia, so she’s obviously a very collaborative person, and as I say, she’ll be talking about the kinds of U.S. cohorts available and

00:00:36,100 --> 00:00:41,200
what we should be looking for.

LINDA KAO: Thanks for the opportunity to speak

00:00:41,200 --> 00:00:52,933
today. It’s a terrible spot to have to go after the keynote speaker, though; this is not nearly as exciting. So, the goal of my presentation today is

00:00:52,933 --> 00:01:02,366
really to share with you our experience transitioning from GWAS consortium into now sequence studies to follow up our GWAS

00:01:02,366 --> 00:01:13,132
signals, and then give a few examples of large sequencing projects that I am personally familiar with because I’m working with them, as there are

00:01:13,133 --> 00:01:22,999
many sequencing projects ongoing, but to end with some common challenges that I think are present in many of these sequence projects if

00:01:23,000 --> 00:01:33,633
we want to use them for our own purpose; so, I think setting up the challenges for the speakers for the rest of the day. So, I’ll first start with just a

00:01:33,633 --> 00:01:42,533
few slides on our experience and our lessons learned from our GWAS. So, Caroline Fox and I co-lead a CKDGen consortium where we have

00:01:42,533 --> 00:01:54,899
primarily used large population-based studies to identify common variants for renal traits. At present we have about 45 studies—we might be

00:01:54,900 --> 00:02:06,500
closer to 50 now—with over 130,000 samples with GWAS data that’s been imputed to HapMap. Our outcomes of interest have been the, quote,

00:02:06,500 --> 00:02:16,366
“dirtier and messier outcomes,” but also easier to obtain in large population-based studies. We use GFR estimated from creatinine, GFR estimated

00:02:16,366 --> 00:02:28,132
from cystatin-C, urinary albumin to creatinine ratio, and CKD to dichotomize outcome defined by eGFR. So, why focus on early CKD? They

00:02:28,133 --> 00:02:38,433
definitely represent a large at-risk group in whom prevention is feasible. We can possibly study disease initiation. There’s less confounding by

00:02:38,433 --> 00:02:53,566
metabolic and vascular derangement that’s present in Stages 4 and 5 of CKD, and 3 of our 4 traits are quantitative, so greater statistical

00:02:53,566 --> 00:03:02,866
power. Very importantly, the samples are readily available in large numbers, which ended up being very important for GWA studies in an outbred

00:03:02,866 --> 00:03:11,999
population, unlike in Iceland. So, what we learned also, in terms of exposure from GWAS, is that we were looking at common SNPs, every study

00:03:12,000 --> 00:03:21,700
would genotype their own chip—a commercial chip—but their arrays are definitely not comprehensive. So then, everybody then imputed

00:03:21,700 --> 00:03:30,166
to this common set of HapMap SNPs so that we would have 2.5 million SNPs per participant to increase information to decrease

00:03:30,166 --> 00:03:40,799
misclassification of exposure. The general paradigm is then every study would conduct their GWA study, they would upload to some central

00:03:40,800 --> 00:03:50,200
shared space. The CKDGen analysis group would then meta-analyze those results. We would then take our top findings and replicate

00:03:50,200 --> 00:04:05,200
them in other studies and essentially keep going in circles, and it’s been fruitful. It’s helped us identify…today we have about 26 eGFR loci that

00:04:05,200 --> 00:04:14,233
have been discovered and replicated and what we learned is collaboration is important. Large sample sizes are needed to achieve the right

00:04:14,233 --> 00:04:23,866
balance of type I and type II errors. We also learned that we can increase our statistical power by imputation of SNPs to improve the

00:04:23,866 --> 00:04:32,399
measurement error of the exposure. We also learned that you can increase power by increasing sample size, even though increasing

00:04:32,400 --> 00:04:40,333
sample size at the same time also introduces phenotype measurement error and heterogeneity, and so this is a concept that I’m going to come

00:04:40,333 --> 00:04:52,666
back to at the end. So, larger sample sizes lead to discovery of more loci as we embark on our next meta-analysis, but larger sample sizes will be

00:04:52,666 --> 00:05:04,099
finite, and as Dr. Stefánsson pointed out, there’s got to be a more efficient and cheaper way of studying than to have to sequence 200,000

00:05:04,100 --> 00:05:16,400
people. So the question for me has been: does this GWAS paradigm still hold for studies of sequence data? So, our experience going from

00:05:16,400 --> 00:05:27,833
CKDGen GWAS results now to sequencing have primarily come from our experiences with another consortium called the CHARGE-Sequencing

00:05:27,833 --> 00:05:38,733
Consortium, CHARGE standing for Cohorts for Heart Aging Research in Genomic Epidemiology. The CHARGE-S cohorts consist of Framingham

00:05:38,733 --> 00:05:48,999
Heart Study; Cardiovascular Health Study; AGES from Iceland; Rotterdam; and then ARIC, Atherosclerosis Risk in Community Studies. So, all

00:05:49,000 --> 00:06:00,100
five studies are population-based with common high-priority disease phenotypes. The objective of CHARGE-S, the sequencing component of this

00:06:00,100 --> 00:06:09,266
consortium, is to leverage existing population, laboratory, and computational resources to identify susceptibility genes underlying selected

00:06:09,266 --> 00:06:19,732
well-replicated GWAS findings for heart, lung, and blood diseases. And so, the two aims for the study for the grant was to follow up GWA signals

00:06:19,733 --> 00:06:27,433
to localize the functional variant, and that’s being done through the targeted sequencing arm of CHARGE-S, and then to evaluate whether rare

00:06:27,433 --> 00:06:37,166
variants may also contribute to the trait, which is done through the exome sequencing part of CHARGE-S. The sampling design of CHARGE-S is

00:06:37,166 --> 00:06:46,599
a case cohort design and it’s a multiple case cohort design. So, the total number sampled…sorry…the total number sequenced for target

00:06:46,600 --> 00:06:56,066
sequencing will be about 4,800 European Americans from the cohorts in this consortium. Out of these 4,800 there’s going to be about

00:06:56,066 --> 00:07:06,832
2,000 cohort random samples, so these are individuals selected irrespective of their outcomes with 14 different case groups. Each

00:07:06,833 --> 00:07:19,866
case group could be…for example, one case group is hyperglycemia but non-diabetic, so each case group has 200 cases that they can select.

00:07:19,866 --> 00:07:34,799
The targeted sequencing case groups are represented here and then what’s shown in red are the targets—the region that they’re targeting

00:07:34,800 --> 00:07:47,133
to sequence. And so you can see, most of these phenotypes or many of these phenotypes are related to chronic kidney disease. In terms of the

00:07:47,133 --> 00:07:58,233
sampling design, most working groups have chosen to do an extreme tail of a quantitative trait—extremely high insulin, extremely high glucose,

00:07:58,233 --> 00:08:06,766
extremely high blood pressure---and then in terms of gene targets, many have chosen boundaries of their original GWA signal. Some

00:08:06,766 --> 00:08:15,399
have chosen to fine map whereas others have chosen to focus on specific, known gene regions, and then many groups have used

00:08:15,400 --> 00:08:25,166
evolutionary conservation to help them decide which regions they want to prioritize for sequencing. So, in parallel in CHARGE-S there is

00:08:25,166 --> 00:08:36,199
also an exome sequencing component, and Caroline Fox and I obtained funding from NIDDK to do a larger selection of exome sequencing within

00:08:36,200 --> 00:08:47,333
the CHARGE-S setting. So, in total there will be about 2,400 samples with exome data. There will be 1,000 random cohort controls, so the study

00:08:47,333 --> 00:08:57,566
design, again, is the same as the targeted sequencing—one common control group that’s unselected for any phenotype with multiple case

00:08:57,566 --> 00:09:10,032
groups. So in this case, we were able to choose 400 CKD cases, and the idea is to have these samples go through solid sequencing, validate,

00:09:10,033 --> 00:09:21,399
we’ll perform association analysis, and then ultimately validate back to the entire cohort of the cohort studies in CHARGE-S. our CKD case

00:09:21,400 --> 00:09:30,333
selection—and this is where we had to think, given that we had a limited number of people that we could sequence—the cases came from ARIC,

00:09:30,333 --> 00:09:39,366
Cardiovascular Health Study, and Framingham Heart Study. All three of us took the same approach in that we ranked eGFR estimated from

00:09:39,366 --> 00:09:47,466
creatinine from lowest to the highest and picked from the lowest to the highest, but we also used a slightly different approach in refining our

00:09:47,466 --> 00:09:58,032
phenotype. For example, in ARIC’s study, we have eGFR from 3 visits over 12 years, so we decided to use our longitudinal data to help inform

00:09:58,033 --> 00:10:09,766
us of the best cases to sequence. So, we imposed an extra criteria of an individual who can’t revert back to eGFR greater than 60 in the

00:10:09,766 --> 00:10:23,132
next study visit. I know in Framingham they had incorporated medical adjudication record data into their selection of cases. So, I think the message

00:10:23,133 --> 00:10:32,399
here is that we really…given the limited resource, we really tried to use a different approach than what we did with the GWAS. With the GWAS we

00:10:32,400 --> 00:10:41,000
took the lowest common denominator phenotype, had as many samples as we could with something like sequence, where we’re really

00:10:41,000 --> 00:10:53,600
trying to define a phenotype that’s more specific, and then they had to have prior GWAS data and pass GWAS QC. The data sequencing is being

00:10:53,600 --> 00:11:05,133
done at Baylor University, the platform is SOLiD, data collection will be completed in the spring of 2012, and then data will be posted to dbGap, and

00:11:05,133 --> 00:11:13,333
for those of you who are not familiar with dbGaP, I think this is an important resource for people to know about. dbGaP is the database of genotype

00:11:13,333 --> 00:11:22,133
and phenotype. It archives and distributes results of studies, including GWAS and sequence studies that have investigated the interaction of

00:11:22,133 --> 00:11:32,999
genotype—NCBI—and you need to submit an application to gain access to these access-controlled data. The other project that we’re

00:11:33,000 --> 00:11:44,300
familiar with is another large-scale project, the NHLBI Exome Sequencing Project, or the NHLBI GO ESP, and I’ll only speak very briefly about this

00:11:44,300 --> 00:11:55,800
because one of the PIs of this study, Dr. Rich, is going to talk a lot more about the study design. The goal of this study was really to examine the

00:11:55,800 --> 00:12:13,166
exome and examine how variants in the exome would contribute to diseases of interest for heart, lung, blood phenotypes. The ESP has three

00:12:13,166 --> 00:12:20,832
cohort groups, there are two sequencing centers, and there are other components, including CHARGE-S that’s working with ESP. My

00:12:20,833 --> 00:12:30,733
primary purpose for showing ESP here, one, is to contrast the difference in study design. If you remember from CHARGE-S, the investigators

00:12:30,733 --> 00:12:44,366
decided on a case cohort design with this common control group, whereas in ESP some of the phenotype working groups have chosen to

00:12:44,366 --> 00:12:53,132
study the two tails, as you had heard about earlier. So in this case, it would be early-onset MI as cases and then controls would be those with

00:12:53,133 --> 00:13:05,766
high Framingham risk score with no MI. There are other phenotypes, other case selections, included in ESP, including lung, Type II diabetes,

00:13:05,766 --> 00:13:22,099
blood pressure, ischemic stroke, and then they will also have this equally phenotyped reference group of about 1,000. Data from ESP is available

00:13:22,100 --> 00:13:32,000
on the ESP website and the current release, this ESP5400, I actually had to do a bit of data search myself or web research myself. It’s taken from

00:13:32,000 --> 00:13:44,966
about 5,400 samples from ESP cohorts and the future release will include up to 7,000 exomes from the entire project. I think a neat resource that

00:13:44,966 --> 00:13:56,332
the ESP is providing is this Exome Variant Server where, if you go to their website, you can just enter the gene of interest for you and then it will

00:13:56,333 --> 00:14:09,099
output back all the exonic variants that been identified in the ESP samples. so those are just two large sequencing efforts that we’re working

00:14:09,100 --> 00:14:21,166
with in CKDGen, but there are many additional large sequencing efforts. And so, what I did was I went to NIH RePORTER and I just typed in key

00:14:21,166 --> 00:14:32,999
word “exome,” and for example, Mike Behnke has a study where in 3,000 Type II diabetes cases and control samples, they will be performing next

00:14:33,000 --> 00:14:43,100
generation high density SNP arrays—so about 5 million—but also a low-pass whole genome sequencing. And then there is a different grant,

00:14:43,100 --> 00:14:54,466
David Altshuler’s grant, where it’s using multi-ethnic cohorts, each one in the thousands of people, where they’ll be performing deep

00:14:54,466 --> 00:15:03,799
resequencing of exome to identify rare causal variants. And then, obviously, there are many exome projects that are looking at Mendelian

00:15:03,800 --> 00:15:14,966
forms of kidney diseases. And then lastly, one other resource is definitely the 1000 Genomes Project, where we have whole genome data on

00:15:14,966 --> 00:15:23,866
individuals from multiple populations. The samples were not selected for a particular phenotype, so that would also mean that rare variants that are

00:15:23,866 --> 00:15:34,966
enriched in disease populations are likely not to be found in this data set, but the data set can be used as a reference panel to impute existing data

00:15:34,966 --> 00:15:44,632
sets with GWAS and maybe this imputed data set could serve as a potential control. So with all the different projects that are ongoing, I think the

00:15:44,633 --> 00:15:54,499
end users—users who want to use these projects for an alternative purpose—face the same challenges. I think the first one and the most

00:15:54,500 --> 00:16:03,566
important one is that different studies have different sampling schemes. So for example, studies in CHARGE-S where the different case

00:16:03,566 --> 00:16:13,466
groups were essentially phenotypes that are related to chronic kidney disease, how will that influence…that will likely introduce a bias into the

00:16:13,466 --> 00:16:22,999
spectrum of alleles that are identified?: this idea that many people are sampling on extremes. Again, sampling on extremes is not a huge

00:16:23,000 --> 00:16:36,266
problem. Statistically, the association estimate that you obtain from these studies will be biased because you’re sampling from the extremely high

00:16:36,266 --> 00:16:45,399
and the extremely low, but I think that’s a problem that could be resolved statistically or it could be resolved eventually by genotyping in large

00:16:45,400 --> 00:16:55,200
population cohorts. Collaboration with original investigators, I think, would be very important. Really getting back to this point of how did they

00:16:55,200 --> 00:17:04,600
initially sample? How did they initially come up with the sampling scheme, which really will have a great deal of influence on interpretation of your

00:17:04,600 --> 00:17:13,800
findings? And then, different platforms are used by different studies and this is a problem that we saw with GWA studies, too, where if the cases

00:17:13,800 --> 00:17:24,066
and controls come from two different studies, then confounding can result, and this is in addition to matching cases of controls on genetic

00:17:24,066 --> 00:17:35,066
background. And then lastly, statistical power is going to be limited in both the association studies in the sequence set and the follow-up, because

00:17:35,066 --> 00:17:44,799
even if you were able to follow up in 20,000 people, rare variants are still not going to be very common in 20,000 people. So, you can increase

00:17:44,800 --> 00:17:54,666
sample size to combat this problem, but again, that might be limited. You can use sequence data to impute your own data set, and that would

00:17:54,666 --> 00:18:05,132
minimize the measurement error of genotypes that you have, but imputation accuracy for rare variants still needs to be improved in outbred

00:18:05,133 --> 00:18:15,133
populations. The last thing that you could think about is minimizing misclassification of outcomes in exposures by considering multiple outcome

00:18:15,133 --> 00:18:25,799
measures, and that’s something that I think is well-known for epidemiologists but maybe not so commonly thought about by others. So, we know

00:18:25,800 --> 00:18:34,133
measurement error exists for genes, for environmental factors, and for outcome measures, so regardless of what the data

00:18:34,133 --> 00:18:41,466
collection method is, there’s always going to be a true value for each one of those and then there’s going to be some discrepancy between what you

00:18:41,466 --> 00:18:50,332
actually collect and what the true value is. And so, those measures are…that difference between the true value and what you actually

00:18:50,333 --> 00:18:59,733
measure is expressed as reliability and validity. Do you have an outcome that can be reproducibly measured? Do you have an outcome that comes

00:18:59,733 --> 00:19:10,666
close to the actual true value, so it is valid? And so I think this problem—the idea that if you have a measurement error in your outcome or in your

00:19:10,666 --> 00:19:20,399
exposure—will lead to loss in power, or the other way to think about it is, if you have measurement error it’s going to lead to a greater sample size

00:19:20,400 --> 00:19:29,400
requirement to achieve the same power. Again, that’s also not very new. What’s shown on this graph here on the X axis is the number of cases

00:19:29,400 --> 00:19:40,466
required, on the Y axis is power. The blue line here would be a study where you have no measurement error in either the exposure or the

00:19:40,466 --> 00:19:53,866
outcome. The yellow and the green lines are where, one, you have introduced measurement error in the genetic assessment, and the other

00:19:53,866 --> 00:20:00,732
one is where you have introduced measurement error in the outcome assessment, and then the red line is where you’ve introduced measurement

00:20:00,733 --> 00:20:10,733
error in both the exposure and the outcome. And so, it’s obvious that, one, with increasing sample size you always get increased power, so that’s

00:20:10,733 --> 00:20:19,933
good confirmation, but at any given sample size every time you introduce some sort of measurement error you essentially end up with

00:20:19,933 --> 00:20:28,533
less power; there’s a decrease of statistical power. And so, one of the things that we’ve been thinking a great deal about is, with eGFR we

00:20:28,533 --> 00:20:39,599
know there is definitely measurement error in eGFR. First of all, it is estimated based on a biomarker, so there is going to be some

00:20:39,600 --> 00:20:48,533
imprecision; there’s going to be some validity issues, too, depending on the population that you’re studying. And so, we’ve been thinking a lot

00:20:48,533 --> 00:20:58,533
about how do we make that outcome better? So, this is work done by my doctorate student, Adrienne Tin. On the X axis here it shows the

00:20:58,533 --> 00:21:06,866
number of outcome measures going from just having 1 measure of an outcome to having 10 measures in an outcome, and on the Y axis is

00:21:06,866 --> 00:21:21,399
equivalent sample size. So, if we look at this black line here, the reference sample size would be where we have 1 eGFR creatinine, so the

00:21:21,400 --> 00:21:32,200
sample size requirement for that study to achieve 80% power is fixed at 1 here, and the next dot here at 2 indicates that if we just add 1 more

00:21:32,200 --> 00:21:44,266
measure of eGFR creatinine into this scenario, then we’ve essentially added…we have introduced an increase of 20% in our sample

00:21:44,266 --> 00:21:55,266
size. So, the equivalent sample size here is roughly 1.2. If we can measure eGFR creatinine 3 times it gets us up to 1.3. So, it’s like as if you

00:21:55,266 --> 00:22:08,999
have recruited an additional 30% of participants. And so, this increase plateaus after a certain point but what’s reassuring is that it is definitely

00:22:09,000 --> 00:22:17,266
going to be cheaper to measure another creatinine than to recruit another 3,000 people. And so, she did it for different scenarios where,

00:22:17,266 --> 00:22:27,066
instead of looking at multiple outcomes of the same measure—so in this case it would be 10 measures of eGFR creatinine—she also varied

00:22:27,066 --> 00:22:39,799
the outcomes so that it could be 3 measures of creatinine and maybe the addition of 2 other biomarkers to estimate GFR. And so, that is

00:22:39,800 --> 00:22:50,066
introduced by this concept of residual correlation between the outcomes. And so, what you can see here is…taking this purple line as an example

00:22:50,066 --> 00:23:00,599
where you can see a very steep gain in equivalent sample size, and that’s because the residual correlation of the outcomes is low—it’s

00:23:00,600 --> 00:23:10,200
only at .5—and so that’s a situation where one marker might be eGFR creatinine, another marker might be eGFR cystatin C, another marker might

00:23:10,200 --> 00:23:19,933
be BTP. So, these are different markers that are correlated with each other but not perfectly correlated with each other, and so there is just

00:23:19,933 --> 00:23:30,399
enough correlation of about .5 to allow them to pick up different components of kidney function. And so, I think this is a good way to think about it,

00:23:30,400 --> 00:23:37,333
especially with studies that have already been done and there’s already sequence data. It’s a good way to think about how you can use

00:23:37,333 --> 00:23:49,733
multiple phenotypes to really come up with a more defined outcome for your analysis. So, just to summarize, there are many opportunities that

00:23:49,733 --> 00:23:58,733
exist for leveraging existing sequence data sets. There are many sequence data sets out there to advance the field of complex kidney disease

00:23:58,733 --> 00:24:10,466
genetics. I think caution has to exist with respect to many things: with respect to how the original study was sampled; the sequencing platform; the

00:24:10,466 --> 00:24:21,799
limited sample size; and then I think it would benefit investigators a great deal to think really about how to incorporate additional outcome

00:24:21,800 --> 00:24:35,266
measures to get a more concise kidney phenotype, given that there is going to be limited power in terms of sample size. And then, I just

00:24:35,266 --> 00:24:55,732
want to acknowledge Caroline Fox, my co-leader in the work in these consortium, and Adrienne Tin, my doctoral student. Okay? Thanks.

00:24:55,733 --> 00:25:08,733
ANDRE SHAW: Yeah, Andre Shaw from Washington University. I just kind of wanted to go back to this issue about expanding GWA studies

00:25:08,733 --> 00:25:15,766
to using sequencing in the search of finding the functional variants in the sense that the assumption here, it seems to me, is that the

00:25:15,766 --> 00:25:25,166
GWAS hit will be associated with a functional coding sequence variant when it seems like another philosophy would be that the GWAS hit is

00:25:25,166 --> 00:25:29,366
really a regulatory element.


00:25:29,366 --> 00:25:34,432
ANDRE SHAW: Either something known or unknown, so that that is not going to actually get us anything.

00:25:34,433 --> 00:25:43,099
LINDA KAO: Definitely, and I think that’s why, in CHARGE-S, there ended up being two parallel arms. With the targeted sequencing, I think very

00:25:43,100 --> 00:25:51,233
much the focus there is, if this is the GWA signal, I’m only going to sequence the region that surrounds that signal, irrespective of where the

00:25:51,233 --> 00:25:59,366
gene is, even though some investigators still placed preference in regions that they wanted to sequence. And then, there’s this whole other

00:25:59,366 --> 00:26:08,699
separate hypothesis that the exome…I agree. I think whatever is found in the exome, the prior expectation can’t be that they can account for the

00:26:08,700 --> 00:26:14,366
GWA signal that was never in the next exon to begin with.

00:26:14,366 --> 00:26:17,966
ANDRE SHAW: And what are people finding doing this?

00:26:17,966 --> 00:26:26,966
LINDA KAO: Well, Dr. Rich may have more examples, but personally, we haven’t been so lucky as Iceland. We haven’t found…following

00:26:26,966 --> 00:26:39,899
our top signals, we’re not seeing signals in exons, definitely, that account for our GWA signal.

00:26:39,900 --> 00:26:50,300
FEMALE: I want to thank you for this talk, too, and it’s an excellent follow up to Dr. Stefánsson’s talk. The question that I have…maybe if Dr.

00:26:50,300 --> 00:27:00,900
Stefánsson is still here. I wonder, how generalizable are the findings from Iceland to European populations? Has there been any kind

00:27:00,900 --> 00:27:08,366
of assessment? I don’t know if he’s still here.

LINDA KAO: He’s there. I think earlier he had said

00:27:08,366 --> 00:27:21,699
that most of the findings from deCODE genetics have been replicated in other populations of European ancestry.

00:27:21,700 --> 00:27:29,700
FEMALE: Okay, because the question I’m just wondering, even in the strategies that you’ve presented…there may be publications, I’m just not

00:27:29,700 --> 00:27:39,700
familiar with them. But because of the numbers that you have of European samples, to what extent, if we follow this idea of the power of the

00:27:39,700 --> 00:27:51,633
greater the homogeneity of the population for mining the genetic information to then test in the larger group, to what extent have you looked at

00:27:51,633 --> 00:28:06,433
increasing the magnification, shall I say power, of your population by using ancestry markers of European populations to enrich for a more

00:28:06,433 --> 00:28:12,899
homogenous group to do the initial search?

LINDA KAO: I think that’s a great point. At least

00:28:12,900 --> 00:28:23,633
with CHARGE-S, that was not part of the consideration in sampling individuals; the sampling was done on phenotypes. But as we have been

00:28:23,633 --> 00:28:32,033
thinking about this more, definitely with our CKD Working Group as we do our association analysis, I think even within the European

00:28:32,033 --> 00:28:43,833
Americans we will work off this idea of creating the…making the chromosomes more homogenous when we actually do the disease versus

00:28:43,833 --> 00:28:56,133
non-disease comparison. So, it would be ideal if they’re homogenous with respect to everywhere else but the actual locus for the disease.

00:28:56,133 --> 00:29:02,799
MALE: In the spirit of the conference, Linda, we’re talking about complex diseases and approaches. I wonder if you can comment on the

00:29:02,800 --> 00:29:12,600
use of eGFR and simple things like proteinuria, which are associated with all the scads of renal diseases that we looked at. We had a lot of

00:29:12,600 --> 00:29:22,533
trouble establishing a genetic association with phenotype in diabetic nephropathy, which is a disease of low GFR and high proteinuria. How

00:29:22,533 --> 00:29:31,933
much are we losing by making a gmish of diseases that are all tracking a common endpoint or a common progressive path, and how much

00:29:31,933 --> 00:29:38,533
are we gaining from that approach?

LINDA KAO: So, I think we’re gaining because

00:29:38,533 --> 00:29:48,666
we definitely are finding proteins that are shedding light on different pathophysiology of kidney disease. We’re definitely losing, because

00:29:48,666 --> 00:29:59,199
you’re right, it is a dirty phenotype, and I think that’s why it’s, at least for us as we think about our next round of analysis, we’re sort of coming

00:29:59,200 --> 00:30:08,533
back in a circle. We started off with a really dirty phenotype in hundreds of thousands of people and that really just pointed us to regions that are

00:30:08,533 --> 00:30:21,133
common, probably across many different subtypes of kidney disease phenotype, and as we now get to a much more refined look of the

00:30:21,133 --> 00:30:30,133
genome with sequence data, I think we’re going to have to now focus on getting our phenotype data much more specific.

00:30:30,133 --> 00:30:38,399
MALE: Do you have the ability in ARIC and [---] and all these things to delineate what the renal phenotype is rather than just a GFR phenotype?

00:30:38,400 --> 00:30:42,933

MALE: Yeah, I think we know that it’s associated

00:30:42,933 --> 00:30:46,666
with progressive disease, but I don’t know how we translate that knowledge. So, do we have an opportunity to look closely at these parts?

00:30:46,666 --> 00:30:59,999
LINDA KAO: Yeah. In ARIC we’re just about to start an ancillary study validating our CKD cases, so we hope that can improve our outcome a bit,

00:31:00,000 --> 00:31:02,933
although we’re not going to know for another year.

00:31:02,933 --> 00:31:12,066
MALE: This is building on something that Dr. Stefánsson talked about where he showed this beautiful example of Alzheimer’s disease where

00:31:12,066 --> 00:31:19,799
there was this stable cohort and a progressive cohort. I’m simplifying it a bit, but what if we start with a clinical question that we get asked all the

00:31:19,800 --> 00:31:31,233
time—what is the meaning of a GFR of 59 in an elderly person?—and look in some of the cohorts that have elderly patients and look at progression.

00:31:31,233 --> 00:31:43,033
Who’s progressing and who’s staying stable? I mean, it’s a clinically relevant question that the renal community will acknowledge as a really

00:31:43,033 --> 00:31:44,766
open question and see if there is a genetic component to that.

00:31:44,766 --> 00:31:52,299
LINDA KAO: I completely agree with you. So, we have a GWAS now of progressive kidney phenotypes. The sample size is a little smaller

00:31:52,300 --> 00:32:01,500
because not all the CKDGen cohorts are longitudinal, so that, hopefully…we’re in the replication stage now. There are a few hits that

00:32:01,500 --> 00:32:09,033
we’re not sure of but they need to be replicated. So hopefully, in the next six months, we’ll know something.

00:32:09,033 --> 00:32:14,833
MALE: Well, I would say it’s not so much a GWAS, but maybe that’s an appropriate rare variant search question.

00:32:14,833 --> 00:32:24,299
LINDA KAO: Mmm. So, that’s the other R01 that’s under review now. It has the same…it is exactly that idea.

00:32:24,300 --> 00:32:33,366
MALE: Linda, to bring you back to the same concept here, we’ve been recruiting from dialysis clinics for 20 years and we get all the Type II, the

00:32:33,366 --> 00:32:38,366
diabetic nephropathies in whites and blacks, and we thought the non-diabetic nephropathies in blacks were more consistent with a common

00:32:38,366 --> 00:32:43,299
disease, which in fact it proved to be, but the white patients in the dialysis unit we never recruited because there are so many diverse

00:32:43,300 --> 00:32:52,500
causes from coronary disease with emboli to FS, you know, you name it, and I think the question Robbie asked of the GFR 59, which [---] may help

00:32:52,500 --> 00:33:00,300
with, but I think it’s critical to validate these in dialysis patients because that’s the ultimate meaning of a fallen GFR. I’m not sure a GFR going

00:33:00,300 --> 00:33:07,300
from 70 to 60 is important to somebody ending up on dialysis, although clearly there are related factors in what you saw with the brain stuff. You

00:33:07,300 --> 00:33:13,500
know, a heart disease gene might give you a drop in GFR because you got cathed and you got dye, but it’s not really a nephropathy gene. I think

00:33:13,500 --> 00:33:17,500
ESRD is a great replication.

LINDA KAO: Yeah. I mean, I agree. I think that’s

00:33:17,500 --> 00:33:27,433
ultimately an important question, but I think to not take away the importance of the GFR findings—because one could also imagine this multi-step

00:33:27,433 --> 00:33:37,099
progression to disease, right?—that you have to somehow have these variants that influence the GFR to being with that would get you to, like, 61

00:33:37,100 --> 00:33:46,300
or 59, and then you have another variant that really takes you down this big decline.

00:33:46,300 --> 00:33:52,766
JEFFREY KOPP: Jeffrey Kopp. Just to make a point explicitly that the nephrologists in the room are thinking about and I think Rob knows part of

00:33:52,766 --> 00:33:59,832
the answer to the question he raised is, there’s this heat map that Joe Coresh and Andrew Levey have published that looks at GFR versus

00:33:59,833 --> 00:34:09,433
albuminuria, and the more you have bad things of both, the worse-off you do. So clearly, if your study groups included albuminuria, then we’re

00:34:09,433 --> 00:34:17,433
most worried about the person with a GFR below 30, then 30-60 but with higher levels of proteinuria in a graded fashion, but I think part of

00:34:17,433 --> 00:34:22,366
the problem is you don’t necessarily have that data in all of these cohorts.

00:34:22,366 --> 00:34:32,966
LINDA KAO: Right, and so that actually is another project that’s ongoing now, where with the bigger sample size we’re able to separate people into

00:34:32,966 --> 00:34:45,766
these distinct groups: the low GFR-high UACR, the high GFR-low UACR. So, I agree. I think it really comes back to this idea of you might start

00:34:45,766 --> 00:34:50,899
big but I think right now we are definitely trying to refine the phenotypes as much as we can.

00:34:50,900 --> 00:34:59,966
MALE: So, this is more of a theoretical question because I don’t think we have the data sets to answer it yet, but we’d like to put in a plug for

00:34:59,966 --> 00:35:08,232
acute kidney injury. I think there’s a tendency to assume that we’re talking about smooth progression, when the data from ASK and

00:35:08,233 --> 00:35:16,899
beginning to show from [---] also, is that it’s anything but smooth progression in most people. So, is there something else? It could be

00:35:16,900 --> 00:35:24,633
environment, it could be a AKI, it could be some other things that we’re just missing in our conceptual models that, when we try and do the

00:35:24,633 --> 00:35:28,966
genetics, that’s why the genetics haven’t worked out yet. I think that’s part of the dark matter.

00:35:28,966 --> 00:35:42,166
LINDA KAO: Yeah, that’s actually part of our validation study now in ARIC, too, to collect that particular phenotype. Okay? Thank you.

Date Last Updated: 9/18/2012

General Inquiries may be addressed to:
Office of Communications and Public Liaison
Building 31, Rm 9A06
31 Center Drive, MSC 2560
Bethesda, MD 20892-2560
Phone: 301.496.3583