DRF 1: Introduction to Fall Series/Health Data Sciences

(upbeat music) – Too early. (audience laughs) Premature. – No, now you’re up, now you’re up. – Now I’m up. I’m on, ready? How ’bout now? Maybe, yes? (audience muttering) Yes, sort of, kinda? Good. Well, welcome. Today is a memorable day, as it always is. It’s sort of a marker of change. If you haven’t noticed, there’s a change of season. If you happen to go outside and you’re shivering or you’re in conference rooms here and you’re shivering, welcome, welcome to the fall, coming to the South relatively rapidly. Little prayer for all those who sort of survived, at least, Hurricane Irma. Hopefully we survive the next series of hurricanes that come down this season. But on a happier note, it’s also the time and thanks to Emily O’Brien, you’ll hear about a change in Research Forum. As you can all see, for those who got here early, you got a meal. It’s a good encouragement and if you answer questions during the session here, I have a meal up front… (audience laughs) That we will offer up to the best question asked during this session. So, just be prepared. I set it aside just for you guys.

In all seriousness, I’m glad you all came today. Doc mumbled in the front, in a usual, I’ve only been here for 45 years response, we’ll get this for the next three sessions and then we’re quite not sure. Another response came that it’s really good when you come to give a talk and there’s a great audience. In part, that’s because I’m giving my talk with the equivalent of Springsteen and Beyonce with Emily and Michael. I’ll have you guess which is Springsteen, which is Beyonce, sorry. Not going there. Okay. Keep on point, Peterson. So… Let’s go with a simple welcome and the mission statement that we all know. You guys hear this every time I give my talk, there’s a reason for that, it’s because it’s what drives this organization from the down, up, and through the place. It is all about both discovering new knowledge and then sharing it. And sharing it is exactly what we should be doing in these conferences.

Think about it in another way and people often argue about the DCRI, that we should be more focused. It’s a problem in my general life, it’s a problem for our organization, or it’s an opportunity in both. The way I see it, there is so much phenomenal science to be done that we really shouldn’t focus in on just one part of it. Should we be just translation or should we be just trials? Or should we be just outcomes or should we be just community work? Or should we do just policy? All these are important. And no matter what part of that organization you’re dealing with, to me, it’s important and it’s phenomenal. And we need all that. We need that spectrum, we need all the parts of the body. We can’t just worry about heart, we’ve gotta worry about all those other minor organs that are associated with it.

Without that, we are not gonna do our mission, which is to make patients healthier and keep them alive longer. That’s what we’re all here for. But that’s a challenge, right? Because now, topics are not necessarily the ones that I only think about, like, I deal in this smaller spectrum. We have to worry about and care about all of the science that’s being done here and then try to convey the messages that we’ve learned in each of these areas to all of our other colleagues here. Sometimes that’s a challenge because you look at it and you’re like, well, the topic tomorrow is GI and who cares about GI? Sorry, GI guys.

Or the topic is in first-in-man studies? Well, I do large outcome studies, I don’t care about those guys. Or the topic’s in data. Nobody cares about data, Michael. (audience laughs) Sorry. But think of this, and Emily gave me the greatest line and I’ma steal it blatantly today. But Emily talked about when she introduced this to the faculty conference and she also gave it to our executive team. She describes this as, think of this as the New York Times. Now for those who spend their Sundays, often like I do, where you get up, you get a cup of coffee, and you pull out the New York Times, and you start reading articles. Some of them when you start to read the first line of these things, you’re like, wait, it’s an article about Cambodian refugees. And you’re like, sorry, it’s not part of my life. But then you start to read through the article and you find out and you’re mesmerized by the quality of the writing that’s there. And the messages that they start to convey. And by the end of the article, you get something out of it, that maybe made you either feel better about your life or made you think about things that you should be doing or could be doing.

It somehow changed you and impacted you and you’re better for the fact that you actually read it. You can take this to lots of others places in your lives and lots of other things that might be meaningful to you. But we want Research Forum to be that for you, that you come down here and maybe you didn’t think the topic in GI or somebody doing community health, or somebody doing a new app development, or whatever it is, is giving the talk but you should be mesmerized in part by the quality of the research and the talk being given.

You can get their passion. You should have an opportunity to exchange with them, and Emily will talk about that, and feel engaged in the process itself. And by you being there, the process is made better, both for you but also for them. You’re in the audience and you’re actually engaged. And then you get something out of it that maybe you get unexpected learnings or a unexpected challenge, or you see how this person overcame something that they had in their lives, you get motivated to do something different and you learn something very technical. All of them should be conveyed, hopefully in these research conferences moving forward. And I’ll not steal Emily’s thunder, but I will turn you over to the idea that we need to learn more everyday about how to do better. And it could be very specific therapeutic areas, it could be methodologic areas, it could be motivational areas for our lives. But any of them and all of them are important and we should get them from these conferences. This research conference itself though is tied in with a larger message that we need to get out.

Lots of great science is being done by lots of individuals across 1,200 people that are part of this organization. And all of them need to be part of both various forums for that to be communicated. We have a much better website and other efforts that Susan has done. We’ve got an annual report that’s just about to be launched for our, now, third year of revamp, it looks pretty solid. But there are messages that we need to get out, both internally and externally, that haven’t been coordinated too well in the past. So, Research Forum, now the name of the conference you’re in, is part of a larger effort with Thinks Tanks that we run in D.C. that is both policy and regulatory efforts. We hold a series of national meetings. And then, even the publications, tweets, national reports, websites, all of them are means in which we, hopefully, will both tie ourselves into one DCRI and ultimately show ourselves to the rest of the world and all the great science we’re doing.

So with that, I will turn it over to Emily who will explain a little bit more about this Research Forum. – Am I on? – Yes, you’re on. – Great, well, thanks Eric and thanks so much to all of you for being here today. I will say that what you’re seeing here now and what you’ll see in terms of the plans for the upcoming year, they’re really the culmination of a lot of hard work from a lot of members of the communications team, as well as the steering committee that we’ve put together, and all of you who we’ve spoken with individually and who completed the faculty survey that we sent out. So, thank you so much for your feedback. We hope that that continues over the coming year and we really look forward to hearing what you think about the changes that we’re gonna be implementing. So, when I took over as Medical Director of Research Conference about a year and a half ago, one thing I noticed is that there wasn’t a really clear mission for what we’re trying to do here.

Many of you go to lots of different conferences every week and we sorta thought about how to differentiate ourselves and really make sure that the mission was clear so that you would know what you would be getting out of your attendance every week. And actually after talking with faculty and the executive team and others, it came down to two objectives, which I think are really simple and straightforward. The first thing that we’re really trying to accomplish is to share research findings that can then be implemented in patient care to improve outcomes. And that’s a really key piece of implementation science but also a key piece of what we do in terms of translating research findings into practice. And then second, our objective is to share lessons learned. So, we all know that some projects go more smoothly than others, so, really taking the opportunity to build on the experiences that we all have and share those with each other so that we can do research in a better, more efficient, way in the future is gonna be key. And we know that this is an hour of your time every week and an hour that can sometimes be very difficult to carve out from week to week.

And we wanted to get a really clear sense of what we want you to get out of this. Of course, understanding new research findings and how these can impact care, as well as the practical ideas that I mentioned. But also, new connections both with each other and with external speakers who come in as visiting professors is gonna be key. And mostly, we want you to feel that this is a good use of your time and that you’ve learned something important and innovative that you can then apply in your own work. The other piece that we wanted to focus on was, what we’re calling, the curriculum and for those of you who’ve been to Research Conference in the past, you know that there’s a wide variety of different talks that are given every year. And we really wanted to be a lot more intentional in how we structured this, so. In order to do this, we asked faculty what they thought the most important topics were, and then summarize those into five key areas that we’ll be focusing on in this first year.

And what I will say is we’re gonna try this out, it’s a first step, and if there are other areas that you think are important for us to cover, certainly let us know and we can incorporate those. But essentially, we’ll be having some sessions and a series on methods masters, so we have a talk on machine learning later this fall as well as a debate on doing comparative effectiveness research and observational data sets. So that’ll be really interesting. Everyone who’s a part of DCRI knows that sometimes we can be a little bit siloed and it’s hard to see how our work connects to Duke more broadly, so we’ll be bringing in some speakers from within Duke who are important collaborators, but who do work that may not be all that visible or apparent to the folks in this institute. So, we wanna foster those connections and build new collaborations through that series. Visiting professors, again, to promote networking and collaborations with folks external to Duke. Transformative research projects is gonna be the bulk of what we cover over the coming year, those are the high impact, high visibility projects like ADAPTABLE or Baseline that many of you have heard of but if you’re not directly involved as a team member, may not know the specifics.

So, we really wanna make sure that the lessons learned and innovation from those studies is shared. And then innovation and emerging therapeutic areas, or as Eric would say, the minor organs of the body, so this is the pulmonology or endocrinology, the growing therapeutic areas that are super important and are doing a lot of cool stuff, a lot of innovative stuff, that we wanna make sure others are aware of and can learn from. Okay, so, another key message that we heard from faculty and staff is that there needs to be more opportunities for audience engagement. So, we want this to be an interactive forum more than anything else rather than just a 45-minute talk with 15-minute Q&A. One of the ways that we’re gonna try this is through Menti. How many folks have used Menti? Okay, so it’ll be a learning experience. I’ve used it and if I can figure it out, it should be totally fine. If you pull out your iPhone and go to the web browser and just enter the website here, www.menti.com, we’ll have a text code available that you then enter that will bring you to that day’s session.

And that’s where speakers can use this functionality to pose questions to the audience. We’ll get back to this in a second but I wanted to mention a couple of other opportunities for engagement. We’ll still be asking folks who attend in-person if you have questions during the Q&A to raise your hand and ask them, but if you can’t attend in a lower level lecture hall and are attending remotely, we also encourage you to submit questions via the YouTube chat functionality, which is on the right-hand side of the screen. You’ll see on the live screen.

We’ll have a moderator here who will collate those questions and then pass them along as we have time to the speaker, so that’s one way that you can engage with the speaker and the session if you can’t attend in person. All right, so we’re gonna do a little, oh, wow, okay, so you guys are already on top of it. So… The first Menti question, again, just to test, which research forum series are you most excited about? Methods Masters, I don’t know, there’s probably some sort of data science rigging going on with that by some of the statisticians. But okay, so that’s good. So, we’ll have a couple of those this fall and a few in the spring and can certainly expand as interests grows. So… Looks like it’s working, great. So, with that, I am going to introduce the Beyonce of biostatistics, Michael Pencina. (audience laughs) It’s a new appointment but it’s well deserved. Michael comes to us, he received his Master’s degree in mathematics in Warsaw and then came to Boston University for his PhD in statistics.

Director of Biostatistics here and Professor of Biostatistics and Bioinformatics, has over 250 peer-reviewed publications and really is just super fun to work with, especially for a data scientist. So, with that I will turn it over to him. – Thank you so much, Emily. Thank you to all of you who are here, especially to those who didn’t get lunch and still stayed. I really appreciate it. So, this talk and presentation about data science is much more about all of you here and the work that we’ve all been doing. I think there is a growing realization of the importance of this quantitative field. And through the examples and what I will be presenting, it’s really a great description of the joint work that’s happening at DCRI. So, I see there’s a great celebration for us all of the importance and the cutting edge of the work we are doing. So, a lot of data everywhere, who would’ve thought, even when I was getting my doctorate in statistics, I went into it because it seemed to fit my math interest and I wanted to do something applied with the mathematics and the statistics seemed applied enough, I had no idea where this field will be in the short years that followed.

So, we have all the data, right? 90% of the data has been created in the last two years. We have all these opportunities from genetics to EHR, to administrative sources, to trials, registries, and open data from industry, that’s ahead of us. And we need to make sense of it and create good research. And here’s one of my favorite quotes. Anything can be done, how do you that it’s done correctly? And I think the focus on quality is what I envision to be the focus of the future. Right now, everybody is very busy building data repositories. When you think about it, a lot of money, effort, and energies invested in gathering, harmonizing, getting the data together.

And that’s phenomenal, without it, we wouldn’t be moving forward. But I think less focus, as of yet, is paid to the idea of how to analyze that data in a principled way, following the canons and following the principles of good research. And I think that’s one of the hallmarks of what we do here at DCRI. When I came here four years ago, I was impressed by the quality of the work that’s happening here, the dedication and motivation to get things right.

The fact that we are not afraid to say to an investigator who tells us, well, they’ve done it this way, and we say, it’s still incorrect. Doesn’t matter, we’re working with a sponsor right now and we’re falling behind another group because we don’t trust the quality of the data and we say, we’re sorry, we’re not going to publish results based on the data that we don’t believe. So, that’s the trust in the data. The other side is the methodology. The methodological advancement and development which describes the work that’s happening here. A lot of good work can be done with right methods in place, not just repeating and doing what others have done, but innovating and leading with methodology in the key areas. So, how do we do it? The first is asking the right question and I think that’s a fundamental and essential part of the work and what I see as the DCRI difference. We are not jumping to questions that’s already given to us, we’re not searching through data and finding what’s the little incremental innovation that can happen, we really start with the true clinical question underlying the problem at hand.

And it is a collaboration with our clinical leaders, with the operational leaders and the faculty and staff at DCRI. But getting the question right is fundamental. When we’re doing a communications workshop thinking, okay, what’s DCRI stats data science all about? What’s the true difference? It’s really not as much the execution of the work, but it’s the innovation at different stage, taking the problem and translating it into a set of answerable pieces. So asking the right question is fundamental. We went with a tagline, the decision is in the question. If you get the question right, you will arrive at the right decision. Then after you have the question, you find the right data. And again, we’re changing the paradigm here. Majority of research happens based on the principle, I have this data, what questions can I answer with that? We’re changing it to go, I have the question that’s clinically meaningful and relevant, what data is best to answer that question? Where can we pull it? That’s why we partner with so many groups to bring the data to the question and not the other way around.

Then there is the methodology. You have to have the right methodology and not every question will be answerable with the data that’s available. We may not be able to answer the question with the data that we have, we have access to, or the data may not exist. Some things you can answer only with a randomized experiment despite the huge desire to apply observational methods. On the other hand, if you do observational methods correctly, you can come close to the results of trials in certain circumstances. And then there is the impact. It’s the translation of the results, it’s the scientific publication which again, needs to be only the start and not the end of the process. We want to publish the results of our research, that’s part of our academic mission, especially in our government contracts, it’s a given. They expect it with industry contracts, sometimes they expect it, sometimes we tell them, that’s what we do. And we will publish the results, whether they’re favorable or not favorable to the sponsor. And then we need to implement it in clinical practice. And I think a huge opportunity is emerging right now at Duke with a lot of focus on implementation and using the Duke data, and combining it with the research that’s happening at DCRI and other entities at Duke, to drive true practice change and build a learning health system.

This is built on history, DCRI, right? DCRI portrayed as this three-legged stool, you need each of them for success, the clinical, the statistical, and the operational working together. This is what DCRI founders, like Kerry Lee who is in the audience, have worked and envisioned going forward with Rob Califf and others. And I think this is coming to the forefront in a magnificent way. And again, this is around asking the right question, working together on a meaningful research project. Then there are all these numerous data sources that are available to us and a lot of the focus and attention of our groups is not as much to be another data repository group, but is to develop expertise working with all these sources so that when the new question comes about, we’re able to say, yes, this is the data that we want to bring into it. So here are several components of data science that we might want to focus on, right? So, you see the content expertise, so, again the tight partnership with the clinicians and research staff. Then we have methodologies, so that needs to be supported by rigorous research and how to do the research.

Then there is the analysis, the execution which has to happen. And then there is education that happens on multiple level. It’s educating through classroom teaching, but it’s also educating the sponsors about the research questions, and educating each other, the statistician educating the clinician and the clinician educating the statistician, and the staff member, and so on. And then there is the implementation that we need to have. And for methodologies, in the area of big data, I envision it around really specialization. I think we have therapeutic areas in medicine, then we have therapeutic areas within cardiology, right? If you’re a heart failure expert, you’re not an expert in another field unless you study, say, prevention. So, people go deeper and deeper. The same is true in statistics and in development of methods.

For big data, I identified five areas of current critical need. And what I know, for sure, is this falls short of the full list. But the things we have are listed here. And what we are doing, we’re starting programs which lead to development of that methodology. So people who are interested in these areas are getting together and talking about research methods, novel ground applications, and they’re also a source for others who can come and ask them questions and engage in research projects.

So, we have our program on comparative effectiveness methodology that Laine is leading, it started last fall and already very successful gaining national recognition and known internally, people are coming to them with questions and research projects. I, myself, you’ll find me out of my depth and saying, yeah, go talk to the program people, they can really help design the study for you. Then Goldstein started a program on methods for electronic health records data and how to use it, what are the pitfalls. And he has a group of, I think, 40 people who are meeting regularly with twice as many interested on the mainland, so, across the campus. So, huge interest levels in this area. I’m talking to Larry Karin, Vice Provost on the campus side, and Ricardo, David, our new hires in machine learning, about starting a new program in machine learning which is storming the biomedical research arena and tremendous opportunities sit there. And I think we can do very well with the focus on applications in health.

Then pragmatic clinical trials, a lot of you involved in it. We’ve acquired Rishi recently who is our expert is cluster randomization. I didn’t know how many cluster randomized trial, Rishi, you’re working on right now, but I think it’s going into huge numbers. Kevin Anstrom leading the pragmatic trials initiatives. How do we do it more efficiently? It’s the design stage using Bayesian and other methods but it’s also going through the entire process and asking questions, can we do monitoring that’s based on mathematical principles, rather than more labor intensive effort. And there is predictive modeling, Center for Predictive Medicine, right? And a lot of the work is really prediction, I kind of laugh at everything, I know how to do is build prediction models. But a lot of the research is really focused, either what’s the risk or how do you differentiate the risk benefit based on different treatments? And then arrive at impact.

Just one example, the ARISTOTLE, which I think it’s a great one. We run this big, mega trial, analyzed the results, published the results, so that’s one set of things. But that doesn’t drive the implementation and the use of the treatment. The results have shown that apixaban is a really good treatment and it’s fairly safe. So then our teams of CpN, formerly sigma, have led analyses which are going to, I think there are 40 papers already published or in the work, based on the ARISTOTLE Database. Finding all the other pieces and components, showing more and more evidence where the treatment will work, where it may not work, what are the underpinnings that drive that? A little tidbit facts around implementation.

Our faculty members have authored 20 or more sets of clinical guidelines. That speaks to the involvement and the partnerships. What the data science does, then is translated in the clinical expertise into something that really changes everyday practice. So, a few examples from our own work, here is the project that I’ve done with the Center for Medicine and with Ann Marie sponsored externally. This sponsor came to us and said, well, we have a treatment, a novel lipid treatment, and we’re worried in the statin era, do lipids still matter? People are raising the question, maybe statins have solved it entirely? So what we said, going into asking the right question, we said, well, we could really investigate it, we could look at pre and post statin era and see what’s the impact? What’s the association of LDL cholesterol, or non-HDL cholesterol, with the outcome of coronary heart disease? And I said, well, I’ve spent 10 years in Framingham before coming here, we can get this data from Bio-Link. The government is nicely sharing it freely, so we can actually do it. And they said, fine, do it. They tell us what it will cost, we did, they funded the research, we acquired the data.

It took tremendous effort of our programming and statistical team to make this data usable, that was really at the heart of the project, getting it ready and available. But we did. And then we investigated and found out that, in general, the impact of lipids is still there. It has it’s ratio, it’s a little down but not significantly. The population actual risk is still there and we ended up with a research letter published in JAMA as part of it.

And now, we can really know that lipids still do matter so the effort to work on new treatments is worthwhile. Follow up project to that is a testament to how the world is changing. I worked, how many years, 10, maybe more, 10 years ago with colleagues in Framingham outside under each registry. And the plan was to create a model for secondary prevention of cardiovascular disease.

And we created a model, we sent it to journals for consideration and the message was, who needs a risk reduction model in secondary prevention? They all are supposed to be on statin treatment anyway, this is not interesting. So, we got published finally but without much fanfare. Well, now, with new treatments available and people are either not responding to statins or their lipid levels not being driven down enough, and the new treatments being available, there is a huge interest ’cause the new treatments are very expensive. You can’t give it to everybody. So, we need to prioritize and prioritization based on risk, risk for addiction coming in, makes a lot of sense.

So, something that 10 years ago was discarded comes back here with full force and we’re engaged in the project using the pooled cohorts, Bio-Link data to answer it. Another cutting edge project I’m stealing from Ben relates to translating clinical trial results to local populations. So, as we all know, the populations enrolled in clinical trials are limited, right? Only a fraction of people on whom the subsequent treatments and therapies are being used, are part of clinical trial populations. But there is tremendous interest in the entire population who might be eligible for the treatment. So, interesting methodological advancements that take the trial data, explore the correlation structure and the interactions that exist, and then superimpose it on the local population. Think about Duke EHR, right? You have a PCSK9 inhibitor that’s extremely expensive, and you want to talk to the health system, well, would you be interested in using it if we can come and start some kind of cost-sharing agreement and we can show you that that will truly lower the cost of the health system and reduce the events. So, it can be done, the mathematics can be figured out, that you can actually take a trial, we’re back on the trial population, apply it to the health system under, there are certain conditions, it’s not always possible and won’t always work as nicely as I’m trying to make it look.

But in many instances, it actually can be done and then you can get a result and say, yes, if you applied ARISTOTLE to Duke EHR, those are the effects on ischemic events versus bleeding. And that’s the methodology we’re developing as part of one of the sponsored projects. Another of Ben’s projects had to do with predicting no shows for the clinic. So, we are cooperating with Duke Health Technology Solutions and the health system, trying to help them address problems that they have. And one of the problems they identified is in their clinics, people are not coming to the appointments. Well, it’s maybe from the patient perspective, I’m not too concerned about that particular problem, doesn’t seem to be affecting my health but when I think about it, well, that’s why I may not get the appointment because there are people who are not showing up taking my place. So, it is a relevant problem. So, the first task was, can we use the EHR data that the health system routinely collects to predict who is not going to come for their appointments? And it turned out, a little bit to our surprise, that we can do a fairly good job predicting who will not show up for the appointment.

It’s people who didn’t show up in the past, it might be people who are small children, there are all sorts of factors that are going into it. But we created a model with a pretty good performance, but we didn’t stop there. The health system said, well, let’s try to do something about it to see if we can implement an intervention and the intervention they tried was, instead of getting the automated reminders, they said, we’re gonna have a live, dedicated person who will be trying to call you, make sure they speak to you, and see if they can ascertain if you’re going to come or not come to the appointment. So, Ben and team designed a cluster randomized trial to test the intervention against the standard reminders within the health system to see what’s going to happen to the appointments. Turns out that the intervention that they came up with is not that effective, so they are thinking about alternative strategies like double booking and other crazy things. But that’s an example of direct implementation of the work to the needs of the health system. So, here is an example from the work from the Center for Comparative Effectiveness and Laine’s work.

I’ll describe it the way she hates me to describe it, which is we said, well, you can’t always conduct a randomized trial. Could we have a crystal ball that will enable us to predict the results of a randomized trial using observational data? Well, and for scientific accuracy and fairness, the answer was no. (audience laughs) But under certain circumstances, right? So, it’s where the mathematics come in. Under a sufficient set of assumptions, you can come up with scenarios that you can come close to that. And there are number of layers and methodologies that are going into it. One has to do with overlap weights, which is collaboration with Fan Li who is one of our co-leaders of the center, she is campus faculty with whom we developed a very productive collaboration.

She came up with overlap weights on the theoretical level, we’re now expanding it and applying it to the data we have in the DCRI. What the overlap weights do, I think many of you are familiar with propensity scores, and when you do propensity scores, you’re weighing the tails of your distribution more. So, it’s the kind of more unique cases which are usually underrepresented that get more weight, so you have a more representative view of the general population. Now, what do clinical trials do? The fundamental principle of a clinical trial is the principle of equipoise, right? You’re ethically not allowed to randomize unless you have equipoise, you don’t know what treatment works better. So, maybe it would make sense to put more weight in the middle on the groups that are more similar, where there is more equipoise as to the way things will turn out. And when you do that, you come up with the method and based on the preliminary data that I’ve seen, that comes much closer to the results of what clinical trials are showing.

The other component of it is you can achieve that through doing the right matching. You’re probably familiar with the famous hormone replacement therapy conundrum where all observational studies led people to believe that that will lead to risk reduction. They conducted a randomized clinical trials and it showed that it actually is harmful for women. If you applied the right methodology, you could have predicted the trial results quite accurately being on the correct side of the hazards ratios.

And I think the science would have been different. So that’s a really good example how correct methodologies can spare us from futile findings. And of course, it’s not gonna work in all circumstances, you have to have the populations aligned in a certain way for the observational methods to work and you have the right quality of the data. And when that happens, where the star sits, you can actually come reasonably close to predicting what the trial would show. So, here is another example of a project we’ve undertaken as part of the health data science center challenge.

I think a lot of people here either are or have been involved in clinical events adjudication. So, the picture on the left, last year, I had the pleasure of working with a very talented cardiology fellow. And we had a really good collaboration but every now and then, she would get very busy and get nonresponsive to the work that was happening. And it turned out, she was sitting, adjudicating, or reviewing cases for the CEC. So, we have the picture here (laughing). Neha, you wanna present yourself? (all laughing) So, we said well, maybe there is a better way. Last fall, you might have heard about the progress with Google Translate, that they have made in translational science.

Now, if you look at translating a document from English to Chinese, it comes really close to what a human being would have done. So much so, that they say the progress in the last two years with machine learning applied to translation science, has been as much or more than the entire progress of the field from inception up to that point. When you think about translation and the application of machine learning, well, what is translation? It’s reading a lot of text, learning what it means, and then doing the same in another language. Think about reviewing cases, well, it’s a lot of materials that you are reading, and in some ways, you don’t have to translate it into Chinese, you need a simple diagnosis. Is it myocardial infarction or not? So this is an example of a project that lends itself very nicely into the machine learning application. So, instead of having Fellows work on that, instead of doing exciting research projects with me, we’re gonna train the machines to do the work, right? Read it, amazing things are happening and Ricardo’s in the audience, we can talk to him more, where we’re reading scans of PDFs and then they are read and translated.

And then, we hopefully will have an algorithm, and maybe, it will not answer all the cases, we still need the humans and we still need the experts. But there are the boring ones, which is very obvious if it’s an event or not, I think the machines would be perfectly equipped to take care of for us. So, this last project is part of the larger initiative that is happening at Duke and we’ve been involved with Larry Karin who is one of the key leaders spearheading that initiative.

And when Rob Califf came back from the FDA, he graciously agreed to lead that and the notion here is we have so much data within the Duke health system, and that’s not being used for everyday patient care. There are pockets here and there in their projects, but what we want to do, we want to instill a culture of data science to the whole enterprise operation at Duke within the health system and the school.

And so, there are several layers of how it needs to happen. We have the six pilot projects, which the CC project that I’ll outline is one of them, it’s actually funded partially through the DCRI Innovations. There is another project funded through the DCRI Innovations fund looking at decision support tools for primary care and endocrinologists, looking at people on different treatments for glucose lowering, and then applying a machine learning algorithm to see, based on the history of people like the one that will be presented to the doctor, can we look at their past history and identify the treatment that has the highest probability of bringing the glucose down? And there are several others. But it’s also very much about community building, having the team science, a reality, getting the clinicians connected with the data scientist, statisticians, project leaders, and research staff. Very much like the model that DCRI is operating under. The DCRI CPM is an integral part of this initiative where we have students from campus and the school of medicine, quantitative sciences embedded with our staff on the fourth floor, working on these projects.

So there is this fun interaction and I know it’s popular among many of you and the students are getting great experience. I think we had 35 students apply for six slots available so we’ll have to do it again and again. But they are supporting these projects. So, those are the pilots but the notion is, hopefully with our CTSA grant being funded and a match from the health system, that that will be a growing initiative that goes forward with the implementation. So there is the educational component related to it and the methodological course that I mentioned earlier, are very much part of this initiative ’cause it needs to be substantiated by strong methods development. And I’ll stop here and welcome questions. (audience clapping) – So, I’ll start off. I guess, tough questions. So, let’s start by first congratulating you for a wonderful talk. I would say that what I was struck by most is the more things change, the more they stay the same, and partly how things have changed over time. So partly, I think it’s a great example that we start off this new series highlighting methods and statistics as being one of the core things to our institution.

It’s a phenomenal message, it goes back to the days of Kerry Lee, almost 40 years ago, DCRI gets founded on the idea that we put smart statisticians and clinicians together, would ask the right questions, and then they developed whole fields. I mean, we don’t remember it now but logistic regression and Cox modeling, and C-statistics and all of that work, that was kind of developed, those seminal papers were written here. And their seminal applications to medicine were done here, the development of prediction tools and applications to human practice, here. And we often forget that that happened because that was a long time ago, for many of you, you weren’t even born.

But realities were, in my lifetime, that all happened and played out. And I think we should honor the fact that we both have made those huge contributions to medicine but then build on the shoulders of our founders. I think about how different this is, as Kerry would probably laugh at both the fact that the slides are not only in color and they’re beautiful, but there’s not one formula in anywhere that he gave a talk on methods. Okay… No grief. But it is interesting to sort of see how, in the part, things are the same and yet they are very different in certain other ways. I think on that last one, I’ll just pick up off of it to give you your first tough question.

We’ve been fighting this in medicine for a long time, or in, even, quantitative methods for a long time, the idea that at some point in time, we, the human part of this, become obsolete. In Kerry’s time, it was the idea, could you replace doctors? Because you could do formulas and prediction things, maybe we didn’t need doctors, and they actually had competitions and wrote beautiful papers about the predictive abilities of doctors versus the predictive abilities of quantitative formulas. And that was really a first thing. But now we’re getting back at those guys who are trying to replace us because rather than replacing the doctors this time and rather than replacing Neha as a reviewer of records, we’re gonna now replace Michael. (audience laughs) We now have machine learning and these people, we just push a button. Who knows my red button on my desk? You just push the button, easy button, answers come out and– – But Michael comes.

If you replace me… (everyone laughing) – So, the question is, why do we still need statisticians in the era of machine learning? – Well… (audience laughs) – Tough questions, good answers, come on. – What I think we need is a fusion of machine learning and biostatistics. I think both fields are very powerful and they can achieve a lot on their own. I don’t think you will be asking machine learning experts to design a trial for you. Or in the example of our diabetes treatments, we had really sophisticated machine learning algorithms but we went to Laine and Fan and the team and asked them, help us with the longitudinal matching to put it together. So, I think the real power is in combining these two fields, is, yes, taking machine learning algorithms, turns out that when you do clinical trials translation, it’s not standard epi standardization, but it is machine learning algorithm that performs the best.

So you can use the best of both fields and really fuse it together. We have a big data tool knowledge grant funded where we’re looking into expanding machine learning algorithm, which was working on the binary and counts data, now to handle time to event variables and continuous predictors. That’s the example where the two fields coming together, combined with the clinical expertise, offer the strongest potential. So I wouldn’t say it’s either/or, I think it’s together we can do much more. – Great. Before I turn it over to the audience for your first questions, I will give a question back to the audience. So, for my lunch… The question is, what is the association between the three speakers who gave you talks today? Myself, Michael, and Emily? Now two of the three are rock stars, we know that part, we’ve got that down.

(audience laughs) Anybody? – They’re educators. – Good choice, yeah. You could come up with lots of, maybe, combinations that kind of fit us. No, the actual answer is, when I became the Director of the DCRI, these were my first two hires. Just for a small world. So, not a bad start. (audience clapping) Okay, questions from the audience. Tough question number one. – So, this may be a simple way of phrasing it, but can machines ask the right questions? – That’s a really difficult question and very loaded one. (audience laughs) I’m looking at Ricardo in the back, he’s the machine learning expert. Actually, there is a fair amount of disagreement, so much so, that within machine learning, there is the supervised learning and unsupervised learning. In supervised learning, you get to ask the question. In unsupervised learning, you feed the data to the machine and it finds patterns, answers, questions, everything in one.

I’m probably more on the conservative side of things. I like to keep the ability to ask questions and have some controls rather than seeing things coming out without knowing what’s happening. So, but I would say even with that, you would need the content expertise to review what’s coming out and replicate it, which is part of science is replicating the experiment. So, in some ways, yeah, machine, to the extent that coming up with answers means asking questions, they can do it. Whether I want to use it or not, I would rather have the human being make that decision. – Okay, a real quick one from somebody tweeted in, YouTubed… How do you resolve the tension between tried and true clinical research analytics and the need to create new methods? How do we know what’s just new and shiny versus actually advancements? – I think, again, it’s a fusion and I would say it really depends on what you need to answer.

I think, best research questions for biostatistics, machine learning, any field, come from practical applications. There are a lot of great degree programs, when people are taking an existing methodology and tweaking it because the mathematics is just right to do the tweak, and they publish in the journal, sits there for a hundred years like a good wine, and somebody may or may not discover it and use it in the future. I think a more direct path is taking the questions that we’re facing everyday, like the issues we were doing in mediation analysis and there isn’t a method or do a time-dependent risk factors mediation analysis. And we need to answer that question to know which lipid marker is the most important, right? So, question at hand that needs to be answered, so, I think to answer that tension is, when you don’t have an existing method to an existing clinical research question, you need to go and develop and you need innovation.

If it already exists, don’t make little tweaks ’cause it’s, in some ways, it’s elegant when it might be a waste of time, apply it tried and true. – Frank. – So, a follow-up question to that. So, one of the go-to, sort of, phrase that we were having for problems in analysis, statisticians, when they wanna solve a problem, the methods aren’t there. They try to redo the method, but actually, the trick to data to science is, if you actually did a better job extracting or ascertaining the data in the first place, some of the old methods would work just fine. So, how do you get people to focus on the data? Don’t just jump to changing the methodology, actually, fundamentally change the substrate that you’re operating on, because a lot of the methods that exist today in signal detection wouldn’t exist if I had a numerator and denominator, I could just do it on a calculator.

So, how do you use this data science center to get statisticians and other people to focus on actually fixing the data as opposed to the mathematics? – Right, well, so, I think that’s another big and important part. Rob Califf likes to talk about that, our janitors, people who will really put the data in usable format. And I think that part and the methods development are really close.

I think bringing data to the question, part of it is that, knowing where the data will exist. But what Ben is doing in the EHR center, there are inherent limitations and acknowledging those, and understanding what’s happening in our different project we’ve been doing in CPM with outcomes and so on, is exactly that. We have the informed presence bias that we talk about, right? It’s, who are the young women between age 20 and 40 that interact with the health system? Well, they either have really serious chronic conditions or they’re pregnant, right? It’s not representative of the general population and trying to make it so, and then what are the reasons for the data that’s missing and so on? And so, I think it needs to be constant feedback. I don’t believe that people can clean the data in separation from the use cases for which it will be needed.

– Final comment? Kerry? – Well. I would like to compliment just you Eric and Michael on this presentation. This is a forward-looking conference today. And there are so many things happening in this area that it’s all very exciting. I wish I was 50 years younger and… (audience laughs) – Don’t we all. – Could really continue to be an integral part of what’s happening at the DCRI. One thing that I would comment on, there’s a lot of, sort of, discussion here in this presentation today about data. And finding the right data and so forth, and in being able to operate on the data using a variety of exciting, new methods that have come out in recent years, all of this machine learning activity, for example.

But I think we should not lose sight of the fact that a vitally important part of the research we do is designing the studies. And you’ve mentioned that, but it really deserves emphasis. A careful study design so that we start out with the questions we want to answer and then design the studies that will address those questions and enable us through the data, the quality of the data, and the quantity of the data, that will be generated in those research studies to be able to definitively answer those questions, and hopefully have a valuable impact on the patient care going forward. So, study design and the methods, and they’re evolving methods for designing studies, cluster randomized trials, for example, has been mentioned here. All of these new things are very important, I think, for us as the DCRI to be on the cutting edge of not only the analysis of data, but designing studies that will really answer the important questions.

– Well, thank you very much for a good, fitting end to this first conference. We look forward to seeing you next week. Bernard Gersh, right? – So, we have actually– – There we go. – So, we’ll have a visiting professor, Bernard Gersh from Mayo who will be talking to us about risk stratification in atrial fibrillation and that will be moderated by our own John Piccini, so I encourage you to get here and get here early so you can get lunch. And thank you so much for coming. (audience clapping) (upbeat music) .

As found on Youtube