Yale Day of Data Poster Session
September 18, 2015
A Machine Learning Approach to Post-market Surveillance of Medical Devices Jonathan Bates, Yale University Post-market surveillance is a collection of processes and activities used by product manufacturers and regulators, such as the U.S. Food and Drug Administration (FDA) to monitor the safety and effectiveness of medical devices once they are available for use “on the market”. These activities are designed to generate information to identify poorly performing devices and other safety problems, accurately characterize real-world device performance and clinical outcomes, and facilitate the development of new devices, or new uses for existing devices. Typically, a device is monitored by comparing adverse events in the exposed population to a matched unexposed population. This research considers the use of machine learning, in particular, clustering algorithms, to match patients at baseline in a framework for the post-market surveillance of medical devices. |
||
A Nonlinear Filter for Markov Chains and its Effect on Diffusion Maps Stefan Steinerberger, Yale University Diffusion maps are a modern mathematical tool that helps to find structure in large data sets - we present a new filtering technique that is based on the assumption that errors in the data are intrinsically random to isolate and filter errors and thus boost the efficiency of diffusion maps. Applications include data sets from medicine (the Cleveland Heart Disease Data set and the Wisconsin Breast Cancer Data set) and engineering (the Ionosphere data set). |
||
Crowdsourcing Global Wastewater Data Don Mosteller, Environmental Performance Index, Yale Center for Environmental Law & Policy No time to waste: Crowdsourcing global wastewater treatment data Worldwide, over 80 percent of wastewater is discharged into water bodies without undergoing treatment, severely impairing human well-being and ecosystem vitality along the way. National performance on wastewater treatment is difficult to quantify and is poorly understood due to a lack of common definitions, poor data collection standards, and limited historical data. To address this, the Yale Environmental Performance Index (EPI), a research group that produces a biennial ranking of country-level environmental performance, developed a first-of-its kind national wastewater treatment indicator.[1] The indicator assesses wastewater treatment performance for 183 countries, but there are still data gaps and quality issues to address. The Yale EPI is looking to refine and improve its database through a first-of-its-kind innovative effort to crowdsource updates and feedback using an interactive map of wastewater treatment performance.[2] The crowdsourcing effort is targeted at water experts and decision-makers around the world and aims to:
Yale EPI’s wastewater treatment indicator can help improve understanding of the topic, and refine the signal sent to policymakers about proper management. Keywords (wastewater, crowdsourcing, map, treatment, ecosystem, indicator, EPI, national, global, database) [1] Malik, Omar A., et al. 2015. “A global indicator of wastewater treatment to inform the Sustainable Development Goals (SDGs).” Environmental Science & Policy 48: 172-185. [2] Torres Quintanilla, Diego, Peter Hirsch, Samuel Cohen. Wastewater Treatment Map. Environmental Performance Index, Yale Center for Environmental Law & Policy, 6 July 2015. Web. 26 Aug. 2015. . |
||
Stephanie M. Noble, Yale University OBJECTIVE: Neurosurgery is potentially curative in chronic epilepsy but can only be offered to patients if the surgical risk to language is known. Clinical functional magnetic resonance imaging (fMRI) is an ideal, noninvasive method for localizing language cortex yet remains to be validated for this purpose. We have recently presented a novel method for localizing language cortex. Here we present a preliminary evaluation of this method’s validity. We hypothesized language regions identified using this novel method would demonstrate stronger functional connectivity than randomly generated set of proximal networks. METHOD: fMRI data were collected from sixteen temporal lobe patients (12 left) being evaluated for epilepsy surgery at UCLA (mean age 38.9 [sd 11.4]; 6 female; per Wada 14 left language dominant, 1 right, 1 mixed). Language maps were generated using a recently standardized method relying on a conjunction of language tasks (e.g., visual object naming; auditory naming; reading) to identify known language regions (Broca’s area; inferior and superior Wernicke’s Areas; Angular gyrus; Basal Temporal Language Area; Exner’s Area; and Supplementary Speech Area). With activations defined as network nodes, mean network connectivity was compared via permutation tests with alternate (i) fully random and (ii) proximal random networks. Mean network connectivity was determined in independently-acquired motor fMRI datasets (9 foot, 16 hand, 14 tongue). FINDINGS: 77% (30/39) of clinician-derived language networks exhibited mean connectivity greater than fully random networks (p<0.05). Similarly, 69% (27/39) of clinician-derived language networks exhibited mean connectivity greater than proximal random networks (p<0.05). Further analysis of networks not passing the permutation test suggests that low connectivity of non-valid networks may be driven not by low connectivity across all nodes, but by individual nodes that may not actually possess membership within the network. CONCLUSIONS: This study provides preliminary validity for a novel, clinician-based approach to mapping language cortex pre-surgery. This complements our recent work showing this method is reliable, and supports a proposed study comparing fMRI language maps using this technique with the results of direct stimulation mapping. |
||
K-mer Analysis on Developmental and Housekeeping Enhancer Peaks Yunsi Yang, Yale University The regulation of gene expression involves interaction between transcriptional enhancers and core promoters. However, the separation between developmental and housekeeping gene regulation remains unknown. Here, we present a method to detect if different core promoters exhibit specificity to certain enhancers within massively parallel assays for enhancer detection. We use k-mers of various length (3-8bp) as sequence features and compare k-mer frequencies between developmental and housekeeping enhancers. This method shows promoter specificity of enhancers in D. melanogaster. |
||
Network Analysis of the Sardex Community Currency Georgios Iosifidis, Deparment of Electrical Engineering, and YINS, Yale University We present a transaction dataset and preliminary analysis results about Sardex, a complementary currency (CC) in Sardinia, Italy. Sardex is currently considered one of the most successful CCs in Europe, as it grows continuously in terms of transactions’ volume and membership, and has been already replicated in 8 other regions in Italy. We model Sardex as a transaction network and study its basic properties. |
||
Using data to guide strategy: enhancing donor engagement at Yale University Deepti Pradhan, Yale University With an increasing number of avenues for philanthropy available to charitably inclined citizens, university offices of development are thinking of new means to identify and engage donors for consistent giving. In order to establish proof of principle for a new approach, we have analyzed large amounts of giving data captured by the various entities at Yale. We will present the development of predictive models for two types of giving to Yale. One model estimates the likelihood of donating to Yale through selected types of charitable contributions, including charitable gift annuities; a second model estimates alumni participation in 50th reunion gift campaigns. Data identification, preparation, curation and analysis for these models required input and collaboration from multiple cohorts across the University. The results from the models illustrate the complexities of incorporating statistical analysis into pathways for giving that have traditionally relied on personal connections to identify and engage alumni and affiliates. Yale Development’s predictive analysis efforts differ significantly from big data analyses undertaken in typical research projects in pharma and other sectors, yet share the common goal of informing future strategies. Our analyses will help in understanding current trends in higher education fundraising; the scope of information collected and maintained by Yale’s Office of Development; how that data is used and protected; and some of the characteristics unique to Yale’s best fundraising prospects. |