Data Science in Action Data Science in Action Data science is a multi-disciplinary field that uses scientific methods, processes, algorithms and systems to extract knowledge and insights from structured and unstructured data. Data is perhaps the most important asset of any organization as it allows business leaders and officials to make decisions based on facts, statistics and trends. As a result of the steep growth in the scope of data for businesses and government, data science has become more important than ever. Indeed, well-trained data scientists are in demand in the job-market more than ever. The Data Science program at Marshall University prepares students for the job market with practical learning and hands-on instruction. Both undergraduate and graduate students in the program are given the opportunity to work on projects and perform research, in conjunction with faculty, on real-world data science problems. These projects touch on and provide students with technical expertise in computational modeling, data collection and integration, data storage and retrieval, data processing, analytical techniques and visualization. Dr. Haroon Malik, an Assistant Professor in the Department of Computer Science and Electrical Engineering as well as a Distinguished Artists and Scholars Awardee, is an expert in the area of the performance analytics of Ultra-Large-Scale Systems (ULSS). Working with industrial researchers in BlackBerry, Google, Amazon and Facebook, he has analyzed terabytes of high-velocity data generated by large software projects at run time for performance analysis. Techniques and tools generated by Dr. Malik have been used on a daily basis to help practitioners of ULSS detect performance deviations among load tests. He has published more than 70 highly cited papers and received numerous grants of more than $600,000, including a recent grant from NASA. He has also worked closely with students who have received various scholarships and published papers, including with a recent student that received the “Best Paper Award” at the International Conference on Emerging Data and Industry 4.0. Dr. Malik is currently working with data science students researching intelligent transportation systems and other projects described below: Mining YouTube Data. The videos on YouTube have become a treasure of data. However, getting access to the immense and massive YouTube data is a challenge. Tian Zeifeng researches on this hot YouTube analytics to find a methodology to systematically and continuously mine and store metadata of over billion of YouTube videos. Please find more at: https://mds.marshall.edu/etd/1129/ Flood Detection Using a Wireless Internet of Things (IoT) Network. Funded by NASA Undergraduate Research Grant, a team of three students, Patrick Shinn, Brandon Duke, and Christopher Roach implemented an inexpensive yet effective solution of flood detection system using IoT device. The system provides the real-time data on the current state of a body of a rising body of water and serves as an early warning system for flooding to save lives and property. The system supports 6 sensor nodes per base station 2 types of sensor node and it provides a web application to view live data. Mining Safety Analysis. Coal mining activity carries an inherent risk in its works. In West Virginia, these risks have produced countless accidents resulting in lost lives. A master’s thesis student, Olivia Milam, has been awarded a NASA graduate fellowship for her work on MSHA’s Big-data to uncover new patterns and rules regarding mine safety and understand the relational behind the pattern of violations. Her work test, whether previously recorded violations in a given mine, means that mine will be more or less prone to serious accidents in the near future. What people complain about drone apps? Kanimozhi Kalaichelvan, a masters’ thesis student, conducted a large-scale empirical study of reviewing UAV apps available on the Google Play Store Platform. The study consisted of 1,825 UAV mobile apps, across twenty-five categories, with 162,250 reviews. Please find more details at https://mds.marshall.edu/etd/1262/ Dr. Sanghoon Lee, an Assistant Professor in the Department of Computer Sciences and Electrical Engineering, was a Brain & Behavior fellow at the Neuroscience Institute and was a Second Century Initiative Presidential Fellow at Georgia State University. Dr. Lee was also recently highlighted as an Arctic Code Vault Contributor on GitHub and has served on an NSF grant panel. He is currently a member of the editorial board on Computational Biology and Bioinformatics Journal as well as a reviewer board on Machine Learning and Knowledge Extraction Journal and the Journal of Imaging. Dr. Lee is a member of American Association for Cancer Research and Digital Pathology Association, a TPC member of ACM Research in Adaptive and Convergent Systems and a TPC member of the IEEE Annual Computing and Communication Workshop and Conference. He has served as a reviewer for numerous conferences and journals such as for the Journal of Imaging, Symmetry, Nature Scientific Report, ACM, and IEEE. Dr. Lee’s primary research interest is in analyzing quantitative data to find patterns or significant evidence in the real world. He has worked with students researching and investigating different types of data such as text, image, and video data in a multidisciplinary sphere, including in computer science and biomedical informatics. Dr. Lee is currently working with undergraduate and graduate students researching a robust quantitative method to see how tumor cells interact with the tumor microenvironment, how the interaction may lead to tumor progression and whether there are techniques that can exploited as a strategy for cancer therapy. As part of the research, Dr. Lee and his students in the Data Science program are developing a machine learning software tool to predict cancer regions in whole slide images and investigating how quantitative phenotypic information from digital pathology images can contribute to the field of biomedical informatics with large and complex datasets.