eScience Seminar with Clark Gaylord (VTTI); Wednesday, November 6th, 4:00 PM, Sieg Hall, Room 233

Please join the eScience Institute Wednesday, November 6, 4:00 pm in *SIEG HALL Room 233*. Refreshments will be provided.

*Clark Gaylord (Virginia Tech Transportation Institute):*

Data-intensive Scientific Workflow and “Big Data” in Transportation Research

The Strategic Highway Research Program (SHRP2) Naturalistic Driving Study is a cornerstone of transportation safety research, led by the Virginia Tech Transportation Institute (VTTI). VTTI researchers innovated the naturalistic driving study methodology, and previous VTTI efforts, for example the National Highway Traffic Safety Administration (NHTSA) "100 Car" study, have made ground-breaking contributions to the field of transportation safety. The SHRP2 study observes over 3,000 participants in their normal day-to-day driving to understand how the driver interacts with and adapts to the vehicle, traffic environment, roadway characteristics, traffic control devices, and the environment. The study concludes data collection in December 2013, resulting in a repository of over 1.5PB of heterogeneous data, with expected useful life of over 20 years.

In this talk, we will discuss the data challenges of naturalistic driving studies and peta-scale data-intensive science. These data in various ways satisfy the "volume, velocity, and variety" we often associate with "Big Data", while at the same time being gathered in a rather "data collection hostile" environment. This presents some unique challenges not only of scale but data management and quality. The infrastructure to manage and analyze these data are as varied as the data, with peta-scale cluster file systems, parallel databases, and compute clusters. Mr Gaylord will describe various aspects of these challenges and how they are addressed, from VTTI's data center architecture to data models, as well as sharing some "lessons learned". The design of VTTI's scalable "agent-based" workflow engine will also be described in some detail.


Mr. Clark Gaylord is the chief information officer for the Virginia Tech Transportation Institute (VTTI) and director of VTTI's data center operations. He is the principal architect of VTTI's "Scientific Data Warehouse", integrating high-performance computing, parallel database, and peta-scale file system technologies to enable VTTI's data-intensive scientific research. Since 2008, Mr. Gaylord has led VTTI's strategic direction for information technology, data center infrastructure, "Big Data" data management and analysis.

Mr. Gaylord has been at Virginia Tech in various capacities for over twenty years and has held several roles of IT leadership. Prior to joining VTTI, he was IT Operations Lead with the Virginia Bioinformatics Institute and Lead Research Engineer with Virginia Tech's Telecommunications Auxiliary.

VTTI's was the recipient of CIO Magazine's "CIO 100" award in 2012 for the effective use of large scale data intensive and high performance computing infrastructure.