Please join the eScience Institute Wednesday, November 6, 4:00 pm in *SIEG HALL Room 233*. Refreshments will be provided.
*Clark Gaylord (Virginia Tech Transportation Institute):*
Data-intensive Scientific Workflow and “Big Data” in Transportation Research
The Strategic Highway Research Program (SHRP2) Naturalistic Driving Study
is a cornerstone of transportation safety research, led by the Virginia
Tech Transportation Institute (VTTI). VTTI researchers innovated the
naturalistic driving study methodology, and previous VTTI efforts, for
example the National Highway Traffic Safety Administration (NHTSA) "100
Car" study, have made ground-breaking contributions to the field of
transportation safety. The SHRP2 study observes over 3,000 participants in
their normal day-to-day driving to understand how the driver interacts with
and adapts to the vehicle, traffic environment, roadway characteristics,
traffic control devices, and the environment. The study concludes data
collection in December 2013, resulting in a repository of over 1.5PB of
heterogeneous data, with expected useful life of over 20 years.
In this talk, we will discuss the data challenges of naturalistic driving
studies and peta-scale data-intensive science. These data in various ways
satisfy the "volume, velocity, and variety" we often associate with "Big
Data", while at the same time being gathered in a rather "data collection
hostile" environment. This presents some unique challenges not only of
scale but data management and quality. The infrastructure to manage and
analyze these data are as varied as the data, with peta-scale cluster file
systems, parallel databases, and compute clusters. Mr Gaylord will describe
various aspects of these challenges and how they are addressed, from VTTI's
data center architecture to data models, as well as sharing some "lessons
learned". The design of VTTI's scalable "agent-based" workflow engine will
also be described in some detail.
Mr. Clark Gaylord is the chief information officer for the Virginia Tech
Transportation Institute (VTTI) and director of VTTI's data center
operations. He is the principal architect of VTTI's "Scientific Data
Warehouse", integrating high-performance computing, parallel database, and
peta-scale file system technologies to enable VTTI's data-intensive
scientific research. Since 2008, Mr. Gaylord has led VTTI's strategic
direction for information technology, data center infrastructure, "Big
Data" data management and analysis.
Mr. Gaylord has been at Virginia Tech in various capacities for over twenty
years and has held several roles of IT leadership. Prior to joining VTTI,
he was IT Operations Lead with the Virginia Bioinformatics Institute and
Lead Research Engineer with Virginia Tech's Telecommunications Auxiliary.
VTTI's was the recipient of CIO Magazine's "CIO 100" award in 2012 for the
effective use of large scale data intensive and high performance computing