History of the Initiative
(2006 – 2010) The Data Intensive Computing Initiative (DICI), focused on the creation of tools and capabilities to address the data overload challenge in the fields of bioinformatics, energy and cyber analytics.
PNNL researchers and scientists examined this challenge and developed analytic algorithms, software architectures and hybrid hardware architectures, which feed into the mission of creating a world in which large amounts of data do not impede human understanding. Since the inception of these tools, researchers and analysts both in and out of PNNL have been able to move towards better scientific discoveries.
The Challenge: Big Data
Technology advances have made data storage relatively inexpensive and bandwidth abundant, resulting in voluminous datasets from modeling and simulation, high-throughput instruments, and system sensors. Such data stores exist in a diverse range of application domains, including scientific research (e.g., bioinformatics, climate change), national security (e.g., cyber security, ports-of-entry), environment (e.g., carbon management, subsurface science) and energy (e.g., power grid management). As technology advances, the list grows. This challenge of extracting valuable knowledge from massive datasets is made all the more daunting by multiple types of data, numerous sources, and various scales -- not to mention the ultimate goal of achieving it in near-real time. To dissect the problem, the science and technology drivers can be grouped into three primary categories:
- Managing the explosion of data
- Extracting knowledge from massive datasets
- Reducing data to facilitate human understanding and response.
Aggressive work to solve this big-data challenge through data intensive computing.
Data Intensive Computing
Data Intensive Computing (DIC) is concerned with capturing, managing, analyzing, and understanding data at volumes and rates that push the frontiers of current technologies. Addressing the demands of ever-growing data volume and complexity requires epochal advances in software, hardware, and algorithm development. Effective solution technologies must also scale to handle the amplified data rates and simultaneously accelerate timely, effective analysis results.