U.S. Department of Energy

Pacific Northwest National Laboratory

The Initiative

From DIC

Jump to: navigation, search
The Data Intensive Computing Initiative (DICI), led by Deb Gracio from 2006-2010, focused on the creation of tools and capabilities to address the data overload challenge in the fields of bioinformatics, energy and cyber analytics. Through analytic algorithms, software architectures and hybrid hardware architectures, DICI provided more than ten new means that feed into the mission of creating a world in which large amounts of data don’t impede human understanding. Since these tools were built, researchers and analysts both in and out of the laboratory have been able to move towards better scientific discoveries.

Contents

History of the Initiative

(2006 – 2010) The Data Intensive Computing Initiative (DICI), focused on the creation of tools and capabilities to address the data overload challenge in the fields of bioinformatics, energy and cyber analytics.

PNNL researchers and scientists examined this challenge and developed analytic algorithms, software architectures and hybrid hardware architectures, which feed into the mission of creating a world in which large amounts of data do not impede human understanding. Since the inception of these tools, researchers and analysts both in and out of PNNL have been able to move towards better scientific discoveries.

The Challenge: Big Data

The big-data challenge: transform terabytes and petabytes of streaming data into information that enables vital discoveries and timely decisions.

Technology advances have made data storage relatively inexpensive and bandwidth abundant, resulting in voluminous datasets from modeling and simulation, high-throughput instruments, and system sensors. Such data stores exist in a diverse range of application domains, including scientific research (e.g., bioinformatics, climate change), national security (e.g., cyber security, ports-of-entry), environment (e.g., carbon management, subsurface science) and energy (e.g., power grid management). As technology advances, the list grows. This challenge of extracting valuable knowledge from massive datasets is made all the more daunting by multiple types of data, numerous sources, and various scales -- not to mention the ultimate goal of achieving it in near-real time. To dissect the problem, the science and technology drivers can be grouped into three primary categories:

  1. Managing the explosion of data
  2. Extracting knowledge from massive datasets
  3. Reducing data to facilitate human understanding and response.

Transformational Solution

Aggressive work to solve this big-data challenge through data intensive computing.

Data Intensive Computing

Data Intensive Computing (DIC) is concerned with capturing, managing, analyzing, and understanding data at volumes and rates that push the frontiers of current technologies. Addressing the demands of ever-growing data volume and complexity requires epochal advances in software, hardware, and algorithm development. Effective solution technologies must also scale to handle the amplified data rates and simultaneously accelerate timely, effective analysis results.

Last Update: 13 July 2011 | Pacific Northwest National Laboratory