U.S. Department of Energy

Pacific Northwest National Laboratory


From DIC

Jump to: navigation, search

The primary challenge facing Data Intensive Computing (DIC) research is being able to rapidly extract valuable knowledge from massive datasets. To help facilitate data intensive research, PNNL has purchased a Netezza TwinFin Data Warehouse Appliance to address the challenge of rapid analysis on large datasets. Incorporating the Netezza TwinFin to the DIC instrumentation portfolio provides researchers and collaborators the ability to perform complex queries on large datasets in a parallel database resulting in significant performance improvement. The addition of the Netezza at PNNL allows researchers to conduct these queries at levels of more than two orders of magnitude more than previously possible. The goal is to further scientific discovery and near real-time extraction for predicting outcomes, identifying trends, and enabling time-critical decision-making.

The Netezza TwinFin is a purpose-built, standards-based data appliance that architecturally integrates database, server and storage into a single, easy to manage system. “This appliance allows us to ask complex questions about our data that would be very difficult in a typical relational database environment” says PNNL scientist Bryan Olsen. The Netezza TwinFin can be applied to the suite of national problems that PNNL researchers are addressing in energy, national security, and the environment.

We have applied this database technology to the following domains:

Cyber Analytics

  • Traffic Circle – Implemented a version of the network flow visualization software to use the Netezza database. This implementation allows users to visualize and interact with larger datasets without the impact of long wait times between queries. The database improves overall analytical process by allowing more interactive and ad hoc analysis with large datasets.
  • CLIQUE – Configured this application to use Netezza to supply the data for the user interface. Using Netezza provides much faster load times using much larger datasets. Allows analysts to identify behavioral anomalies based on group historical behavior.
  • TRIAD Census – Implemented a JDBC database connection on the Cray XMT allowing researchers the ability to perform complex triadic analysis on network traffic in a flexible dynamic fashion. Researchers are able to perform more ad hoc analysis on the data when it is stored in a database. Having a high performance parallel database is a nice compliment to the Cray XMT.
  • OLAP – Implemented an Online Analytical Processing (OLAP) data cube using Netezza and a network traffic dataset. OLAP cubes provide a summary view of an entire dataset with the ability to interact and drill down to details within the data. Cubes provide a dynamic ad hoc analytic environment.

Power Grid

  • Compiled custom R algorithm on the Netezza. Allows the ability for researchers to apply their algorithms against datasets in a flexible ad hoc fashion. Having the R code compiled in the database allows for parallel execution resulting in maximum performance. Netezza R language support gives researchers the capability to write their own algorithms in a familiar environment with the added benefit of being able to compile and run them in a parallel database environment.

The Netezza TwinFin data warehouse appliance can be applied to the suite of national problems that PNNL researchers are addressing in bioinformatics, energy, national security, and the environment.

Article Title: Netezza Database Advances the Speed of PNNL Scientific Discovery

Article Added: 2010/08/18

Category(s): Hybrid Architectures, Cyber Security

Last Update: 13 July 2011 | Pacific Northwest National Laboratory