U.S. Department of Energy

Pacific Northwest National Laboratory


From DIC

Jump to: navigation, search


More than ever, technological advancements are producing massive amounts of data from high-throughput instrumentation, sophisticated system sensors, and modeling and simulation programs. This deluge of complex, high-volume data burdens scientists and analysts working to make sense of the information and its relationship to intricate problems. Building analytical software systems that can process this data in a timely fashion presents challenges in many ways, including: * Capturing and integrating high-throughput data from its source. * Integrating multiple algorithms for fusing and analyzing data in real time. * Managing diverse data formats and distributed data sources. * Integrating distributed, heterogeneous software and hardware systems into a single application. In our data-intensive research program, scientists at Pacific Northwest National Laboratory (PNNL) are working to create new technologies to solve these challenges. At the core of these emerging technologies is the Middleware for Data-Intensive Computing (MeDICi) Integration Framework, an integration middleware platform designed to solve data analysis and processing needs of scientists across many domains, and in a fashion that is scalable, easily modified, and robust to multiple languages, protocols, and hardware platforms.


MeDICi is designed for building complex, high-performance analytical applications, typically comprising a pipeline of software components. Each component in the pipeline performs some analysis on incoming data and transfers the results to the next step(s) in the pipeline. The MeDICi framework enables software codes written in any languages to be wrapped as MeDICi components, which can then be simply plugged together using the core framework to create applications. The framework automatically takes care of difficult tasks such as multi-threading, message buffering, distributed communications, and load balancing. This makes it simple to integrate separate codes, which were not designed to work together, into complex applications that operate as a data analysis pipeline. MeDICi was designed to address many of the difficult aspects of building analytical applications, namely: Pipeline creation – MeDICi makes it easy to transfer data as it moves from one application to another, turning a set of distributed heterogeneous components into an integrated pipeline. Handling large data – MeDICi offers features that give pipeline designers choices on how to pass data through pipelines to maximize the performance of the applications. Component libraries – MeDICi enables analytical codes written in any language and running on any platform to be plugged into a MeDICi pipeline through the creation of a few lines of code and without changing the analysis code itself. The codes, in fact, are oblivious to the MeDICi framework, making it easy to combine existing components in an application.


MeDICi provides a faster, more efficient, and less expensive way to build analytical pipelines. The platform provides a scalable, flexible, and extensible development and run-time framework for ease of use in many application domains.


MeDICi is being applied to a variety of research projects at PNNL, including: * Bioinformatics Resource Manager: Data-intensive pipelines that analyze large biological data sets are executed and managed on a 32-node cluster using MeDICi. * Power Grid Failure Analysis: A high-performance MeDICi pipeline sends data from a Cray multithreaded supercomputer to a conventional supercomputer cluster for simulation. These results, on the order of 100’s of MBs, are returned by the pipeline to the Cray for detailed analysis. * Text Analysis: MeDICi integrates a set of components that perform advanced semantic text analysis into a high-throughput processing pipeline.

MeDICi Features:

  1. Simple: The MeDICi framework automatically handles many of the complex architectural issues that must be addressed when building high-performance software pipelines.
  2. Robust: MeDICi is built on proven standards-based integration, workflow, and provenance technologies.
  3. Flexible: MeDICi supports multiple languages, communication protocols, and hardware platforms.
  4. Efficient: MeDICi improves performance by passing large data by reference.

Article Title: MeDICi

Article Added: 2010/09/08

Category(s): Science, Software Architectures

Last Update: 13 July 2011 | Pacific Northwest National Laboratory