MeDICi is designed for building complex, high-performance analytical applications, typically comprising a pipeline of software components. Each component in the pipeline performs some analysis on incoming data and transfers the results to the next step(s) in the pipeline. The MeDICi framework enables software codes written in any languages to be wrapped as MeDICi components, which can then be simply plugged together using the core framework to create applications. The framework automatically takes care of difficult tasks such as multi-threading, message buffering, distributed communications, and load balancing. This makes it simple to integrate separate codes, which were not designed to work together, into complex applications that operate as a data analysis pipeline. MeDICi was designed to address many of the difficult aspects of building analytical applications, namely: Pipeline creation – MeDICi makes it easy to transfer data as it moves from one application to another, turning a set of distributed heterogeneous components into an integrated pipeline. Handling large data – MeDICi offers features that give pipeline designers choices on how to pass data through pipelines to maximize the performance of the applications. Component libraries – MeDICi enables analytical codes written in any language and running on any platform to be plugged into a MeDICi pipeline through the creation of a few lines of code and without changing the analysis code itself. The codes, in fact, are oblivious to the MeDICi framework, making it easy to combine existing components in an application.
MeDICi provides a faster, more efficient, and less expensive way to build analytical pipelines. The platform provides a scalable, flexible, and extensible development and run-time framework for ease of use in many application domains.
MeDICi is being applied to a variety of research projects at PNNL, including: * Bioinformatics Resource Manager: Data-intensive pipelines that analyze large biological data sets are executed and managed on a 32-node cluster using MeDICi. * Power Grid Failure Analysis: A high-performance MeDICi pipeline sends data from a Cray multithreaded supercomputer to a conventional supercomputer cluster for simulation. These results, on the order of 100’s of MBs, are returned by the pipeline to the Cray for detailed analysis. * Text Analysis: MeDICi integrates a set of components that perform advanced semantic text analysis into a high-throughput processing pipeline.
- Simple: The MeDICi framework automatically handles many of the complex architectural issues that must be addressed when building high-performance software pipelines.
- Robust: MeDICi is built on proven standards-based integration, workflow, and provenance technologies.
- Flexible: MeDICi supports multiple languages, communication protocols, and hardware platforms.
- Efficient: MeDICi improves performance by passing large data by reference.