U.S. Department of Energy

Pacific Northwest National Laboratory

Proteomics Pipeline

From DIC

Jump to: navigation, search

In the field of proteomics, the ever-growing numbers of data has always presented a challenge to the analysis of protein sequences. At PNNL, researchers came up with a new way to approach protein analysis – a method they call Next Generation Proteomics (NGP). With NGP, researchers have been able to solve the challenge of large datasets with the combination of several of the lab’s data intensive computing tools and capabilities.

This Proteomics Pipeline features tools that can manage, compress and organize data – with the end result of visualization for simple analysis. These tools come together to identify peptide features and relationships with one another and ultimately show the results in a hierarchical view that is easily navigated by scientists.

The tools/capabilities included in the Pipeline:

  • ScalaBLAST: runs protein comparisons on multiple processors at once – allowing for near real-time visualization of protein relationships.
  • Peptide and Feature identification: high performance computing is used to analyze large sets of mass spectra and detect the features and functions of certain proteins.
  • Visualization: after all the analysis and identification is done, data interaction is made possible with a hierarchical visualization of the results. Users can choose to manipulate and navigate the results to easily create hypotheses based on the information.
  • Middleware for Data Intensive Computing (MeDICi): To handle the data transfers between ScalaBLAST, Peptide and Feature ID and Visualization, the MeDICi framework creates a pipeline to quickly move data from one tool to another.
  • Smart Instrument Control (SIC): SIC facilitates intelligent data gathering by performing on-the-fly spectra analysis and pattern matching against known signatures. SIC uses Ion Mobility Spectrometry Time of Flight to gather more accurate measurements and Field Programmable Gate Array Acquisition to process the high-speed streaming data.
  • Data Compression: With the deluge of incoming data, this algorithm reduces data size while increasing speed for storing and extracting data – making datasets more accessible to users.

With the ability to easily compress and visualize large amounts of data, scientists will be able to easily study the relationships between proteins and the environment and create solutions to some of the most environmental issues today: biological warfare, disease outbreak, environmental clean-up.

Article Title: Proteomics Pipeline

Article Added: 2010/08/21

Category(s): Bioinformatics

Last Update: 13 July 2011 | Pacific Northwest National Laboratory