December 2009

Nathaniel Beagley

Nathaniel Beagley, a research scientist at Pacific Northwest National Laboratory (PNNL), delivered the presentation, “Increasing the Efficiency of Data Storage and Analysis Using Indexed Compression,” at the 5th Annual IEEE International Conference on e-Science held on December 9-11, 2009 at Oxford University in Oxford, England. As part of his presentation, Beagley addressed the problems associated with massive data size and the inability to capture the full resolution of data from ion mobility mass (IMS) spectrometry instrumentation. To counter this, Beagley and his PNNL research team created an indexed compression storage algorithm and methodology. The algorithm offers size and speed benefits from conventional compression data approaches, while also allowing for the extraction of a specific segment of uncompressed data without having to uncompress the entire data volume. In less than one second, 240 MB of data can be captured, compressed, and stored to disk. Overall, the approach enables up to a 60 percent reduction in data storage size.

Currently, the algorithm is in use as part of the Unified Ion Mobility Frame (UIMF) data format, the data storage format for PNNL's Next-Generation Proteomics Platform. With this platform PNNL researchers are increasing throughput by a factor of at least 10 without sacrificing sensitivity. According to Beagley, the potential application areas for the algorithm are far reaching as many other data sets have similar characteristics to IMS data.

The PNNL research team supporting Beagley in the development of the compression algorithm and methodology includes: Chad Scherrer, Yan Shi, Brian Clowers, Anuj Shah, William Danielson, and Gordon A. Anderson. Operational testing of the algorithm was performed at the William R. Wiley Environmental Molecular Sciences Laboratory, a U.S. Department of Energy national scientific user facility located at PNNL in Richland, Washington.

