Protein microarrays like the ProtoArray® (Life Technologies, Carlsbad, California, USA) are used for autoimmune antibody screening studies to discover biomarker panels. For ProtoArray data analysis the software Prospector (provided by the ProtoArray vendor) is often used, because it comes with an advantageous feature ranking approach (“M score”). Unfortunately, Prospector provides no capabilities regarding multivariate feature selection, classification, manufacturing batch normalization and computational biomarker candidate validation.

Therefore, we have adopted Prospector’s M score approach and implemented a new R package called Protein Array Analyzer (PAA) that provides these features and a complete data analysis pipeline. Besides ProtoArray data, PAA is also suitable for all other single color microarray data that comes in GenePix® results (gpr) file format. After optional data pre-processing and M score-based feature pre-selection a multivariate feature selection is performed. For this purpose, a backwards elimination (wrapper) approach (“gene shaving” using random forests for feature sub-group evaluation) has been implemented. To validate the performance of the selected protein features, a test set classification is performed. Furthermore, different plots and results files can be obtained to outline the data analysis results.

We propose the new R package PAA for protein microarray data analysis. PAA has been used to successfully analyse several different ProtoArray data sets (e.g. “Parkinson”, “Alzheimer”, “Amyotrophic Lateral Sclerosis”). Thereby, its suitability for protein microarray data analysis has been shown. Meanwhile PAA is the default tool for protein microarray analysis at our facility. The first publicly available version will be published in the next months.


Link to Bioconductor

pdf   poster (Proteomic Forum 2013, Berlin, Germany) (1.4 MB)


PAA feature selection workflow

PAA new workflow