Report for: Sequence Based Prediction of DNA-Binding Proteins Based on Hybrid Feature Selection Using Random Forest and Gaussian Naïve Bayes

Title	Sequence Based Prediction of DNA-Binding Proteins Based on Hybrid Feature Selection Using Random Forest and Gaussian Naïve Bayes
Published in	PLOS ONE, January 2014
DOI	10.1371/journal.pone.0086703
Pubmed ID	24475169
Authors	Wangchao Lou, Xiaoqing Wang, Fan Chen, Yixiao Chen, Bo Jiang, Hua Zhang
Abstract	Developing an efficient method for determination of the DNA-binding proteins, due to their vital roles in gene regulation, is becoming highly desired since it would be invaluable to advance our understanding of protein functions. In this study, we proposed a new method for the prediction of the DNA-binding proteins, by performing the feature rank using random forest and the wrapper-based feature selection using forward best-first search strategy. The features comprise information from primary sequence, predicted secondary structure, predicted relative solvent accessibility, and position specific scoring matrix. The proposed method, called DBPPred, used Gaussian naïve Bayes as the underlying classifier since it outperformed five other classifiers, including decision tree, logistic regression, k-nearest neighbor, support vector machine with polynomial kernel, and support vector machine with radial basis function. As a result, the proposed DBPPred yields the highest average accuracy of 0.791 and average MCC of 0.583 according to the five-fold cross validation with ten runs on the training benchmark dataset PDB594. Subsequently, blind tests on the independent dataset PDB186 by the proposed model trained on the entire PDB594 dataset and by other five existing methods (including iDNA-Prot, DNA-Prot, DNAbinder, DNABIND and DBD-Threader) were performed, resulting in that the proposed DBPPred yielded the highest accuracy of 0.769, MCC of 0.538, and AUC of 0.790. The independent tests performed by the proposed DBPPred on completely a large non-DNA binding protein dataset and two RNA binding protein datasets also showed improved or comparable quality when compared with the relevant prediction methods. Moreover, we observed that majority of the selected features by the proposed method are statistically significantly different between the mean feature values of the DNA-binding and the non DNA-binding proteins. All of the experimental results indicate that the proposed DBPPred can be an alternative perspective predictor for large-scale determination of DNA-binding proteins.

View on publisher site Alert me about new mentions

X Demographics

The data shown below were collected from the profile of 1 X user who shared this research output. Click here to find out more about how the information was compiled.

Geographical breakdown

Country	Count	As %
United States	1	100%

Demographic breakdown

Type	Count	As %
Members of the public	1	100%

Mendeley readers

The data shown below were compiled from readership statistics for 114 Mendeley readers of this research output. Click here to see the associated Mendeley record.

Geographical breakdown

Country	Count	As %
United Kingdom	2	2%
Israel	1	<1%
India	1	<1%
Argentina	1	<1%
Unknown	109	96%

Demographic breakdown

Readers by professional status	Count	As %
Student > Ph. D. Student	20	18%
Student > Master	16	14%
Student > Bachelor	14	12%
Researcher	13	11%
Professor > Associate Professor	6	5%
Other	15	13%
Unknown	30	26%

Readers by discipline	Count	As %
Computer Science	30	26%
Agricultural and Biological Sciences	12	11%
Biochemistry, Genetics and Molecular Biology	12	11%
Engineering	8	7%
Medicine and Dentistry	5	4%
Other	12	11%
Unknown	35	31%

PLOS

Article Metrics

Sequence Based Prediction of DNA-Binding Proteins Based on Hybrid Feature Selection Using Random Forest and Gaussian Naïve Bayes

Mentioned by

Citations

Readers on

X Demographics

Geographical breakdown

Demographic breakdown

Mendeley readers

Geographical breakdown

Demographic breakdown