-
<正>Bioinformatics and computational biology research is fundamental to our understanding of complex biological systems, impacting the science and technology of fields ranging from agricultural and environmental sciences to pharmaceutical and medical sciences. It is one of the fastest developing research fields in the last two decades. High throughput biological data that are used to provide information at molecular and genetic level are rapidly generated. Almost all
2012年06期 v.17 607-608页 [查看摘要][在线阅读][下载 78K] [下载次数:30 ] |[网刊下载次数:0 ] |[引用频次:0 ] |[阅读次数:0 ] Identifying hierarchically related entities is a critical step towards constructing bio-networks in the field of biomedical text mining. To this end, we adopt a mapping-based approach by first mapping bio-entities to terms in an established ontology Medical Subject Headings (MeSH). We then utilize the hierarchical relationships available in MeSH to recognize hierarchically related entities. Specifically, we present two approaches to map biomedical entities identified using the Unified Medical Language System (UMLS) Metathesaurus to MeSH terms. The first approach utilizes a special feature provided by the MetaMap algorithm, whereas the other employs approximate phrase-based match to directly map entities to MeSH terms. These two approaches deliver comparable results with an accuracy of 72% and 75%, respectively, based on two evaluation datasets. A thorough error analysis demonstrates that these two approaches result in only around 10% mutual errors, indicating the complementary nature of these two approaches.
2012年06期 v.17 609-618页 [查看摘要][在线阅读][下载 176K] [下载次数:39 ] |[网刊下载次数:0 ] |[引用频次:0 ] |[阅读次数:0 ] - Piyaphol Phoungphol;
Imbalanced data is a common and serious problem in many biomedical classification tasks. It causes a bias on the training of classifiers and results in lower accuracy of minority classes prediction. This problem has attracted a lot of research interests in the past decade. Unfortunately, most research efforts only concentrate on 2-class problems. In this paper, we study a new method of formulating a multiclass Support Vector Machine (SVM) problem for imbalanced biomedical data to improve the classification performance. The proposed method applies cost-sensitive approach and ramp loss function to the Crammer and Singer multiclass SVM formulation. Experimental results on multiple biomedical datasets show that the proposed solution can effectively cure the problem when the datasets are noisy and highly imbalanced.
2012年06期 v.17 619-628页 [查看摘要][在线阅读][下载 271K] [下载次数:48 ] |[网刊下载次数:0 ] |[引用频次:16 ] |[阅读次数:0 ] - Ken D. Nguyen;
There are many web-based multiple sequence alignment services accessible around the world. However, many researchers working on biological sequence analysis still struggle with inefficient, unfriendly user interface, and limited capability multiple sequence alignment software. In this study, we provide a comprehensive survey of regional and continental facilities that provide web-based alignment services. We also analyze and identify much needed services that are not available through these existing service providers. We then implement a web-based model to address these needs. From that perspective, our web-based multiple sequence alignment server, SeqAna, provides a unique set of services that none of these studied facilities have. For example, SeqAna provides a multiple sequence alignment scoring and ranking service. This service, the only of its kind, allows SeqAna's users to perform multiple sequence alignment with several alignment tools and rank the results of these alignments in the order of quality. With this service, SeqAna's users will be able to identify which alignment tools are more appropriate for their specific set of sequences. In addition, SeqAna's users can customize a small alignment sample as a reference for SeqAna to automatically identify the best tool to align their large set of sequences.
2012年06期 v.17 629-637页 [查看摘要][在线阅读][下载 342K] [下载次数:36 ] |[网刊下载次数:0 ] |[引用频次:3 ] |[阅读次数:0 ] - Dale Schuurmans;
Protein phosphorylation/dephosphorylation is the central mechanism of post-translational modification which regulates cellular responses and phenotypes. Due to the efficiency and resource constraints of the in vivo methods for identifying phosphorylation sites, there is a strong motivation to computationally predict potential phosphorylation sites. In this work, we propose to use a unique set of features to represent the peptides surrounding the amino acid sites of interest and use feature selection support vector machine to predict whether the serine/threonine sites are potentially phosphorylable, as well as selecting important features that may lead to phosphorylation. Experimental results indicate that the new features and the prediction method can more effectively predict protein phosphorylation sites than the existing state of the art methods. The features selected by our prediction model provide biological insights to the in vivo phosphorylation.
2012年06期 v.17 638-644页 [查看摘要][在线阅读][下载 805K] [下载次数:42 ] |[网刊下载次数:0 ] |[引用频次:4 ] |[阅读次数:0 ] - Wooyoung Kim;
The prediction of essential proteins, the minimal set required for a living cell to support cellular life, is an important task to understand the cellular processes of an organism. Fast progress in high-throughput technologies and the production of large amounts of data enable the discovery of essential proteins at the system level by analyzing Protein-Protein Interaction (PPI) networks, and replacing biological or chemical experiments. Furthermore, additional gene-level annotation information, such as Gene Ontology (GO) terms, helps to detect essential proteins with higher accuracy. Various centrality algorithms have been used to determine essential proteins in a PPI network, and, recently motif centrality GO, which is based on network motifs and GO terms, works best in detecting essential proteins in a Baker's yeast Saccharomyces cerevisiae PPI network, compared to other centrality algorithms. However, each centrality algorithm contributes to the detection of essential proteins with different properties, which makes the integration of them a logical next step. In this paper, we construct a new feature space, named CENT-ING-GO consisting of various centrality measures and GO terms, and provide a computational approach to predict essential proteins with various machine learning techniques. The experimental results show that CENT-ING-GO feature space improves performance over the INT-GO feature space in previous work by Acencio and Lemke in 2009. We also demonstrate that pruning a PPI with informative GO terms can improve the prediction performance further.
2012年06期 v.17 645-658页 [查看摘要][在线阅读][下载 1004K] [下载次数:92 ] |[网刊下载次数:0 ] |[引用频次:26 ] |[阅读次数:0 ] -
A gene selection algorithm was developed using Multiple Principal Component Analysis with Sparsity (MSPCA). The MSPCA algorithm is used to analyze normal and disease gene expression samples and to set these component loadings to zero if they are smaller than a threshold for sparse solutions. Next, genes with zero loadings across all samples (both normal and disease) are removed before extracting feature genes. Feature genes are genes that contribute differentially to variations in normal and disease samples and, thus, can be used for classification. The MSPCA is applied to three microarray datasets to select feature genes with a linear support vector machine to evaluate its performance. This method is compared with several previous gene selection results to show that this MSPCA gene selection algorithm has good classification accuracy and model stability.
2012年06期 v.17 659-665页 [查看摘要][在线阅读][下载 212K] [下载次数:30 ] |[网刊下载次数:0 ] |[引用频次:0 ] |[阅读次数:0 ] There have been many skewed cancer gene expression datasets in the post-genomic era. Extraction of differential expression genes or construction of decision rules using these skewed datasets by traditional algorithms will seriously underestimate the performance of the minority class, leading to inaccurate diagnosis in clinical trails. This paper presents a skewed gene selection algorithm that introduces a weighted metric into the gene selection procedure. The extracted genes are paired as decision rules to distinguish both classes, with these decision rules then integrated into an ensemble learning framework by majority voting to recognize test examples; thus avoiding tedious data normalization and classifier construction. The mining and integrating of a few reliable decision rules gave higher or at least comparable classification performance than many traditional class imbalance learning algorithms on four benchmark imbalanced cancer gene expression datasets.
2012年06期 v.17 666-673页 [查看摘要][在线阅读][下载 616K] [下载次数:26 ] |[网刊下载次数:0 ] |[引用频次:10 ] |[阅读次数:0 ] -
Evidence shows that biological systems are composed of separable functional modules. Identifying protein complexes is essential for understanding the principles of cellular functions. Many methods have been proposed to mine protein complexes from protein-protein interaction networks. However, the performances of these algorithms are not good enough since the protein-protein interactions detected from experiments are not complete and have noise. This paper presents an analysis of the topological properties of protein complexes to show that although proteins from the same complex are more highly connected than proteins from different complexes, many protein complexes are not very dense (density 0.8). A method is then given to mine protein complexes that are relatively dense (density 0.4). In the first step, a topology property is used to identify proteins that are probably in a same complex. Then, a possible boundary is calculated based on a minimum vertex cut for the protein complex. The final complex is formed by the proteins within the boundary. The method is validated on a yeast protein-protein interaction network. The results show that this method has better performance in terms of sensitivity and specificity compared with other methods. The functional consistency is also good.
2012年06期 v.17 674-681页 [查看摘要][在线阅读][下载 1036K] [下载次数:27 ] |[网刊下载次数:0 ] |[引用频次:3 ] |[阅读次数:0 ] The flowering time of Arabidopsis is sensitive to climate variability, with lighting conditions being a major determinant of the flowering time. Long-days induce early flowering, while short-days induce late flowering or even no flowers. This study investigates the intrinsic mechanisms for Arabidopsis flowering in different lighting conditions using mutual information networks and logic networks. The structure parameters of the mutual information networks show that the average degree and the average core clearly distinguish these networks. A method is then given to find the key structural genes in the mutual information networks and the logic networks respectively. Ten genes are found to possibly promote flowering with three genes that may restrain flowering. The sensitivity of this method to find the genes that promote flowering is 80%, while the sensitivity of the method to find the genes that restrain flowering is 100%
2012年06期 v.17 682-690页 [查看摘要][在线阅读][下载 709K] [下载次数:23 ] |[网刊下载次数:0 ] |[引用频次:0 ] |[阅读次数:0 ] - Junilda Spirollari;Shawn Xiong Wang;Jason T.L. Wang;
We present in this paper an ab initio method, named KnotFold, for RNA H-type pseudoknot prediction. Our method employs an ensemble of RNA folding tools and a filtering heuristic to generate a set of pseudoknot-free stems, and then predicts pseudoknots by utilizing a search technique with a pseudo-probability scoring scheme. Experimental results show that KnotFold achieves higher sensitivity than existing methods. The KnotFold package with documentation is freely available at http://bioinformatics.njit.edu/KnotFold
2012年06期 v.17 691-700页 [查看摘要][在线阅读][下载 610K] [下载次数:16 ] |[网刊下载次数:0 ] |[引用频次:0 ] |[阅读次数:0 ] <正>~~
2012年06期 v.17 701-704页 [查看摘要][在线阅读][下载 54K] [下载次数:21 ] |[网刊下载次数:0 ] |[引用频次:0 ] |[阅读次数:0 ] -
<正>Tsinghua Science and Technology (Tsinghua Sci Technol), an academic journal sponsored by Tsinghua University,is published bimonthly. This journal aims at presenting the up-to-date scientific achievements with high creativityand great significance in computer and electronic engineering. Contributions all over the world are welcome.Tsinghua Sci Technol is indexed by IEEE Xplore,
2012年06期 v.17 705页 [查看摘要][在线阅读][下载 569K] [下载次数:8 ] |[网刊下载次数:0 ] |[引用频次:0 ] |[阅读次数:0 ] 下载本期数据