- Zebang Shen;Binbin Yong;Gaofeng Zhang;Rui Zhou;Qingguo Zhou;
As a subfield of Multimedia Information Retrieval(MIR), Singer IDentification(SID) is still in the research phase. On one hand, SID cannot easily achieve high accuracy because the singing voice is difficult to model and always disturbed by the background instrumental music. On the other hand, the performance of conventional machine learning methods is limited by the scale of the training dataset. This study proposes a new deep learning approach based on Long Short-Term Memory(LSTM) and Mel-Frequency Cepstral Coefficient(MFCC) features to identify the singer of a song in large datasets. The results of this study indicate that LSTM can be used to build a representation of the relationships between different MFCC frames. The experimental results show that the proposed method achieves better accuracy for Chinese SID in the MIR-1 K dataset than the traditional approaches.
2019年04期 v.24 371-378页 [查看摘要][在线阅读][下载 3738K] [下载次数:122 ] |[网刊下载次数:0 ] |[引用频次:7 ] |[阅读次数:0 ] - Yanxia Lv;Sancheng Peng;Ying Yuan;Cong Wang;Pengfei Yin;Jiemin Liu;Cuirong Wang;
By combining multiple weak learners with concept drift in the classification of big data stream learning, the ensemble learning can achieve better generalization performance than the single learning approach. In this paper,we present an efficient classifier using the online bagging ensemble method for big data stream learning. In this classifier, we introduce an efficient online resampling mechanism on the training instances, and use a robust coding method based on error-correcting output codes. This is done in order to reduce the effects of correlations between the classifiers and increase the diversity of the ensemble. A dynamic updating model based on classification performance is adopted to reduce the unnecessary updating operations and improve the efficiency of learning.We implement a parallel version of EoBag, which runs faster than the serial version, and results indicate that the classification performance is almost the same as the serial one. Finally, we compare the performance of classification and the usage of resources with other state-of-the-art algorithms using the artificial and the actual data sets, respectively. Results show that the proposed algorithm can obtain better accuracy and more feasible usage of resources for the classification of big data stream.
2019年04期 v.24 379-388页 [查看摘要][在线阅读][下载 1305K] [下载次数:77 ] |[网刊下载次数:0 ] |[引用频次:30 ] |[阅读次数:0 ] - Chao Tan;Genlin Ji;
In the fields of machine learning and data mining, label learning is a nascent area of research, and within this paradigm, there is much room for improving multi-label manifold learning algorithms for high-dimensional data. Thus far, researchers have experimented with mapping relationships from the feature space to the traditional logical label space(using neighbors in the label space, for example, to predict logical label vectors from the feature space's manifold structure). Here we combine the feature manifold's and label space's local topological structures to reconstruct the label manifold. To achieve this, we use a nonlinear manifold learning algorithm to transform the local topological structure from the feature space to the label space. Our algorithm adopts a regularized leastsquares kernel method to realize the reconstruction process, employing an optimization function to find the best solution. Extensive experiments show that our algorithm significantly improves multi-label manifold learning in terms of learning accuracy and time complexity.
2019年04期 v.24 389-399页 [查看摘要][在线阅读][下载 736K] [下载次数:40 ] |[网刊下载次数:0 ] |[引用频次:1 ] |[阅读次数:0 ] - Fang Dong;Xiaolin Guo;Pengcheng Zhou;Dian Shen;
With the continuous enrichment of cloud services, an increasing number of applications are being deployed in data centers. These emerging applications are often communication-intensive and data-parallel, and their performance is closely related to the underlying network. With their distributed nature, the applications consist of tasks that involve a collection of parallel flows. Traditional techniques to optimize flow-level metrics are agnostic to task-level requirements, leading to poor application-level performance. In this paper, we address the heterogeneous task-level requirements of applications and propose task-aware flow scheduling. First, we model tasks' sensitivity to their completion time by utilities. Second, on the basis of Nash bargaining theory, we establish a flow scheduling model with heterogeneous utility characteristics, and analyze it using Lagrange multiplier method and KKT condition. Third, we propose two utility-aware bandwidth allocation algorithms with different practical constraints. Finally, we present Tasch, a system that enables tasks to maintain high utilities and guarantees the fairness of utilities. To demonstrate the feasibility of our system, we conduct comprehensive evaluations with realworld traffic trace. Communication stages complete up to 1.4 faster on average, task utilities increase up to 2.26,and the fairness of tasks improves up to 8.66 using Tasch in comparison to per-flow mechanisms.
2019年04期 v.24 400-411页 [查看摘要][在线阅读][下载 1174K] [下载次数:40 ] |[网刊下载次数:0 ] |[引用频次:6 ] |[阅读次数:0 ]
- Jinzhi Liao;Jiuyang Tang;Xiang Zhao;
As a supplement to traditional education, online courses offer people, regardless of their age, gender, or profession, the chance to access state-of-the-art knowledge. Nonetheless, despite the large number of students who choose to begin online courses, it is easy to observe that quite a few of them drop out in the middle, and information on this is vital for course organizers to improve their curriculum outlines. In this work, in order to make a precise prediction of the drop-out rate, we propose a combined method MOOP, which consists of a global tensor and local tensor to express all available feature aspects. Specifically, the global tensor structure is proposed to model the data of the online courses, while a local tensor is clustered to capture the inner connection of courses. Consequently, drop-out prediction is achieved by adopting a high-accuracy low-rank tensor completion method, equipped with a pigeon-inspired algorithm to optimize the parameters. The proposed method is empirically evaluated on real-world Massive Open Online Courses(MOOC) data, and is demonstrated to offer remarkable superiority over alternatives in terms of efficiency and accuracy.
2019年04期 v.24 412-422页 [查看摘要][在线阅读][下载 2455K] [下载次数:133 ] |[网刊下载次数:0 ] |[引用频次:8 ] |[阅读次数:0 ] - Qing Sun;Ji Wu;Wenge Rong;Wenbo Liu;
In programming courses, the traditional assessment approach tends to evaluate student performance by scoring one or more project-level summative assignments. This approach no longer meets the requirements of a quality programming language education. Based on an upgraded peer code review model, we propose a formative assessment approach to assess the learning of computer programming languages, and develop an online assessment system(OOCourse) to implement this approach. Peer code review and inspection is an effective way to ensure the high quality of a program by systematically checking the source code. Though it is commonly applied in industrial and open-source software development, it is rarely taught and practiced in undergraduate-level programming courses. We conduct a case study using the formative assessment method in a sophomore level Object-Oriented Design and Construction course with more than 240 students. We use Moodle(an online learning system) and some relevant plugins to conduct peer code review. We also conduct data mining on the running data from the peer assessment activities. The case study shows that formative assessment based on peer code review gradually improved the programming ability of students in the undergraduate class.
2019年04期 v.24 423-434页 [查看摘要][在线阅读][下载 3836K] [下载次数:81 ] |[网刊下载次数:0 ] |[引用频次:4 ] |[阅读次数:0 ] - Hans Yuan;Paul Cao;
As computer science enrollments continue to surge, assessments that involve student collaboration may play a more critical role in improving student learning. We provide a review on some of the most commonly adopted collaborative assessments in computer science, including pair programming, collaborative exams, and group projects. Existing research on these assessment formats is categorized and compared. We also discuss potential future research topics on the aforementioned collaborative assessment formats.
2019年04期 v.24 435-445页 [查看摘要][在线阅读][下载 290K] [下载次数:121 ] |[网刊下载次数:0 ] |[引用频次:3 ] |[阅读次数:0 ]
- Xiang Chen;Min Li;Ruiqing Zheng;Siyu Zhao;Jianxin Wang;Fang-Xiang Wu;Yaohang Li;
Inferring Gene Regulatory Networks(GRNs) structure from gene expression data has been a challenging problem in systems biology. It is critical to identify complicated regulatory relationships among genes for understanding regulatory mechanisms in cells. Various methods based on information theory have been developed to infer GRNs. However, these methods introduce many redundant regulatory relationships in the network inference process due to external noise in the original data, topology sparseness in the network structure, and non-linear dependency among genes. Especially as the network size increases, the performance of these methods decreases dramatically. In this paper, a novel network structure inference method named Loc-PCA-CMI is proposed that first identifies local overlapped gene clusters, and then infers the local network structure for each cluster by a Path Consistency Algorithm based on Conditional Mutual Information(PCA-CMI). The final structure of the GRN is denoted as dependence among genes by an ensemble of the obtained local network structures. Loc-PCA-CMI was evaluated on DREAM3 knock-out datasets, and its performance was compared to other information theorybased network inference methods including ARACNE, MRNET, PCA-CMI, and PCA-PMI. Experimental results demonstrate our novel method Loc-PCA-CMI outperforms the other four methods in DREAM3 datasets especially in size 50 and 100 networks.
2019年04期 v.24 446-454页 [查看摘要][在线阅读][下载 568K] [下载次数:67 ] |[网刊下载次数:0 ] |[引用频次:6 ] |[阅读次数:0 ] - Shengbing Pei;Jihong Guan;Shuigeng Zhou;
Functional networks are extracted from resting-state functional magnetic resonance imaging data to explore the biomarkers for distinguishing brain disorders in disease diagnosis. Previous works have primarily focused on using a single Resting-State Network(RSN) with various techniques. Here, we apply fusion analysis of RSNs to capturing biomarkers that can combine the complementary information among the RSNs. Experiments are carried out on three groups of subjects, i.e., Cognition Normal(CN), Early Mild Cognitive Impairment(EMCI), and Alzheimer's Disease(AD) groups, which correspond to the three progressing stages of AD; each group contains18 subjects. First, we apply group Independent Component Analysis(ICA) to extracting the Default Mode Network(DMN) and Dorsal Attention Network(DAN) for each subject group. Then, by obtaining the common DMN and DAN as templates for each group, we employ the individual ICA to extract the DMN and DAN for each subject.Finally, we fuse the DMNs and DANs to explore the biomarkers. The results show that(1) the templates generated by group ICA can extract the RSN for each subject by individual ICA effectively;(2) the RSNs combined with the fusion analysis can obtain more informative biomarkers than without fusion analysis;(3) the most different regions of DMN and DAN are found between CN and EMCI and between EMCI and AD, which show differences. For the DMN, the difference in the medial prefrontal cortex between the EMCI and AD is smaller than that between CN and EMCI, whereas that in the posterior cingulate between EMCI and AD is larger. As for the DAN, the difference in the intraparietal sulcus is smaller than that between CN and EMCI;(4) extracting DMN and DAN for each subject via the back reconstruction of group ICA is invalid.
2019年04期 v.24 456-467页 [查看摘要][在线阅读][下载 2461K] [下载次数:38 ] |[网刊下载次数:0 ] |[引用频次:3 ] |[阅读次数:0 ] - Feihao Wu;Juan Chen;Yong Dong;Wenxu Zheng;Xiaodong Pan;Yuan Yuan;Zhixin Ou;Yuyang Sun;
Component overclocking is an effective approach to speed up the components of a system to realize a higher program performance; it includes processor overclocking or memory overclocking. However, overclocking will unavoidably result in increase in power consumption. Our goal is to optimally improve the performance of scientific computing applications without increasing the total power consumption for a processor-memory system. We built a processor-memory energy efficiency model for multicore-based systems, which coordinates the performance and power of processor and memory. Our model exploits performance boost opportunities for a processor-memory system by adopting processor overclocking, processor Dynamic Voltage and Frequency Scaling(DVFS), memory active ratio adjustment, and memory overclocking, according to different scientific applications.This model also provides a total power control method by considering the same four factors mentioned above. We propose a processor and memory Coordination-based holistic Energy-Efficient(CEE) algorithm, which achieves performance improvement without increasing the total power consumption. The experimental results show that an average of 9.3% performance improvement was obtained for all 14 benchmarks. Meanwhile the total power consumption does not increase. The maximal performance improvement was up to 13.1% from dedup benchmark.Our experiments validate the effectiveness of our holistic energy-efficient model and technology.
2019年04期 v.24 468-483页 [查看摘要][在线阅读][下载 1826K] [下载次数:54 ] |[网刊下载次数:0 ] |[引用频次:4 ] |[阅读次数:0 ] - Yudong Qin;Deke Guo;Guoming Tang;Bangbang Ren;
To satisfy the ever-increasing bandwidth demand of modern data centers, researchers have proposed hybrid Data Center Networks(DCNs), which employ high-bandwidth Optical Circuit Switches(OCSs) to compensate for Electrical Packet Switches(EPS). Existing designs, such as Helios and c-Through, mainly focus on reconfiguring optical devices to meet the estimated traffic requirements. However, these designs face two major challenges in their OCS-based networks, namely, the complex control mechanism and cabling problems. To solve these challenges, we propose TIO, a hybrid DCN that employs Visible Light Communication(VLC) instead of wired OCS design to connect racks. TIO integrates the wireless VLC-based Jellyfish and wired EPS-based Fat Tree seamlessly and combines the opposite and complementary characteristics, including wireless VLC direct connection and wired electrical packet switching, random graph, and Clos topology properties. To further exploit the merits of TIO, we design a hybrid routing scheme and congestion-aware flow scheduling method. Comprehensive evaluations indicate that TIO outperforms the Jellyfish and Fat Tree in both topology properties and network performance, and the flow scheduling method also evidently improves performance.
2019年04期 v.24 484-496页 [查看摘要][在线阅读][下载 949K] [下载次数:40 ] |[网刊下载次数:0 ] |[引用频次:6 ] |[阅读次数:0 ]