- Waseem Ahmed;Yongwei Wu;
Over the past decade, there has been a paradigm shift leading consumers and enterprises to the adoption of cloud computing services. Even though most cases are still in the early stages of transition, there has been a steady increase in the implementation of the pay-as-you-go or pay-as-you-grow models offered by cloud providers. Whether applied as an extension of virtual infrastructure, software, or platform as a service, many users are still challenged by the estimation of adequate resource allocation and the wide variations in pricing. Customers require a simple method of predicting future demand in terms of the number of nodes to be allocated in the cloud environment. In this paper, we review and discuss existing methodologies for estimating the demand for cloud nodes and their corresponding pricing policies. Based on our review, we propose a novel approach using the Hidden Markov Model to estimate the acquisition of cloud nodes.
2014年01期 v.19 1-12页 [查看摘要][在线阅读][下载 671K] [下载次数:24 ] |[网刊下载次数:0 ] |[引用频次:3 ] |[阅读次数:0 ] - Yu Gu;Dongsheng Wang;Chuanyi Liu;
With the rapid popularity of cloud computing paradigm, disaster recovery using cloud resources becomes an attractive approach. This paper presents a practical multi-cloud based disaster recovery service model: DRCloud. With DR-Cloud, resources of multiple cloud service providers can be utilized cooperatively by the disaster recovery service provider. A simple and unified interface is exposed to the customers of DR-Cloud to adapt the heterogeneity of cloud service providers involved in the disaster recovery service, and the internal processes between clouds are invisible to the customers. DR-Cloud proposes multiple optimization scheduling strategies to balance the disaster recovery objectives, such as high data reliability, low backup cost, and short recovery time, which are also transparent to the customers. Different data scheduling strategies based on DR-Cloud are suitable for different kinds of data disaster recovery scenarios. Experimental results show that the DR-Cloud model can cooperate with cloud service providers with various parameters effectively, while its data scheduling strategies can achieve their optimization objectives efficiently and are widely applicable.
2014年01期 v.19 13-23页 [查看摘要][在线阅读][下载 712K] [下载次数:53 ] |[网刊下载次数:0 ] |[引用频次:12 ] |[阅读次数:0 ] - Nan Zhu;Xue Liu;Jie Liu;Yu Hua;
Distributed data processing system is becoming one of the most important components for data-intensive computational tasks in the enterprise software infrastructure.Deploying and operating such systems require large amount of costs,including hardware costs to build clusters and energy costs to run clusters.To make these systems sustainable and scalable,power management has been an important research problem.In this paper,we take Hadoop as an example to illustrate the power peak problem which causes power inefficiency and provides in-depth analysis to explain issues with existing system designs.We propose a novel power capping module in the Hadoop scheduler to mitigate power peaks.Extensive simulation studies show that our proposed solution can effectively smooth the power consumption curve and mitigate temporary power peaks for Hadoop clusters.
2014年01期 v.19 24-32页 [查看摘要][在线阅读][下载 4292K] [下载次数:71 ] |[网刊下载次数:0 ] |[引用频次:14 ] |[阅读次数:0 ] - Yang Liu;Bin Wu;Hongxu Wang;Pengjiang Ma;
The design and implementation of a scalable parallel mining system target for big graph analysis has proven to be challenging. In this study, we propose a parallel data mining system for analyzing big graph data generated on a Bulk Synchronous Parallel(BSP) computing model named BSP-based Parallel Graph Mining(BPGM). This system has four sets of parallel graph mining algorithms programmed in the BSP parallel model and a well-designed workflow engine optimized for cloud computing to invoke these algorithms. Experimental results show that the graph mining algorithm components in BPGM are efficient and have better performance than big cloud-based parallel data miner and BC-BSP.
2014年01期 v.19 33-38页 [查看摘要][在线阅读][下载 1279K] [下载次数:82 ] |[网刊下载次数:0 ] |[引用频次:4 ] |[阅读次数:0 ] - Yaxiong Zhao;Jie Wu;Cong Liu;
The buzz-word big-data refers to the large-scale distributed data processing applications that operate on exceptionally large amounts of data. Google's MapReduce and Apache's Hadoop, its open-source implementation, are the defacto software systems for big-data applications. An observation of the MapReduce framework is that the framework generates a large amount of intermediate data. Such abundant information is thrown away after the tasks finish, because MapReduce is unable to utilize them. In this paper, we propose Dache, a data-aware cache framework for big-data applications. In Dache, tasks submit their intermediate results to the cache manager. A task queries the cache manager before executing the actual computing work. A novel cache description scheme and a cache request and reply protocol are designed. We implement Dache by extending Hadoop. Testbed experiment results demonstrate that Dache significantly improves the completion time of MapReduce jobs.
2014年01期 v.19 39-50页 [查看摘要][在线阅读][下载 866K] [下载次数:177 ] |[网刊下载次数:0 ] |[引用频次:16 ] |[阅读次数:0 ] - Yaxiong Zhao;Jie Wu;Cong Liu;
Data Center Networks(DCNs) are the fundamental infrastructure for cloud computing. Driven by the massive parallel computing tasks in cloud computing, one-to-many data dissemination becomes one of the most important traffic patterns in DCNs. Many architectures and protocols are proposed to meet this demand. However, these proposals either require complicated configurations on switches and servers, or cannot deliver an optimal performance. In this paper, we propose the peer-assisted data dissemination for DCNs. This approach utilizes the rich physical connections with high bandwidths and mutli-path connections, to facilitate efficient one-to-many data dissemination. We prove that an optimal P2P data dissemination schedule exists for FatTree, a speciallydesigned DCN architecture. We then present a theoretical analysis of this algorithm in the general multi-rooted tree topology, a widely-used DCN architecture. Additionally, we explore the performance of an intuitive line structure for data dissemination. Our analysis and experimental results prove that this simple structure is able to produce a comparable performance to the optimal algorithm. Since DCN applications heavily rely on virtualization to achieve optimal resource sharing, we present a general implementation method for the proposed algorithms, which aims to mitigate the impact of the potentially-high churn rate of the virtual machines.
2014年01期 v.19 51-64页 [查看摘要][在线阅读][下载 1288K] [下载次数:33 ] |[网刊下载次数:0 ] |[引用频次:3 ] |[阅读次数:0 ] - Xiaolin Xu;Hai Jin;Song Wu;Lixiang Tang;Yihong Wang;
To satisfy the rapid growth of cloud technologies, a large number of web applications have been developed and deployed, and these applications are being run in clouds. Due to the scalability provided by clouds, a single web application may be concurrently visited by several millions or billions of users. Thus, the testing and performance evaluations of these applications are increasingly important. User model based evaluations can significantly reduce the manual work required, and can enable us to determine the performance of applications under real runtime environments. Hence, it has become one of the most popular evaluation methods in both industry and academia. Significant efforts have focused on building different kinds of models using mining web access logs, such as Markov models and Customer Behavior Model Graph(CBMG). This paper proposes a new kind of model, named the User Representation Model Graph(URMG), which is built based on CBMG. It uses an algorithm to refine CBMG and optimizes the evaluations execution process. Based on this model, an automatic testing and evaluation system for web applications is designed, implemented, and deployed in our test cloud, which is able to execute all of the analysis and testing operations using only web access logs. In our system, the error rate caused by random access to applications in the execution phase is also reduced, and the results show that the error rate of the evaluation that depends on URMG is 50% less than that which depends on CBMG.
2014年01期 v.19 65-75页 [查看摘要][在线阅读][下载 864K] [下载次数:51 ] |[网刊下载次数:0 ] |[引用频次:4 ] |[阅读次数:0 ] - Jie Chen;Shu Zhao;Yanping Zhang;
The concept of deep learning has been applied to many domains, but the definition of a suitable problem depth has not been sufficiently explored. In this study, we propose a new Hierarchical Covering Algorithm(HCA) method to determine the levels of a hierarchical structure based on the Covering Algorithm(CA). The CA constructs neural networks based on samples' own characteristics, and can effectively handle multi-category classification and large-scale data. Further, we abstract characters based on the CA to automatically embody the feature of a deep structure. We apply CA to construct hidden nodes at the lower level, and define a fuzzy equivalence relation R on upper spaces to form a hierarchical architecture based on fuzzy quotient space theory. The covering tree naturally becomes from R. HCA experiments performed on MNIST dataset show that the covering tree embodies the deep architecture of the problem, and the effects of a deep structure are shown to be better than having a single level.
2014年01期 v.19 76-81页 [查看摘要][在线阅读][下载 380K] [下载次数:67 ] |[网刊下载次数:0 ] |[引用频次:3 ] |[阅读次数:0 ] - Zhen Chen;Wenyu Dong;Hang Li;Peng Zhang;Xinming Chen;Junwei Cao;
A data center is an infrastructure that supports Internet service. Cloud computing is rapidly changing the face of the Internet service infrastructure, enabling even small organizations to quickly build Web and mobile applications for millions of users by taking advantage of the scale and flexibility of shared physical infrastructures provided by cloud computing. In this scenario, multiple tenants save their data and applications in shared data centers, blurring the network boundaries between each tenant in the cloud. In addition, different tenants have different security requirements, while different security policies are necessary for different tenants. Network virtualization is used to meet a diverse set of tenant-specific requirements with the underlying physical network, enabling multi-tenant datacenters to automatically address a large and diverse set of tenants requirements. In this paper, we propose the system implementation of vCNSMS, a collaborative network security prototype system used in a multi-tenant data center. We demonstrate vCNSMS with a centralized collaborative scheme and deep packet inspection with an open source UTM system. A security level based protection policy is proposed for simplifying the security rule management for vCNSMS. Different security levels have different packet inspection schemes and are enforced with different security plugins. A smart packet verdict scheme is also integrated into vCNSMS for intelligence flow processing to protect from possible network attacks inside a data center network.
2014年01期 v.19 82-94页 [查看摘要][在线阅读][下载 2580K] [下载次数:276 ] |[网刊下载次数:0 ] |[引用频次:24 ] |[阅读次数:0 ] - Wenliang Huang;Zhen Chen;Wenyu Dong;Hang Li;Bin Cao;Junwei Cao;
China Unicom, the largest WCDMA 3G operator in China, meets the requirements of the historical Mobile Internet Explosion, or the surging of Mobile Internet Traffic from mobile terminals. According to the internal statistics of China Unicom, mobile user traffic has increased rapidly with a Compound Annual Growth Rate(CAGR) of 135%. Currently China Unicom monthly stores more than 2 trillion records, data volume is over 525 TB, and the highest data volume has reached a peak of 5 PB. Since October 2009, China Unicom has been developing a home-brewed big data storage and analysis platform based on the open source Hadoop Distributed File System(HDFS) as it has a long-term strategy to make full use of this Big Data. All Mobile Internet Traffic is well served using this big data platform. Currently, the writing speed has reached 1 390 000 records per second, and the record retrieval time in the table that contains trillions of records is less than 100 ms. To take advantage of this opportunity to be a Big Data Operator, China Unicom has developed new functions and has multiple innovations to solve space and time constraint challenges presented in data processing. In this paper, we will introduce our big data platform in detail. Based on this big data platform, China Unicom is building an industry ecosystem based on Mobile Internet Big Data, and considers that a telecom operator centric ecosystem can be formed that is critical to reach prosperity in the modern communications business.
2014年01期 v.19 95-101页 [查看摘要][在线阅读][下载 929K] [下载次数:785 ] |[网刊下载次数:0 ] |[引用频次:45 ] |[阅读次数:0 ] -
<正>Manuscripts are invited for a Special Issue on Parameterized Complexity published by the Tsinghua Science and Technology.The field of parameterized complexity is a vibrant research area and has witnessed a tremendous growth in the last two decades.It has become a central area in theoretical computer science,with applications to bioinformatics,artificial intelligence,and many other areas.The Special Issue invites regular and survey papers pertaining to parameterized complexity,for example,Design and analysis of parameterized and kernelization algorithms,Complexity and lower bounds,Connections between parameterized complexity and approximation.Other topics in parameterized complexity are also welcome.
2014年01期 v.19 102页 [查看摘要][在线阅读][下载 46K] [下载次数:24 ] |[网刊下载次数:0 ] |[引用频次:0 ] |[阅读次数:0 ] <正>The publication of Tsinghua Science and Technology was started in 1996.Since then,it has been an international academic journal sponsored by Tsinghua University and published bimonthly.This journal aims at presenting the state-of-art scientific achievements in computer science,and other IT fields,and is currently indexed by EI and other abstracting indices.The journal is available in IEEE Xplore Digital Library with an open access model:http://ieeexplore.ieee.org/xpl/RecentIssue.jsp?punumber=5971803.
2014年01期 v.19 103页 [查看摘要][在线阅读][下载 53K] [下载次数:17 ] |[网刊下载次数:0 ] |[引用频次:0 ] |[阅读次数:0 ] -
<正>The publication of Tsinghua Science and Technology was started in 1996.It is an international academic journal sponsored by Tsinghua University and is published bimonthly.This journal aims at presenting the up-to-date scientific achievements in computer science,electronic engineering,and other IT fields.It is indexed by EI and other abstracting indexes.From 2012,the journal enters into IEEE Xplore Digital Library and all papers published there are freely downloadable.
2014年01期 v.19 104页 [查看摘要][在线阅读][下载 53K] [下载次数:13 ] |[网刊下载次数:0 ] |[引用频次:0 ] |[阅读次数:0 ] <正>Tsinghua Science and Technology(Tsinghua Sci Technol),an academic journal sponsored by Tsinghua University,is published bimonthly.This journal aims at presenting the up-to-date scientific achievements with high creativity and great significance in computer and electronic engineering,Contributions all over the world are welcome,Tsinghua Sci Technol is indexed by IEEE Xplore,Engineering index(Ei,USA),INSPEC,SA,Cambridge Abstract and other abstracting indexes.Manuscripts are selected for publication according to the editorial assessment of their suitability and evaluation from independent reviewers.Papers are usually sent to two or more reviewers including one reviewer out of China.Editorial staff will edit accepted papers to improve accuracy and clarity and shorten,if necessary.
2014年01期 v.19 105页 [查看摘要][在线阅读][下载 409K] [下载次数:10 ] |[网刊下载次数:0 ] |[引用频次:0 ] |[阅读次数:0 ] 下载本期数据