- You Liang;Yuqing Lan;
The microservices architecture has been proposed to overcome the drawbacks of the traditional monolithic architecture. Scalability is one of the most attractive features of microservices. Scaling in the microservices architecture requires the scaling of specified services only, rather than the entire application. Scaling services can be achieved by deploying the same service multiple times on different physical machines. However,problems with load balancing may arise. Most existing solutions of microservices load balancing focus on individual tasks and ignore dependencies between these tasks. In the present paper, we propose TCLBM, a task chainbased load balancing algorithm for microservices. When an Application Programming Interface(API) request is received, TCLBM chooses target services for all tasks of this API call and achieves load balancing by evaluating the system resource usage of each service instance. TCLBM reduces the API response time by reducing data transmissions between physical machines. We use three heuristic algorithms, namely, Particle Swarm Optimization(PSO), Simulated Annealing(SA), and Genetic Algorithm(GA), to implement TCLBM, and comparison results reveal that GA performs best. Our findings show that TCLBM achieves load balancing among service instances and reduces API response times by up to 10% compared with existing methods.
2021年03期 v.26 251-258页 [查看摘要][在线阅读][下载 1093K] [下载次数:37 ] |[网刊下载次数:0 ] |[引用频次:10 ] |[阅读次数:0 ] - Pingchuan Ma;Bo Jiang;Zhigang Lu;Ning Li;Zhengwei Jiang;
Network texts have become important carriers of cybersecurity information on the Internet. These texts include the latest security events such as vulnerability exploitations, attack discoveries, advanced persistent threats,and so on. Extracting cybersecurity entities from these unstructured texts is a critical and fundamental task in many cybersecurity applications. However, most Named Entity Recognition(NER) models are suitable only for general fields, and there has been little research focusing on cybersecurity entity extraction in the security domain. To this end, in this paper, we propose a novel cybersecurity entity identification model based on Bidirectional Long Short-Term Memory with Conditional Random Fields(Bi-LSTM with CRF) to extract security-related concepts and entities from unstructured text. This model, which we have named XBi LSTM-CRF, consists of a word-embedding layer, a bidirectional LSTM layer, and a CRF layer, and concatenates X input with bidirectional LSTM output. Via extensive experiments on an open-source dataset containing an office security bulletin, security blogs, and the Common Vulnerabilities and Exposures list, we demonstrate that XBi LSTM-CRF achieves better cybersecurity entity extraction than state-of-the-art models.
2021年03期 v.26 259-265页 [查看摘要][在线阅读][下载 499K] [下载次数:140 ] |[网刊下载次数:0 ] |[引用频次:46 ] |[阅读次数:0 ] - Yan Jiang;Wei Liu;Xuanhua Shi;Weizhong Qiang;
Docker, as a mainstream container solution, adopts the Copy-on-Write(CoW) mechanism in its storage drivers. This mechanism satisfies the need of different containers to share the same image. However, when a single container performs operations such as modification of an image file, a duplicate is created in the upper readwrite layer, which contributes to the runtime overhead. When the accessed image file is fairly large, this additional overhead becomes non-negligible. Here we present the concept of Dynamic Prefetching Strategy Optimization(DPSO), which optimizes the Co W mechanism for a Docker container on the basis of the dynamic prefetching strategy. At the beginning of the container life cycle, DPSO pre-copies up the image files that are most likely to be copied up later to eliminate the overhead caused by performing this operation during application runtime. The experimental results show that DPSO has an average prefetch accuracy of greater than 78% in complex scenarios and could effectively eliminate the overhead caused by the CoW mechanism.
2021年03期 v.26 266-274页 [查看摘要][在线阅读][下载 617K] [下载次数:102 ] |[网刊下载次数:0 ] |[引用频次:6 ] |[阅读次数:0 ] - Qinchen Cao;Weilin Zhang;Yonghua Zhu;
The cartoon animation industry has developed into a huge industrial chain with a large potential market involving games, digital entertainment, and other industries. However, due to the coarse-grained classification of cartoon materials, cartoon animators can hardly find relevant materials during the process of creation. The polar emotions of cartoon materials are an important reference for creators as they can help them easily obtain the pictures they need. Some methods for obtaining the emotions of cartoon pictures have been proposed, but most of these focus on expression recognition. Meanwhile, other emotion recognition methods are not ideal for use as cartoon materials. We propose a deep learning-based method to classify the polar emotions of the cartoon pictures of the "Moe" drawing style. According to the expression feature of the cartoon characters of this drawing style, we recognize the facial expressions of cartoon characters and extract the scene and facial features of the cartoon images. Then, we correct the emotions of the pictures obtained by the expression recognition according to the scene features. Finally, we can obtain the polar emotions of corresponding picture. We designed a dataset and performed verification tests on it, achieving 81.9% experimental accuracy. The experimental results prove that our method is competitive.
2021年03期 v.26 275-286页 [查看摘要][在线阅读][下载 688K] [下载次数:65 ] |[网刊下载次数:0 ] |[引用频次:10 ] |[阅读次数:0 ] - Zhang Yang;Aiqing Zhang;Zeyao Mo;
The Distributed Shared Memory(DSM) architecture is widely used in today's computer design to mitigate the ever-widening processing-memory gap, and it inevitably exhibits Non-Uniform Memory Access(NUMA) to shared-memory parallel applications. Failure to adapt to the NUMA effect can significantly downgrade application performance, especially on today's manycore platforms with tens to hundreds of cores. However, traditional approaches such as first-touch and memory policy fall short in false page-sharing, fragmentation, or ease of use. In this paper, we propose a partitioned shared-memory approach that allows multithreaded applications to achieve full NUMA-awareness with only minor code changes and develop an accompanying NUMA-aware heap manager which eliminates false page-sharing and minimizes fragmentation. Experiments on a 256-core cc-NUMA computing node show that the proposed approach helps applications to adapt to NUMA with only minor code changes and improves the performance of typical multithreaded scientific applications by up to 4.3 folds with the increased use of cores.
2021年03期 v.26 287-295页 [查看摘要][在线阅读][下载 915K] [下载次数:48 ] |[网刊下载次数:0 ] |[引用频次:2 ] |[阅读次数:0 ] - Jianjiang Li;Jie Lin;Panpan Du;Kai Zhang;Jie Wu;
The radiation damage effect of key structural materials is one of the main research subjects of the numerical reactor. From the perspective of experimental safety and feasibility, Molecular Dynamics(MD) in the materials field is an ideal method for simulating the radiation damage of structural materials. The Crystal-MD represents a massive parallel MD simulation software based on the key material characteristics of reactors. Compared with the Large-scale Atomic/Molecurlar Massively Parallel Simulator(LAMMPS) and ITAP Molecular Dynamics(IMD)software, the Crystal-MD reduces the memory required for software operation to a certain extent, but it is very time-consuming. Moreover, the calculation results of the Crystal-MD have large deviations, and there are also some problems, such as memory limitation and frequent communication during its migration and optimization. In this paper, in order to solve the above problems, the memory access mode of the Crystal-MD software is studied.Based on the memory access mode, a memory access optimization strategy is proposed for a unique architecture of China's supercomputer Sunway Taihu Light. The proposed optimization strategy is verified by the experiments, and experimental results show that the running speed of the Crystal-MD is increased significantly by using the proposed optimization strategy.
2021年03期 v.26 296-308页 [查看摘要][在线阅读][下载 744K] [下载次数:36 ] |[网刊下载次数:0 ] |[引用频次:0 ] |[阅读次数:0 ] - Jianjiang Li;Baixue Ji;Yun Yang;Peng Wei;Jie Wu;
The Kinetic Monte Carlo(KMC) is one of the commonly used methods for simulating radiation damage of materials. Our team develops a parallel KMC software named Crystal-KMC, which supports the Embedded Atom Method(EAM) potential energy and utilizes the Message Passing Interface(MPI) technology to simulate the vacancy transition of the Copper(Cu) element under neutron radiation. To make better use of the computing power of modern supercomputers, we develop the parallel efficiency optimization model for the Crystal-KMC on Tianhe-2, to achieve a larger simulation of the damage process of materials under irradiation environment. Firstly, we analyze the performance bottleneck of the Crystal-KMC software and use the MIC offload statement to implement the operation of key modules of the software on the MIC coprocessor. We use Open MP to develop parallel optimization for the Crystal-KMC, combined with existing MPI inter-process communication optimization, finally achieving hybrid parallel optimization. The experimental results show that in the single-node CPU and MIC collaborative parallel mode, the speedup of the calculation hotspot reaches 30.1, and the speedup of the overall software reaches 7.43.
2021年03期 v.26 309-321页 [查看摘要][在线阅读][下载 977K] [下载次数:29 ] |[网刊下载次数:0 ] |[引用频次:1 ] |[阅读次数:0 ] - Xiang Fei;Youhui Zhang;Weimin Zheng;
Resistive Random Access Memory(ReRAM)-based neural network accelerators have potential to surpass their digital counterparts in computational efficiency and performance. However, design of these accelerators faces a number of challenges including imperfections of the Re RAM device and a large amount of calculations required to accurately simulate the former. We present XB-SIM, a simulation framework for Re RAM-crossbar-based Convolutional Neural Network(CNN) accelerators. XB-SIM can be flexibly configured to simulate the accelerator's structure and clock-driven behaviors at the architecture level. This framework also includes an Re RAM-aware Neural Network(NN) training algorithm and a CNN-oriented mapper to train an NN and map it onto the simulated design efficiently. Behavior of the simulator has been verified by the corresponding circuit simulation of a real chip. Furthermore, a batch processing mode of the massive calculations that are required to mimic the behavior of Re RAM-crossbar circuits is proposed to fully apply the computational concurrency of the mapping strategy. On CPU/GPGPU, this batch processing mode can improve the simulation speed by up to 5.02 or 34.29. Within this framework, comprehensive architectural exploration and end-to-end evaluation have been achieved, which provide some insights for systemic optimization.
2021年03期 v.26 322-334页 [查看摘要][在线阅读][下载 3006K] [下载次数:54 ] |[网刊下载次数:0 ] |[引用频次:7 ] |[阅读次数:0 ] - Jianqiang Huang;Wei Xue;Haodong Bian;Wenxin Yan;Xiaoying Wang;Wenguang Chen;
Despite efficient parallelism in the solution of physical parameterization in the Global/Regional Assimilation and Prediction System(GRAPES), the Helmholtz equation in the dynamic core, with the increase of resolution, can hardly achieve sufficient parallelism in the solving process due to a large amount of communication and irregular access. In this paper, optimizing the Helmholtz equation solution for better performance and higher efficiency has been an urgent task. An optimization scheme for the parallel solution of the Helmholtz equation is proposed in this paper. Specifically, the geometrical multigrid optimization strategy is designed by taking advantage of the data anisotropy of grid points near the pole and the isotropy of those near memory equator in the Helmholtz equation,and the Incomplete LU(ILU) decomposition preconditioner is adopted to speed up the convergence of the improved Generalized Conjugate Residual(GCR), which effectively reduces the number of iterations and the computation time.The overall solving performance of the Helmholtz equation is improved by thread-level parallelism, vectorization, and reuse of data in the cache. The experimental results show that the proposed optimization scheme can effectively eliminate the bottleneck of the Helmholtz equation as regards the solving speed. Considering the test results on a 10-node two-way server, the solution of the Helmholtz equation, compared with the original serial version, is accelerated by 100, with one-third of iterations reduced.
2021年03期 v.26 335-346页 [查看摘要][在线阅读][下载 636K] [下载次数:21 ] |[网刊下载次数:0 ] |[引用频次:3 ] |[阅读次数:0 ] - Wenjie Liu;Ke Zhou;Ping Huang;Tianming Yang;Xubin He;
DRAM-based memory suffers from increasing row buffer conflicts, which causes significant performance degradation and power consumption. As memory capacity increases, the overheads of the row buffer conflict are increasingly worse as increasing bitline length, which results in high row activation and precharge latencies. In this work, we propose a practical approach called Row Buffer Cache(RBC) to mitigate row buffer conflict overheads efficiently. At the core of our proposed RBC architecture, the rows with good spatial locality are cached and protected,which are exempted from being interrupted by the accesses for rows with poor locality. Such an RBC architecture significantly reduces the overheads of performance and energy caused by row activation and precharge, and thus improves overall system performance and energy efficiency. We evaluate RBC architecture using SPEC CPU2006 on a DDR4 memory compared to a commodity baseline memory system. Results show that RBC improves the overall performance by up to 2:24(16:1% on average) and reduces the memory energy by up to 68:2%(23:6% on average) for single-core simulations. For multi-core simulations, RBC increases the overall performance by up to1:55(17% on average) and reduces memory energy consumption by up to 35:4%(21:3% on average).
2021年03期 v.26 347-360页 [查看摘要][在线阅读][下载 1028K] [下载次数:43 ] |[网刊下载次数:0 ] |[引用频次:1 ] |[阅读次数:0 ] - Ruibo Wang;Kai Lu;Juan Chen;Wenzhe Zhang;Jinwen Li;Yuan Yuan;Pingjing Lu;Libo Huang;Shengguo Li;Xiaokang Fan;
Facing the challenges of the next generation exascale computing, National University of Defense Technology has developed a prototype system to explore opportunities, solutions, and limits toward the next generation Tianhe system. This paper briefly introduces the prototype system, which is deployed at the National Supercomputer Center in Tianjin and has a theoretical peak performance of 3.15 Pflops. A total of 512 compute nodes are found where each node has three proprietary CPUs called Matrix-2000+. The system memory is 98.3 TB,and the storage is 1.4 PB in total.
2021年03期 v.26 361-369页 [查看摘要][在线阅读][下载 1010K] [下载次数:39 ] |[网刊下载次数:0 ] |[引用频次:10 ] |[阅读次数:0 ] - Juan Chen;Xinxin Qi;Feihao Wu;Jianbin Fang;Yong Dong;Yuan Yuan;Zheng Wang;Keqin Li;
Achieving faster performance without increasing power and energy consumption for computing systems is an outstanding challenge. This paper develops a novel resource allocation scheme for memory-bound applications running on High-Performance Computing(HPC) clusters, aiming to improve application performance without breaching peak power constraints and total energy consumption. Our scheme estimates how the number of processor cores and CPU frequency setting affects the application performance. It then uses the estimate to provide additional compute nodes to memory-bound applications if it is profitable to do so. We implement and apply our algorithm to 12 representative benchmarks from the NAS parallel benchmark and HPC Challenge(HPCC)benchmark suites and evaluate it on a representative HPC cluster. Experimental results show that our approach can effectively mitigate memory contention to improve application performance, and it achieves this without significantly increasing the peak power and overall energy consumption. Our approach obtains on average 12.69% performance improvement over the default resource allocation strategy, but uses 7.06% less total power, which translates into 17.77% energy savings.
2021年03期 v.26 370-383页 [查看摘要][在线阅读][下载 766K] [下载次数:47 ] |[网刊下载次数:0 ] |[引用频次:3 ] |[阅读次数:0 ]