MIX-RS:A Multi-Indexing System Based on HDFS for Remote Sensing Data StorageMIX-RS:A Multi-Indexing System Based on HDFS for Remote Sensing Data Storage
Jiashu Wu;Jingpan Xiong;Hao Dai;Yang Wang;Chengzhong Xu;
摘要(Abstract):
A large volume of Remote Sensing(RS) data has been generated with the deployment of satellite technologies.The data facilitate research in ecological monitoring,land management and desertification,etc.The characteristics of RS data(e.g.,enormous volume,large single-file size,and demanding requirement of fault tolerance) make the Hadoop Distributed File System(HDFS) an ideal choice for RS data storage as it is efficient,scalable,and equipped with a data replication mechanism for failure resilience.To use RS data,one of the most important techniques is geospatial indexing.However,the large data volume makes it time-consuming to efficiently construct and leverage.Considering that most modern geospatial data centres are equipped with HDFS-based big data processing infrastructures,deploying multiple geospatial indices becomes natural to optimise the efficacy.Moreover,because of the reliability introduced by high-quality hardware and the infrequently modified property of the RS data,the use of multi-indexing will not cause large overhead.Therefore,we design a framework called Multi-IndeXing-RS(MIX-RS) that unifies the multi-indexing mechanism on top of the HDFS with data replication enabled for both fault tolerance and geospatial indexing efficiency.Given the fault tolerance provided by the HDFS,RS data are structurally stored inside for faster geospatial indexing.Additionally,multi-indexing enhances efficiency.The proposed technique naturally sits on top of the HDFS to form a holistic framework without incurring severe overhead or sophisticated system implementation efforts.The MIX-RS framework is implemented and evaluated using real remote sensing data provided by the Chinese Academy of Sciences,demonstrating excellent geospatial indexing performance.
关键词(KeyWords):
基金项目(Foundation): supported in part by Key-Area Research and Development Program of Guangdong Province(No.2020B010164002);; the Fundamental Research Foundation of Shenzhen Technology and Innovation Council (No.KCXFZ20201221173613035)
作者(Authors): Jiashu Wu;Jingpan Xiong;Hao Dai;Yang Wang;Chengzhong Xu;
参考文献(References):
- [1]The National Aeronautics and Space Administration,https://www.nasa.gov/,2021.
- [2]European Space Agency,https://www.esa.int/,2021.
- [3]Land Sat Science,Landsat 8 overview,https://landsat.gsfc.nasa.gov/landsat-8,2021.
- [4]J.W.Wang,X.Huang,J.Y.Zheng,C.Rajapakshe,S.Kay,L.Kandoor,T.Maxwell,and Z.B.Zhang,Scalable aggregation service for satellite remote sensing data,in Proc.20t hInt.Conf.Algorithms and Architectures for Parallel Processing,New York,NY,USA,2020,pp.184-199.
- [5]Y.B.Huang,Z.X.Chen,T.Yu,X.Z.Huang,and X.F.Gu,Agricultural remote sensing big data:Management and applications,J.Integrat.Agric.,vol.17,no.9,pp.1915-1931,2018.
- [6]D.M.Huang,X.N.Liu,B.M.Song,J.Chen,S.Masae,Y.S.Wang,T.Shigeo,H.Yoshimichi,and Y.Yasuo,Vegetation spatial heterogeneity of different soil regions in Inner Mongolia,China,Tsinghua Science and Technology,vol.12,no.4,pp.413-423,2007.
- [7]D.M.Huang,Y.S.Wang,S.Masae,X.N.Liu,B.M.Song,J.Chen,T.Shigeo,H.Yoshimichi,and Y.Yasuo,Spatial heterogeneity of vegetation in China,Tsinghua Science and Technology,vol.12,no.4,pp.424-434,2007.
- [8]J.Y.Liang and D.S.Liu,Estimating daily inundation probability using remote sensing,riverine flood,and storm surge models:A case of hurricane harvey,Remote Sens.,vol.12,no.9,p.1495,2020.
- [9]M.Chen,S.W.Mao,and Y.H.Liu,Big data:A survey,Mobile Netw.Appl.,vol.19,no.2,pp.171-209,2014.
- [10]M.Li,J.S.Wu,J.B.Dai,Q.S.Jiang,Q.Qu,X.L.Huang,and Y.Wang,A self-contained and self-explanatory DNAstorage system,Sci.Rep.,vol.11,p.18063,2021.
- [11]J.M.Haut,M.E.Paoletti,S.Moreno-′Alvarez,J.Plaza,J.A.Rico-Gallego,and A.Plaza,Distributed deep learning for remote sensing data interpretation,Proc.IEEE,vol.109,no.8,pp.1320-1349,2021.
- [12]M.S.Warren,S.P.Brumby,S.W.Skillman,T.Kelton,B.Wohlberg,M.Mathis,R.Chartrand,R.Keisler,and M.Johnson,Seeing the earth in the cloud:Processing one petabyte of satellite imagery in one day,in Proc.of the2015 IEEE Applied Imagery Pattern Recognition Workshop(AIPR),Washington,DC,USA,2015,pp.1-12.
- [13]L.H.Li,W.P.Jing,and N.H.Wang,An improved distributed storage model of remote sensing images based on the HDFS and pyramid structure,Int.J.Comput.Appl.Technol.,vol.59,no.2,pp.142-151,2019.
- [14]B.E.B.Semlali and C.El Amrani,Big data and remote sensing:A new software of ingestion,Int.J.Electr.Comput.Eng.,vol.11,no.2,pp.1521-1530,2021.
- [15]Z.C.Xing and G.M.Li,Intelligent classification method of remote sensing image based on big data in spark environment,Int.J.Wirel.Inf.Netw.,vol.26,no.3,pp.183-192,2019.
- [16]P.Y.Wang,J.Q.Wang,Y.Chen,and G.Y.Ni,Rapid processing of remote sensing images based on cloud computing,Future Gener.Comput.Syst.,vol.29,no.8,pp.1963-1968,2013.
- [17]A.K.Karun and K.Chitharanjan,A review on hadoopHDFS infrastructure extensions,in Proc.of the 2013IEEE Conf.Information&Communication Technologies,Thuckalay,India,2013,pp.132-137.
- [18]A.Eldawy and M.F.Mokbel,Spatial Hadoop:AMap Reduce framework for spatial data,in Proc.of the 2015IEEE 31stInt.Conf.Data Engineering,Seoul,Republic of Korea,2015,pp.1352-1363.
- [19]A.Eldawy,Y.Li,M.F.Mokbel,and R.Janardan,Cg Hadoop:Computational geometry in Map Reduce,in Proc.21stACM SIGSPATIAL Int.Conf.Advances in Geographic Information Systems,Orlando,FL,USA,2013,pp.294-303.
- [20]K.M.Al Naami,S.Seker,and L.Khan,GISQF:An efficient spatial query processing system,in Proc.of the 2014 IEEE7t hInt.Conf.Cloud Computing,Anchorage,AK,USA,2014,pp.681-688.
- [21]A.Eldawy,M.F.Mokbel,S.Alharthi,A.Alzaidy,K.Tarek,and S.Ghani,SHAHED:A Map Reduce-based system for querying and visualizing spatio-temporal satellite data,in Proc.of the 2015 IEEE 31stInt.Conf.Data Engineering,Seoul,Republic of Korea,2015,pp.1585-1596.
- [22]M.W.Ding,L.Zheng,Y.C.Lu,L.Li,S.Guo,and M.Y.Guo,More convenient more overhead:The performance evaluation of Hadoop streaming,in Proc.2011 ACM Symp.Research in Applied Computation,Miami,FL,USA,2011,pp.307-313.
- [23]X.F.Lü,C.Q.Cheng,J.Y.Gong,and L.Guan,Review of data storage and management technologies for massive remote sensing data,Sci.China Technol.Sci.,vol.54,no.12,pp.3220-3232,2011.
- [24]A.Fox,C.Eichelberger,J.Hughes,and S.Lyon,Spatiotemporal indexing in non-relational distributed databases,in Proc.of the 2013 IEEE Int.Conf.Big Data,Silicon Valley,CA,USA,2013,pp.291-299.
- [25]I.S.Suwardi,D.Dharma,D.P.Satya,and D.P.Lestari,Geohash index based spatial data model for corporate,in Proc.of the 2015 Int.Conf.Electrical Engineering and Informatics (ICEEI),Denpasar,Indonesia,2015,pp.478-483.
- [26]K.Y.Huang,G.Q.Li,and J.Wang,Rapid retrieval strategy for massive remote sensing metadata based on Geo Hash coding,Remote Sens.Lett.,vol.9,no.11,pp.1070-1078,2018.
- [27]J.J.Liu,H.R.Li,Y.Gao,H.Yu,and D.Jiang,AGeo Hash-based index for spatial data management in distributed memory,in Proc.of the 2014 22ndInt.Conf.Geoinformatics,Kaohsiung,China,2014,pp.1-4.
- [28]R.T.Whitman,M.B.Park,S.M.Ambrose,and E.G.Hoel,Spatial indexing and analytics on Hadoop,in Proc.22ndACM SIGSPATIAL Int.Conf.Advances in Geographic Information Systems,Dallas,TX,USA,2014,pp.73-82.
- [29]C.Xu,X.P.Du,Z.Z.Yan,and X.T.Fan,Science Earth:A big data platform for remote sensing data processing,Remote Sens.,vol.12,no.4,p.607,2020.
- [30]P.Petrov,P.Dimitrov,and S.Petrova,GEOHASH-EAS-Amodified geohash geocoding system with equal-area spaces,in Proc.of the 18t hInt.Multidisciplinary Scientific Geo Conference SGEM2018,Bulgaria,Russia,2018,pp.187-194.
- [31]N.Guo,W.Xiong,Y.Wu,L.Chen,and N.Jing,Ageographic meshing and coding method based on adaptive Hilbert-Geohash,IEEE Access,vol.7,pp.39815-39825,2019.
- [32]V.Mithal,A.Khandelwal,S.Boriah,K.Steinhaeuser,and V.Kumar,Change detection from temporal sequences of class labels:Application to land cover change mapping,in Proc.2013 SIAM Int.Conf.Data Mining,Austin,TX,USA,2013,pp.650-658.
- [33]J.H.Faghmous,M.Le,M.Uluyol,V.Kumar,and S.Chatterjee,A parameter-free spatio-temporal pattern mining model to catalog global ocean dynamics,in Proc.of the2013 IEEE 13t hInt.Conf.Data Mining,Dallas,TX,USA,2013,pp.151-160.
- [34]T.Yu,N.Chawla,and S.Simoff,Computational Intelligent Data Analysis for Sustainable Development.New York,NY,USA:CRC Press,2013.
- [35]W.W.Jiang and L.Zhang,Geospatial data to images:Adeep-learning framework for traffic forecasting,Tsinghua Science and Technology,vol.24,no.1,pp.52-64,2019.
- [36]Z.Y.Zhang,X.N.Tong,K.T.Mc Donnell,A.Zelenyuk,D.Imre,and K.Mueller,An interactive visual analytics framework for multi-field data in a geo-spatial context,Tsinghua Science and Technology,vol.18,no.2,pp.111-124,2013.
- [37]S.Li,B.H.Xie,J.S.Wu,Y.Zhao,C.H.Liu,and Z.M.Ding,Simultaneous semantic alignment network for heterogeneous domain adaptation,in Proc.28t hACM Int.Conf.Multimedia,Seattle,WA,USA,2020,pp.3866-3874.
- [38]RCEECA CAS,Central Asian Ecology and Environment Research Center of Chinese Academy of Sciences,http://www.egi.cas.cn/yjpt/zgkxyzystyhjyjzx 163317/,2021.
- [39]A.Aji,F.S.Wang,H.Vo,R.Lee,Q.L.Liu,X.D.Zhang,and J.Saltz,Hadoop GIS:A high performance spatial data warehousing system over Map Reduce,Proc.VLDB Endow.,vol.6,no.11,pp.1009-1020,2013.
- [40]T.H.Cormen,C.E.Leiserson,R.L.Rivest,and C.Stein,Introduction to Algorithms,3rd ed.Cambridge,MA,USA:MIT Press,2009.
- [41]T.Zhang,L.H.Yang,D.H.Shen,and Y.L.Fan,An efficient in-memory R-tree construction scheme for spatio-temporal data stream,in Proc.of the ADMS,ASOCA,ISYy CC,Clo TS,DDBS,and NLS4Io T,Hangzhou,China,2019,pp.253-265.