针对具有稀疏性的流式大数据卸载方法
Load Shedding Methods for Big Data Stream with Sparsity
投稿时间:2019-02-21  修订日期:2020-01-13
DOI:10.11908/j.issn.0253-374x.19054     稿件编号:    中图分类号:TP338
 
摘要点击次数: 120    全文下载次数: 23
中文摘要
      在保证实时性的前提下提高流式大数据卸载的准确性是一个重要问题。针对具有稀疏性的流式大数据开展2种典型场景下的卸载研究。对普通均匀业务的流式大数据进行空间建模,使用弹性距离对数据间的距离进行放缩,提出基于离心率的卸载方法。对异常检测业务流式大数据应用场景进行特征分析,使用预处理自动机对数据的动态处理过程进行描述,在综合考虑数据和处理行为相似度基础上,提出基于等价类划分的卸载方法。重复试验表明,所提出的卸载方法与传统卸载方法相比能明显提高卸载的有效性。
英文摘要
      How to improve the accuracy of load shedding under the premise of ensuring real-time performance is an important problem. Sparsity is a widespread feature of the big data stream. Therefore, we propose two load-shedding methods of the big data stream with sparsity in two scenarios. In the normal business scenario, we model the big data stream with the high dimensional space. Then we propose a load shedding method based on centrifugation, which uses the elastic distance to measure the distance of data. In the anomaly-monitoring scenario, we analyze the feature of the big data stream and propose a load shedding method based on equivalence class, which uses the combined similarity to divide the data set into equivalence classes. The combined similarity was composed of processing behavior similarity and data similarity to measure the difference between data. Repeated test results show that the two load shedding methods in this paper can significantly improve the accuracy compared with the conventional load shedding methods.
HTML   查看全文  查看/发表评论  

您是第6620926位访问者
版权所有《同济大学学报(自然科学版)》
主管单位:教育部 主办单位:同济大学
地  址: 上海市四平路1239号 邮编:200092 电话:021-65982344 E-mail: zrxb@tongji.edu.cn
本系统由北京勤云科技发展有限公司设计