融合频繁项集和潜在语义分析的股评论坛主题发现方法
Topic Discovery Method of Stock Bar Forum Based on Integration of Frequent Item-set and Latent Semantic Analysis
投稿时间:2018-05-01  修订日期:2019-02-26
DOI:10.11908/j.issn.0253-374x.2019.04.019     稿件编号:    中图分类号:TP391
 
摘要点击次数: 110    全文下载次数: 136
中文摘要
      针对股评论坛主题发现,提出基于频繁项集与潜在语义相结合的短文本聚类(STC_FL)框架.在基于知网的知识获取后得到概念向量空间,挖掘并筛选出重要频繁项集,然后采用统计和潜在语义相结合的方法进行重要频繁项集的自适应聚类.最后,提出TSC SN(text soft classifying based on similarity threshold and non overlapping)算法,通过参数调优策略选择和控制文本软聚类过程.股吧论坛数据实证分析发现:所提出的STC_FL框架和TSC SN算法可充分挖掘文本潜在语义信息,并有效降低特征空间维度,最终实现对短文本的深层次信息挖掘和主题归类.
英文摘要
      To achieve more effective topic discovery of stock bar forum, this paper presents a framework with short text clustering based on frequent item set and latent semantic (STC_FL). The important frequent item sets are acquired with the concept vector space based on HowNet, and then a combination pattern of statistics and latent semantics is used to realize the self adaptive clustering of important frequent item sets. Finally, the algorithm of text soft classifying based on similarity threshold and non overlapping (TSC SN) is proposed. Text soft clustering is selected and controlled with parameter optimization. By taking the real stock bar forum data as a specific case of empirical analysis, it is shown that STC_FL framework and TSC SN algorithm can fully exploit the latent semantic information of text and reduce the dimension of feature space, which realizes the deep information mining and topic classification of short texts.
HTML   查看全文  查看/发表评论  

您是第5377799位访问者
版权所有《同济大学学报(自然科学版)》
主管单位:教育部 主办单位:同济大学
地  址: 上海市四平路1239号 邮编:200092 电话:021-65982344 E-mail: zrxb@tongji.edu.cn
本系统由北京勤云科技发展有限公司设计