聚类和分类方法在上市公司财务舞弊分析中应用

当前位置: 大雅查重 - 范文 更新时间:2024-01-15 版权:用户投稿原创标记本站原创
论文中文摘要:本文使用数据挖掘方法对上市公司白勺财务数据进行财务舞弊分析。结合财务舞弊理论从盈利能力、资产结构、效率、流量、流动性、成长性六个方面选择了28个指标;使用基于相关性白勺特征选择算法(CFS)筛选指标;对特征选择得到白勺特征子集使用K均值和EM期望最大化聚类算法进行聚类探测,得出财务舞弊公司一般具有主营业务收入增长率低于其他类均值、速动比率远高于其他类均值、每股净资产高于其他类均值白勺特征;依据聚类结果选择分类器白勺训练样本,从而提高了上市公司财务舞弊识别白勺准确率,训练样本上白勺准确率达到了8

5.7%,测试样本上白勺准确率达到了79.6%

Abstract(英文摘要):www.328tibEt.cn Fraudulent financial statements he a serious impact on capital market, securities market and the investors. Preventing firms to issue fraudulent financial statements is a necessary and meaningful task. Improving capability of identifying fraudulent financial statements is a effective way to solve this problem. But fraudulent means these firms use become more and more diversiform. It has become increasingly difficult to identify the fraud financial statements.As data mining technology continues to mature and become more widely applied. Data Mining Techniques become an alternative analysis for the financial statements. It can use abundant hidden information to anatomize the financial statements. We can use clustering to anatomize, compare result based on many situation and use practical meaning to explain the result. We can also use classify to analyze probability of fraudulence.This paper use Data Mining Techniques to anatomize the financial statements. Including aspects as follows:1.Data PreparationThe data this paper use is download from website wind , our team collect balance sheet, profit sheet, cash flow sheet, mid report, season report, finance summary and daily stock data, these data exist as Excel file, each firm has many kinds of report, each report has different format. We build index database by VB data integration program and index compute program. We build finance fraudulent character database by collecting data from website of C. Index data prepared by VB extraction program, a convenience sample data extraction.2.Finance Index SelectionWe choose 28 indexes based on profitability, asset structure, efficiency, cash flow , liquidity and growth. Index selection guided by following principles: First, combine the theory of fraudulent financial statements. Secondly, selected indexes must reflect all aspects of firms, Third, considering ailability of index which we selected.After Index Selection, We use feature selection algorithm based on correlation, we use genetic algorithm as search strategy. We finally get three subset has good performance.3.Cluster analysis of financial dataThe empirical analysis of this paper focused on clustering’s detecting ability and the effects after apply the result to the training data.We see that clustering feature has clearly performances in the cluster result. Quick-moving ratio is generally much higher than the erage in other categories, the main business income growth rate is lower than the other categories, nets asset per share is higher than the other categories. Financial reporting fraud is usually associated with those companies in financial difficulties. In order to cover up financial difficulties, they are more likely to fraud. This explains why the main business growth is relatively low in the cluster. The surplus minus Cash flows playing a very important role in the accounting fraud, some accrual accounting fraud associated with a high level of it. Surplus minus cash flows positive is a signal of potential fraud. Moreover, the fraud company’s free cash much lower, as compared with non-fraud. Fraud companies usually issue more interest securities, higher financial leverage, more account receivable balance, more sales of higher growths, higher market returns to its assets and market value. However, the absolute value of its assets and sales are usually aller. This explains the cluster’s liquid ratio is far higher than that of other cluster.4.Compare classifiers use different training dataWe use different training data for training the neural network classifiers. The first is random sampling method; the second sample of non-fraud, we made a choice based on clustering results. We use WEKA’s multi-feedback neural network as classifiers. When we training classifiers based on clustering, Classification results he been noticeable improved, the correct identification rate of the test samples from 73.5% to 79.6%. Based on the above empirical analysis, we see clustering’s ability to detect unknown data, especially for fraudulent financial statements application. Clustering can provide reference of training data choosing, thereby improving the classification model, and improve recognition accuracy.The process of data mining to find data model is very dependent on the data. Sometimes the data has so many complex structures that we can’t find meaningful patterns even using best algorithm. Sometimes many features will offset each other. Financial statements data has complex data structure. Clustering provides a way to analyze complex data structure, it will decompose competing signal. Clustering is non-directed knowledge discovery tool, for the automatic detection clustering only detects existing data structure, not considering any specific target variables and there is also no difference between the independent variables and non-independent variables. Clustering algorithm search the records of the different groups called cluster, the aim of the algorithm is to find the comparability. Finally, we will find whether similar things on behalf of a meaningful reality.We analysis pattern of fraud company by using cluster, choose training data by cluster result, thereby increasing the rate of correct identification of financial fraud.
论文关键词: 财务舞弊;聚类;分类;