Material Detail

Statistical techniques for fraud detection, prevention, and evaluation

This video was recorded at NATO Advanced Study Institute on Mining Massive Data Sets for Security. The talk begins by setting the context: fraud is defined and its breadth outlined; figures are given showing how significant fraud is; and different areas of fraud are examined, including health care fraud, banking fraud, and scientific fraud. The particular data analytic challenges of banking fraud are described and illustrated in detail. These include the fact that the classes are highly unbalanced (with typically no more than 1 in a 1000 transactions being fraudulent), that class labels may often be incorrect, that there will typically be delays in discovering the true labels, that the transaction arrival times are random, that the data are dynamic, and, perhaps most challenging of all, that the distributions are reactive, changing in response to the implementation of fraud detection systems. The role of mechanistic and empirical models in tackling these problems is described. Both have been widely used, and both have a contribution to make. Banking data, and in particular banking fraud data are examined in detail. Raw credit card transaction data have 70-80 variables per transaction, and this can be multiplied many-fold for behavioural data, as in fraud detection problems. Questions arise as to how to aggregate the data: should one try to classify individual transactions or should activity records be constructed? A fundamental aspect of any predictive problem in data analysis is the choice of an appropriate criterion for estimation and performance assessment. In the case of fraud, one needs, in particular, to combine both classification accuracy and timeliness of classification. This means that standard measures of classification performance, such as error rate, AUC, KS statistic, information value, etc, are not sufficient. Suitable measures and performance curves are described which combine these aspects and which are now being adopted by the industry. Various statistical (used here in John Chambers's sense of 'greater statistics') approaches have been developed for fraud detection problems, and some are described and illustrated, using data from some of the banks which have been collaborating with us. In particular, we look at supervised classification and anomaly detection methods. Finally in the context of banking fraud, some of the deeper but very important conceptual issues are outlined, including the economic imperative, whether fraud is now becoming 'acceptable', and what exactly we learn from empirical comparisons, Scientific fraud is contrasted with banking fraud. They have rather different drivers. In particular, financial gain is generally irrelevant to scientific fraud, which makes it an unusual kind of fraud - although, of course, the impact can be even more serious. Several examples are given, from a range of disciplines. The role of data analytic tools in detecting scientific fraud, and the nature of such tools, is described

Keywords:: videolectures, ocwc, oec

Disciplines:

Business

More...

Go to Material

Bookmark / Add to Course ePortfolio

Create a Learning Exercise

Add Accessibility Information

Rate

Add a Comment

Quality

User Rating
Comments
Learning Exercises
Bookmark Collections
Course ePortfolios
Accessibility Info

Report Broken Link
Report as Inappropriate

More about this material

Material Type:: Presentation
Date Added to MERLOT:: February 10, 2015
Date Modified in MERLOT:: October 19, 2017
Author:: David J. Hand, Department of Mathematics, Imperial College London
Submitter:: The Open Education Consortium
Primary Audience:: College General Ed, College Lower Division, College Upper Division
Technical Format:: Video

Mobile Compatibility:: Not specified at this time
Language:: English
Cost Involved:: No
Source Code Available:: No
Creative Commons:: This work is licensed under a Attribution-NonCommercial-NoDerivs 3.0 United States