Every year the retail industry lose billions of dollars to fraud in the US alone. To complicate the matter research in the field has been obstructed due to the sensitive nature of transactional data. To facilitate future studies researchers in the BigData profile at BTH set out to develop a simulator generating synthetic transactional data without compromising sensitive information.
Retail fraud is common and it is the amount of scams rather than the size of the transactions that create the costly losses. A woman working at a petrol station was recently sentenced for repeatedly stealing money from the cash register and trying to cover up her crimes by registering false returns. Similarly, a 30 year old man was caught when trying to return ski boots he had snatched from the store shelf moments earlier. These stories are just two examples from the news of fraud occurring in retail stores.
Using real transactional data compromises privacy
Historically these frauds have been combatted through various policies such as return policies, but lately other solutions have been developed where transactions etc. are monitored and suspicious behaviour is flagged for further investigation. However fraud is usually detected after rigorous inventory which is not often performed in stores. There is also currently a lack of public research in the area. The main reason for this is the sensitive nature of the data. Publishing real financial transaction data would seriously compromise the privacy of both customers and companies alike. Anonymization techniques are often not considered sufficiently effective and the risk of leakage is hard to calculate.
Creating a simulator
Edgar Alonso Lopez-Rojas, Dan Gorton and Stefan Axelsson set out to create realistic fraud research data by developing a simulator, primed by real data which would enable sharing data with the research community, without exposing potentially sensitive information. The model they came up with is based on historical transaction data provided by one of the largest Scandinavian shoe retailers and contains several hundred million records. The simulator is built on the concept of Multi-Agent-Based-Simulation (MABS) that simulates the normal operation of a shoe store and includes simulation of different fraud scenarios. The simulator is called RetSim (short for Retail Simulator) and allows us to generate synthetic transactional data that can be publicly shared and studied without leaking business sensitive information, and still preserve the important characteristics of the data. The idea is that we could develop methods that help identify the important and perhaps systematic loss earlier. Another benefit of the simulator is that it allows researchers to measure the cost of fraud since we keep track of the fraudulent behaviour. There are also applications for managers which include estimation of costs for different expected fraud scenarios. This helps managers take a closer look at the potential loss on profit and make informed decisions for investments in fraud detection.
A general model can be applicable in other domians
RetSim is intended to be used for developing and testing fraud scenarios in a shoe retail store, while keeping business sensitive and private personal information about customers secret from competitors and others. However, as the model is focusing on the salesperson, customer relation, it should be applicable in other retail settings. The model was created to be general enough to be applicable in other domains like online financial services or any number of systems dominated by handling many small transactions.
Synthetic data facilitates further research in retail fraud
The purpose of the simulator is to generate transaction data sets that can be used for research into fraud detection. Synthetic data sets generated by RetSim can aid academia, companies and governmental agencies in testing their methods and the performance of their methods under similar conditions on the same data set.
For the future, several improvements and additions to the current model are already being planned. The intention is to make RetSim available to the research community together with standard data sets to facilitate research in the area and help fight fraud.