AutoML Stops Fraudulent Merchants

Business Solution Series

Nov 22, 2022

The Business Solutions Series is a compilation of solutions to various business challenges that I have encountered throughout my professional journey.

Context

The goal of this project was to protect acquiring banks - banks that provide credit card terminals to businesses - from nefarious merchants. These are merchants who use their online terminals to transact on products and services that are not allowed by the terms of the contract with their bank (e.g. counterfeit products, illegal substances, etc.)

Problem

The way these fraudulent merchants were operating is by sending bank applications misrepresenting the nature of their business in order to be approved. Once approved, they used the online terminals on their websites to sell prohibited products.

In order to crack down on this practice a team of analysts was going through all the merchant applications to verify that their listed website was not selling any prohibited product/services (and flagging those that were). In the early days it was very easy to detect thousands and thousands of these bogus applications by only going to their listed websites. But, as soon as they realized it was easy to monitor them, they started creating professional looking front websites in which they pretended to sell allowed products. However, their real intention once approved was to still use the online terminal on an undisclosed website that was breaking the terms.

Objective

To prevent nefarious merchants from using credit card terminals to sell prohibited products.

Solution

We had a rich labeled dataset of websites (several tens of thousands) that sold prohibited products from the early days. The data set consisted of a text document (text parsed from those websites with a crawler) and the different labels that our analysts provided when flagging them for prohibited products.

We were able to feed this data directly to an AutoML managed tool that would automatically pre process all the text into 1-grams and 2-grams (lists of all words and list of all pairs of words) and will learn embeddings for all these n-grams. It was also able to find the best neural network architecture and hyper parameters to train a very accurate multi class classification model. Once we had this model, we piggy backed on an open source project called Common Crawl which periodically crawls the whole internet and saves the crawled data into a public cloud storage for anyone to read. We then spin up a managed Apache Beam job to process the common crawl data and run each crawled website text through our model. Lastly, we dropped all the URLs with predictions into a data warehouse for the analysts to query later.

We then armed all our analysts with credit cards that were set up to always be declined by the issuing bank. We did that to track the merchants and acquiring banks on the other side of the transactions. By using the scores in the warehouse (prioritizing those with higher scores for prohibited products) our analysts were able to quickly find thousands of online stores selling prohibited products and taking credit cards without their bank knowledge. We then attempted to make purchases in those sites in order to find which specific merchants were using their online terminals on websites they didn’t list on their applications. These merchants were later shut down for breaking the terms.

Impact

The main takeaway from this story is that because we already had a very rich training dataset and the Common Crawl data at our disposal, we were able to easily leverage a set of managed services to produce very good business results. We only needed the initial model to have a low false positive rate on the top-scored websites, and the AutoML tool managed to do that very well. The scores from this model were then used to rank websites for analyst prioritization which allowed the analysts to focus almost exclusively on nefarious merchants and rapidly process and shut down thousands and thousands of them.

Data Science is not Rocket Science

Discussion about this post