More Data, Fewer Problems?

In Short

More Data, Fewer Problems?

How one computer science professor is using data to fight development aid fraud.

Feb 8, 2017

Chenxi Wang, Ph.D.

Email

View as PDF

The work of lifting countries out of poverty is never easy. But thought he could make it far more effective.

Rozier is a computer science professor at Iowa State with a penchant for who saw a big problem at the World Bank. Every year the Bank issues development loans across the world. Some are earmarked for projects to help countries or regions improve infrastructure and basic living standards — think access to clean water and medicine. But each project is complex, and requires many international suppliers to provide many goods and services to make them a reality. Unsurprisingly, the supplier selection process is competitive.In an ideal world, the contract would go to the company that offers the best combination of price and quality. And then, the development project would be underway.

But all too often, that’s not how it works. The bank sees a non-trivial amount of fraud in the supplier bid and selection process. From setting up fake companies to collusions and price fixes, fraud in this process can lead to the embezzlement or siphoning of funds. That could mean no road, well, or access to medicine for the intended country or region.

In 2014, Rozier started a research project with (DSSG) to “fight data with data.” His project focused on helping the bank identify fraud scenarios by building an automatic reasoning system that studies data patterns, and identifies what he calls “data integrity attacks”, which have a high correlation to potential fraud.

Rozier sees this as an attacker-vs-defender problem. The attackers — the fraudsters — aim to inject the system with bad data or subvert the decision process of the system. The defender’s task is to spot the bad data and/or the patterns of subversions.

An example of a data integrity attack, Rozier says, is collusion. In other words, multiple entities may work together to artificially inflate their bids on a loan in order to make a moderately high bid competitive. In those cases, either a single fraudster submits nearly all bids, sometimes over 90% of them, via fake companies. Or a supplier may collude with others to submit artificially inflated bids, subsequently dictating what a “reasonably-priced” bar is.

Another commonly seen fraud tactic relates to companies that the World Bank put on a “debarred” list, which hosts entities prohibited from submitting bids because of past violations or dubious practices. To thwart this process, a debarred company may obfuscate its identity by taking on a new name that is similar to a well-recognized, legitimate organization — think “PricewaterhoseCoopers” to resemble “PricewaterhouseCoopers” — or changing its name slightly to confuse automated identification algorithms — think “Amce inc” to “Acme Inc”.

And here, we get back to “fighting data with data.” In this situation, Rozier saw that “the defender” — in this case the World Bank — had fundamentally more data than a typical fraudster and attacker. The defender can leverage this fact to fight data fraud. For instance, the World Bank has an entire history of all bids submitted, both winning and losing ones. The fraudster may only have a partial view of such a history. In one case, Rozier’s study identified a number of fake bids that stood out because they followed a model of what bids looked like 10 years ago. “The bidding process has changed significantly in the last 5 years,” Rozier said, “these bids had the wrong parameters and patterns, and it was clear that they were manufactured bids.”

Rozier had discovered the key fighting attacks: “ You need a superior data model than your adversaries.”

Understanding how to preserve data integrity, and to build a superior data model to fight data fraud, is becoming ever more crucial. We rely on the integrity of financial records, election results, news, and medical records, to name just a handful of sources. If these data and the algorithms that process them become compromised, the very foundation of our society may be at risk. “The question is: how vulnerable is your system to data manipulation, gaming, and other forms of data integrity attacks, and what will you do about it?” Rozier asks.

It’s a question that’s just beginning to be answered. Data science for fraud and security is a relatively young field. Rozier’s work using semantic and syntactic clustering to resolve name conflicts, when tested on a World Bank data set, has greatly improved upon previous results gained using opensource tools like OpenRefine.

In addition, recent advances in deep learning applied in conjunction with data science, such as those seen in Google’s successful AI-driven GO game against the world’s best GO player, is “incredibly encouraging”, Rozier said. “The same deep learning techniques can be applied to fraud detection: We can label the data, learn something about the governing dynamics, refine the model, iterate, and eventually build sound analysis.”

Rozier is deep in the where he aims to apply deep learning principles to tackle data integrity. He believes that data science can help to effectively eliminate supplier fraud. “The hope is more projects can go forward unimpeded, and we’ll see more clean water, better infrastructure, and improved healthcare for more regions sooner rather than later.”

��Ƶ

Education & Work

Democratic Futures

Global Security

Technology & Democracy

Thriving Families

Trending Topics

Real Skills, Real Income: Why Youth Apprenticeship Is Resonating Now

Future-Proofing U.S. Nuclear Policy: Forecasting Outcomes of the Nuclear-Armed Sea-Launched Cruise Missile

Debunking Myths on Student Parent Data Collection

The App Store Accountability Act Poses Serious Concerns for Privacy, Security, and Free Expression

Redrawing School Boundaries for Fairer Funding

Reframing Fusion Voting as a Practical, Powerful Reform Strategy

Harnessing Terrorism Data to Reshape U.S. National Security Policy

Establishing a National Housing Loss Rate

��Ƶ Fellows

The Fifth Pillar: Where Higher Ed Goes from Here

Screen People

A New Fellowship to Foster Journalism Around Education, Economic Development, and the Future of Work

Lexington’s First Civic Assembly

More Data, Fewer Problems?

More ��Ƶ the Authors

Chenxi Wang, Ph.D.

Issues

Programs/Projects/Initiatives

Topics

More Data, Fewer Problems?

������Ƶ

Education & Work

Democratic Futures

Global Security

Technology & Democracy

Thriving Families

Real Skills, Real Income: Why Youth Apprenticeship Is Resonating Now

Future-Proofing U.S. Nuclear Policy: Forecasting Outcomes of the Nuclear-Armed Sea-Launched Cruise Missile

Debunking Myths on Student Parent Data Collection

The App Store Accountability Act Poses Serious Concerns for Privacy, Security, and Free Expression

Redrawing School Boundaries for Fairer Funding

Reframing Fusion Voting as a Practical, Powerful Reform Strategy

Harnessing Terrorism Data to Reshape U.S. National Security Policy

Establishing a National Housing Loss Rate

������Ƶ Fellows

The Fifth Pillar: Where Higher Ed Goes from Here

Screen People

A New Fellowship to Foster Journalism Around Education, Economic Development, and the Future of Work

Lexington’s First Civic Assembly

More ������Ƶ the Authors

Chenxi Wang, Ph.D.

Issues

Programs/Projects/Initiatives

Topics

Related

A Human to Know: Dr. Ashley Podhradsky

Don’t Touch My Password Routine!

“Quota Women”

Securing Technology is No Longer a ‘First World Problem’

More Data, Fewer Problems?

��Ƶ

��Ƶ Fellows

More ��Ƶ the Authors