Big Data Infra for Enterprises – Fraud Detection


This Pre-Strata meetup is sponsored by LexisNexis (

This is part of the “Big Data Infra for Enterprises” series of meetups.

Big Data Cloud Inc.
A not-for-profit Organization for Evangelizing & Training around Big Data.
Founded & Operated by Third Eye Consulting Services & Solutions LLC. (
(Click here to buy the annual membership)


6:00 pm – 6:30 pm :
Registration & Mixer

6:30 pm – 7:15 pm:
Large Scale Identity Theft and Fraud to Make Bucket Loads of Easy Money
– Jo Prichard, LexisNexis Risk Solutions

7:30 pm – 8:15 pm:
Real-time Fraud and Spam Detection Using Cassandra, Hadoop and S3
– Mark Kent, Proofpoint

8:15 pm – 9:30 pm:
Q&A Session, Networking, Mixer Details:

Large Scale Identity Theft and Fraud to Make Bucket Loads of Easy Money
– Jo Prichard, LexisNexis Risk Solutions

Attendees will be shown how to leverage crowdsourcing to commit various types of fraud such as Tax Refund Fraud, Disaster-Relief Fraud, Benefits Fraud (Medicaid Fraud, Medicare Fraud, Welfare Fraud and Section 8 Housing Assistance Fraud). We will look at which opportunities and segments of the population are easy targets for large scale identity fraud what insights are gained from this analysis and what can be done on the ground to narrow the window of opportunity for these types of operations and schemes.

Attendees will also be given a solution to how to detect and combat these various types of fraud. This session will discuss the challenge of resolving identity from billions of identity fragments and why the bigger data, the better the resolution. The session will outline what this means for Business and why Consumers should care about how to best generate a meaningful public data identity for themselves.

We will look at some of the interesting high level identity statistics and insights we have gained from large scale data analysis and exploration using our massive data assets. The discussion will include questions like: What is the average age a person starts to establish an identity footprint (with and without immigrant identities)?

Lastly, we will cover the one surprising Big Data perspective that differentiates fake or synthetic identities from real identities. We’ll give you a hint: it’s the one thing that identity thieves forget to do when they create a fake or synthetic identity, and this one thing, helps distinguish real identities from fake identities.


Real-time Fraud and Spam Detection Using Cassandra, Hadoop and S3
– Mark Kent, Staff Software Engineer, Proofpoint

This presentation will focus on Proofpoint’s fraud and spam detection techniques using insights from Big Data, and the challenges of translating those insights into real-time decisions.

The following topics will be covered:

Characteristics of fraud and spam used to confound traditional detection systems. Big data methods to mine data for real-time fraud and spam anomolytics. Lessons learned from real-time analysis using Cassandra, Hadoop and S3 as well as Proofpoint’s own open source big data tools.
Light snacks & drinks would be served.