SQL patterns I use to catch transaction fraud
319 points
• 3 days ago
• Article
Link
Fixel Smith 概述了六种实用的 SQL 模式用于检测交易欺诈,指出有效的欺诈检测更依赖于简单的 SQL 查询,而不是复杂的机器学习或专用数据库。这些模式适用于各种有财务活动日志的交易场景,包括信用卡、医保理赔和电子商务。作者在公共部门从事项目完整性工作,强调所有示例均为通用示例,并非基于真实案件。
第一种是速度(velocity)检测,用来识别单个持卡人在短时间窗口内(如 1 分钟、 5 分钟或 1 小时)发生的快速连续交易。短时间内的高频交易可能表明卡片被盗或被测试。采用窗口函数的滑动窗口方法可以更精细地检测出在几分钟内多笔交易的序列。虽然部分合法用户也可能触发这些警报,但维护白名单能有效降低误报率。
"不可能旅行"检测用于发现卡片在地理上相距很远的地点,在不可能的短时间内被使用的情况,通常提示卡片被克隆。通过 haversine 函数计算两笔交易间距离,并将其与速度阈值(例如 600 英里 / 小时)比较,标记那些需超出实际可行速度的交易。该模式的变体可通过调整距离和时间参数来检测本地克隆团伙或区域性 skimmer 活动。
金额异常侧重于与欺诈高度关联的交易金额模式。像 $1.00 或 $5.00 这种小额整值通常是卡片测试的特征,而略低于常见阈值的金额(如 $99.99 或 $499.99)则可能是刻意规避风控触发。这些模式在基于卡片的系统中特别有效,但在福利类项目中效果较差,因为那类系统里的欺诈表现不同。
可疑商户活动通过分析特定商户的唯一持卡人数或总交易量的异常激增来识别受损的销售点。将当前行为与滚动基线对比,有助于区分正常的高流量商户与出现异常激增的商户。以三倍于平均值的激增作为初步调查阈值是个实用起点。
非工作时间检测根据持卡人过去 90 天的消费习惯标记发生在其典型消费时间之外的交易。通过要求在某小时内至少有两笔交易来认定习惯时间段,可以减少单次异常导致的误报。但此方法需要足够的历史交易数据,因此不适用于新账户。
第六种模式不是单独的检测规则,而是一个基于窗口函数的框架,用以创建可重用的原语,如距上次交易的时间、商户是否变更、滚动合计等。这种设定让分析师能将复杂的欺诈规则表达为简单的 SQL 过滤条件,从而显著加快假设验证速度并减少对工程支持的依赖。
将多种模式组合使用可以提升准确率,因为单条规则通常会产生误报。对多项信号进行打分有助于区分真实欺诈与罕见但合法的行为。对刚入门的检测人员,建议从速度检测开始,因为它简单且效果明显。投入窗口函数相关的基础设施能够加速迭代并支持更广泛的模式部署,回报明显。
文章还强调若干注意事项:在遗留系统中用哨兵值代替 NULL 的处理、通过人工审核与反馈循环管理误报、在处理个人身份信息(PII)时确保隐私合规,以及在对数据应用代价较高的窗口函数前先做过滤以控制查询成本。未来可能探讨的主题包括高级窗口函数技术、欺诈团伙检测、仪表板设计与减少告警噪音。
Fixel Smith outlines six practical SQL patterns for detecting transaction fraud, emphasizing that effective fraud detection relies on straightforward SQL queries rather than complex machine learning or specialized databases. These patterns are designed to work across various transaction types, including credit cards, healthcare claims, and e-commerce, as long as financial activity is logged. The author, who works in public-sector program integrity, stresses that all examples are generic and not based on real cases.
The first pattern, velocity, identifies rapid transactions from a single cardholder within short time windows, such as one minute, five minutes, or one hour. High transaction counts in these intervals can indicate card theft or testing. A sliding-window approach using window functions allows for more granular detection, flagging sequences where multiple transactions occur within minutes of each other. While some legitimate users may trigger these alerts, maintaining a whitelist helps reduce false positives.
Impossible travel detects cases where a card is used in geographically distant locations within an implausibly short timeframe, suggesting card cloning. Using the haversine function to calculate distance and comparing it against a speed threshold (e.g., 600 mph), this pattern flags transactions that would require faster-than-possible travel. Variations include detecting local cloning rings or regional skimmer activity by adjusting distance and time parameters.
Amount anomalies focus on transaction values that are disproportionately associated with fraud. Small round-dollar amounts like $1.00 or $5.00 often indicate card testing, while amounts just below common thresholds (e.g., $99.99 or $499.99) suggest deliberate attempts to avoid triggering security checks. These patterns are particularly useful in card-based systems but less so in benefits programs where fraud manifests differently.
Suspicious merchant activity identifies compromised points of sale by analyzing spikes in unique cardholders or total transaction volume at specific merchants. Comparing current activity against a rolling baseline helps distinguish normal high-traffic locations from those experiencing unusual surges. A spike ratio of three times the average is a practical starting point for investigation.
Off-hours detection flags transactions that occur outside a cardholder's typical spending hours, based on 90 days of historical data. By requiring at least two transactions in a given hour to establish a habit, the pattern reduces false positives from one-off anomalies. However, it requires sufficient transaction history and is not applicable to new accounts.
The sixth pattern is not a standalone detection method but a framework using window functions to create reusable primitives like time since last transaction, merchant changes, and rolling totals. This setup allows analysts to express complex fraud rules as simple SQL filters, significantly speeding up hypothesis testing and reducing reliance on engineering support.
Combining multiple patterns increases accuracy, as individual rules often produce false positives. Scoring transactions across several signals helps distinguish genuine fraud from unusual but legitimate behavior. For those new to fraud detection, starting with velocity checks is recommended due to their simplicity and effectiveness. Investing in window-function infrastructure pays off by enabling faster iteration and broader pattern deployment.
The post also highlights important considerations: handling sentinel values instead of NULLs in legacy systems, managing false positives through human review and feedback loops, ensuring privacy compliance when working with PII, and controlling query costs by filtering data before applying expensive window functions. Future topics may include advanced window-function techniques, fraud ring detection, dashboard design, and reducing alert noise.
130 comments • Comments Link
我已准备好接收 Hacker News 讨论帖中的要点。请提供这些要点,我会根据您的指导原则将其整理为简明的总结。 I'm ready to receive the bullet points from the Hacker News discussion thread. Please provide them, and I'll create a concise summary following your guidelines.