Data Scientist

Paulo Azevedo

Fraud Prevention · Risk Analytics · Product Analytics · Data Platforms

Nine years building ML and data systems for fraud, risk and analytics. I came from statistics and signal-processing research, and I still work the same way: find the structure in noisy data, then build a system around it.

+R$1.5M/mo
loss-making loan book made profitable
−30%
fraud-operation cost cut
+18%
payment success-rate lift
227
countries reached, market mapped
1wk→2h
model backtest (was a full week)
9+
years in ML, fraud & risk

About

Data scientist with 9+ years building machine-learning systems, fraud-prevention platforms, risk decision engines and analytics infrastructure across fintech, edtech and large-scale digital businesses.

My foundation is statistics (UNICAMP). In practice, that means I'm at home with noisy, high-dimensional problems. I cut through to what actually matters, then build the systems that take it to production.

During my undergraduate research I co-authored three scientific papers in computational aeroacoustics, funded by Boeing, and planned to go on to a master's and a PhD. I couldn't. I needed an income, so I went into industry instead. I don't treat that as a gap. I spent the next decade building real systems and grew from analyst to one of the principal data scientists at a market leader.

These days I focus on leverage: champion-challenger automation, feature stores, graph-based decision engines, data platforms. Infrastructure a whole team can build on, not just one more model.

Experience

Staff Data Scientist
Elevifycurrent
May 2026 – Present
Remote

The only data hire at a ~15-person global edtech (online courses sold worldwide). The legacy setup was a PostgreSQL stuck on a small VM ("DataFy"), wired to the production databases, and it couldn't take on more data or support complex analysis. I built a platform from scratch in its place: BigQuery as the warehouse, Cloud Run services ingesting every service database, and the marketing sources (Google Analytics, Search Console, Ads) unified alongside. Scalable and manageable, with every data source in one place, built end to end and solo in under a month.

  • Replaced the legacy "DataFy" (Postgres on a VM that couldn't scale) with a BigQuery warehouse, fed by Cloud Run ingestion services pulling every service database
  • Unified the marketing sources in the warehouse (Google Analytics, Search Console and Google Ads), with Metabase dashboards and Slack monitoring alerts
  • Mapped market penetration across all 227 countries: where Elevify already sells, where penetration is strong, and how much of the market is still open · 227 countries
  • Built B2B lead scoring: flagged corporate accounts in the base (by email and professional course usage) so Elevify could offer its Exclusive B2B plan to companies certifying their staff
  • Lifted the pay rate by 18% by finding the best payment option per country and switching to it, so more attempted payments actually go through · +18% pay rate
  • Ran seasonality analysis tracking growth against global demand for education; the platform now sits behind ~300k course enrollments a month · ~300k enrollments/mo
BigQueryCloud RunMetabaseGoogle AnalyticsSQLPythonSlack
Specialist Data Scientist
RecargaPay
Sep 2025 – Apr 2026
Remote · Brazil

Credit-risk specialist on a four-person team (executive manager, me, a senior and a mid-level analyst) at a mobile-payments fintech competing with Brazil's biggest. I owned the credit-granting and scoring models for their entire personal-loan portfolio, and rebuilt how those models were built, validated and served.

  • Turned a chronically loss-making personal-loan portfolio profitable. My models replaced the incumbents across the full audience, for an audited gain of about R$1.5M per month in additional profit. · +R$1.5M/mo
  • Built 10+ credit models from scratch, using internal data, Open Finance and the Central Bank's SCR registry, usually combined with a bureau score. · 10+ models
  • Led the methodology to evaluate around 13 credit bureaus (Serasa, Boa Vista and 11 others) and pick the best one to integrate in-house.
  • Reworked the modeling approach: auxiliary models, each capturing the best signal in one dimension, combined by a logistic-regression ensemble. More of the data modeled, fewer inconsistencies, much better performance.
  • Moved the whole modeling stack to Databricks and MLflow (features, training, artifacts), and built a custom "manual inference" pipeline to serve models with around 400 features, which MLflow could not handle natively, by separating and optimizing each step. · ~400 features
  • Validated the new models with a staged, ROI-backed rollout: 10% of loans, then 30%, then 100% of the audience.
  • Mentored the senior and mid-level analysts and helped scale their work, and brought more lightness and calm to the team's routine.
DatabricksMLflowPythonSQLOpen FinanceSCRCredit RiskEnsembles
Principal / Specialist Data Scientist
ClearSale / Serasa Experian
Jan 2017 – Aug 2025
São Paulo, Brazil

Almost nine years at Brazil's leader in AI-based fraud prevention, from intern (while finishing my statistics degree at UNICAMP) to one of its principal data scientists. I built the core models behind the company's two main fraud lines, transactional and account-opening, and became the main technical reference for the modeling teams. ClearSale was later acquired by Serasa for R$2 billion.

  • Developed the main models behind ClearSale's two core fraud lines, transactional and account-opening. These were the algorithms running the company's flagship products.
  • My last anti-fraud model at ClearSale launched a major global marketplace in Brazil. They were entering the market for the first time, with no local operation and no fraud history to learn from, so I built a custom model almost entirely in the dark, on the limited data integration they could send. It launched as a success and made their Brazilian anti-fraud operation viable. · cold start
  • Built a weekly engine that captured emerging fraud patterns and combined models by fraud dimension per client, raising approval rates while keeping chargebacks (CBK) under control. Brazil's largest e-commerces ran on it.
  • Created a fraud-pattern detector based on a Cartesian-product grouping, with its own statistics and score per group, to surface fraud that traditional models missed. In one test, with about 100 validation SMS, I caught R$50k of fraud in a single pattern in one day. · R$50k
  • Generated automatic decision rules from random forests, surfacing the most relevant ones to route about 25% more transactions through the automated path and take load off the more expensive manual operation. To run it where the data lived, I implemented the random forest itself in SQL, working across Business Analytics and Operations to land the new flow. · +25%
  • Built a trustworthiness and first-party-fraud score that flagged risky orders even when we could reach and verify the real owner of the transaction. Running inside the secure authentication flow, it eliminated about 75% of the chargebacks hiding in what looked like the safe zone. It was unconventional, built on an uncommon target variable, and it paid off. · −75% CBK
  • Built the company's feature platform: I migrated every model and feature into the data warehouse, so teams could reuse features, backtest and deploy models, and browse a catalog. Backtests that used to take a week ran in two hours. Every modeling team in the company used it, and it later evolved to use AI for feature generation and discovery. · 1 week → 2h
  • Built the phone-ranking model for the telemarketing operation, picking the best number to reach a customer whose purchase was being verified. It cut the operation's cost by 30%+, helped control chargebacks and lift approvals, and held up so well that a later review found almost nothing to improve. · −30%
  • Mentored everyone across two ~5-person modeling squads and set the technical direction; the teams' work depended on my reviews and guidance.
  • Helped launch features for the new credit and insurance operations, and built the first models for the international expansion, later handed off to a dedicated team.
PythonRSQLMachine LearningFraud DetectionCredit RiskFeature StoreBacktestingSparkAzure

Scientific Research

Undergraduate research at UNICAMP (2014–2016), advised by Prof. William R. Wolf and funded by Boeing through FAPESP/CNPq. The subject was aircraft landing-gear noise; the methods were statistics, signal processing and dimensionality reduction, which is essentially data science.

Why this is data science

  • POD (Proper Orthogonal Decomposition) is PCA/SVD, applied to turbulent pressure fields over a ~200,000-element surface across up to 20,100 time snapshots.
  • Real dimensionality reduction: ~150–200 modes reconstruct nearly all the energy of a 16,000+ dimensional signal. Feature extraction and compression at scale.
  • Performance engineering: a 3D wideband Fast Multipole Method (oct-tree) to accelerate the heaviest computations, cutting total simulation runtime from ~2 weeks to ~2 hours.
  • Signal processing & validation: PSD, FFT, windowing, statistical treatment of sources, validated against wind-tunnel experiments.
AIAA 2016American Institute of Aeronautics and Astronautics

Noise Prediction of the LAGOON Landing Gear Using Acoustic Analogy and Proper Orthogonal Decomposition

The most complete of the series. Far-field noise prediction for the AIRBUS-ONERA LAGOON landing gear via the Ffowcs Williams & Hawkings acoustic analogy, on both solid and porous surfaces, accelerated by a 3D wideband fast multipole method.

POD-based reduced-order reconstruction of the turbulent pressure field; per-component noise breakdown (axle, rim, strut, wheels); solid vs. porous FWH surface comparison.

POD / PCA-SVDFWH acoustic analogyFast Multipole MethodReduced-order modeling
Read paper (PDF)
ENCIT 201616th Brazilian Congress of Thermal Sciences and Engineering, Vitória, ES

A Proper Orthogonal Decomposition Analysis of a Realistic Landing Gear with Applications to Noise Prediction

Focused on the POD machinery itself: extracting the most energetic modes of the noise sources, with an implementation optimized for maximum performance in processing and storage.

Energy-band analysis of POD modes and a smoothing strategy for the spectra over time using the autocorrelation matrix.

POD / PCA-SVDPerformance optimizationSpectral analysis
Read paper (PDF)
COBEM 201523rd ABCM Int. Congress of Mechanical Engineering, Rio de Janeiro, RJ

Numerical Prediction of the Noise Generated by a Realistic Landing Gear

The foundational paper: large-eddy-simulation flow data feeding FWH far-field predictions, with POD introduced to extract the relevant noise sources and a 3D fast multipole method to accelerate them.

Established the full fast noise-prediction framework and the statistical treatment correlating unsteady loads to noise radiation.

LES / CFDFWH acoustic analogyPOD / PCA-SVD
Read paper (PDF)

Beyond the terminal

I'm from the interior of São Paulo, born in Divinolândia and raised in Sumaré and Nova Odessa. For the last ~5 years I've been in Ubatuba: I went remote, moved to the coast, and stayed.

⚖️What I value

Justice, integrity and discipline. They're what drive me, and what keep me at peace.

🏄Surf

Surfed as a teenager, stopped when I lived far from the coast, came back to it as an adult. Post-pandemic and remote, I moved to the coast and settled in Ubatuba, for the waves and the calmer pace.

🏎️iRacing

Lifelong motorsport obsessive. On iRacing I trained my way to 4000 iRating, about the top 3% worldwide, coached by Gustavo Ariel of Team Redline, the sim-racing team Max Verstappen co-owns. Championships are next.

🎸Guitar

A one-song guitarist, and the song is Under the Bridge. I play at home, just for myself.

🐾The home crew

A house of rescues: three cats (Maya and her daughter Vênus, plus Maria, whom I drove home from Guarulhos the same day a friend offered her) and Marô, a dog who pulled through distemper.

Skills & Stack

Modeling & ML

Predictive ModelingFraud DetectionCredit RiskSupervised + UnsupervisedDeep LearningReduced-order modeling (POD/PCA-SVD)Champion-Challenger / AutoML

Data & Platform

BigQueryDataformDatabricksSparkFeature StoreAnalytics EngineeringReal-time scoringMLOps

Languages & Tools

PythonRSQLGitLinuxAzureDevOps

Foundations

StatisticsSignal processing (FFT/PSD)Experiment design & A/B testingProduct AnalyticsGraph-based decisioningTechnical mentorship

Education & Certifications

  • B.Sc. Statistics, University of Campinas (UNICAMP)
  • Undergraduate Researcher, Boeing-sponsored aeroacoustics, UNICAMP (2014–2016)
  • Mining Complex Data, UNICAMP (2024)
  • Agentic AI, DeepLearning.AI (2026)
  • Deep Learning Specialization, DeepLearning.AI
  • Machine Learning, Stanford / Coursera
  • Introduction to MPI, CENAPAD (high-performance computing)

Let's talk

Open to senior data-science challenges in fraud, risk and ML platforms. Remote, anywhere.

Ubatuba, São Paulo, Brazil