ML Engineer & Data Scientist
Hadiqa Basit
I build production ML and AI systems that make decisions at scale.
30,000 clients matched daily across 1,500+ sales agents using a reinforcement learning system designed from scratch
200,000+ product combinations with automated pricing decisions, replacing a once-a-year manual process with daily recommendations
1000's of investor conversations handled weekly by a WhatsApp AI sales agent built end-to-end across multiple markets
Hi there! I'm a data scientist / ML engineer at Dubizzle Labs, where I design and ship ML/AI systems that run in production every day. My work spans reinforcement learning, LLM-powered agents, NLP, and ML-driven pricing, always with a focus on solving real business problems and measuring what changed. I've taken projects from first research to production deployment, built services from the ground up, and presented directly to stakeholders up to the CEO.
What I Do
I work at the intersection of machine learning and software engineering. Most of my projects start the same way: a business problem that can't be solved with rules or dashboards anymore, or is tedious to do so. I figure out the right ML/AI approach. Sometimes that means adapting a research paper, sometimes it means building something from scratch because nothing fits. Then I build and ship the production service around it.
I'm most experienced with reinforcement learning for decision systems, LLM-based agents and tool-use architectures, NLP (entity extraction, semantic search), and ML systems that need to be reliable at scale. I care about the full picture: the algorithm, the API, the monitoring, and whether the business actually got what it needed.
Selected Work
Projects taken from research to production.
Intelligent Client-to-Agent Allocation
Reinforcement LearningDubizzle Labs · Live across MENA and Pakistan
The company needed a way to intelligently match incoming clients to sales agents. The existing system was a hand-coded scoring policy that concentrated clients on a small set of agents and was painful to change.
I pulled production data and found it was falsifiable in places (agents gaming metrics), which ruled out supervised learning. After several failed prototypes and a deep research phase, I designed a contextual multi-armed bandit system, adapting the DiagUCB formulation into a hybrid of UCB and Thompson sampling. The system clusters clients by behavioural scores (evaluated by an LLM), runs three parallel bandits (one per cluster), and produces a final match score weighted by the client's cluster distribution.
I also built the full production service: FastAPI, async SQLAlchemy on MySQL, Celery + Redis for async jobs, New Relic for observability, and Dockerized deploys via GitHub Actions. After launch, I diagnosed that reward events were too rare for meaningful reinforcement, so I decomposed the reward signal into finer-grained events, lifting the reward event rate from 21% to 84%.
Live in production, matching ~30,000 clients across 1,500+ agents daily. The POC presentation drove direct CEO recognition to engineering leadership.
AI Dynamic Pricing
ML SystemsDubizzle Labs · POC with controlled rollout proposed
An e-commerce marketplace monetizes through ad-visibility packages. Pricing was a once-a-year, top-down exercise. With 200,000+ category × attribute combinations, that approach structurally cannot capture demand differences at the attribute level.
A model that produces price recommendations daily at the lowest meaningful unit (leaf category combined with item attributes) using live demand signals. The system is deliberately conservative: it tests small price increases at the margin, measures the net effect (revenue gained vs. churn), and stops or reverts when upside disappears.
In the pilot category, only 26 of 21,427 candidate cases received a recommended price change, a 0.12% recommendation rate. The system surfaces only high-confidence opportunities, not blanket increases.
December back-test produced ~$170K monthly revenue uplift. Six-month projection: ~$1.15M cumulative uplift (95% CI: $646K–$1.65M).
Multi-Tenant WhatsApp AI Sales Agent
LLM AgentsDubizzle Labs · Live in production, Saudi real estate market
The business needed an AI agent that could autonomously engage potential real estate investors over WhatsApp: qualify them, present relevant projects, and book consultation meetings, all without human intervention.
I built the agent end-to-end on a multi-tenant, multi-agent platform. The architecture uses LangChain's tool-use agent pattern (GPT-4o-mini) with custom tools: a real estate projects search API, a CRM conversation submission endpoint, currency conversion, and timezone-aware meeting scheduling.
I engineered bidirectional state tracking between the LLM and the server. For meeting bookings, I built a dual-safeguard system so that if the LLM skips the CRM submission, the server catches it and fires the submission anyway. I also built the multi-tenant WebSocket layer for the WhatsApp Business API with automatic token refresh and reconnect handling.
Live in production, handling thousands of investor conversations weekly across multiple markets. Agent behaviour is fully DB-driven, changeable without redeployment.
Project Name Extraction from Call Transcripts
NLP / NERDubizzle Labs · Shipped to production
An AI transcription pipeline processed Urdu sales calls via Whisper. The project-name extraction field was only 37% accurate due to background noise, mid-call cut-offs, and Urdu pronunciation issues.
A custom spaCy NER model trained on the company's own call transcripts, wrapped as a FastAPI service. On top, I added a fuzzy matching layer fed by an uploadable list of canonical project names. The fuzzy matcher catches close matches deterministically, and the NER model handles anything it misses.
Lifted project-name extraction accuracy from 37% to 84%, enabling per-project cost-of-calling reporting that stakeholders previously didn't have.
Semantic Search Engine for Restaurant Recommendations
Semantic SearchContract role · Shipped to production
Users type natural-language queries like "cozy ramen spot" or "cheap brunch place" and the system needed to retrieve semantically relevant restaurants, not just keyword matches.
SBERT embeddings represent restaurants and queries in a shared semantic space, stored in pgvector with HNSW indexing for sub-linear approximate nearest-neighbor retrieval at scale. I built an NLP-based noun-pair extraction step that pulls salient (modifier, head) pairs from free-text queries so the embedding step prioritises semantically meaningful tokens over filler words.
Faster retrieval and materially better precision on natural-language queries compared to baseline keyword search. Deployed to production.
Stack
Technologies I work with regularly.
ML & AI
Backend & APIs
Infrastructure & DevOps
Other
Publication
Peer-reviewed research.
A Novel Poisson–Weibull Model for Stress–Strength Reliability Analysis in Industrial Systems
Bayesian and Classical Approaches
Axioms (MDPI) · Vol. 14, Issue 9 · August 2025
Co-authored a peer-reviewed paper introducing the Poisson–Weibull Distribution (PWD), a novel three-parameter probability model for reliability analysis of industrial systems with parallel components. Built both classical (MLE) and Bayesian (MCMC) parameter estimation, with Monte Carlo simulation studies validating estimator behaviour across varied sample sizes and parameter configurations.
Cited by Liu et al. (Mathematics, 2026) in their work on Weibull parameter estimation under censored data.
About
I'm Hadiqa, a data scientist and ML engineer based in Pakistan, currently at Dubizzle Labs. I've been here since late 2023, starting as a data analyst intern and now owning production ML systems, mentoring junior engineers, and presenting directly to leadership.
What I care about most is building things that actually work in production and measurably change outcomes. Every project I've shipped has gone from research and prototyping through to a live service handling real traffic.
I also tinker with open-source models locally (Qwen is my current go-to for local inference) and have a peer-reviewed publication in statistical modelling and reliability theory. I'm always looking for hard problems that push me to learn something new.
Outside of work, I'm building a freelance practice in ML/AI engineering, taking on projects where I can bring the same production-first mindset to new domains.