The full catalog.
Flagship projects and industry collaborations — each broken out by architecture so the system tells the story instead of a pitch.
SecureAdvisor — AI-Powered Security Incident Response
Built for Certis Group — an AI-powered security incident management platform that fuses live CCTV, access control logs, and manual panic triggers to coordinate real-time ground officer dispatch across a 3-app system.
On the detection side, CCTV frames are processed by YOLOv8n at a 0.7 confidence threshold with per-camera polygon zone validation. A 120s sliding event window correlates signals across all three input streams — camera, door, and manual — with 30s duplicate suppression per zone, classifying them into 7 incident types before escalating to the Advisory App.
The Advisory App calls GPT-4o with structured event context and returns a natural language incident report: threat severity, recommended officer action, and urgency level — all delivered to the Operator dashboard in under 2 seconds end-to-end.
Three Architectural Commitments
Real-time async pipeline
Detection streams events independently — the advisory call never blocks the next frame. FastAPI async endpoints handle concurrent events without queuing.
Strict LLM boundary
GPT-4o only generates the advisory text. All routing, deduplication, zone validation, and incident classification are deterministic Python — no LLM in the critical path.
Three-app isolation
Detection, Advisory, and Operator are independent services. Advisory pushes to Operator via WebSocket; the Operator App never calls the Detection App directly.
Production Outcomes
- •< 2s end-to-end: from raw CCTV frame → YOLOv8n detection → rule engine → GPT-4o advisory → Operator dashboard alert.
- •7 incident types classified deterministically: intrusion, loitering, tailgating, panic trigger, fire, unauthorized access, after-hours presence.
- •120s sliding multi-source fusion window with 30s duplicate suppression — prevents alert flooding from repeated detections.
- •Live working demo delivered at NAISC 2026 for Certis Group, covering end-to-end from detection to dispatch recommendation.
3-app system — Command Centre, Ground Officer App, Demo Trigger. FastAPI backend, WebSocket events, YOLOv8n + OpenCV detection.
MakanMap — Real-Time Crowd Level Forecasting
Built for Aires Applied Technology — a real-time crowd level forecasting system for food court locations. Operators view predicted crowd density for any location at any future time and run live What-If scenario analysis.
The core model is a Gradient Boosting Regressor trained on ~50K rows of historical visitor count data per location, achieving 93.2% R² on the holdout set. The feature set includes hour-of-day, day-of-week, week-of-year, is_public_holiday, rolling 7-day average visitor count, location_type, and weather category.
The system runs on two Apache Airflow DAGs: one daily retraining + validation pipeline and one hourly inference pipeline that writes predictions to Supabase for the React dashboard to consume in real time.
Three Architectural Commitments
In-memory What-If engine
Model is loaded once at FastAPI startup and lives in process memory. Each What-If query is a single inference call on a modified feature vector — no DB call, no retraining. Response time < 100ms.
Two-DAG Airflow pipeline
DAG 1 (daily): pull sensor data → clean → feature engineer → retrain → validate → write artifact to S3. DAG 2 (hourly): fetch latest data → inference → write predictions to Supabase.
Tabular-first model selection
Gradient Boosting chosen over XGBoost and LightGBM after cross-validation on holdout R². Neural nets were excluded — the ~50K row dataset is too small for deep learning to generalise reliably.
Production Outcomes
- •93.2% R² on holdout set across all test locations — Gradient Boosting outperformed XGBoost and LightGBM after cross-validation.
- •Rolling 7-day average visitor count was the highest-importance feature, capturing recent location-specific trends better than calendar signals alone.
- •What-If scenario recompute runs < 100ms in-memory — operators can sweep across times, days, and holiday flags without any backend latency.
- •Deployed with Docker + GitHub Actions CI/CD; model artifacts versioned to S3 with each daily DAG run.
Gradient Boosting model, Airflow orchestration, FastAPI prediction service, Supabase storage, MLflow tracking.
Real Estate Valuation AI — USA Property Predictor
Built for the SIM Data Analytics Club — an end-to-end ML system for USA residential property price prediction, trained on ~1M Zillow records across all 50 US states, with a GPT-4o advisory layer that turns a model output into a natural language valuation report.
XGBoost handles the regression task: it natively manages the high null rate in Zillow data (older listings frequently omit features), trains fast on 1M rows, and produces feature importance out of the box. It outperformed Random Forest and LightGBM on RMSE after cross-validation.
Location encoding was the central challenge: 50 states × hundreds of cities × thousands of zip codes creates extreme cardinality. Target Encoding captures location price signal in a single numeric feature per column, with cross-validation folds to prevent target leakage.
Three Architectural Commitments
Target Encoding for location
One-hot on 50 states × cities × zip codes would produce tens of thousands of sparse columns. Target Encoding collapses each to one numeric feature (mean price per category) with CV folds to prevent leakage.
XGBoost for tabular scale
Handles nulls natively — critical for Zillow data where older listings frequently omit features. Trains fast on 1M rows and provides feature importance rankings without post-hoc SHAP computation.
GPT-4o as decision layer
After the model outputs a price, GPT-4o receives: predicted price, listing price, and median zip-code price. It generates a 3-paragraph advisory: valuation verdict, key drivers, buyer/seller guidance.
Production Outcomes
- •Trained on ~1M Zillow residential records spanning all 50 US states — XGBoost outperformed Random Forest and LightGBM on RMSE after cross-validation.
- •Target Encoding reduced location feature dimensionality from tens of thousands of one-hot columns to 3 numeric features (state, city, zip) with no information loss on price signal.
- •GPT-4o advisory prompt is grounded: predicted price, listing price, and median zip-code price are injected — the model cannot fabricate a valuation without a reference anchor.
- •Gradio interface exposes property input form, predicted price band, and GPT-4o advisory text — deployable as a standalone web app without any frontend framework.
XGBoost regression, Target Encoding, GPT-4o advisory layer, ~1M Zillow records, Gradio web interface.
Astrindo Digital Approval Chatbot
Built during my internship at Astrindo Senayasa (Jakarta, Apr–Jun 2025) — an internal enterprise chatbot that lets non-technical employees query live business data across Marketing, HR, Finance, Purchasing, and Service departments using plain language.
The system runs a two-stage NLU pipeline: GPT-4o-mini first classifies the user's intent and extracts structured entities (year, month, specialist name, city, brand) at temperature=0, returning strict JSON. The PHP backend then dispatches to a domain-specific feature handler that executes deterministic MySQL queries and formats the response.
The LLM also generates a ChatGPT-style sidebar title per conversation. A hard-banned list of generic titles ("General Chat", "Greeting", "Quick Question") forces the model to produce specific, intent-driven labels — with a regex-based Indonesian fallback if the output is invalid.
Three Architectural Commitments
Two-stage NLU pipeline
GPT-4o-mini classifies intent + extracts entities at temperature=0 and returns strict JSON. The PHP dispatcher routes to the correct feature handler based on the intent field — never on free text.
Strict LLM boundary
The LLM never touches the database. It returns { intent, title, entities }. All SQL queries, aggregations, and number formatting are in deterministic PHP handlers — hallucinated data is structurally impossible.
Domain-scoped intent list
12 intents across 5 business domains, explicitly enumerated in the NLU prompt. Anything outside the domain is routed to smalltalk → GPT-4o-mini fallback chat, keeping business data queries separate from general conversation.
Production Outcomes
- •12 intents across 5 departments — Marketing, HR, Finance, Purchasing, Service — each with a dedicated PHP feature handler and parameterised MySQL queries.
- •Zero hallucinated numbers: the LLM returns only intent + entities; all figures come from live MySQL queries against Astrindo's Digital Approval database.
- •ChatGPT-style conversation titles generated per session by the NLU call, with a hard-banned generic-title list and regex-based Indonesian language fallback.
- •Deployed internally on Apache/XAMPP during internship. Supports bilingual input (Indonesian and English) with intent detection stable across both.
Two-stage NLU pipeline, 12 domain intents, MySQL feature handlers, ChatGPT-style title generation, bilingual support.
More on GitHub.
Tooling, prototypes, and in-progress work live at one address — the curated story is above, the full archive is a click away.