Independent project. A production hybrid retrieval system combining BM25 and dense vector search (FAISS) over 207,000 paragraph-aware chunks from 1,203 sermons, with Reciprocal Rank Fusion, conditional Qwen3 neural reranking, vLLM-served embeddings, LiteLLM-proxied text generation, and SSE streaming. Full-stack delivery: Next.js frontend on Cloudflare, LLM API on Cloud Run.
Deployed at branhamsermons.ai with zero marketing. Organic adoption has reached 12,960 requests across 59 countries in 30 days.
Primary architect of the template-generation engine that compiles user-defined natural-language trading strategies into deployable A2A (agent-to-agent) pipelines on Google's A2A protocol. Built the Modifier and Drafter services into production-ready components, designed the execution engine that runs on live capital, and auto-provisions MCP servers and utility agents. Manual agent deployment time dropped from ~40 hours to under 8 hours per configuration (~80% reduction).
Published and maintained the traia-iatp Inter-Agent Transfer Protocol across 111 versioned releases, plus traia-tools; established CI/CD, release notes, and documentation for both.
Primary AI engineer on a feature that had stalled without a production path. Architected the full AI search and chat system over a 64,000-entry Web3/crypto project database: semantic search, project-similarity discovery by user-chosen metrics, multi-turn conversation, persistent chat history, and the production endpoints powering all of it. Mentored two junior engineers and led code reviews across the team.
Independent research programme covering Node2Vec, GCN, GraphSAGE, GAT, TransE, HGT, RGCN, Correct & Smooth, and generative graph models (GraphRNN, GCPN). Applied to MAG, OGB, and FB15k benchmarks spanning node classification, graph classification, link prediction, knowledge-graph traversal, and graph generation. This work is the foundation for my focus on GNNs over relational databases.
Built GPT-2 and LLaMA2-variant models from scratch up to 124M parameters. Optimised training with flash attention, mixed precision (TF32/BF16), PyTorch compile, and Distributed Data Parallel across an 8-GPU box — achieving 10x speedup and 40% GPU memory reduction vs. the unoptimised baseline. Also replicated core LLM components in pure C and CUDA to deepen systems-level understanding. Fine-tuning with SFT, DPO, and LoRA yielded up to 25% BLEU improvement.
Robust AI agent integrating RAG pipelines, web scraping, and browsing with dynamic tool routing via LangChain. Master agent orchestrating sub-agents through CrewAI for complex reasoning tasks. Custom LlamaIndex RAG application for targeted retrieval — partial-text summarisation, full-document summarisation, and structured fact extraction. Chainlit / Streamlit frontends; HTML/CSS site for backend selection.
POC RAG solution for a mental-health startup using LangChain and Streamlit, allowing therapists to chat with their therapy-session recordings.
This project is ongoing: It involves the containerization of a Machine Learning API and deployment using Amazon Elastic Container Repository (ECR) and Amazon Elastic Container Service (ECS) respectively
This project includes a user-friendly Text Summarization App, a versatile Command Line Tool, and a Text Generation API. With continuous integration on Hugging Face Spaces
Developed a Vision Transformer model application for classifying food types. It utilizes the Gradio Web Interface. Interactive app is available on hugging face
Scripted an application for sentiment analysis based on the RoBERTa model and using the Flask framework as API
Developed an application for collecting data and storing within a database using the FastAPI framework as API
Created a repository containing training notebooks and scripts for ML, MLOps and Data Science
Scripted a wind turbine SCADA data filtration algorithm for cleaning out outlier or faulty wind turbine data.