GENOMICS Input Hidden Hidden Output Clinical Data Deep Learning Drug Design
AVAILABLE FOR NEW PROJECTS

Matt Derya

Data Scientist with 15+ years of pharmaceutical expertise and 6+ years building production GenAI/LLM solutions — delivering 90% cost savings, 40% productivity gains, and 99% model accuracy across clinical and commercial environments.

LangChain & RAG Agentic AI CNN / LSTM Docker & MLOps IQVIA Analytics AWS SageMaker Pharmacokinetics
15+Years in Pharma
6+Years AI/ML
99%Model Accuracy
90%Cost Savings
Technical Skills

Tools & Expertise

🧠ML / AI
TensorFlowPyTorchKerasHugging FaceLangChainLangGraphRAGAgentic AIOpenAIClaudeCNNLSTMXGBoostNLPTransfer LearningComputer Vision
💻Programming & Tools
PythonSQLRPandasNumPyScikit-learnFastAPIGitDockerSHAPStreamlitPower BITableauJIRA
☁️Deployment
AWS SageMakerAWS EC2CI/CDHuggingFace SpacesStreamlit CloudRailway
💊Pharma Commercial Data
IQVIA LAADXponentXponent PlanTrakDDDVeevaClaims/EMROmnichannelNext Best Engagement
🔬Pharma Domain
PharmacokineticsDrug-Drug InteractionsADMEFDA RegulatoryOncologyImmunologyDiabetesObesityCardiometabolicData Governance
About Me

Where Drug Science Meets Artificial Intelligence

I started my career as a clinical pharmacist, spending years understanding how drugs behave in the body — pharmacokinetics, drug interactions, ADME. That domain knowledge turned out to be my greatest ML asset.

Today I build production-grade AI systems for pharmaceutical R&D and commercial operations: LLM agents that query clinical databases, CNN models that analyze cell images for Phase II trials, and RAG pipelines powered by LangChain and LangGraph for enterprise Agentic AI workflows.

My MLOps & DevOps stack spans Docker, CI/CD pipelines, AWS SageMaker, FastAPI, and end-to-end model deployment and monitoring — ensuring every model moves cleanly from experiment to production. I work fluently across Python, SQL, and R, with deep experience in IQVIA LAAD, Xponent, Veeva, and omnichannel pharma commercial data.

At IMEBRANDS I'm delivering AI-powered demand forecasting, recommendation engines, and CLV models — proving pharma AI skills translate powerfully to commercial analytics. The result? AI that actually understands the science — not just the data.

🤖 LLM/RAG & Agentic AI

Built HIPAA-compliant LLM agents (LangChain/LangGraph/RAG) for enterprise Agentic AI workflows and Next Best Engagement analytics. Deployed to 50+ stakeholders with 40%+ productivity gains.

🐳 MLOps & DevOps

Extensive hands-on MLOps using Docker, CI/CD pipelines, Git, AWS SageMaker, and FastAPI for real-time model inference. End-to-end lifecycle management from data ingestion to production monitoring.

📊 IQVIA & Pharma Commercial Data

8+ years daily immersion in IQVIA LAAD, Xponent, Xponent PlanTrak, DDD, Veeva, and Claims/EMR data — shaping pricing decisions, sales force strategy, and physician targeting at the executive level.

🚀 Oncology / Immunology AI

Drove NLP/Transformer pipelines for adverse event analysis across Oncology and Immunology programs. Experience also spans Diabetes, Obesity, and Cardiometabolic therapeutic areas.

🏛️ Data Governance & Compliance

Managed ML lifecycle ensuring FDA regulatory compliance, data governance standards, and human-in-the-loop oversight across all production deployments. Contributed to 10+ EU regulatory submissions.

Portfolio

Featured Projects

Production-grade AI/ML projects spanning pharma R&D, GenAI engineering, and commercial analytics.

🐙 GitHub
Clinical Trial Cell Classification (CNN)

ResNet/EfficientNet deep learning models for Phase II clinical trial cellular analysis. Reduced manual review from months to minutes. Achieved 99% accuracy delivering 90% cost savings for drug development timelines.

PyTorch CNN ResNet AWS Clinical Trials
View on GitHub →
🐙 GitHub
HIPAA-Compliant LLM Agent (RAG)

LangChain/LangGraph agentic AI system for enterprise pharma data querying. Multi-step reasoning over clinical databases with full regulatory compliance. Deployed to 50+ users with 40% productivity gains.

LangChain RAG LangGraph HIPAA Agentic AI
View on GitHub →
🤗 Hugging Face
Pharma Sales Forecasting (LSTM)

LSTM time series models incorporating PK/ADME domain features for pharmaceutical sales forecasting and customer churn prediction. Achieved R² above 0.90, enabling data-driven strategic planning.

TensorFlow LSTM Pharmacokinetics Time Series
View on GitHub →
🤗 Hugging Face
Drug Interaction Prediction (XGBoost)

ML model using drug-drug interaction and ADME features for interaction risk prediction. Domain-enriched feature engineering bridging AI with pharmaceutical science delivering 25-30% accuracy gains.

XGBoost Scikit-learn ADME DDI
View on GitHub →
🚀 Streamlit
Adverse Event NLP Pipeline

Transformer-based NLP pipeline extracting structured clinical signals from 10,000+ adverse event reports and scientific literature for Oncology/Immunology programs. Automated alert systems for medical affairs teams.

Hugging Face Transformers NLP Oncology
View on GitHub →
🚂 Railway
Demand Forecasting & CLV Models

LSTM, XGBoost & Ensemble forecasting for inventory optimization at IMEBRANDS. RFM/Cohort-based Customer Lifetime Value models with A/B testing and Sentiment Analysis for marketing strategy.

LSTM XGBoost CLV Ensemble RFM
View on GitHub →
Live Deployments

Production AI Systems

7 live deployments across HuggingFace, Streamlit, and Railway.

🤗
Hugging Face
Cell Classification CNN
🤗
Hugging Face
Drug Interaction XGBoost
🤗
Hugging Face
Adverse Event NLP
🤗
Hugging Face
Pharma RAG LLM Agent
🚀
Streamlit
Pharma Drug Forecasting
🚀
Streamlit
PowerBI Drug Analysis
🚂
Railway
Physician Churn Engine
Experience

Professional Journey

Data Scientist
01/2025 – Present
IMEBRANDS · Illinois, USA
  • Built RAG-based chatbots and Agentic AI assistants connected to company SQL databases, eliminating manual market reporting and reducing reporting time by 60%.
  • Replaced static Excel forecasting with LSTM, XGBoost, and Ensemble models for time series revenue and demand forecasting; implemented Stock & Demand Optimization reducing excess inventory costs significantly.
  • Developed FastAPI-based APIs for real-time model inference and integration; deployed ML models using Docker containerization with CI/CD pipelines for automated delivery.
  • Designed Recommendation Systems, RFM/Cohort-based CLV models, and Computer Vision pipelines for catalog automation; conducted A/B tests and Sentiment Analysis to optimize marketing ROI.
  • Built and deployed applications on AWS SageMaker; implemented Python-based RAG pipelines using LangChain for AI-driven insights from SQL data sources.
Senior Data Scientist
08/2019 – 01/2025
Mentor R&D · Gaithersburg, MD
  • Automated months-long manual microscopy for Phase II clinical trials using CNN models (ResNet, EfficientNet, VGG), achieving 99% accuracy and 90% cost savings per trial cycle.
  • Architected HIPAA-compliant LLM agents using LangChain / LangGraph / RAG for enterprise Agentic AI workflows and Next Best Engagement analytics, improving team productivity by 40%.
  • Drove commercial analytics by integrating IQVIA LAAD, Xponent, and Xponent PlanTrak datasets into physician segmentation, targeting, and churn models — delivering real-time insights to 50+ stakeholders via Power BI / Tableau / Streamlit dashboards on AWS SageMaker.
  • Bridged pharma science and AI by engineering drug-drug interactions, ADME, and pharmacokinetics as predictive features using Scikit-learn / XGBoost / LSTM Ensemble, delivering 25–30% model performance improvements.
  • Built containerized ML applications using Docker; managed end-to-end MLOps including model deployment, monitoring, and updates via CI/CD pipelines and Git.
  • Developed NLP / Transformer (BioBERT) pipelines for adverse event analysis in Oncology / Immunology programs; mentored junior data scientists and led cross-functional AI initiatives.
Data Analyst & Owner
11/2015 – 07/2019
SG Health · Trenton, NJ
  • Reduced pharmacy waste by 40% and optimized stock levels by building inventory forecasting models and sales analytics dashboards, replacing manual stock management and driving consistent revenue growth.
  • Built Python/Pandas-based forecasting pipelines, Tableau, Power BI, and SQL queries for prescription trend analysis and patient demographic segmentation, enabling data-driven operational decisions.
  • Automated reporting workflows and integrated BI dashboards with backend Python pipelines; developed Python scripts for data cleaning, transformation, and reporting automation.
Data Analyst & Product Manager
05/2008 – 11/2015
Octa Pharma · Ankara, Turkey
  • Captured 75% market share and became market leader for Human Albumin, Immunoglobulin, and Factor products by leading data-driven competitive analysis, pricing optimization, and physician engagement across Turkey and EU markets.
  • Embedded IMS Health Xponent and IMS DDD (now IQVIA) as the analytical backbone of all commercial operations — owning weekly, monthly, quarterly, and annual reporting covering market share, competitive intelligence, and portfolio performance.
  • Replaced manual Excel reporting with Tableau, SQL-based dashboards, ETL pipelines, and LSTM/Ensemble forecasting models, enabling real-time pricing decisions and market performance tracking across multiple EU country markets.
  • Contributed to 10+ European regulatory submissions including stability, bioequivalence, and CMC documentation across multiple countries.
Contact

Let's Build Something Together

Have a question or want to collaborate? I'd love to hear from you!

Contact Information

📍
Location
Princeton, NJ · USA
✉️
Email
📱
Phone
+1 929 840 4971
🕐
Availability
Monday – Friday, 9:00 – 18:00
Follow Me