AI protein generation lakehouse pipeline using PySpark and MLflow

Here, Built an end-to-end protein generation lakehouse pipeline using PySpark, MLflow, and a reinforcement-learning-inspired loop. The system generates synthetic protein sequences, scores and ranks them through a Bronze → Silver → Gold architecture, and optimizes candidates iteratively using MLflow-tracked experiments. The project is designed to be scalable and Databricks-ready, with future extensions toward distributed diffusion/flow-based protein generation workflows.