Live courses (2+ hours) are not included in subscriptions

FAQ Cart
Creative Live Creative Live
  • Coding
    • Python
    • AI
    • Web Development
    • FinTech
    • React
    • High School Coding
    • JavaScript
    • All Coding Classes & Bootcamps
  • Design
    • Graphic Design
    • Web Design
    • Premiere Pro
    • Photoshop
    • InDesign
    • Video Editing
    • UX Design
    • Illustrator
    • Motion Graphics
    • AutoCAD
    • Figma
    • After Effects
    • Interior Design
    • Revit
    • Adobe
    • All Design Classes & Certificates
  • Business
    • Digital Marketing
    • SEO
    • Generative AI
    • PowerPoint
    • Finance
    • Microsoft Office
    • Data Analytics
    • Project Management
    • Social Media
    • Excel
    • All Business Classes & Certificates
  • Certificates
    • Graphic Design
    • Data Science & AI
    • Data Analytics
    • UX & UI Design
    • Data Analytics & AI
    • Video Editing
    • Motion Graphics
    • Digital Marketing
    • FinTech
    • Social Media
    • UI Design
    • Digital Design
    • Full-Stack Web
    • IT
    • Software Engineering
    • Front-End Web
    • Generative AI
    • JavaScript Development
    • Python Developer
    • “MBA” Business
    • Video & Motion
    • Web Design
    • Find & Compare Certificates by Topic
  • Type forward slash ("/") to open the search bar /
  • Corporate
  • Course Catalog
FAQ Cart
  • Coding
    • Python
    • AI
    • Web Development
    • FinTech
    • React
    • High School Coding
    • JavaScript
    • All Coding Classes & Bootcamps
  • Design
    • Graphic Design
    • Web Design
    • Premiere Pro
    • Photoshop
    • InDesign
    • Video Editing
    • UX Design
    • Illustrator
    • Motion Graphics
    • AutoCAD
    • Figma
    • After Effects
    • Interior Design
    • Revit
    • Adobe
    • All Design Classes & Certificates
  • Business
    • Digital Marketing
    • SEO
    • Generative AI
    • PowerPoint
    • Finance
    • Microsoft Office
    • Data Analytics
    • Project Management
    • Social Media
    • Excel
    • All Business Classes & Certificates
  • Certificates
    • Graphic Design
    • Data Science & AI
    • Data Analytics
    • UX & UI Design
    • Data Analytics & AI
    • Video Editing
    • Motion Graphics
    • Digital Marketing
    • FinTech
    • Social Media
    • UI Design
    • Digital Design
    • Full-Stack Web
    • IT
    • Software Engineering
    • Front-End Web
    • Generative AI
    • JavaScript Development
    • Python Developer
    • “MBA” Business
    • Video & Motion
    • Web Design
    • Find & Compare Certificates by Topic
More
  • Corporate
  • Course Catalog
  • Student Testimonials
  • Compare

Contact Us

  • (212) 226-4149
  • hello@nobledesktop.com

Data Science Student Capstone Project Examples

  • Python Machine Learning
  • Python for AI
  • Python Data Visualization
  • LeBron’s Scoring Prediction
  • Diabetes Risk Prediction with CDC Data
  • NYC Subway Traffic Classification

Python Machine Learning Capstone

Choose your own structured dataset (e.g., housing prices, customer churn, or loan default) to build a machine learning pipeline from scratch, including data cleaning, feature engineering, model selection, and performance evaluation. Put together a presentation highlighting your process, tools, and insights. 

Learn more about the Python machine learning capstone project deliverables.

Python for AI Capstone

Capstone Project I:  Build an AI chat assistant for a live website that helps users answer questions about the product offering and services offered. 

Capstone Project II:  Create a web app that allows users to upload images of personal collections—such as vintage books, vinyl records, rare sneakers, collectible cards, or antiques—and uses AI to identify the item, generate descriptive metadata, and log it in a searchable session history.

Learn more about the Python for AI capstone project deliverables.

Python Data Visualization Capstone

Analyze global CO₂ emissions alongside GDP and population data. You’ll clean, explore, and visualize the data, then build an interactive dashboard in Dash. Your final presentation should highlight key patterns, tools used, and insights discovered.

Learn more about the Python data visualization capstone project deliverables.

LeBron’s Scoring: Points‑Per‑Game Prediction

Student Project: Pratik B

This project aimed to predict how many points LeBron James would score in a game using a combination of his personal stats and the defensive statistics of his opponents. Game data was scraped from the NBA website and Basketball Reference, cleaned, and merged into a single dataset. The student explored multiple regression models from scikit-learn, ultimately finding that the RANSAC regressor performed best by filtering out outliers. The project identified a negative correlation between LeBron’s scoring and opposing team defense metrics like DEFRTG and DREB%.

Student data science project visualizing a regression model predicting LeBron James’ points scored based on personal performance metrics and opposing team defensive statistics

Concepts Covered:

  • Web scraping with Selenium and CSV import
  • Feature engineering (e.g., transforming minutes played)
  • Exploratory analysis and correlation thinking
  • Model selection using regression models from sklearn
  • Train/test split for evaluation

Output & Findings:

  • RANSAC Regressor yielded the best results, showing a high negative correlation between LeBron’s points and the opposing team's defensive stats.
  • Recognized that combining variables with opposing directional impact (e.g., minutes vs. defense) may have confused some models.
  • Suggested future improvements such as selecting more logically consistent features.

Diabetes Risk Prediction: Insights from CDC Health Data

Student Project: Mariely D.

Using a Publicly Available CDC Survey Dataset from Kaggle, This Project Explored Factors Associated with Diabetes and Attempted to Predict Diabetes Status. the Analysis Was Driven by Clear Hypotheses Regarding Age, BMI, Cholesterol, Physical Activity, and Gender. Visual Comparisons Between Diabetic and Non-diabetic Individuals Highlighted Meaningful Differences in Health Outcomes and Lifestyle. a Logistic Regression Model Achieved 74% Accuracy in Predicting Diabetes, and the Project Concluded That BMI and Other Health Metrics Are Strong Indicators of Diabetic Risk.

Student data science project analyzing CDC survey data to predict diabetes risk using health metrics such as BMI, cholesterol, physical activity, and age, with visual comparisons between diabetic and non-diabetic groups

Concepts Covered:

  • Exploratory Data Analysis (EDA) and Hypothesis Framing
  • Correlation analysis between features (e.g., BMI, cholesterol) and diabetes status
  • Visualizations: bar charts comparing diabetics vs. non-diabetics
  • Predictive modeling using logistic regression
  • Confusion matrix and accuracy assessment

Output & Findings:

  • Key correlates of diabetes included high blood pressure, BMI, poor general health, and difficulty walking.
  • Logistic Regression model achieved 74% accuracy.
  • Provided strong visual and narrative support for health-related insights, reinforcing the role of lifestyle and physical health in diabetes risk.

NYC Subway Traffic Patterns: Weekday vs. Weekend Classification

Student Project: Amit J. & Valentina P.

This project analyzed subway turnstile data from 10 stations across New York City to examine commuter flow patterns throughout the week. The primary objective was to distinguish weekday vs. weekend traffic and understand how passenger volume varied by station and time. A regression-based classification model was applied and reportedly achieved 83% accuracy in identifying weekday patterns. The study aimed to support strategic planning for workforce allocation in the MTA system, particularly around rush hour and location-based trends.

Student data analysis project examining New York City subway turnstile data to compare weekday and weekend commuter patterns across multiple stations and time periods

Concepts Covered:

  • Real-world time series data analysis
  • Classification based on temporal and geographic features
  • Regional sampling (Uptown, Midtown, Downtown)
  • Application of a regression-based model for weekday/weekend classification
  • Interpretation of accuracy and use of confusion matrix

Output & Findings:

  • The model (type unspecified) reportedly achieved 83% accuracy in distinguishing weekdays from weekends.
  • Noted significant differences in commuter flow between weekdays and weekends.
  • Proposed that this analysis could inform MTA scheduling and workforce planning, especially around rush hour.
Creative Live

Organization

CreativeLive hosts live, instructor-led courses through trusted partners. Courses are taught in real time and span a range of creative subjects, including photography, design, video, marketing, and data. Each class provides an interactive online learning experience led by industry professionals, with opportunities for live engagement.

Contact Us

Office Hours:
9am–6pm EST, Mon–Fri

(212) 226-4149

hello@nobledesktop.com

© Copyright 2026 CreativeLive

Policies