Production-quality prompts for data scientists and analysts. Each prompt includes your schema, constraints, and expected output format — the difference between code that runs and code that needs three rounds of fixes.
Works with Claude, ChatGPT, Gemini, or any LLM. Copy, fill in the brackets, paste.
Customize for your specific dataset, libraries, or workflow.
The best data science prompts share four traits: (1) They specify your data schema upfront — column names, types, and sample rows so the AI writes correct code on the first pass. (2) They ask for explanations alongside code so you understand and can maintain what gets generated. (3) They constrain the output format — "return only a Python function, no prose" avoids lengthy explanations you have to strip. (4) They reference real libraries (pandas, scikit-learn, seaborn) rather than abstract concepts, keeping suggestions grounded in runnable code. Vague prompts like "analyze my data" produce generic answers; schema-specific prompts produce copy-paste-ready code.
For Python data analysis, always include: (1) Your DataFrame schema — paste df.dtypes and df.head(3) output, or describe columns explicitly. (2) Your exact goal — "calculate 30-day rolling average of column X grouped by column Y" is far more useful than "analyze the trend". (3) Constraints — "use only pandas and numpy, no additional libraries" prevents dependency creep. (4) Expected output format — "return a DataFrame with columns [a, b, c]" vs "print to console". With this context, Claude can write production-quality pandas/numpy/sklearn code instead of pseudocode.
Yes, AI is excellent at interpreting statistical output and model results when you provide the raw numbers. Paste your model metrics, confusion matrix, or statistical test output and ask specific questions: "This logistic regression has AUC=0.72 and precision=0.61 on a 90/10 class split — is this good? What are the top 3 ways to improve it?" or "My residual plot shows a fan shape — what does this indicate and how do I fix it?" The key is giving the AI concrete output to reason about rather than asking abstract questions about techniques.
The most effective EDA prompts: (1) Schema-first audit — "Here is my DataFrame schema: [paste dtypes]. What data quality issues should I check for before analysis, and give me the pandas code to check each one." (2) Targeted visualization — "I have columns X (continuous) and Y (categorical) with Z rows. Write seaborn code for the 3 most informative plots for understanding their relationship." (3) Hypothesis generation — "Here are summary statistics for my dataset: [paste]. List 5 hypotheses worth testing, ordered by likely business impact." EDA prompts that include actual data stats consistently outperform vague requests.
For SQL improvement, always paste: (1) Your current query, (2) The table schemas (CREATE TABLE statements or column names + types), (3) What the query is supposed to return, and (4) What is wrong or slow about it. Then ask specifically: "Rewrite this query to avoid the correlated subquery in the WHERE clause" or "This query scans 50M rows — what indexes or query rewrites would reduce that?" Generic "make my SQL better" prompts miss the context needed for meaningful improvements. Include EXPLAIN output if you have it.