THE DATA SCIENCE INTERVIEW BOOK
Buy Me a Coffee ☕FollowForum
  • About
  • Log
  • Mathematical Motivation
  • STATISTICS
    • Probability Basics
    • Probability Distribution
    • Central Limit Theorem
    • Bayesian vs Frequentist Reasoning
    • Hypothesis Testing
    • ⚠️A/B test
  • MODEL BUILDING
    • Overview
    • Data
      • Scaling
      • Missing Value
      • Outlier
      • ⚠️Sampling
      • Categorical Variable
    • Hyperparameter Optimization
  • Algorithms
    • Overview
    • Bias/Variance Tradeoff
    • Regression
    • Generative vs Discriminative Models
    • Classification
    • ⚠️Clustering
    • Tree based approaches
    • Time Series Analysis
    • Anomaly Detection
    • Big O
  • NEURAL NETWORK
    • Neural Network
    • ⚠️Recurrent Neural Network
  • NLP
    • Lexical Processing
    • Syntactic Processing
    • Transformers
  • BUSINESS INTELLIGENCE
    • ⚠️Power BI
      • Charts
      • Problems
    • Visualization
  • PYTHON
    • Theoretical
    • Basics
    • Data Manipulation
    • Statistics
    • NLP
    • Algorithms from scratch
      • Linear Regression
      • Logistic Regression
    • PySpark
  • ML OPS
    • Overview
    • GIT
    • Feature Store
  • SQL
    • Basics
    • Joins
    • Temporary Datasets
    • Windows Functions
    • Time
    • Functions & Stored Proc
    • Index
    • Performance Tuning
    • Problems
  • ⚠️EXCEL
    • Excel Basics
    • Data Manipulation
    • Time and Date
    • Python in Excel
  • MACHINE LEARNING FRAMEWORKS
    • PyCaret
    • ⚠️Tensorflow
  • ANALYTICAL THINKING
    • Business Scenarios
    • ⚠️Industry Application
    • Behavioral/Management
  • Generative AI
    • Vector Database
    • LLMs
  • CHEAT SHEETS
    • NumPy
    • Pandas
    • Pyspark
    • SQL
    • Statistics
    • RegEx
    • Git
    • Power BI
    • Python Basics
    • Keras
    • R Basics
  • POLICIES
    • PRIVACY NOTICE
Powered by GitBook
On this page

Was this helpful?

Log

The journey of the book so far

PreviousAboutNextMathematical Motivation

Last updated 2 months ago

Was this helpful?

Month of Feb, 2025

  • MLE calculation error updated in the Probability Distribution page

Month of Oct, 2024

  • added

Month of Sep, 2024

  • Pyspark added

Month of Feb, 2024

  • page updated

  • Questions added in the page

Older Updates

Month of Oct, 2023

  • added in Power BI

  • pushed to the end of the contents

  • added in SQL

  • updated with Table Variable and Temp tables

  • added in SQL

  • added for Python programming questions

  • New Problems added in Python and SQL

Month of Sep, 2023

  • updated.

  • (WIP) added.

  • SVR and OLS added in .

  • updated with use cases.

  • Optimizers and Optimization Criterion updated in .

  • page added.

  • added in classification.

  • Many new SQL and Python problems added.

  • added in Central Limit Theorem.

  • Coding added in Python.

  • added.

  • section updated

Month of August, 2023

  • Vector Database added in the LLM section.

  • Categorical Encoding added in the data section.

  • Probability Distribution cheat sheet added.

  • Algorithm overview page updated.

  • Bias Variance tradeoff updated.

  • Linear Regression page updated.

  • LLM Section updated to Generative AI.

  • Clustering (WIP) section added.

  • QUALIFY added in Windows functions page.

  • We are now on , and , please do follow we will start uploading content soon.

  • Python in Excel Page added.

  • Many Python Questions added.

  • Transformer page added.

Month of May, 2023

  • Have enabled GitBook's Lens feature in search, which allow users to ask a question and get answers back from the content of the book itself. This is an experimental feature and supported by OpenAI. Please note this is experimental and can be changed or removed at any moment.

  • Work on Power BI section started under the Business Intelligence section.

  • Dark mode and Light mode toggle enabled.

Month of January, 2023

  • R Basics cheat sheet added

  • Python Theoretical Question section updated

  • Mathematical Motivation Page added

Month of November, 2022

  • Group vs Window added

  • Git added in the new ML Ops section

  • Platform migration for the book

  • Cheat Sheet section added

  • ⚠️ Sign beside pages indicate that work is pending on those

  • Added questions to Bias/Variance

  • Python Theoretical section added --> TBA in BOOK

Month of October, 2022

  • More questions added to the Time Series Section

  • Bias/Variance Tradeoff added

  • Ensemble learning section updated in Decision Tree

  • MAP vs MLE added in Probability Basics

  • Basic Overview page added in the Algorithm section

Month of September, 2022

  • As per suggestions by users PDF of the book as been made available as a paid extra. It can be purchased from

  • Big O notation section added

  • Anamoly detection and Time Series section extensively updated

  • Probability [FACEBOOK] N Dice, [SPOTIFY] MLE of Uniform Distribution,Bernoulli trial generator problem solution updated

  • Business Scenarios section updated

Month of August, 2022

  • Behavioral - Management section added

  • New interview questions added

Month of July, 2022

  • Data sampling section added under data

Month of June, 2022

  • We are back post break, keep checking for new content

  • Machine Learning Framework section added and TensorFlow moved into it

  • PyCaret added to Machine Learning Framework section

Month of March, 2022

  • Hyperparameter optimization section completed

  • Had an extremely busy last few weeks and the next few months are going to be packed too

  • Story Telling section added

  • Quick guide to Visualization added

Month of February, 2022

  • Added problems in Python, SQL, Probability

  • Excel section updated

  • Data section has been moved into a new and broader section called Model Building

  • To keep the table of contents clean collapsible headers used in Model Building section

  • Hyperparameter optimization section added

Month of January, 2022

  • Neural Network section added

  • Added new problems in the Probability section

  • Added cartoons in a few sections

  • Outlier section added

Month of December, 2021

  • NLP section updated

  • Got our first bug reported by a reader 😍

Month of November, 2021

  • NLP section updated

  • Missing values section added

  • Formatting changes in the Statistics section

  • Took some break, was obsessively working on this 😌

  • New section - Tree based approaches, Industry application added

  • Decided to make this page a little more interesting

  • Launched our LinkedIn page do , have some interesting plans for it in near future

  • Added support for dark theme, 🤯 had to remove it as it was breaking a lot of other stuff. Will wait for official support

  • Added new problems in Probability, Python, Regression, SQL

  • Added Temporary Datasets and Time page in SQL covering CTEs

  • Regression section extensively updated

Month of October, 2021

  • Major updates to the SQL section

  • TensorFlow, Excel, Data Sections added

  • Added new problems in Probability, Python, SQL, Business Case

  • Cleaned up the formatting issues

  • Added this change log section

  • Added Generative VS Discriminative Models section

  • Completed Hypothesis Testing

Pyspark page
cheat sheet
Clustering
Problems page
Cheatsheets
Index
Temp datasets
Performance Tuning
Study reference materials
Windows Function
A/B test page
Regression page
Algorithm overview page
Model Building Overview
Algorithms from scratch
Common hypothesis tests
Neural Network
Instagram
TikTok
YouTube
here
Follow LinkedIn
Confidence Interval
Classification metrics
Naive Bayes
Clustering