THE DATA SCIENCE INTERVIEW BOOK
Buy Me a Coffee ☕FollowForum
  • About
  • Log
  • Mathematical Motivation
  • STATISTICS
    • Probability Basics
    • Probability Distribution
    • Central Limit Theorem
    • Bayesian vs Frequentist Reasoning
    • Hypothesis Testing
    • ⚠️A/B test
  • MODEL BUILDING
    • Overview
    • Data
      • Scaling
      • Missing Value
      • Outlier
      • ⚠️Sampling
      • Categorical Variable
    • Hyperparameter Optimization
  • Algorithms
    • Overview
    • Bias/Variance Tradeoff
    • Regression
    • Generative vs Discriminative Models
    • Classification
    • ⚠️Clustering
    • Tree based approaches
    • Time Series Analysis
    • Anomaly Detection
    • Big O
  • NEURAL NETWORK
    • Neural Network
    • ⚠️Recurrent Neural Network
  • NLP
    • Lexical Processing
    • Syntactic Processing
    • Transformers
  • BUSINESS INTELLIGENCE
    • ⚠️Power BI
      • Charts
      • Problems
    • Visualization
  • PYTHON
    • Theoretical
    • Basics
    • Data Manipulation
    • Statistics
    • NLP
    • Algorithms from scratch
      • Linear Regression
      • Logistic Regression
    • PySpark
  • ML OPS
    • Overview
    • GIT
    • Feature Store
  • SQL
    • Basics
    • Joins
    • Temporary Datasets
    • Windows Functions
    • Time
    • Functions & Stored Proc
    • Index
    • Performance Tuning
    • Problems
  • ⚠️EXCEL
    • Excel Basics
    • Data Manipulation
    • Time and Date
    • Python in Excel
  • MACHINE LEARNING FRAMEWORKS
    • PyCaret
    • ⚠️Tensorflow
  • ANALYTICAL THINKING
    • Business Scenarios
    • ⚠️Industry Application
    • Behavioral/Management
  • Generative AI
    • Vector Database
    • LLMs
  • CHEAT SHEETS
    • NumPy
    • Pandas
    • Pyspark
    • SQL
    • Statistics
    • RegEx
    • Git
    • Power BI
    • Python Basics
    • Keras
    • R Basics
  • POLICIES
    • PRIVACY NOTICE
Powered by GitBook
On this page

Was this helpful?

  1. ML OPS

Feature Store

Many machine learning and AI models work best on summaries of raw data called features. These features structure information into a form that makes it easier to train algorithms.

A simple feature might involve transforming a raw date into a weekday or weekend, both of which might be better predictors of behavior than a raw date number. Other kinds of features can be more complex and require intricate calculations across many data streams. A feature store provides a place to organize the most popular features so they can be reused across projects rather than redone from scratch every time they're used.

A feature store can increase automation, improve productivity by promoting sharing and reuse, reduce technical debt in software code, ensure consistency in calculations and provide governance, auditability and lineage for regulatory compliance, according to David Sweenor, senior director of product marketing at data science tools company Alteryx. However, a feature store isn't ideal for every company. Smaller ones may struggle with the overhead required to create and maintain a feature store. Companies may also struggle with reusing features across departments.

What are the benefits of a feature store?

A feature, as it relates to data science, is any variable that can be used for analytics. Simple examples include name, age, sex, zip code and amount. These raw variables are transformed through a process known as feature engineering to yield better predictions. For example, a date could be transformed into a day of the week, a day of the year or a holiday.

A feature store enables a data scientist to create this transformation once rather than having each data scientist recreate the same features repeatedly. This ensures consistency since everyone is using the exact same transformation as part of their models. It also reduces the need to insert the same algorithm within code. If a company decides to change a complex feature, a feature store enables them to change it once and propagate it across all models that use it. Otherwise, someone would have to manually edit all the models using that feature.

Since processing these data is very expensive, and these data are slow-changing, it makes sense to process them once every hour or day and store the features into a feature store for hundreds of teams to use [machine learning] ML to solve their business problems.

PreviousGITNextBasics

Last updated 1 year ago

Was this helpful?