The design patterns in this book capture best practices and solutions to recurring problems in machine learning. The authors, three Google engineers, catalog proven methods to help data scientists tackle common problems throughout the ML process. These design patterns codify the experience of hundreds of experts into straightforward, approachable advice.In this book, you will find detailed explanations of 30 patterns for data and problem representation, operationalization, repeatability, reproducibility, flexibility, explainability, and fairness. Each pattern includes a description of the problem, a variety of potential solutions, and recommendations for choosing the best technique for your situation.You'll learn how to:Identify and mitigate common challenges when training, evaluating, and deploying ML modelsRepresent data for different ML model types, including embeddings, feature crosses, and moreChoose the right model type for specific problemsBuild a robust training loop that uses checkpoints, distribution strategy, and hyperparameter tuningDeploy scalable ML systems that you can retrain and update to reflect new dataInterpret model predictions for stakeholders and ensure models are treating users fairly Spis treści:
Preface
Who Is This Book For?
Whats Not in the Book
Code Samples
Conventions Used in This Book
OReilly Online Learning
How to Contact Us
Acknowledgments
1. The Need for Machine Learning Design Patterns
What Are Design Patterns?
How to Use This Book
Machine Learning Terminology
Models and (...) więcej Frameworks
Data and Feature Engineering
The Machine Learning Process
Data and Model Tooling
Roles
Common Challenges in Machine Learning
Data Quality
Reproducibility
Data Drift
Scale
Multiple Objectives
Summary
2. Data Representation Design Patterns
Simple Data Representations
Numerical Inputs
Why scaling is desirable
Linear scaling
Nonlinear transformations
Array of numbers
Categorical Inputs
One-hot encoding
Array of categorical variables
Design Pattern 1: Hashed Feature
Problem
Solution
Why It Works
Out-of-vocabulary input
High cardinality
Cold start
Trade-Offs and Alternatives
Bucket collision
Skew
Aggregate feature
Hyperparameter tuning
Cryptographic hash
Order of operations
Empty hash buckets
Design Pattern 2: Embeddings
Problem
Solution
Text embeddings
Image embeddings
Why It Works
Trade-Offs and Alternatives
Choosing the embedding dimension
Autoencoders
Context language models
Embeddings in a data warehouse
Design Pattern 3: Feature Cross
Problem
Solution
Feature cross in BigQuery ML
Feature crosses in TensorFlow
Why It Works
Trade-Offs and Alternatives
Handling numerical features
Handling high cardinality
Need for regularization
Design Pattern 4: Multimodal Input
Problem
Solution
Trade-Offs and Alternatives
Tabular data multiple ways
Multimodal representation of text
Text data multiple ways
Extracting tabular features from text
Multimodal representation of images
Images as pixel values
Images as tiled structures
Combining different image representations
Using images with metadata
Multimodal feature representations and model interpretability
Summary
3. Problem Representation Design Patterns
Design Pattern 5: Reframing
Problem
Solution
Why It Works
Capturing uncertainty
Changing the objective
Trade-Offs and Alternatives
Bucketized outputs
Other ways of capturing uncertainty
Precision of predictions
Restricting the prediction range
Label bias
Multitask learning
Design Pattern 6: Multilabel
Problem
Solution
Trade-Offs and Alternatives
Sigmoid output for models with two classes
Which loss function should we use?
Parsing sigmoid results
Dataset considerations
Inputs with overlapping labels
One versus rest
Design Pattern 7: Ensembles
Problem
Solution
Bagging
Boosting
Stacking
Why It Works
Bagging
Boosting
Stacking
Trade-Offs and Alternatives
Increased training and design time
Dropout as bagging
Decreased model interpretability
Choosing the right tool for the problem
Other ensemble methods
Design Pattern 8: Cascade
Problem
Solution
Trade-Offs and Alternatives
Deterministic inputs
Single model
Internal consistency
Pre-trained models
Reframing instead of Cascade
Regression in rare situations
Design Pattern 9: Neutral Class
Problem
Solution
Why It Works
Synthetic data
In the real world
Trade-Offs and Alternatives
When human experts disagree
Customer satisfaction
As a way to improve embeddings
Reframing with neutral class
Design Pattern 10: Rebalancing
Problem
Solution
Choosing an evaluation metric
Downsampling
Weighted classes
Upsampling
Trade-Offs and Alternatives
Reframing and Cascade
Anomaly detection
Number of minority class examples available
Combining different techniques
Choosing a model architecture
Importance of explainability
Summary
4. Model Training Patterns
Typical Training Loop
Stochastic Gradient Descent
Keras Training Loop
Training Design Patterns
Design Pattern 11: Useful Overfitting
Problem
Solution
Why It Works
Trade-Offs and Alternatives
Interpolation and chaos theory
Monte Carlo methods
Data-driven discretizations
Unbounded domains
Distilling knowledge of neural network
Overfitting a batch
Design Pattern 12: Checkpoints
Problem
Solution
Why It Works
Trade-Offs and Alternatives
Early stopping
Checkpoint selection
Regularization
Two splits
Fine-tuning
Redefining an epoch
Steps per epoch
Retraining with more data
Virtual epochs
Design Pattern 13: Transfer Learning
Problem
Solution
Bottleneck layer
Implementing transfer learning
Pre-trained embeddings
Why It Works
Trade-Offs and Alternatives
Fine-tuning versus feature extraction
Focus on image and text models
Embeddings of words versus sentences
Design Pattern 14: Distribution Strategy
Problem
Solution
Synchronous training
Asynchronous training
Why It Works
Trade-Offs and Alternatives
Model parallelism
ASICs for better performance at lower cost
Choosing a batch size
Minimizing I/O waits
Design Pattern 15: Hyperparameter Tuning
Problem
Manual tuning
Grid search and combinatorial explosion
Solution
Why It Works
Nonlinear optimization
Bayesian optimization
Trade-Offs and Alternatives
Fully managed hyperparameter tuning
Genetic algorithms
Summary
5. Design Patterns for Resilient Serving
Design Pattern 16: Stateless Serving Function
Problem
Solution
Model export
Inference in Python
Create web endpoint
Why It Works
Autoscaling
Fully managed
Language-neutral
Powerful ecosystem
Trade-Offs and Alternatives
Custom serving function
Multiple signatures
Online prediction
Prediction library
Design Pattern 17: Batch Serving
Problem
Solution
Why It Works
Trade-Offs and Alternatives
Batch and stream pipelines
Cached results of batch serving
Lambda architecture
Design Pattern 18: Continued Model Evaluation
Problem
Solution
Concept
Deploying the model
Saving predictions
Capturing ground truth
Evaluating model performance
Continuous evaluation
Why It Works
Trade-Offs and Alternatives
Triggers for retraining
Scheduled retraining
Data validation with TFX
Estimating retraining interval
Design Pattern 19: Two-Phase Predictions
Problem
Solution
Phase 1: Building the offline model
Phase 2: Building the cloud model
Trade-Offs and Alternatives
Standalone single-phase model
Offline support for specific use cases
Handling many predictions in near real time
Continuous evaluation for offline models
Design Pattern 20: Keyed Predictions
Problem
Solution
How to pass through keys in Keras
Adding keyed prediction capability to an existing model
Trade-Offs and Alternatives
Asynchronous serving
Continuous evaluation
Summary
6. Reproducibility Design Patterns
Design Pattern 21: Transform
Problem
Solution
Trade-Offs and Alternatives
Transformations in TensorFlow and Keras
Efficient transformations with tf.transform
Text and image transformations
Alternate pattern approaches
Design Pattern 22: Repeatable Splitting
Problem
Solution
Trade-Offs and Alternatives
Single query
Random split
Split on multiple columns
Repeatable sampling
Sequential split
Stratified split
Unstructured data
Design Pattern 23: Bridged Schema
Problem
Solution
Bridged schema
Probabilistic method
Static method
Augmented data
Trade-Offs and Alternatives
Union schema
Cascade method
Handling new features
Handling precision increases
Design Pattern 24: Windowed Inference
Problem
Solution
Trade-Offs and Alternatives
Reduce computational overhead
Per element versus over a time interval
High-throughput data streams
Streaming SQL
Sequence models
Stateful features
Batching prediction requests
Design Pattern 25: Workflow Pipeline
Problem
Solution
Building the TFX pipeline
Running the pipeline on Cloud AI Platform
Why It Works
Trade-Offs and Alternatives
Creating custom components
Integrating CI/CD with pipelines
Apache Airflow and Kubeflow Pipelines
Development versus production pipelines
Lineage tracking in ML pipelines
Design Pattern 26: Feature Store
Problem
Solution
Feast
Adding feature data to Feast
Creating a FeatureSet
Adding entities and features to the FeatureSet
Registering the FeatureSet
Ingesting feature data into the FeatureSet
Retrieving data from Feast
Batch serving
Online serving
Why It Works
Trade-Offs and Alternatives
Alternative implementations
Transform design pattern
Design Pattern 27: Model Versioning
Problem
Solution
Types of model users
Model versioning with a managed service
Trade-Offs and Alternatives
Other serverless versioning tools
TensorFlow Serving
Multiple serving functions
New models versus new model versions
Summary
7. Responsible AI
Design Pattern 28: Heuristic Benchmark
Problem
Solution
Trade-Offs and Alternatives
Development check
Human experts
Utility value
Design Pattern 29: Explainable Predictions
Problem
Solution
Model baseline
SHAP
Explanations from deployed models
Trade-Offs and Alternatives
Data selection bias
Counterfactual analysis and example-based explanations
Limitations of explanations
Design Pattern 30: Fairness Lens
Problem
Solution
Before training
After training
Trade-Offs and Alternatives
Fairness Indicators
Automating data evaluation
Allow and disallow lists
Data augmentation
Model Cards
Fairness versus explainability
Summary
8. Connected Patterns
Patterns Reference
Pattern Interactions
Patterns Within ML Projects
ML Life Cycle
Discovery
Development
Deployment
AI Readiness
Tactical phase: Manual development
Strategic phase: Utilizing pipelines
Transformational phase: Fully automated processes
Common Patterns by Use Case and Data Type
Natural Language Understanding
Computer Vision
Predictive Analytics
Recommendation Systems
Fraud and Anomaly Detection
Index

Machine Learning Design Patterns - Opinie i recenzje

Na liście znajdują się opinie, które zostały zweryfikowane (potwierdzone zakupem) i oznaczone są one zielonym znakiem Zaufanych Opinii. Opinie niezweryfikowane nie posiadają wskazanego oznaczenia.

Aktualnie brak opinii na temat Machine Learning Design Patterns - Dodaj opinię