

Elasticsearch: The Definitive Guide (e-book)



Elasticsearch: The Definitive Guide (e-book) - Najlepsze oferty
Elasticsearch: The Definitive Guide (e-book) - Opis
Whether you need full-text search or real-time analytics of structured data-or both-the Elasticsearch distributed search engine is an ideal way to put your data to work. This practical guide not only shows you how to search, analyze, and explore data with Elasticsearch, but also helps you deal with the complexities of human language, geolocation, and relationships.If you...re a newcomer to both search and distributed systems, you...ll quickly learn how to integrate Elasticsearch into your application. More experienced users will pick up lots of advanced techniques. Throughout the book, you...ll follow a problem-based approach to learn why, when, and how to use Elasticsearch features.Understand how Elasticsearch interprets data in your documentsIndex and query your data to take advantage of search concepts such as relevance and word proximityHandle human language through the effective use of analyzers and queriesSummarize and group data to show overall trends, with aggregations and analyticsUse geo-points and geo-shapes-Elasticsearch...s approaches to geolocationModel your data to take advantage of Elasticsearch...s horizontal scalabilityLearn how to configure and monitor your cluster in production Spis treści:Foreword
Preface
Who Should Read This Book
Why We Wrote This Book
Elasticsearch Version
How to Read This Book
Navigating This Book
Online Resources
Conventions Used in This Book
Using Code Examples
Safari Books Online
How to Contact Us
Acknowledgments
I. Getting Started
1. You Know, for (...) więcej Search
Installing Elasticsearch
Installing Marvel
Running Elasticsearch
Viewing Marvel and Sense
Talking to Elasticsearch
Java API
RESTful API with JSON over HTTP
Document Oriented
JSON
Finding Your Feet
Lets Build an Employee Directory
Indexing Employee Documents
Retrieving a Document
Search Lite
Search with Query DSL
More-Complicated Searches
Full-Text Search
Phrase Search
Highlighting Our Searches
Analytics
Tutorial Conclusion
Distributed Nature
Next Steps
2. Life Inside a Cluster
An Empty Cluster
Cluster Health
Add an Index
Add Failover
Scale Horizontally
Then Scale Some More
Coping with Failure
3. Data In, Data Out
What Is a Document?
Document Metadata
_index
_type
_id
Other Metadata
Indexing a Document
Using Our Own ID
Autogenerating IDs
Retrieving a Document
Retrieving Part of a Document
Checking Whether a Document Exists
Updating a Whole Document
Creating a New Document
Deleting a Document
Dealing with Conflicts
Optimistic Concurrency Control
Using Versions from an External System
Partial Updates to Documents
Using Scripts to Make Partial Updates
Updating a Document That May Not Yet Exist
Updates and Conflicts
Retrieving Multiple Documents
Cheaper in Bulk
Dont Repeat Yourself
How Big Is Too Big?
4. Distributed Document Store
Routing a Document to a Shard
How Primary and Replica Shards Interact
Creating, Indexing, and Deleting a Document
Retrieving a Document
Partial Updates to a Document
Multidocument Patterns
Why the Funny Format?
5. SearchingThe Basic Tools
The Empty Search
hits
took
shards
timeout
Multi-index, Multitype
Pagination
Search Lite
The _all Field
More Complicated Queries
6. Mapping and Analysis
Exact Values Versus Full Text
Inverted Index
Analysis and Analyzers
Built-in Analyzers
When Analyzers Are Used
Testing Analyzers
Specifying Analyzers
Mapping
Core Simple Field Types
Viewing the Mapping
Customizing Field Mappings
index
analyzer
Updating a Mapping
Testing the Mapping
Complex Core Field Types
Multivalue Fields
Empty Fields
Multilevel Objects
Mapping for Inner Objects
How Inner Objects are Indexed
Arrays of Inner Objects
7. Full-Body Search
Empty Search
Query DSL
Structure of a Query Clause
Combining Multiple Clauses
Queries and Filters
Performance Differences
When to Use Which
Most Important Queries and Filters
term Filter
terms Filter
range Filter
exists and missing Filters
bool Filter
match_all Query
match Query
multi_match Query
bool Query
Combining Queries with Filters
Filtering a Query
Just a Filter
A Query as a Filter
Validating Queries
Understanding Errors
Understanding Queries
8. Sorting and Relevance
Sorting
Sorting by Field Values
Multilevel Sorting
Sorting on Multivalue Fields
String Sorting and Multifields
What Is Relevance?
Understanding the Score
Understanding Why a Document Matched
Fielddata
9. Distributed Search Execution
Query Phase
Fetch Phase
Search Options
preference
timeout
routing
search_type
scan and scroll
10. Index Management
Creating an Index
Deleting an Index
Index Settings
Configuring Analyzers
Custom Analyzers
Creating a Custom Analyzer
Types and Mappings
How Lucene Sees Documents
How Types Are Implemented
Avoiding Type Gotchas
The Root Object
Properties
Metadata: _source Field
Metadata: _all Field
Metadata: Document Identity
Dynamic Mapping
Customizing Dynamic Mapping
date_detection
dynamic_templates
Default Mapping
Reindexing Your Data
Index Aliases and Zero Downtime
11. Inside a Shard
Making Text Searchable
Immutability
Dynamically Updatable Indices
Deletes and Updates
Near Real-Time Search
refresh API
Making Changes Persistent
flush API
Segment Merging
optimize API
II. Search in Depth
12. Structured Search
Finding Exact Values
term Filter with Numbers
term Filter with Text
Internal Filter Operation
Combining Filters
Bool Filter
Nesting Boolean Filters
Finding Multiple Exact Values
Contains, but Does Not Equal
Equals Exactly
Ranges
Ranges on Dates
Ranges on Strings
Dealing with Null Values
exists Filter
missing Filter
exists/missing on Objects
All About Caching
Independent Filter Caching
Controlling Caching
Filter Order
13. Full-Text Search
Term-Based Versus Full-Text
The match Query
Index Some Data
A Single-Word Query
Multiword Queries
Improving Precision
Controlling Precision
Combining Queries
Score Calculation
Controlling Precision
How match Uses bool
Boosting Query Clauses
Controlling Analysis
Default Analyzers
Configuring Analyzers in Practice
Use index settings, not config files
Keep it simple
Relevance Is Broken!
14. Multifield Search
Multiple Query Strings
Prioritizing Clauses
Single Query String
Know Your Data
Best Fields
dis_max Query
Tuning Best Fields Queries
tie_breaker
multi_match Query
Using Wildcards in Field Names
Boosting Individual Fields
Most Fields
Multifield Mapping
Cross-fields Entity Search
A Naive Approach
Problems with the most_fields Approach
Field-Centric Queries
Problem 1: Matching the Same Word in Multiple Fields
Problem 2: Trimming the Long Tail
Problem 3: Term Frequencies
Solution
Custom _all Fields
cross-fields Queries
Per-Field Boosting
Exact-Value Fields
15. Proximity Matching
Phrase Matching
Term Positions
What Is a Phrase
Mixing It Up
Multivalue Fields
Closer Is Better
Proximity for Relevance
Improving Performance
Rescoring Results
Finding Associated Words
Producing Shingles
Multifields
Searching for Shingles
Performance
16. Partial Matching
Postcodes and Structured Data
prefix Query
wildcard and regexp Queries
Query-Time Search-as-You-Type
Index-Time Optimizations
Ngrams for Partial Matching
Index-Time Search-as-You-Type
Preparing the Index
Querying the Field
Edge n-grams and Postcodes
Ngrams for Compound Words
17. Controlling Relevance
Theory Behind Relevance Scoring
Boolean Model
Term Frequency/Inverse Document Frequency (TF/IDF)
Term frequency
Inverse document frequency
Field-length norm
Putting it together
Vector Space Model
Lucenes Practical Scoring Function
Query Normalization Factor
Query Coordination
Index-Time Field-Level Boosting
Query-Time Boosting
Boosting an Index
t.getBoost()
Manipulating Relevance with Query Structure
Not Quite Not
boosting Query
Ignoring TF/IDF
constant_score Query
function_score Query
Boosting by Popularity
modifier
factor
boost_mode
max_boost
Boosting Filtered Subsets
filter Versus query
functions
score_mode
Random Scoring
The Closer, The Better
Understanding the price Clause
Scoring with Scripts
Pluggable Similarity Algorithms
Okapi BM25
Term-frequency saturation
Field-length normalization
Tuning BM25
Changing Similarities
Configuring BM25
Relevance Tuning Is the Last 10%
III. Dealing with Human Language
18. Getting Started with Languages
Using Language Analyzers
Configuring Language Analyzers
Pitfalls of Mixing Languages
At Index Time
Incorrect stemming
Incorrect inverse document frequencies
At Query Time
Identifying Language
One Language per Document
Foreign Words
One Language per Field
Mixed-Language Fields
Split into Separate Fields
Analyze Multiple Times
Use n-grams
19. Identifying Words
standard Analyzer
standard Tokenizer
Installing the ICU Plug-in
icu_tokenizer
Tidying Up Input Text
Tokenizing HTML
Tidying Up Punctuation
20. Normalizing Tokens
In That Case
You Have an Accent
Retaining Meaning
Living in a Unicode World
Unicode Case Folding
Unicode Character Folding
Sorting and Collations
Case-Insensitive Sorting
Differences Between Languages
Unicode Collation Algorithm
Unicode Sorting
Specifying a Language
Multiple sort orders
Customizing Collations
21. Reducing Words to Their Root Form
Algorithmic Stemmers
Using an Algorithmic Stemmer
Dictionary Stemmers
Hunspell Stemmer
Installing a Dictionary
Per-Language Settings
Creating a Hunspell Token Filter
Hunspell Dictionary Format
Choosing a Stemmer
Stemmer Performance
Stemmer Quality
Stemmer Degree
Making a Choice
Controlling Stemming
Preventing Stemming
Customizing Stemming
Stemming in situ
Is Stemming in situ a Good Idea
22. Stopwords: Performance Versus Precision
Pros and Cons of Stopwords
Using Stopwords
Stopwords and the Standard Analyzer
Maintaining Positions
Specifying Stopwords
Using the stop Token Filter
Updating Stopwords
Stopwords and Performance
and Operator
minimum_should_match
Divide and Conquer
Controlling Precision
Only High-Frequency Terms
More Control with Common Terms
Stopwords and Phrase Queries
Positions Data
Index Options
Stopwords
common_grams Token Filter
At Index Time
Unigram Queries
Bigram Phrase Queries
Two-Word Phrases
Stopwords and Relevance
23. Synonyms
Using Synonyms
Formatting Synonyms
Expand or contract
Simple Expansion
Simple Contraction
Genre Expansion
Synonyms and The Analysis Chain
Case-Sensitive Synonyms
Multiword Synonyms and Phrase Queries
Use Simple Contraction for Phrase Queries
Synonyms and the query_string Query
Symbol Synonyms
24. Typoes and Mispelings
Fuzziness
Fuzzy Query
Improving Performance
Fuzzy match Query
Scoring Fuzziness
Phonetic Matching
IV. Aggregations
25. High-Level Concepts
Buckets
Metrics
Combining the Two
26. Aggregation Test-Drive
Adding a Metric to the Mix
Buckets Inside Buckets
One Final Modification
27. Building Bar Charts
28. Looking at Time
Returning Empty Buckets
Extended Example
The Skys the Limit
29. Scoping Aggregations
Global Bucket
30. Filtering Queries and Aggregations
Filtered Query
Filter Bucket
Post Filter
Recap
31. Sorting Multivalue Buckets
Intrinsic Sorts
Sorting by a Metric
Sorting Based on Deep Metrics
32. Approximate Aggregations
Finding Distinct Counts
Understanding the Trade-offs
Optimizing for Speed
Calculating Percentiles
Percentile Metric
Percentile Ranks
Understanding the Trade-offs
33. Significant Terms
significant_terms Demo
Recommending Based on Popularity
Recommending Based on Statistics
34. Controlling Memory Use and Latency
Fielddata
Aggregations and Analysis
High-Cardinality Memory Implications
Limiting Memory Usage
Fielddata Size
Monitoring fielddata
Circuit Breaker
Fielddata Filtering
Doc Values
Enabling Doc Values
Preloading Fielddata
Eagerly Loading Fielddata
Global Ordinals
Building global ordinals
Eager global ordinals
Index Warmers
Preventing Combinatorial Explosions
Depth-First Versus Breadth-First
35. Closing Thoughts
V. Geolocation
36. Geo-Points
Lat/Lon Formats
Filtering by Geo-Point
geo_bounding_box Filter
Optimizing Bounding Boxes
geo_distance Filter
Faster Geo-Distance Calculations
geo_distance_range Filter
Caching geo-filters
Reducing Memory Usage
Sorting by Distance
Scoring by Distance
37. Geohashes
Mapping Geohashes
geohash_cell Filter
38. Geo-aggregations
geo_distance Aggregation
geohash_grid Aggregation
geo_bounds Aggregation
39. Geo-shapes
Mapping geo-shapes
precision
distance_error_pct
Indexing geo-shapes
Querying geo-shapes
Querying with Indexed Shapes
Geo-shape Filters and Caching
VI. Modeling Your Data
40. Handling Relationships
Application-side Joins
Denormalizing Your Data
Field Collapsing
Denormalization and Concurrency
Renaming Files and Directories
Solving Concurrency Issues
Global Locking
Document Locking
Tree Locking
41. Nested Objects
Nested Object Mapping
Querying a Nested Object
Sorting by Nested Fields
Nested Aggregations
reverse_nested Aggregation
When to Use Nested Objects
42. Parent-Child Relationship
Parent-Child Mapping
Indexing Parents and Children
Finding Parents by Their Children
min_children and max_children
Finding Children by Their Parents
Children Aggregation
Grandparents and Grandchildren
Practical Considerations
Memory Use
Global Ordinals and Latency
Multigenerations and Concluding Thoughts
43. Designing for Scale
The Unit of Scale
Shard Overallocation
Kagillion Shards
Capacity Planning
Replica Shards
Balancing Load with Replicas
Multiple Indices
Time-Based Data
Index per Time Frame
Index Templates
Retiring Data
Migrate Old Indices
Optimize Indices
Closing Old Indices
Archiving Old Indices
User-Based Data
Shared Index
Faking Index per User with Aliases
One Big User
Scale Is Not Infinite
VII. Administration, Monitoring, and Deployment
44. Monitoring
Marvel for Monitoring
Cluster Health
Drilling Deeper: Finding Problematic Indices
Blocking for Status Changes
Monitoring Individual Nodes
indices Section
OS and Process Sections
JVM Section
Threadpool Section
FS and Network Sections
Circuit Breaker
Cluster Stats
Index Stats
Pending Tasks
cat API
45. Production Deployment
Hardware
Memory
CPUs
Disks
Network
General Considerations
Java Virtual Machine
Transport Client Versus Node Client
Configuration Management
Important Configuration Changes
Assign Names
Paths
Minimum Master Nodes
Recovery Settings
Prefer Unicast over Multicast
Dont Touch These Settings!
Garbage Collector
Threadpools
Heap: Sizing and Swapping
Give Half Your Memory to Lucene
Dont Cross 32 GB!
Swapping Is the Death of Performance
File Descriptors and MMap
Revisit This List Before Production
46. Post-Deployment
Changing Settings Dynamically
Logging
Slowlog
Indexing Performance Tips
Test Performance Scientifically
Using and Sizing Bulk Requests
Storage
Segments and Merging
Other
Rolling Restarts
Backing Up Your Cluster
Creating the Repository
Snapshotting All Open Indices
Snapshotting Particular Indices
Listing Information About Snapshots
Deleting Snapshots
Monitoring Snapshot Progress
Canceling a Snapshot
Restoring from a Snapshot
Monitoring Restore Operations
Canceling a Restore
Clusters Are Living, Breathing Creatures
Index mniej
Elasticsearch: The Definitive Guide (e-book) - Opinie i recenzje
Na liście znajdują się opinie, które zostały zweryfikowane (potwierdzone zakupem) i oznaczone są one zielonym znakiem Zaufanych Opinii. Opinie niezweryfikowane nie posiadają wskazanego oznaczenia.