Search engine and text processing functions

This module builds a full-text search index from plain-text documents. It provides a standard tokenizer and Porter stemmer (English), and supports separate text zones, weights, and ranking algorithms. Search query results include contextual information.

The search index is backed by a simpljs database (indexedDB) instance, and individual entries can be re-indexed at any time.



Search engine and utilities. stem is the Porter Stemmer. tokenize is a word segmentation function.


Stemmer takes a token (as provided by Tokenizer) and returns a stemmed term for use in indexing and query parsing.


Tokenizer takes a string of text and returns an array of token-index pairs.


An index is built by parsing text in a document object or string. If a document (as determined by its id) is already indexed, it is purged from the index before re-indexing. Document objects must be JSON-serializable. Results from a search query can be filtered via the filter function and ordered by descending score via the score function. iterate calls each for each document in the index, and stops if each returns true.


database is an opened database instance for storing index data. If specified, zones configures indexing for corresponding fields in document objects provided to the search engine. If no zones are declared, the search engine will index documents as strings or as objects with a 'text' string property. stem and tokenize functions default to the provided Porter Stemmer and tokenizer functions, and can be overridden for each zone. Weights for each zone are used by the default scoring function in sorting the result set.