

The Greek origins of the word are the prefix σύν (syn, “together”) and ὄνομα (ónoma, “name”). Things like stemmers or fuzzy queries address some of the most common of these problems, but they don’t bridge the gap between relating concepts and ideas or between slightly different vocabulary usage in the documents and queries. The matching process when searching uses simple string similarity, which is the reason why even small spelling mistakes (“hous”) or the use of a plural of a word (“houses”) in a query won’t match a document containing only the singular (“house”). Documents and queries are analyzed and reduced to their smallest units, often called tokens, which are essentially abstract symbols. To understand the usefulness and flexibility of synonyms, let’s take a quick look at how most of today's search engines work internally. In addition to presenting this new API, this blog will answer some common questions around using synonyms and point out some frequent caveats around their use. The most notable is probably functionality that allows for reloading search-time analyzers, which in turn enables search-time synonyms to be changed and reloaded.

There have been some recent improvements around analysis in Elasticsearch lately. Synonym filters are part of the analysis process that converts input text into searchable terms, and while they are relatively easy to get started with, their use can be quite varied and require some deeper understanding of concepts before applying them successfully in a real-world scenario. At the same time, some complexities and subtleties arising from their use are sometimes underestimated, even by advanced users. While novices sometimes underestimated their importance, almost no real-life search system can work without them.


Using synonyms is undoubtedly one of the most important techniques in a search engineer's tool belt.
