Elasticsearch Explain API

Do you want to understand why your documents got those scores?

Elasticsearch Explain API

Documents

Let’s go through the Explain API with a set of example documents. In my case I am going to use a small list of movie quotes.

POST /_bulk
{ "index" : { "_index" : "movie_quotes" } }
{ "title" : "The Incredibles", "quote": "Never look back, darling. It distracts from the now" }
{ "index" : { "_index" : "movie_quotes" } }
{ "title" : "The Lion King", "quote": "Oh yes, the past can hurt. But, you can either run from it or learn from it" }
{ "index" : { "_index" : "movie_quotes" } }
{ "title" : "Toy Story", "quote": "To infinity and beyond" }
{ "index" : { "_index" : "movie_quotes" } }
{ "title" : "Ratatouille", "quote": "You must not let anyone define your limits because of where you come from" }
{ "index" : { "_index" : "movie_quotes" } }
{ "title" : "Lilo and Stitch", "quote": "Ohana means family, family means nobody gets left behind. Or forgotten" }

BM25

The BM25 algorithm, the default algorithm for scoring in elasticsearch.

score=boost×freq×log(1+(Nn+0.5)(n+0.5))(freq+k1×(1b+b×dlavgdl))score = \frac{boost \times freq \times log(1 + \frac{(N - n + 0.5)}{(n + 0.5)})}{(freq + k1 \times (1 - b + b \times \frac{dl}{avgdl}))}

Example 1: Fields with less terms are more important.

GET /movie_quotes/_search
{
  "explain": true,
  "query": {
    "match": {
      "quote": "the"
    }
  }
}

Response

{
  "took": 1,
  "timed_out": false,
  "_shards": {
    "total": 1,
    "successful": 1,
    "skipped": 0,
    "failed": 0
  },
  "hits": {
    "total": {
      "value": 2,
      "relation": "eq"
    },
    "max_score": 0.94581884,
    "hits": [
      {
        "_shard": "[movie_quotes][0]",
        "_node": "M9Dx5c1BTk6ehVVSCiJAvQ",
        "_index": "movie_quotes",
        "_id": "LMpi64YBn8MlrX4RQHf9",
        "_score": 0.94581884,
        "_source": {
          "title": "The Incredibles",
          "quote": "Never look back, darling. It distracts from the now"
        },
        "_explanation": {
          "value": 0.94581884,
          "description": "weight(quote:the in 0) [PerFieldSimilarity], result of:",
          "details": [
            {
              "value": 0.94581884,
              "description": "score(freq=1.0), computed as boost * idf * tf from:",
              "details": [
                {
                  "value": 2.2,
                  "description": "boost",
                  "details": []
                },
                {
                  "value": 0.87546873,
                  "description": "idf, computed as log(1 + (N - n + 0.5) / (n + 0.5)) from:",
                  "details": [
                    {
                      "value": 2,
                      "description": "n, number of documents containing term",
                      "details": []
                    },
                    {
                      "value": 5,
                      "description": "N, total number of documents with field",
                      "details": []
                    }
                  ]
                },
                {
                  "value": 0.4910714,
                  "description": "tf, computed as freq / (freq + k1 * (1 - b + b * dl / avgdl)) from:",
                  "details": [
                    {
                      "value": 1,
                      "description": "freq, occurrences of term within document",
                      "details": []
                    },
                    {
                      "value": 1.2,
                      "description": "k1, term saturation parameter",
                      "details": []
                    },
                    {
                      "value": 0.75,
                      "description": "b, length normalization parameter",
                      "details": []
                    },
                    {
                      "value": 9,
                      "description": "dl, length of field",
                      "details": []
                    },
                    {
                      "value": 11,
                      "description": "avgdl, average length of field",
                      "details": []
                    }
                  ]
                }
              ]
            }
          ]
        }
      },
      {
        "_shard": "[movie_quotes][0]",
        "_node": "M9Dx5c1BTk6ehVVSCiJAvQ",
        "_index": "movie_quotes",
        "_id": "Lcpi64YBn8MlrX4RQHf9",
        "_score": 0.71575475,
        "_source": {
          "title": "The Lion King",
          "quote": "Oh yes, the past can hurt. But, you can either run from it or learn from it"
        },
        "_explanation": {
          "value": 0.71575475,
          "description": "weight(quote:the in 1) [PerFieldSimilarity], result of:",
          "details": [
            {
              "value": 0.71575475,
              "description": "score(freq=1.0), computed as boost * idf * tf from:",
              "details": [
                {
                  "value": 2.2,
                  "description": "boost",
                  "details": []
                },
                {
                  "value": 0.87546873,
                  "description": "idf, computed as log(1 + (N - n + 0.5) / (n + 0.5)) from:",
                  "details": [
                    {
                      "value": 2,
                      "description": "n, number of documents containing term",
                      "details": []
                    },
                    {
                      "value": 5,
                      "description": "N, total number of documents with field",
                      "details": []
                    }
                  ]
                },
                {
                  "value": 0.3716216,
                  "description": "tf, computed as freq / (freq + k1 * (1 - b + b * dl / avgdl)) from:",
                  "details": [
                    {
                      "value": 1,
                      "description": "freq, occurrences of term within document",
                      "details": []
                    },
                    {
                      "value": 1.2,
                      "description": "k1, term saturation parameter",
                      "details": []
                    },
                    {
                      "value": 0.75,
                      "description": "b, length normalization parameter",
                      "details": []
                    },
                    {
                      "value": 17,
                      "description": "dl, length of field",
                      "details": []
                    },
                    {
                      "value": 11,
                      "description": "avgdl, average length of field",
                      "details": []
                    }
                  ]
                }
              ]
            }
          ]
        }
      }
    ]
  }
}

Detailed Description

For the document that came first, The Incredibles, we get the score 0.94581884, calculated as follows:

0.94581884=2.2×1×log(1+(52+0.5)(2+0.5))(1+1.2×(10.75+0.75×911))0.94581884 = \frac{2.2 \times 1 \times log(1 + \frac{(5 - 2 + 0.5)}{(2 + 0.5)})}{(1 + 1.2 \times (1 - 0.75 + 0.75 \times \frac{9}{11}))}

For the second document, The Lion King, we get 0.71575475, calculated as follows

0.71575475=2.2×1×log(1+(52+0.5)(2+0.5))(1+1.2×(10.75+0.75×1711))0.71575475 = \frac{2.2 \times 1 \times log(1 + \frac{(5 - 2 + 0.5)}{(2 + 0.5)})}{(1 + 1.2 \times (1 - 0.75 + 0.75 \times \frac{17}{11}))}

For this example the equation is almost exactly the same, the only difference is in the second document, the field is longer by 2 terms, the algorithm has been designed in a way that means this document is less important because it has more terms in total. The reasoning behind this; the shorter the field that contains the term, the more real estate has been used for the term so it must be more valuable.

Example 2: Higher frequency of a term in a field is more important.

GET /movie_quotes/_search
{
  "explain": true,
  "query": {
    "match": {
      "quote": "you"
    }
  }
}

Detailed Description

Skipping the full output this time, for the document that came first, Ratatouille, we get the score 1.1180129, calculated as follows:

1.1180129=2.2×2×log(1+(52+0.5)(2+0.5))(2+1.2×(10.75+0.75×1411))1.1180129= \frac{2.2 \times 2 \times log(1 + \frac{(5 - 2 + 0.5)}{(2 + 0.5)})}{(2 + 1.2 \times (1 - 0.75 + 0.75 \times \frac{14}{11}))}

For the second document, The Lion King, we get 0.71575475, calculated as follows

0.71575475=2.2×1×log(1+(52+0.5)(2+0.5))(1+1.2×(10.75+0.75×1711))0.71575475 = \frac{2.2 \times 1 \times log(1 + \frac{(5 - 2 + 0.5)}{(2 + 0.5)})}{(1 + 1.2 \times (1 - 0.75 + 0.75 \times \frac{17}{11}))}

For this example the equation is very similar again, but the documents have very different scores, more so than the shorter field example. The differences now are in frequency and field length, the field length in the first document is shorter, and the term frequency is higher, the field length is important to a point, but the frequency is more important up to a point, see next example. We also see that the second document has the same score as the first document in the previous example, this is because the circumstances are the same, the same frequency and the same field length, in fact it is the same document.

Example 3: Messing with the algorithm

If the term frequency is so important, can I just make sure my document always comes to the top by repeating the same term over and over?

POST /_bulk
{ "index" : { "_index" : "movie_quotes" } }
{ "title" : "Movie 1", "quote": "Movie movie movie movie." }
{ "index" : { "_index" : "movie_quotes" } }
{ "title" : "Movie 2", "quote": "Movie movie movie movie movie movie movie movie." }
GET /movie_quotes/_search
{
  "explain": true,
  "query": {
    "match": {
      "quote": "movie"
    }
  }
}

Description

For Movie 2, we get 2.2614799 and for Movie 1 we get 2.1889362. These 2 scores are very similar now, the reason is that the freq is in the numerator and the denominator at first, as the frequency of the term increases the score boosts fast, but when the frequency gets to high, it becomes less and less relevant, even though the Movie 2 document has double the frequency of terms.

Conclusion

These are short examples with a simple match query and we have not seen the interplay of every possible condition yet, but even so this is a good starting point to really get to grips with the scores that our documents get and to understand how to start tuning our documents to take advantage of this algorithm. It is necessary to mention and understand at this point that the exact value of the score is irrelevant, the relative score is the only thing useful for ordering. Some further examples to try out on your own;