Friday, June 24, 2011

Error in default important level weights for the Full-text index mappings

The title for this post might seem somewhat cryptic if you haven’t worked with managed properties in FAST for SharePoint, and mapped them to different priority levels. But I will walk you thru it getting to the error.

First of all, the relevance score for a search hit in FAST for SharePoint is build up of many different values, where one part of the score is how important is the field which contains data where your query matched. As an example if your query matches words in the title it will get rated higher compared to if it matched in the body text of the document.

When you create a managed property which holds textual content, you can set how important this field is, from level 1-7 as seen on the image below. You can get to this screen by going to

Central Admin –> FAST Query SSA –> FAST Search Administration –> Managed properties –> <click a property> –> <scroll to the bottom of the page>

image

Using PowerShell we can list the weights behind the difference important levels.

$rankprofile = Get-FASTSearchMetadataRankProfile default
$content = $rankprofile.GetFullTextIndexRanks()|where-Object -filterscript {$_.FullTextIndexReference.Name -eq "content"}
$content.GetImportanceLevelWeight(1)
30
$content.GetImportanceLevelWeight(2)
10
$content.GetImportanceLevelWeight(3)
20
$content.GetImportanceLevelWeight(4)
30
$content.GetImportanceLevelWeight(5)
40
$content.GetImportanceLevelWeight(6)
50
$content.GetImportanceLevelWeight(7)
60


As you can see Level 1 has a weight of 30, the same as Level 4, and this is where the error is.

To rule out any magic going on behind the scenes I conducted a test. First I created three crawled properties, which each was mapped to three managed properties, as circled in red in the first image. Then I created three documents with the same content, where levelone.txt was indexed into the madcowone field, leveltwo.txt into madcowtwo and levelfive.txt into madcowfive. I also set the freshness weight to zero, to rule out the time factor on ranking.

When I executed a search against these three documents, they all got the same rank score of 39, but I could see via my FS4SP Query Logger tool that they did get different context scores, but they were sorted on random as you can see in the output (shortened for clarity):

image
I have highlighted the context score and also the level in which we got a hit, which corresponds to the name of the document. The reason for the low score is that the Context Weight doesn’t count as much compared to other factors in the static rank.

Next I changed the context weight from the default of 50 to 200. Executing the same query I now got these results, sorted in the “correct” order, levelfive.txt, levelone.txt and leveltwo.txt.

image

This means that the Level 1 field clearly ranks above Level 2 and 3, and this is most likely an error with the product. And you probably want to change the values for the importance levels in your deployments to match the expected behavior.

6 comments:

  1. Hi Mikael

    A great find! :)

    After changing the weight for level 1, do I have to recrawl or "nctrl.exe reset" or is this weight change applied immediately?

    Regards,

    Chris

    ReplyDelete
  2. You have to recrawl as the importance levels are part of the static rank in the full-text index. (I verified it just now).

    It's easily checked with http://fs4splogger.codeplex.com/. An invaluable tool when working with rank tuning.

    ReplyDelete
  3. Thank you, Mikael. It works a fine treat! :) I have downloaded your query log analyser tool -- in fact, this is how I debugged the original issue.

    ReplyDelete
    Replies
    1. If you have ideas for the tool, feel free to log them on the codeplex site or at my blog :)

      Delete
  4. This is a great post Mikael! I have one question. I am using query logger from codeplex(this is a great tool by the way) and I do not set my managed properties in the log although they are mapped to an importance level. However, I do see the default properties like "Title". am I missing something? Thanks in advance.

    ReplyDelete
    Replies
    1. Hi,
      Appreciate you like the tool :) Could you output the settings of one of the custom managed properties for reference? (eg via PowerShell)

      Thanks,
      Mikael

      Delete