Monday, January 17, 2011

How to “spy” the data in a custom pipeline extensibility stage with FS4SP

In the old FAST a much used stage during development is the “Spy” stage. What this stage does is dump out a log file of all current attributes and the values assigned to them at that point in the content processing pipeline.

Fortunately for us, this stage still exists in FS4SP, and it might help you when testing and debugging your crawling.

In order to enable the spy stage, first stop the FAST configserver

nctrl stop configserver

Second, open up %FASTSEARCH%\etc\pipelineconfig.xml

Typically you want to add your spy stage before or after the custom extensibility. In the example below I have added it before.
image

After the edit, save your file, and start the configserver up again.

nctrl start configserver

If you watch the %FASTSEARCH\var\log folder during indexing you will see a file named spy.txt appear which contains all current fields available to you.
image

The file is overwritten by each processed file and will contain information from the latest document only. If you index using only one document processor it’s still a valuable tool during development to check that you are receiving the data you expect for your custom stage.

3 comments:

  1. Hello Mikael, We are migrating from ESP 5.3 to FS4SP. Your blog is wonderful. Bookmarked. I have a question, Can we have multiple pipelines in FS4SP like in ESP? And can we add custom stages? like an equivalent to the python stage in ESP?

    Freddie

    ReplyDelete
  2. Hello freddiemaize, and thanks for the kind words :)

    You can have multiple pipelines and collections, but this is not supported or recommended. Remember that collections are merely a mental model with an extra property called meta.collection, as all content is stored in the same big bucket.

    There is one default pipeline which should be left untouched. This pipeline has an extensibility point where you can hook in your own modules, typically written in C#. If you only want the module to work on certain item types, you simply add your conditions in the module (if data.Contains("author") do logic).

    You should take a look at http://msdn.microsoft.com/en-us/library/ff795821.aspx which explains your options for custom item processing.

    That said, as en ESP developer you might find it cumbersome to do modification like you are used to, but I have yet to find a scenario which is not solvable in a supported manner when moving over to FS4SP.

    Any questions you might have on how to accomplish the migration can be asked at the FS4SP forum (http://social.technet.microsoft.com/Forums/en-US/fastsharepoint/)

    ReplyDelete
  3. Mikael, Thanks for the beautiful reply. Never expected, I should say. Will be sure to participate in the forum with my future doubts.

    >>as en ESP developer you might find it cumbersome to do modification like you are used to

    This certainly gives me hope!!

    I'm adding your blog to my blog roll :)

    ReplyDelete