Wednesday, March 2, 2011

Prototyping pipeline stages in PowerShell

The defacto way of creating a custom pipeline stage in FAST for SharePoint is to create an executable file which reads and writes an xml file. This usually implies having Visual Studio available and compiling and deploying a new file each time you make a change in order to test it in a proper pipeline.

To the rescue comes PowerShell. Since all FAST servers have PowerShell installed you can create a PowerShell script and use this. All you need is notepad Smile This gives the flexibility of trying out stuff without recompiling. But the cost is speed of execution. So you might want to port the code over to e.g. C# when you are done testing your code.
See my post “How To: Debug and log FAST Search pipeline extensibility stages in Visual Studio” on how to do this in C#, and also how to create your own property set with PowerShell commands.
In order to register the script in pipelineextensibility.xml we prepend the ps1 file with the PowerShell runtime. Make sure to keep the <PipelineExtensibility> section around your <Run>  section(s).
<Run command="C:\Windows\System32\WindowsPowerShell\v1.0\PowerShell.exe C:\FASTSearch\pipelinemodules\concept.ps1 %(input)s %(output)s">
<Input>      
<CrawledProperty propertySet="48385C54-CDFC-4E84-8117-C95B3CF8911C" varType="31" propertyName="docvector"/>
</Input>
<Output>
<CrawledProperty propertySet="fa585f53-2679-48d9-976d-9ce62e7e19b7" varType="31" propertyName="concepts"/>
</Output>
</Run>

In PowerShell you have the option to instantiate any .Net object, and for this script I use the XmlDocument class for xml creation out the result file. The script is pretty straight forward as it reads in an xml file. Selects a property called “docvector” and writes this out to a new field called “concepts”.  Typically you would do more work than just copy an attribute, but it shows the concept.

I’m not a PowerShell expert, so there might be better ways of doing some parts of this script. Make a note that I read and write the xml files with UTF-8 encoding. This is crucial as this is what the pipeline works with.

concept.ps1
function CreateXml()
{
param ([string]$set, [string]$name, [int]$type, $value)

$resultXml = New-Object xml
$doc = $resultXml.CreateElement("Document")

$crawledProperty = $resultXml.CreateElement("CrawledProperty")
$propSet = $resultXml.CreateAttribute("propertySet")
$propSet.innerText = $set
$propName = $resultXml.CreateAttribute("propertyName")
$propName.innerText = $name
$varType = $resultXml.CreateAttribute("varType")
$varType.innerText = $type

$crawledProperty.Attributes.Append($propSet) > $null
$crawledProperty.Attributes.Append($propName) > $null
$crawledProperty.Attributes.Append($varType) > $null

$crawledProperty.innerText = $value

$doc.AppendChild($crawledProperty) > $null
$resultXml.AppendChild($doc) > $null
$xmlDecl = $resultXml.CreateXmlDeclaration("1.0", "UTF-8", "")
$el = $resultXml.psbase.DocumentElement
$resultXml.InsertBefore($xmlDecl, $el) > $null

return $resultXml
}

function DoWork()
{
param ([string]$inputFile, [string]$outputFile)    
$propertyGroupIn = "48385c54-cdfc-4e84-8117-c95b3cf8911c" # FAST internal group
$propertyNameIn = "docvector" # property name
$dataTypeIn = 31 # integer

$propertyGroupOut = "fa585f53-2679-48d9-976d-9ce62e7e19b7" # Custom group
$propertyNameOut = "concepts" # property name
$dataTypeOut = 31 # integer

$xmldata = [xml](Get-Content $inputFile -Encoding UTF8)
$node = $xmldata.Document.CrawledProperty | Where-Object {  $_.propertySet -eq $propertyGroupIn -and  $_.propertyName -eq $propertyNameIn -and $_.varType -eq $dataTypeIn }
$data = $node.innerText

# do your custom modification on $data here

$resultXml = CreateXml $propertyGroupOut $propertyNameOut $dataTypeOut $data
$resultXml.OuterXml | Out-File $outputFile -Encoding UTF8
}
# pass input and output file paths as arguments
DoWork $args[0] $args[1]

6 comments:

  1. Mikael, how did you figure out the property set guid for docvector?

    ReplyDelete
  2. I used a spy stage as per http://techmikael.blogspot.com/2011/01/how-to-spy-raw-data-and-available.html and examined the output.

    ReplyDelete
  3. hi mikael, how would you handle if there is more than one multi-valued crawled properties need to be process?

    ReplyDelete
    Replies
    1. Do you mean how to merge several multi-valued cp's into one mp, or several mappings?

      If the former, then you would concatenate all values using unicode 0x2029 character as the separator. And make sure the mp allows multiple values.

      If the latter, you would output several cp's and map them to their mp counterpart.

      Delete
  4. Hi ,

    can you please let us know ,how we need handle the same for Sharepoint 2013,to display multi value properties with individual count.


    For example, it is returning;

    Client A
    Client A; Client C
    Client B; Client A
    Client C
    I would like it to return;

    Client A
    Client B
    Client C

    ReplyDelete
    Replies
    1. Hi,
      My research on this so far says you have to create a content enrichment stage (web service) where you split the value and assign it back as multi-value (List).

      Delete