To the rescue comes PowerShell. Since all FAST servers have PowerShell installed you can create a PowerShell script and use this. All you need is notepad
See my post “How To: Debug and log FAST Search pipeline extensibility stages in Visual Studio” on how to do this in C#, and also how to create your own property set with PowerShell commands.In order to register the script in pipelineextensibility.xml we prepend the ps1 file with the PowerShell runtime. Make sure to keep the <PipelineExtensibility>
<Run command="C:\Windows\System32\WindowsPowerShell\v1.0\PowerShell.exe C:\FASTSearch\pipelinemodules\concept.ps1 %(input)s %(output)s"> <Input> <CrawledProperty propertySet="48385C54-CDFC-4E84-8117-C95B3CF8911C" varType="31" propertyName="docvector"/> </Input> <Output> <CrawledProperty propertySet="fa585f53-2679-48d9-976d-9ce62e7e19b7" varType="31" propertyName="concepts"/> </Output> </Run>
I’m not a PowerShell expert, so there might be better ways of doing some parts of this script. Make a note that I read and write the xml files with UTF-8 encoding. This is crucial as this is what the pipeline works with.
concept.ps1
function CreateXml()
{
param ([string]$set, [string]$name, [int]$type, $value)
$resultXml = New-Object xml
$doc = $resultXml.CreateElement("Document")
$crawledProperty = $resultXml.CreateElement("CrawledProperty")
$propSet = $resultXml.CreateAttribute("propertySet")
$propSet.innerText = $set
$propName = $resultXml.CreateAttribute("propertyName")
$propName.innerText = $name
$varType = $resultXml.CreateAttribute("varType")
$varType.innerText = $type
$crawledProperty.Attributes.Append($propSet) > $null
$crawledProperty.Attributes.Append($propName) > $null
$crawledProperty.Attributes.Append($varType) > $null
$crawledProperty.innerText = $value
$doc.AppendChild($crawledProperty) > $null
$resultXml.AppendChild($doc) > $null
$xmlDecl = $resultXml.CreateXmlDeclaration("1.0", "UTF-8", "")
$el = $resultXml.psbase.DocumentElement
$resultXml.InsertBefore($xmlDecl, $el) > $null
return $resultXml
}
function DoWork()
{
param ([string]$inputFile, [string]$outputFile)
$propertyGroupIn = "48385c54-cdfc-4e84-8117-c95b3cf8911c" # FAST internal group
$propertyNameIn = "docvector" # property name
$dataTypeIn = 31 # integer
$propertyGroupOut = "fa585f53-2679-48d9-976d-9ce62e7e19b7" # Custom group
$propertyNameOut = "concepts" # property name
$dataTypeOut = 31 # integer
$xmldata = [xml](Get-Content $inputFile -Encoding UTF8)
$node = $xmldata.Document.CrawledProperty | Where-Object { $_.propertySet -eq $propertyGroupIn -and $_.propertyName -eq $propertyNameIn -and $_.varType -eq $dataTypeIn }
$data = $node.innerText
# do your custom modification on $data here
$resultXml = CreateXml $propertyGroupOut $propertyNameOut $dataTypeOut $data
$resultXml.OuterXml | Out-File $outputFile -Encoding UTF8
}
# pass input and output file paths as arguments
DoWork $args[0] $args[1]



Mikael, how did you figure out the property set guid for docvector?
ReplyDeleteI used a spy stage as per http://techmikael.blogspot.com/2011/01/how-to-spy-raw-data-and-available.html and examined the output.
ReplyDeletehi mikael, how would you handle if there is more than one multi-valued crawled properties need to be process?
ReplyDeleteDo you mean how to merge several multi-valued cp's into one mp, or several mappings?
DeleteIf the former, then you would concatenate all values using unicode 0x2029 character as the separator. And make sure the mp allows multiple values.
If the latter, you would output several cp's and map them to their mp counterpart.