lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Alvaro Cabrerizo <topor...@gmail.com>
Subject Re: Indexing a token to a different field in a custom filter
Date Tue, 12 Nov 2013 08:20:20 GMT
Hi,

Maybe the synonym
filter<http://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters#solr.SynonymFilterFactory>is
the mirror you can look in. You can start creating a new field type in
your schema that is stanbol enhanced. Let's follow with the parallelism, in
the case of synonym we could have this schema:

...
<fielType name="synonymtext" class="solr.TextField"
positionIncrementGap="100">
  <tokenizer class="solr.WhitespaceTokenizerFactory" />
  <filter class="solr.SynonymFilterFactory" synonyms="synonyms.txt"
ignoreCase="true" expand="true" />
</fieldType>
...
<field name="id" type="string" indexed="true" stored="true" required="true"
/>
<field name="description" type="synonymtext" indexed="true" stored="true"
multiValued="true" />
...

In the case of stanbol:

...
<fielType name="stanboltext" class="solr.TextField"
positionIncrementGap="100">
  <tokenizer class="solr.WhitespaceTokenizerFactory" />
  <filter class="StanbolFilterFactory"  your Stanbol filter parameters here
/>
</fieldType>
...
<field name="id" type="string" indexed="true" stored="true" required="true"
/>
<field name="description" type="synonymtext" indexed="true" stored="true"
multiValued="true" />
...

Thus the StanbolFilterFactory is in charge of connecting ot Stanbol and
enhance the data coming from WhitespaceTokenizerFactory, creating an output
that can be used by other filters.

How do you index your data, then?

Just send your doc:

id:your id
description:the data to be enhanced


Other path you can follow is imitate the behaviour of
CopyField<http://wiki.apache.org/solr/SchemaXml#Copy_Fields>in a more
sofisticated fashion i.e. (copy, enhance an put in a new field).
The you can have the next schema:

...
<fielType name="text" class="solr.TextField" positionIncrementGap="100">
  <tokenizer class="solr.WhitespaceTokenizerFactory" />
</fieldType>
...
<field name="id" type="string" indexed="true" stored="true" required="true"
/>
<field name="description" type="text" indexed="true" stored="true"
multiValued="true" />
<field name="enhancedDescription" type="text" indexed="true" stored="true"
multiValued="true" />
<copyEnhanceField source="description" dest="enhancedDescription" />

The copyEnhanceField is now in charge of take the original field, send to
stanbol, get the response and write it in the new field.

How do you index your data then?

Just send your doc:

id:your id
description:the original data

And you will get in solr:

id:your id
description:the original data
enhancedDescription:the enhanced data


Regards

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message