lucene-solr-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Apache Wiki <>
Subject [Solr Wiki] Update of "PreAnalyzedField" by AndrzejBialecki
Date Fri, 11 May 2012 19:16:34 GMT
Dear Wiki user,

You have subscribed to a wiki page or wiki category on "Solr Wiki" for change notification.

The "PreAnalyzedField" page has been changed by AndrzejBialecki:

New page:
= Using PreAnalyzedField type for integration with external document processing pipelines.

''This field type is available since Solr 4.0. ''

PreAnalyzedField type provides a way to send to Solr serialized token streams, optionally
with independent stored values of a field, and have this information stored and indexed without
any additional text processing applied in Solr. This is useful if user wants to submit field
content that was already processed by some existing external text processing pipeline (e.g.
tokenized, annotated, stemmed, inserted synonyms, etc), while using all the rich attributes
that Lucene's TokenStream provides (per-token attributes).

== Pluggable serialization ==
The serialization format is pluggable using implementations of PreAnalyzedParser interface.
There are two out of the box implementations:

 * JsonPreAnalyzedParser - as the name suggests, it parses content that uses JSON to represent
field's content. This is the default parser to use if the field type is not configured otherwise.
 * SimplePreAnalyzedParser - uses a simple strict plain text format, which in some situations
may be easier to create than JSON.

== Configuration options ==
There is only one configuration parameter, `parserImpl`. The value of this parameter should
be a fully qualified class name of a class that implements PreAnalyzedParser interface. The
default value of this parameter is `org.apache.solr.schema.JsonPreAnalyzedParser`.

View raw message