lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jan Høydahl (Commented) (JIRA) <j...@apache.org>
Subject [jira] [Commented] (SOLR-1535) Pre-analyzed field type
Date Fri, 21 Oct 2011 15:56:32 GMT

    [ https://issues.apache.org/jira/browse/SOLR-1535?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13132768#comment-13132768
] 

Jan Høydahl commented on SOLR-1535:
-----------------------------------

Became aware of this during EuroCon. This is great stuff.
Have you thought about going <buzzwordAlert>Avro</buzzwordAlert> for the serialization
format? It would better support changing serialization format in new versions, and be more
compact, especially when serializing binary data (instead of using base64). The Avro version
of the document could also be the new binary serialization format to replace JavaBin so that
other clients than SolrJ can benefit from binary streaming.
                
> Pre-analyzed field type
> -----------------------
>
>                 Key: SOLR-1535
>                 URL: https://issues.apache.org/jira/browse/SOLR-1535
>             Project: Solr
>          Issue Type: New Feature
>    Affects Versions: 1.5
>            Reporter: Andrzej Bialecki 
>             Fix For: 3.5, 4.0
>
>         Attachments: SOLR-1535.patch, preanalyzed.patch, preanalyzed.patch
>
>
> PreAnalyzedFieldType provides a functionality to index (and optionally store) content
that was already processed and split into tokens using some external processing chain. This
implementation defines a serialization format for sending tokens with any currently supported
Attributes (eg. type, posIncr, payload, ...). This data is de-serialized into a regular TokenStream
that is returned in Field.tokenStreamValue() and thus added to the index as index terms, and
optionally a stored part that is returned in Field.stringValue() and is then added as a stored
value of the field.
> This field type is useful for integrating Solr with existing text-processing pipelines,
such as third-party NLP systems.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

       

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


Mime
View raw message