lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Erick Erickson (JIRA)" <j...@apache.org>
Subject [jira] [Resolved] (SOLR-810) changes for javabin format
Date Sat, 16 Mar 2013 18:46:13 GMT

     [ https://issues.apache.org/jira/browse/SOLR-810?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Erick Erickson resolved SOLR-810.
---------------------------------

    Resolution: Won't Fix

SPRING_CLEANING_2013 we can reopen if necessary. 
                
> changes for javabin format
> --------------------------
>
>                 Key: SOLR-810
>                 URL: https://issues.apache.org/jira/browse/SOLR-810
>             Project: Solr
>          Issue Type: Improvement
>            Reporter: Noble Paul
>
> For storage purposes javabin can be quite inefficient assuming that we write one document
at a time. The field names may be written for each document which makes it inefficient. 
> javabin can be as efficient as a format like say thrift/protocol buffers if we do not
pay the price of a string per name. We can easily achieve it using a new type  KNOWN_STRING.

> KNOWN_STRING can be like an EXTERN_STRING but it is just that these are preconfigured
string names which is a map of index -> string . The known string list can probably have
a version . The client must be using a newer version known string list than the server . 
> an example looks like
> {code}
> 1:responseHeader
> 2:QTime
> 3:status
> {code}
> A newer version of the string list can add a new string at a new index but it must never
change the index of an existing string. This is similar to an IDL file of thrift/protocol
buffers but w/o any of those complexities
> So when an EXTERN_STRING is written it first looks up in the KNOWN_STRING map. If it
is present , it is written as a KNOWN_STRING instead of an EXTERN_STRING . The value will
be the index
> Another addition could be a zip string type. This is useful when javabin is used for
storing data . In storage, the performance cost of serialization/deserialization may not be
as important as the space itself.  This may also have a minimum size to compress . Only large
strings (say > 2KB?) may need to be serialized

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


Mime
View raw message