lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From bugzi...@apache.org
Subject DO NOT REPLY [Bug 34066] New: - [PATCH] Extension to binary Fields that allows fixed byte buffer
Date Fri, 18 Mar 2005 02:29:12 GMT
DO NOT REPLY TO THIS EMAIL, BUT PLEASE POST YOUR BUG·
RELATED COMMENTS THROUGH THE WEB INTERFACE AVAILABLE AT
<http://issues.apache.org/bugzilla/show_bug.cgi?id=34066>.
ANY REPLY MADE TO THIS MESSAGE WILL NOT BE COLLECTED AND·
INSERTED IN THE BUG DATABASE.

http://issues.apache.org/bugzilla/show_bug.cgi?id=34066

           Summary: [PATCH] Extension to binary Fields that allows fixed
                    byte buffer
           Product: Lucene
           Version: CVS Nightly - Specify date in submission
          Platform: All
        OS/Version: All
            Status: NEW
          Severity: normal
          Priority: P3
         Component: Index
        AssignedTo: lucene-dev@jakarta.apache.org
        ReportedBy: chuck@manawiz.com


This is a very simple patch that supports storing binary values in the index
more efficiently.  A new Field constructor accepts a length argument, allowing a
fixed byte[] to be reused acrossed multiple calls with arguments of different
sizes.  A companion change to FieldsWriter uses this length when storing and/or
compressing the field.

There is one remaining case in Document.  Intentionally, no direct accessor to
the length of a binary field is provided from Document, only from Field.  This
is because Field's created by FieldReader will never have a specified length and
this is usual case for Field's read from Document.  It seems less confusing for
most users.

I don't believe any upward incompatibility is introduced here (e.g., from the
possibility of getting a larger byte[] than actually holds the value from
Document), since no such byte[] values are possible without this patch anyway.

The compression case is still inefficient (much copying), but it is hard to see
how Lucene can do too much better.  However, the application can do the
compression externally and pass in the reused compression-output buffer as a
binary value (which is what I'm doing).  This represents a substantialy
allocation savings for storing large documents bodies (compressed) into the
Lucene index.

Two patch files are attached, both created by svn on 3/17/05.

-- 
Configure bugmail: http://issues.apache.org/bugzilla/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.

---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org


Mime
View raw message