lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Chuck Williams (JIRA)" <j...@apache.org>
Subject [jira] Updated: (LUCENE-362) [PATCH] Extension to binary Fields that allows fixed byte buffer
Date Wed, 14 Dec 2005 16:49:46 GMT
     [ http://issues.apache.org/jira/browse/LUCENE-362?page=all ]

Chuck Williams updated LUCENE-362:
----------------------------------

    Attachment: FixedBufferBinaryFields.patch

(Thanks Eric for correcting my mistaken posting to the old issue tracking system)

Better late than never I hope.	FixedBufferBinaryFields.patch is revised to
apply against the latest source and now includes a test case (extension of 
TestBinaryDocument).  This is my last current local patch to Lucene, so it
would be great if it gets committed.  The value again is to eliminate copying
of large binary values to be stored in the Lucene index.  For a compressed
document, for example, if the documents are read and compressed externally in a
fixed buffer and the compressed buffer is passed in, all copying can be
eliminated.

Chuck


> [PATCH] Extension to binary Fields that allows fixed byte buffer
> ----------------------------------------------------------------
>
>          Key: LUCENE-362
>          URL: http://issues.apache.org/jira/browse/LUCENE-362
>      Project: Lucene - Java
>         Type: Bug
>   Components: Index
>     Versions: CVS Nightly - Specify date in submission
>  Environment: Operating System: All
> Platform: All
>     Reporter: Chuck Williams
>     Assignee: Lucene Developers
>  Attachments: Field-extension.patch, Field-extension.patch, FieldsWriter-extension.patch,
FixedBufferBinaryFields.patch
>
> This is a very simple patch that supports storing binary values in the index
> more efficiently.  A new Field constructor accepts a length argument, allowing a
> fixed byte[] to be reused acrossed multiple calls with arguments of different
> sizes.  A companion change to FieldsWriter uses this length when storing and/or
> compressing the field.
> There is one remaining case in Document.  Intentionally, no direct accessor to
> the length of a binary field is provided from Document, only from Field.  This
> is because Field's created by FieldReader will never have a specified length and
> this is usual case for Field's read from Document.  It seems less confusing for
> most users.
> I don't believe any upward incompatibility is introduced here (e.g., from the
> possibility of getting a larger byte[] than actually holds the value from
> Document), since no such byte[] values are possible without this patch anyway.
> The compression case is still inefficient (much copying), but it is hard to see
> how Lucene can do too much better.  However, the application can do the
> compression externally and pass in the reused compression-output buffer as a
> binary value (which is what I'm doing).  This represents a substantialy
> allocation savings for storing large documents bodies (compressed) into the
> Lucene index.
> Two patch files are attached, both created by svn on 3/17/05.

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators:
   http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see:
   http://www.atlassian.com/software/jira


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org


Mime
View raw message