lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From bugzi...@apache.org
Subject DO NOT REPLY [Bug 34066] - [PATCH] Extension to binary Fields that allows fixed byte buffer
Date Wed, 23 Mar 2005 01:45:50 GMT
DO NOT REPLY TO THIS EMAIL, BUT PLEASE POST YOUR BUG·
RELATED COMMENTS THROUGH THE WEB INTERFACE AVAILABLE AT
<http://issues.apache.org/bugzilla/show_bug.cgi?id=34066>.
ANY REPLY MADE TO THIS MESSAGE WILL NOT BE COLLECTED AND·
INSERTED IN THE BUG DATABASE.

http://issues.apache.org/bugzilla/show_bug.cgi?id=34066





------- Additional Comments From chuck@manawiz.com  2005-03-23 02:45 -------
I don't see how the first change (initializing fieldsDataLength to -1 and
testing it in binaryLength()) provides any additional safety.  The patch always
initializes fieldsDataLength in Field() for any binary value, so unless that was
changed the value would never be -1 in binaryLength().  The main risk in this
area of the code is that somebody calls binaryValue() without calling
binaryLength() to get the length, which seems impossible to address.  Another
risk is that somebody calls binaryLength() on a non-binary field -- the current
patch returns null in that case, consistent with binaryValue(), while the
changed version would get an NPE.  I'm probably missing some other case that you
see.

The second change (validating the length passed to Field()) seems an
improvement.  But the biggest risk with this patch is the one I outlined in the
Javadoc, i.e. that somebody passes the same byte array in two different calls to
Field before the use in the first call is consumed (e.g., 2 fields in the same
Document, or the same field in 2 Documents before either is indexed).  I don't
see a good way to protect against that (without a performance hit).

The patch is a bit risky due to this last consideration, but the performance
gain is substantial in the particular case where it is necessary to store large
document bodies in Lucene.  I'm indexing with this now and doing zero
allocations by using nio ByteBuffer and CharBuffer views on a fixed byte array
that holds the successive document bodies.  It rips.  I'm using compression
outside of Lucene (using the same zlib).  The combination of outside-compression
and the patch reduces 5 allocations of each document body to 0, when compared to
passing the bodies as text and letting Lucene do the compression (the biggest
part of this is the outside-compression).

I don't object to either change and would be happy to see them if it means this
gets committed so I can eliminate my local patch!

Chuck


-- 
Configure bugmail: http://issues.apache.org/bugzilla/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.

---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org


Mime
View raw message