lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Eks Dev (JIRA)" <j...@apache.org>
Subject [jira] Commented: (LUCENE-1219) support array/offset/ length setters for Field with binary data
Date Tue, 05 Aug 2008 19:54:44 GMT

    [ https://issues.apache.org/jira/browse/LUCENE-1219?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12620019#action_12620019
] 

Eks Dev commented on LUCENE-1219:
---------------------------------

Great Mike,
it gets better and better, i saw LUCENE-1340 committed. Thanks to you Grant, Doug and all
others that voted for 1349  this happened so quickly. Trust me, these two issues are really
making my life easier. I pushed decision to add new hardware to some future point (means,
save customer's money now)... a few weeks later would be too late.

Now it remains only to make one nice patch that enables us to pass our own byte[] for retrieving
stored fields during search. I was thinking along the lines of  things you did in Analyzers.

we could pool the same trick for this, eg.

Field Document.getBinaryValue(String FIELD_NAME, Field destination);

Field already has all access methods (get/set), 

the contract would be: If destination==null, new one will be created and returned, if not
we use this one and returne the same object back. The method should check if byte[] is big
enough, if not simple growth policy can be there.  This way we avoid new byte[] each time
you fetch stored field..

I did not look exactly at code now, but the last time I was looking into it it looked as quite
simple to do something along these lines. Do you have some ideas how we could do it better?

Just simple calculation in my case, 
average Hits count is around 200, for each hit we have to fetch one stored field where we
do some post-processing, re-scoring and whatnot. Currently we run max 30 rq/second , with
average document length of 2k you lend at 2K * 200 * 30 = 6000 object allocations per second
totaling 12Mb ... only to get the data... I can imagine people with much longer documents
 (that would be typical lucene use case)  where it gets worse... simply reducing gc() pressure
with really small amount of work. I am sure this would have nice effects on some other use
cases in lucene.

thanks again to all "workers"  behind this greet peace of software...
eks

PS:  I need to find some time to peek at paul's work in LUVENE -1345 and my wish list will
be complete, at least for now (at least until you get your magic with flexi index format done
:)  
 

> support array/offset/ length setters for Field with binary data
> ---------------------------------------------------------------
>
>                 Key: LUCENE-1219
>                 URL: https://issues.apache.org/jira/browse/LUCENE-1219
>             Project: Lucene - Java
>          Issue Type: Improvement
>          Components: Index
>            Reporter: Eks Dev
>            Assignee: Michael McCandless
>            Priority: Minor
>         Attachments: LUCENE-1219.patch, LUCENE-1219.patch, LUCENE-1219.patch, LUCENE-1219.patch,
LUCENE-1219.take2.patch, LUCENE-1219.take3.patch
>
>
> currently Field/Fieldable interface supports only compact, zero based byte arrays. This
forces end users to create and copy content of new objects before passing them to Lucene as
such fields are often of variable size. Depending on use case, this can bring far from negligible
 performance  improvement. 
> this approach extends Fieldable interface with 3 new methods   
> getOffset(); gettLenght(); and getBinaryValue() (this only returns reference to the array)
>    

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org


Mime
View raw message