lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Jason Rutherglen (JIRA)" <>
Subject [jira] Commented: (LUCENE-1574) PooledSegmentReader, pools SegmentReader underlying byte arrays
Date Thu, 02 Apr 2009 20:53:13 GMT


Jason Rutherglen commented on LUCENE-1574:

True the pool would hold onto spares, but they would expire.
It's mostly useful for the large on disk segments as those byte
arrays (for BitVectors) are large, and because there's more docs
in them would get hit with deletes more often, and so they'd be
reused fairly often. 

I'm not knowledgeable enough to say whether the transactional
data structure will be fast enough. We had been using
BTreeSet.html in Zoie for deleted docs and it's way slow. Binary
search of an int array is faster, albeit not fast enough. The
multi dimensional array thing isn't fast enough (for searching)
as we implemented this in Bobo. It's implemented in Bobo because
we have a multi value field cache (which is quite large because
for each doc we're storing potentially 64 or more values in an
inplace bitset) and a single massive array kills the GC. In some
cases this is faster than a single large array because of the
way Java (or the OS?) transfers memory around through the CPU

> PooledSegmentReader, pools SegmentReader underlying byte arrays
> ---------------------------------------------------------------
>                 Key: LUCENE-1574
>                 URL:
>             Project: Lucene - Java
>          Issue Type: Improvement
>          Components: contrib/*
>    Affects Versions: 2.4.1
>            Reporter: Jason Rutherglen
>            Priority: Minor
>             Fix For: 2.9
>   Original Estimate: 168h
>  Remaining Estimate: 168h
> PooledSegmentReader pools the underlying byte arrays of deleted docs and norms for realtime
search.  It is designed for use with IndexReader.clone which can create many copies of byte
arrays, which are of the same length for a given segment.  When pooled they can be reused
which could save on memory.  
> Do we want to benchmark the memory usage comparison of PooledSegmentReader vs GC?  Many
times GC is enough for these smaller objects.

This message is automatically generated by JIRA.
You can reply to this email to add a comment to the issue online.

To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message