lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Michael McCandless (JIRA)" <>
Subject [jira] [Commented] (LUCENE-4515) Make MemoryIndex more memory efficient
Date Wed, 31 Oct 2012 14:55:12 GMT


Michael McCandless commented on LUCENE-4515:

Cool!: you used the same slice idea that we use to hold postings in
RAM in shared byte[]s, but with int[]s instead.  This should be a huge
reduction on GC load for MemoryIndex.

I agree that DocFieldProcessor.docBoost is unused...

synchronizedAllocator looks unused?  I guess you added that after
removing all sync from RecyclingByteBlockAllocator ... but I think we
can just add synchronizedAllocator back later if/when we need it?
Separately can you call out that RecyclingByteBlockAllocator is not
thread safe in its javadocs?

int[] start; // nocommit maybe we can safe the end array and just check freq - need to change
the SliceReader for this

I think you need the start ... because if you used more than one slice
you won't know how to read "backwards" to get to the starting slice?

intBlockPool = new IntBlockPool(); // nocommit expose allocator and impl a recycling one

If we do that we have to make sure that allocator clears each int[]
before returning it, in getIntBlock().

The added MemoryIndex.reset method is sort of ... spooky?  Like, do we
really need/want to reuse a MemoryIndex?  (I guess this is because we
added passing in an allocator to the ctor ... so you want the byte[]'s
returned to it ... but that also makes me nervous: should we really
pass in an external allocator...?).

> Make MemoryIndex more memory efficient
> --------------------------------------
>                 Key: LUCENE-4515
>                 URL:
>             Project: Lucene - Core
>          Issue Type: Improvement
>          Components: modules/other
>    Affects Versions: 4.0, 4.1, 5.0
>            Reporter: Simon Willnauer
>             Fix For: 4.1, 5.0
>         Attachments: LUCENE-4515.patch
> Currently MemoryIndex uses BytesRef objects to represent terms and holds an int[] per
term per field to represent postings. For highlighting this creates a ton of objects for each
search that 1. need to be GCed and 2. can't be reused.

This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see:

To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message