lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Michael McCandless (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (LUCENE-4515) Make MemoryIndex more memory efficient
Date Wed, 31 Oct 2012 14:55:12 GMT

    [ https://issues.apache.org/jira/browse/LUCENE-4515?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13487830#comment-13487830
] 

Michael McCandless commented on LUCENE-4515:
--------------------------------------------

Cool!: you used the same slice idea that we use to hold postings in
RAM in shared byte[]s, but with int[]s instead.  This should be a huge
reduction on GC load for MemoryIndex.

I agree that DocFieldProcessor.docBoost is unused...

synchronizedAllocator looks unused?  I guess you added that after
removing all sync from RecyclingByteBlockAllocator ... but I think we
can just add synchronizedAllocator back later if/when we need it?
Separately can you call out that RecyclingByteBlockAllocator is not
thread safe in its javadocs?

{quote}
int[] start; // nocommit maybe we can safe the end array and just check freq - need to change
the SliceReader for this
{quote}

I think you need the start ... because if you used more than one slice
you won't know how to read "backwards" to get to the starting slice?

{quote}
intBlockPool = new IntBlockPool(); // nocommit expose allocator and impl a recycling one
{quote}

If we do that we have to make sure that allocator clears each int[]
before returning it, in getIntBlock().

The added MemoryIndex.reset method is sort of ... spooky?  Like, do we
really need/want to reuse a MemoryIndex?  (I guess this is because we
added passing in an allocator to the ctor ... so you want the byte[]'s
returned to it ... but that also makes me nervous: should we really
pass in an external allocator...?).

                
> Make MemoryIndex more memory efficient
> --------------------------------------
>
>                 Key: LUCENE-4515
>                 URL: https://issues.apache.org/jira/browse/LUCENE-4515
>             Project: Lucene - Core
>          Issue Type: Improvement
>          Components: modules/other
>    Affects Versions: 4.0, 4.1, 5.0
>            Reporter: Simon Willnauer
>             Fix For: 4.1, 5.0
>
>         Attachments: LUCENE-4515.patch
>
>
> Currently MemoryIndex uses BytesRef objects to represent terms and holds an int[] per
term per field to represent postings. For highlighting this creates a ton of objects for each
search that 1. need to be GCed and 2. can't be reused.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


Mime
View raw message