lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Simon Willnauer (JIRA)" <>
Subject [jira] Commented: (LUCENE-2662) BytesHash
Date Tue, 28 Sep 2010 11:52:33 GMT


Simon Willnauer commented on LUCENE-2662:

How about renaming key back to ord? And then maybe rename values to
bytesStart? And in their decls add comments saying they are indexed
by hash code? And maybe rename addByOffset -> addByBytesStart?
I don't like addByBytesStart I would like to keep offset since it really is an offset into
the pool. addByPoolOffset?
The names ord and bytesStart are a good compromise :) lets shoot for that.

On the nocommit in ByteBlockPool - I think that's fine? It's an
internal class....
you refer to this: // nocommit - public arrays are not nice! ?
yeah that more of an style thing but if somebody changes them its their fault for being stupid
I guess.

The nocommit in BytesRefHash seems wrong? (Ie, compact is used
internally)... though maybe we make it private if it's not used

Ah yeah thats bogus - its from a previous iteration which was wrong as well, I will remove.

On the "nocommit factor this out!" in I agree, the
postingsArray.textStarts should go away right? Ie, it's a
[wasteful] copy of what the BytesRefHash is already storing?
Yeah that is the reason for that nocommit. Yet, I though about this a little and I have two
options for this.
 * we could factor out a super class from ParallelPostingArray which only has the textStart
int array, the grow and copy method and let ParallelPostingArray subclass it.
BytesRefHash would accept this class, don't have a good name for it but lets call it TextStartArray
for now, and use it internally. It would call grow() once needed inside BytesRefHash and all
the other code would be unchanged since PPA is a subclass. 
* the other way would be to bind the ByteRefHash to the postings array which seems odd to
me though.

More ideas?

Can we impl BytesRefHash.bytesUsed as an AtomicLong (hmm maybe
AtomicInt - none of these classes can address > 2GB)? Then the
pool would add in blockSize every time it binds a new block. That
method (DW.bytesUsed) is called alot - at least once on every

I did exactly that in the not yet uploaded patch. But I figured that it would maybe make more
sense to use that AtomicInt in the allocator as well as in THPF or is that what you mean?

I'm confused again - when do we use RecyclingByteBlockAllocator
from a single thread...? Ie, why did the sync need to be
conditional for this class, again....? It seems like we always
need it sync'd (both the main pool & per-doc pool need this)? If
so we can simplify and make these methods sync'd?

man, I am sorry - I  thought I will use this in LUCENE-2186 in a single threaded env but if
so I should change it there if needed. I was one step ahead though.
I will change and maybe have a second one if needed. Agree?


> BytesHash
> ---------
>                 Key: LUCENE-2662
>                 URL:
>             Project: Lucene - Java
>          Issue Type: Improvement
>          Components: Index
>    Affects Versions: Realtime Branch, 4.0
>            Reporter: Jason Rutherglen
>            Assignee: Simon Willnauer
>            Priority: Minor
>             Fix For: Realtime Branch, 4.0
>         Attachments: LUCENE-2662.patch, LUCENE-2662.patch, LUCENE-2662.patch, LUCENE-2662.patch
> This issue will have the BytesHash separated out from LUCENE-2186

This message is automatically generated by JIRA.
You can reply to this email to add a comment to the issue online.

To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message