lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Erick Erickson <erickerick...@gmail.com>
Subject Re: Exception while loading 2 Billion + Documents in Solr 4.8.0
Date Wed, 11 Feb 2015 13:05:44 GMT
bq: Are there any such structures?

Well, I thought there were, but I've got to admit I can't call any to mind
immediately.

bq: 2b is just the hard limit

Yeah, I'm always a little nervous as to when Moore's Law will make
everything I know about current systems' performance obsolete.

At any rate, I _can_ say with certainty that I have no interest at this
point in exceeding this limit. Of course that may change with
compelling use-cases ;)....

Best,
Erick

On Wed, Feb 11, 2015 at 4:14 AM, Toke Eskildsen <te@statsbiblioteket.dk> wrote:
> Erick Erickson [erickerickson@gmail.com] wrote:
>
>> I guess my $0.02 is that you'd have to have strong evidence that extending
>> Lucene to 64 bit is even useful. Or more generally, useful enough to pay the
>> penalty. All the structures that allocate maxDoc id arrays would suddenly
>> require twice the memory for instance,
>
> Are there any such structures? It was my impressions that ID-structures in Solr were
either bitmaps, hashmaps or queues. Anyway, if the number of places with full-size ID-arrays
is low, there could be dual implementations selected by maxDoc.
>
>> plus all the coding effort that could be spend doing other things.
>
> Very true. I agree that at the current stage, > 2b/shard is still a bit too special
to spend a lot of effort on it.
>
> However, 2b is just the hard limit. As has been discussed before, single shards works
best in the lower end of the hundreds of millions of documents. One reason is that many parts
of Lucene works single-threaded on structures that scale linear to document count. Having
some hundreds of millions of documents (log analysis being the typical case) is not uncommon
these days. A gradual shift to more multi-thread oriented processing would fit well with current
trends in hardware as well as use cases. As opposed to the int->long switch, there would
be little to no penalty for setups with low maxDocs (they would just use 1 thread).
>
> - Toke Eskildsen

Mime
View raw message