lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Peter Sturge <peter.stu...@googlemail.com>
Subject Re: Experience with indexing billions of documents?
Date Fri, 02 Apr 2010 17:30:30 GMT
You can do this today with multiple indexes, replication and distributed
searching.
SolrCloud/clustering will certainly make life easier when it comes to
managing these,
but with distributed searches over multiple indexes, you're limited only by
how much hardware you can throw at it.


On Fri, Apr 2, 2010 at 6:17 PM, <darren@ontrenet.com> wrote:

> My guess is that you will need to take advantage of Solr 1.5's upcoming
> cloud/cluster renovations and use multiple indexes to comfortably achieve
> those numbers. Hypthetically, in that case, you won't be limited by single
> index docid limitations of Lucene.
>
> > We are currently indexing 5 million books in Solr, scaling up over the
> > next few years to 20 million.  However we are using the entire book as a
> > Solr document.  We are evaluating the possibility of indexing individual
> > pages as there are some use cases where users want the most relevant
> pages
> > regardless of what book they occur in.  However, we estimate that we are
> > talking about somewhere between 1 and 6 billion pages and have concerns
> > over whether Solr will scale to this level.
> >
> > Does anyone have experience using Solr with 1-6 billion Solr documents?
> >
> > The lucene file format document
> > (http://lucene.apache.org/java/3_0_1/fileformats.html#Limitations)
> > mentions a limit of about 2 billion document ids.   I assume this is the
> > lucene internal document id and would therefore be a per index/per shard
> > limit.  Is this correct?
> >
> >
> > Tom Burton-West.
> >
> >
> >
> >
>
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message