lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jake Mannix <jake.man...@gmail.com>
Subject Re: Maximum index file size
Date Fri, 23 Oct 2009 05:39:14 GMT
On Thu, Oct 22, 2009 at 10:29 PM, Hrishikesh Agashe <
hrishikesh_agashe@persistent.co.in> wrote:

> Can I create an index file with very large size, like 1 TB or so? Is there
> any limit on how large index file one can create? Also, will I be able to
> search on this 1 TB index file at all?
>

Leaving aside the question of hardware or JVM limits on monstrous files,
this question (can you search this file) is easier: if you've got say, a ten
billion documents in one index, and you have a query which is going to hit
maybe even just 0.1% of the documents, you'll need to do scoring of 10
million hits in the course of that query.  To do this in under a second
means you only have 100 nanoseconds to look at each document.  If your query
hits 1% of your documents, you're down to 10 ns per document.  I've never
tried searching a 1TB index, but I'd say that's pushing it.

Is there a reason you can't shard your index, and instead put maybe 20
shards of 50GB (or better - 100 shards of 10GB) each on a variety of
machines, and just merge results?

  -jake

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message