lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Danil Ε’ORIN <>
Subject Re: Max size of index? How do search engines avoid this?
Date Mon, 18 May 2009 10:21:37 GMT
2GB size is a limitation of OS and/or file systems, not of the index
as supported by Lucene.
There is some other kind of limitation in Lucene: number of documents
< 2147483648
However the size of the lucene index may reach tens and hundreds of GB
way before that.

If you are thinking about BIG indexes, you should forget windows+fat32.

On linux with i've seen big indexes, like 80M of relatively small
documents, about 50Gb on disk
with reasonable performance (on pretty cheap machine)

If you need more documents, better performance, etc, you need to
partition your index into
several smaller indexes running on separate hosts, call them in
parallel and then merge results in a single resultset.

This way of operation is not "built-in" into Lucene, but you can
relativelly easy build a customized wrapper to do that.

AFAIK something simmilar powers google: each box handles about 10M
docs, there are thousands of boxes which do searches in parallel.

On Mon, May 18, 2009 at 12:42, raistlink <> wrote:
> Hi,
> I think I've read that there is a limit for de index, may be 2Gb for fat
> machines. If this is right I ask you for good resources (webs or books)
> about programming search engines to know about the techniques used by big
> search engines to search among such huge data.
> Thanks
> --
> View this message in context:
> Sent from the Lucene - Java Users mailing list archive at
> ---------------------------------------------------------------------
> To unsubscribe, e-mail:
> For additional commands, e-mail:

To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message