lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From <>
Subject RE: Lucene 4.0 scalability and performance.
Date Mon, 24 Dec 2012 13:30:09 GMT
Thank you

-----Original Message-----
From: Carsten Schnober [] 
Sent: Monday, December 24, 2012 3:25 PM
Subject: Re: Lucene 4.0 scalability and performance.

Am 23.12.2012 12:11, schrieb

> This means that we need to index millions of document with TeraBytes of content and search
in it.
> For now we want to define only one indexed field, contained the content of the documents,
with possibility to search terms and retrieving the terms offsets.
> Does somebody already tested Lucene with TerabBytes of data?
> Does Lucene has some known limitations related to the indexed documents number or to
the indexed documents size?
> What is about search performance in huge set of data?

Hi Vitali,
we've been working on a linguistic search engine based on Lucene 4.0 and have performed a
few tests with large text corpora. There are at least some overlaps in the functionality you
mentioned (term offsets). See
(mainly section 5).

Institut für Deutsche Sprache |
Projekt KorAP                 |
Tel. +49-(0)621-43740789      |
Korpusanalyseplattform der nächsten Generation Next Generation Corpus Analysis Platform

To unsubscribe, e-mail:
For additional commands, e-mail:

To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message