Return-Path: X-Original-To: apmail-lucene-java-user-archive@www.apache.org Delivered-To: apmail-lucene-java-user-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id D3996D2D7 for ; Mon, 24 Dec 2012 13:30:38 +0000 (UTC) Received: (qmail 73043 invoked by uid 500); 24 Dec 2012 13:30:36 -0000 Delivered-To: apmail-lucene-java-user-archive@lucene.apache.org Received: (qmail 72934 invoked by uid 500); 24 Dec 2012 13:30:36 -0000 Mailing-List: contact java-user-help@lucene.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: java-user@lucene.apache.org Delivered-To: mailing list java-user@lucene.apache.org Received: (qmail 72923 invoked by uid 99); 24 Dec 2012 13:30:36 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 24 Dec 2012 13:30:36 +0000 X-ASF-Spam-Status: No, hits=-0.0 required=5.0 tests=SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (athena.apache.org: domain of Vitaly_Artemov@mcafee.com designates 161.69.47.167 as permitted sender) Received: from [161.69.47.167] (HELO MIVWSMAILOUT1.mcafee.com) (161.69.47.167) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 24 Dec 2012 13:30:31 +0000 Received: from MIVEXAMER1N2.corp.nai.org (unknown [10.48.48.12]) by MIVWSMAILOUT1.mcafee.com with smtp id 3eda_d711_179d2bb2_128b_4cc5_aca9_b0d8f660e51c; Mon, 24 Dec 2012 07:30:09 -0600 Received: from MIVEXEMEA1N1.corp.nai.org ([169.254.3.106]) by MIVEXAMER1N2.corp.nai.org ([169.254.3.39]) with mapi id 14.02.0318.001; Mon, 24 Dec 2012 08:30:10 -0500 From: To: Subject: RE: Lucene 4.0 scalability and performance. Thread-Topic: Lucene 4.0 scalability and performance. Thread-Index: Ac3g+1Lu49Wk7fxfRg+SNRBPRuu4PgBCKcGAAApMyfA= Date: Mon, 24 Dec 2012 13:30:09 +0000 Message-ID: <7B17ABD38A7A234E8C7D4462A26A489B0861669B@MIVEXEMEA1N1.corp.nai.org> References: <7B17ABD38A7A234E8C7D4462A26A489B086162B1@MIVEXEMEA1N1.corp.nai.org> <50D857A9.7010502@ids-mannheim.de> In-Reply-To: <50D857A9.7010502@ids-mannheim.de> Accept-Language: en-US Content-Language: en-US X-MS-Has-Attach: X-MS-TNEF-Correlator: x-originating-ip: [10.48.48.243] Content-Type: text/plain; charset="iso-8859-1" Content-Transfer-Encoding: quoted-printable MIME-Version: 1.0 X-Virus-Checked: Checked by ClamAV on apache.org Thank you -----Original Message----- From: Carsten Schnober [mailto:schnober@ids-mannheim.de]=20 Sent: Monday, December 24, 2012 3:25 PM To: java-user@lucene.apache.org Subject: Re: Lucene 4.0 scalability and performance. Am 23.12.2012 12:11, schrieb Vitaly_Artemov@McAfee.com: > This means that we need to index millions of document with TeraBytes of c= ontent and search in it. > For now we want to define only one indexed field, contained the content o= f the documents, with possibility to search terms and retrieving the terms = offsets. > Does somebody already tested Lucene with TerabBytes of data? > Does Lucene has some known limitations related to the indexed documents n= umber or to the indexed documents size? > What is about search performance in huge set of data? Hi Vitali, we've been working on a linguistic search engine based on Lucene 4.0 and ha= ve performed a few tests with large text corpora. There are at least some o= verlaps in the functionality you mentioned (term offsets). See http://www.o= egai.at/konvens2012/proceedings/27_schnober12p/ (mainly section 5). Carsten -- Institut f=FCr Deutsche Sprache | http://www.ids-mannheim.de Projekt KorAP | http://korap.ids-mannheim.de Tel. +49-(0)621-43740789 | schnober@ids-mannheim.de Korpusanalyseplattform der n=E4chsten Generation Next Generation Corpus Ana= lysis Platform --------------------------------------------------------------------- To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org For additional commands, e-mail: java-user-help@lucene.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org For additional commands, e-mail: java-user-help@lucene.apache.org