Return-Path: Delivered-To: apmail-lucene-java-user-archive@www.apache.org Received: (qmail 55628 invoked from network); 31 Mar 2011 19:10:32 -0000 Received: from hermes.apache.org (HELO mail.apache.org) (140.211.11.3) by minotaur.apache.org with SMTP; 31 Mar 2011 19:10:32 -0000 Received: (qmail 40214 invoked by uid 500); 31 Mar 2011 19:10:30 -0000 Delivered-To: apmail-lucene-java-user-archive@lucene.apache.org Received: (qmail 40173 invoked by uid 500); 31 Mar 2011 19:10:30 -0000 Mailing-List: contact java-user-help@lucene.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: java-user@lucene.apache.org Delivered-To: mailing list java-user@lucene.apache.org Received: (qmail 40160 invoked by uid 99); 31 Mar 2011 19:10:30 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 31 Mar 2011 19:10:30 +0000 X-ASF-Spam-Status: No, hits=2.1 required=5.0 tests=FREEMAIL_FROM,FREEMAIL_REPLYTO,RCVD_IN_DNSWL_LOW,RFC_ABUSE_POST,SPF_PASS,T_TO_NO_BRKTS_FREEMAIL X-Spam-Check-By: apache.org Received-SPF: pass (nike.apache.org: domain of simon.willnauer@googlemail.com designates 209.85.212.48 as permitted sender) Received: from [209.85.212.48] (HELO mail-vw0-f48.google.com) (209.85.212.48) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 31 Mar 2011 19:10:23 +0000 Received: by vws7 with SMTP id 7so3311534vws.35 for ; Thu, 31 Mar 2011 12:10:03 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=googlemail.com; s=gamma; h=domainkey-signature:mime-version:reply-to:in-reply-to:references :date:message-id:subject:from:to:cc:content-type :content-transfer-encoding; bh=rNuORC8lHc8Pg8bqgV/nGucKeaGxgk1v+INbDHobj+I=; b=UqzPlibivmwyDYst3h2/QCcxjcf4Qn0PTqRj9ajDK8nXvlTnTaGb35U8ybDBTKoRld 9ZTMjW4t/jMewYSCKg2Pm0UqDQww2OZkZIAU0YgKQEDHqO6PNWXb2Aw3cxwU5pfDzgOC ccR7Qesp8Jm8yir/gOAiAKkXUqsipNRYa8xWE= DomainKey-Signature: a=rsa-sha1; c=nofws; d=googlemail.com; s=gamma; h=mime-version:reply-to:in-reply-to:references:date:message-id :subject:from:to:cc:content-type:content-transfer-encoding; b=TQaKbtbkarl2qyhjOWlMrpATRtbkI8jQhixCGnS8c54Z/++yzLBQLYDH6MqK8lSawB HQLMXTjjfENRvhBdBVCMKZIiX2v5FxEALmQeAjYxHATBklRCDFFKtqx60j10M13x6WZV 9g+nR46bvh0smcT8a2Vg/6os/CAK/mP1UpTuk= MIME-Version: 1.0 Received: by 10.52.95.211 with SMTP id dm19mr4045936vdb.71.1301598602069; Thu, 31 Mar 2011 12:10:02 -0700 (PDT) Received: by 10.52.164.65 with HTTP; Thu, 31 Mar 2011 12:10:01 -0700 (PDT) Reply-To: simon.willnauer@gmail.com In-Reply-To: References: Date: Thu, 31 Mar 2011 21:10:01 +0200 Message-ID: Subject: Re: a faster way to addDocument and get the ID just added? From: Simon Willnauer To: java-user@lucene.apache.org Cc: Ian Lea Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: quoted-printable X-Virus-Checked: Checked by ClamAV on apache.org Hey Ian, On Thu, Mar 31, 2011 at 11:32 AM, Ian Lea wrote: >>> Subject: a faster way to addDocument and get the ID just added? > > Might it be possible to come up with a version of > IndexWriter.addDocument() that returns the docid rather than void? > Answering that question is way out of my league, but it would > presumably be quick. > With the current trunk I think we could do that since doc IDs are assigned in DocumentsWriter and we only have one instance of this although we are indexing into multiple mem segments and merge on flush. But, (yeah there must be a but :) we are working on DocumentsWriterPerThread to exploit extra concurrency for 4.0 where this doesn't work anymore since we indexing a segment per thread which is not merged on flush but written directly to disk. With 4.0 we will also have Column Stride Fields which might help with this issue which enables you to use your own docIds stored in a fast accessible integrated column based storage. It might not be as fast as docIds directly but reasonable since it can be access with a low footprint iterator during scoring. maybe that helps once 4.0 is there simon > -- > Ian. > > > On Thu, Mar 31, 2011 at 6:34 AM, Trejkaz wrote: >> On Wed, Mar 30, 2011 at 8:21 PM, Simon Willnauer >> wrote: >>> Before trunk (and I think >>> its in 3.1 also) merge only merged continuous segments so the actual >>> per-segment ID might change but the global document ID doesn't if you >>> only add documents. But this should not be considered a feature. In >>> upcoming version this does not work anymore since merges can now be >>> non-continuous. >> >> This myth was busted some time ago: >> https://issues.apache.org/jira/browse/LUCENE-2506?#comment-12935973 >> >> Summary: selecting segments to merge is decided by MergePolicy, and a >> MergePolicy which does not upset ordering will be remain in existence. >> >>> Anyway, I strongly discourage to rely on lucene document IDs you >>> should not do this at all. Can't you use your own ID mechanism? >> >> This has pretty much already been covered in my reply to the previous >> person that suggested that solution, not to mention in the initial >> email which started the thread. >> >> Summary: the overheads are simply not acceptable. >> >> So far the only remotely helpful suggestion I have heard anywhere is >> to keep two gigantic int[] arrays in memory, mapping the IDs in each >> direction. =C2=A0This would work if we had an infinite amount of memory = to >> play with, but unfortunately we don't. =C2=A01 billion item indexes are >> expected to work, and we can't just tell everyone to buy 8 GB more RAM >> when we update to the next version of our app. =C2=A0If we were a >> server-side app, *maybe* we could... >> >> TX >> >> --------------------------------------------------------------------- >> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org >> For additional commands, e-mail: java-user-help@lucene.apache.org >> >> > > --------------------------------------------------------------------- > To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org > For additional commands, e-mail: java-user-help@lucene.apache.org > > --------------------------------------------------------------------- To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org For additional commands, e-mail: java-user-help@lucene.apache.org