Return-Path: Delivered-To: apmail-lucene-java-user-archive@www.apache.org Received: (qmail 69152 invoked from network); 31 Mar 2011 09:32:49 -0000 Received: from hermes.apache.org (HELO mail.apache.org) (140.211.11.3) by minotaur.apache.org with SMTP; 31 Mar 2011 09:32:49 -0000 Received: (qmail 46901 invoked by uid 500); 31 Mar 2011 09:32:47 -0000 Delivered-To: apmail-lucene-java-user-archive@lucene.apache.org Received: (qmail 46751 invoked by uid 500); 31 Mar 2011 09:32:47 -0000 Mailing-List: contact java-user-help@lucene.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: java-user@lucene.apache.org Delivered-To: mailing list java-user@lucene.apache.org Received: (qmail 46743 invoked by uid 99); 31 Mar 2011 09:32:47 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 31 Mar 2011 09:32:47 +0000 X-ASF-Spam-Status: No, hits=-0.7 required=5.0 tests=FREEMAIL_FROM,RCVD_IN_DNSWL_LOW,SPF_PASS,T_TO_NO_BRKTS_FREEMAIL X-Spam-Check-By: apache.org Received-SPF: pass (athena.apache.org: domain of ian.lea@gmail.com designates 209.85.210.176 as permitted sender) Received: from [209.85.210.176] (HELO mail-iy0-f176.google.com) (209.85.210.176) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 31 Mar 2011 09:32:42 +0000 Received: by iym1 with SMTP id 1so3171976iym.35 for ; Thu, 31 Mar 2011 02:32:22 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; h=domainkey-signature:mime-version:in-reply-to:references:from:date :message-id:subject:to:content-type:content-transfer-encoding; bh=TI4UZOSDvLCc+YNOQSKVRk1tQfLk9vCCTvK3oZ5nQBo=; b=Rr8flTUoi0GwqBT2ow3Fdt8xpFM/kG7fNAoo4KM7sumCeqL3EacrjpWN80oLo11svQ i4pLGL3qzVesZplVNJrI/UxXT+a24laGD6UN0N/43s68Y5LsjF0+OZqpPACdeKNPHpTD KdmzrtLD6QJh+MFCr7mihD7yO8YDf/ExvJTB8= DomainKey-Signature: a=rsa-sha1; c=nofws; d=gmail.com; s=gamma; h=mime-version:in-reply-to:references:from:date:message-id:subject:to :content-type:content-transfer-encoding; b=qIhXDJ3KBFZTtLnNBhug4MKcWZKbmmR8O/PK3pvpyjUNMLwlF30XiuNWeWnwAdR1Pi IBc+qQ4PWrX/jrRcM0mwV6U2ABqDpcCm4+mF+yFCqRzxqruklPey6PbB8Vb8g1U+puDq YhgF5d0H6BEWMvCFgJdxjvVgeynRIFh6LhA0w= Received: by 10.231.212.154 with SMTP id gs26mr2436005ibb.121.1301563942119; Thu, 31 Mar 2011 02:32:22 -0700 (PDT) MIME-Version: 1.0 Received: by 10.231.199.134 with HTTP; Thu, 31 Mar 2011 02:32:02 -0700 (PDT) In-Reply-To: References: From: Ian Lea Date: Thu, 31 Mar 2011 10:32:02 +0100 Message-ID: Subject: Re: a faster way to addDocument and get the ID just added? To: java-user@lucene.apache.org Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: quoted-printable >> Subject: a faster way to addDocument and get the ID just added? Might it be possible to come up with a version of IndexWriter.addDocument() that returns the docid rather than void? Answering that question is way out of my league, but it would presumably be quick. -- Ian. On Thu, Mar 31, 2011 at 6:34 AM, Trejkaz wrote: > On Wed, Mar 30, 2011 at 8:21 PM, Simon Willnauer > wrote: >> Before trunk (and I think >> its in 3.1 also) merge only merged continuous segments so the actual >> per-segment ID might change but the global document ID doesn't if you >> only add documents. But this should not be considered a feature. In >> upcoming version this does not work anymore since merges can now be >> non-continuous. > > This myth was busted some time ago: > https://issues.apache.org/jira/browse/LUCENE-2506?#comment-12935973 > > Summary: selecting segments to merge is decided by MergePolicy, and a > MergePolicy which does not upset ordering will be remain in existence. > >> Anyway, I strongly discourage to rely on lucene document IDs you >> should not do this at all. Can't you use your own ID mechanism? > > This has pretty much already been covered in my reply to the previous > person that suggested that solution, not to mention in the initial > email which started the thread. > > Summary: the overheads are simply not acceptable. > > So far the only remotely helpful suggestion I have heard anywhere is > to keep two gigantic int[] arrays in memory, mapping the IDs in each > direction. =A0This would work if we had an infinite amount of memory to > play with, but unfortunately we don't. =A01 billion item indexes are > expected to work, and we can't just tell everyone to buy 8 GB more RAM > when we update to the next version of our app. =A0If we were a > server-side app, *maybe* we could... > > TX > > --------------------------------------------------------------------- > To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org > For additional commands, e-mail: java-user-help@lucene.apache.org > > --------------------------------------------------------------------- To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org For additional commands, e-mail: java-user-help@lucene.apache.org