Mailing-List: contact java-user-help@lucene.apache.org; run by ezmlm
Precedence: bulk
Reply-To: java-user@lucene.apache.org
Received-SPF: pass (athena.apache.org: domain of ian.lea@gmail.com designates
 209.85.210.176 as permitted sender)
DomainKey-Signature: a=rsa-sha1; c=nofws;
        d=gmail.com; s=gamma;
        h=mime-version:in-reply-to:references:from:date:message-id:subject:to
         :content-type:content-transfer-encoding;
        b=qIhXDJ3KBFZTtLnNBhug4MKcWZKbmmR8O/PK3pvpyjUNMLwlF30XiuNWeWnwAdR1Pi
         IBc+qQ4PWrX/jrRcM0mwV6U2ABqDpcCm4+mF+yFCqRzxqruklPey6PbB8Vb8g1U+puDq
         YhgF5d0H6BEWMvCFgJdxjvVgeynRIFh6LhA0w=
MIME-Version: 1.0
In-Reply-To: <AANLkTinL2d9VqpPb2f2ANVPgx8cL1wmGK-YvXDByNNwf@mail.gmail.com>
References: <AANLkTikMY=eVoCCAoVZoQdgj-b4EuKvxXxoh1cnHY5=q@mail.gmail.com>
 <AANLkTi=zJxmB5Gt-W+FrN+QZPkhMiPYMMMy5ij=MmLAh@mail.gmail.com>
 <AANLkTi=xBrJcLvmA6QjWQgdjRXya0nZBV9GmpVwePtPJ@mail.gmail.com>
 <AANLkTi=M7z1Ztzu_L1uPocuFiGj03tKrh-QoVaMjURFb@mail.gmail.com>
 <AANLkTin-8NTb7zJgtAUwNTMtdfkA_ORwovBBGPxdwzk+@mail.gmail.com>
 <AANLkTinL2d9VqpPb2f2ANVPgx8cL1wmGK-YvXDByNNwf@mail.gmail.com>
From: Ian Lea <ian.lea@gmail.com>
Date: Thu, 31 Mar 2011 10:32:02 +0100
Message-ID: <BANLkTimJ=KsvtnN4jMWEDUx3BAmzXGNisg@mail.gmail.com>
Subject: Re: a faster way to addDocument and get the ID just added?
To: java-user@lucene.apache.org
Content-Type: text/plain; charset=ISO-8859-1
Content-Transfer-Encoding: quoted-printable

>> Subject: a faster way to addDocument and get the ID just added?

Might it be possible to come up with a version of
IndexWriter.addDocument() that returns the docid rather than void?
Answering that question is way out of my league, but it would
presumably be quick.


--
Ian.


On Thu, Mar 31, 2011 at 6:34 AM, Trejkaz <trejkaz@trypticon.org> wrote:
> On Wed, Mar 30, 2011 at 8:21 PM, Simon Willnauer
> <simon.willnauer@googlemail.com> wrote:
>> Before trunk (and I think
>> its in 3.1 also) merge only merged continuous segments so the actual
>> per-segment ID might change but the global document ID doesn't if you
>> only add documents. But this should not be considered a feature. In
>> upcoming version this does not work anymore since merges can now be
>> non-continuous.
>
> This myth was busted some time ago:
> https://issues.apache.org/jira/browse/LUCENE-2506?#comment-12935973
>
> Summary: selecting segments to merge is decided by MergePolicy, and a
> MergePolicy which does not upset ordering will be remain in existence.
>
>> Anyway, I strongly discourage to rely on lucene document IDs you
>> should not do this at all. Can't you use your own ID mechanism?
>
> This has pretty much already been covered in my reply to the previous
> person that suggested that solution, not to mention in the initial
> email which started the thread.
>
> Summary: the overheads are simply not acceptable.
>
> So far the only remotely helpful suggestion I have heard anywhere is
> to keep two gigantic int[] arrays in memory, mapping the IDs in each
> direction. =A0This would work if we had an infinite amount of memory to
> play with, but unfortunately we don't. =A01 billion item indexes are
> expected to work, and we can't just tell everyone to buy 8 GB more RAM
> when we update to the next version of our app. =A0If we were a
> server-side app, *maybe* we could...
>
> TX
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>
>

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org