Return-Path: Delivered-To: apmail-lucene-java-user-archive@www.apache.org Received: (qmail 22680 invoked from network); 7 Oct 2009 20:03:23 -0000 Received: from hermes.apache.org (HELO mail.apache.org) (140.211.11.3) by minotaur.apache.org with SMTP; 7 Oct 2009 20:03:23 -0000 Received: (qmail 51011 invoked by uid 500); 7 Oct 2009 20:03:21 -0000 Delivered-To: apmail-lucene-java-user-archive@lucene.apache.org Received: (qmail 50934 invoked by uid 500); 7 Oct 2009 20:03:21 -0000 Mailing-List: contact java-user-help@lucene.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: java-user@lucene.apache.org Delivered-To: mailing list java-user@lucene.apache.org Received: (qmail 50924 invoked by uid 99); 7 Oct 2009 20:03:21 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 07 Oct 2009 20:03:21 +0000 X-ASF-Spam-Status: No, hits=-0.0 required=10.0 tests=SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (nike.apache.org: domain of markrmiller@gmail.com designates 72.14.220.153 as permitted sender) Received: from [72.14.220.153] (HELO fg-out-1718.google.com) (72.14.220.153) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 07 Oct 2009 20:03:10 +0000 Received: by fg-out-1718.google.com with SMTP id 16so1588697fgg.5 for ; Wed, 07 Oct 2009 13:02:49 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; h=domainkey-signature:received:received:message-id:date:from :user-agent:mime-version:to:subject:references:in-reply-to :x-enigmail-version:content-type:content-transfer-encoding; bh=YzYuOOS8z49fLboZcrWCE3glaYnYjWJAbWHIuXzdivE=; b=WIH3vqLn9Bsnc1vRd0ChMrnVfCqFvRv3QA6+le+1mq9DCC+AApqTjU7SuW46daghRV 1pjdXuPKQcLU8pWYlA+lqN7EEyQ/tZn3hvJQYsuGh7DgnIa7cxxIMr1kT60S4sGdifmc glVHUjl+6OfZT6rmmHIbQGSMRt0/aqVs2yhCs= DomainKey-Signature: a=rsa-sha1; c=nofws; d=gmail.com; s=gamma; h=message-id:date:from:user-agent:mime-version:to:subject:references :in-reply-to:x-enigmail-version:content-type :content-transfer-encoding; b=sDGd0P2w32qX8EcWbeZmt3zLa/GjPHEy5jTHZr1s5FMmToHPIcKHjUX/SvGxHgZlhN 0+afqppiPwTFo980ao15o2S0JWZ5bTSp2qcKXiLBbg6Nt30+JGrAFMaAH1XBFDQH1UoI C4UeMFyEAV1lpyJzTy8T0UD4MVNgKclowJlFY= Received: by 10.86.231.17 with SMTP id d17mr344526fgh.46.1254945769685; Wed, 07 Oct 2009 13:02:49 -0700 (PDT) Received: from ?192.168.1.108? (ool-44c639d9.dyn.optonline.net [68.198.57.217]) by mx.google.com with ESMTPS id l12sm265601fgb.11.2009.10.07.13.02.47 (version=SSLv3 cipher=RC4-MD5); Wed, 07 Oct 2009 13:02:48 -0700 (PDT) Message-ID: <4ACCF3E7.7010500@gmail.com> Date: Wed, 07 Oct 2009 16:02:47 -0400 From: Mark Miller User-Agent: Thunderbird 2.0.0.23 (X11/20090817) MIME-Version: 1.0 To: java-user@lucene.apache.org Subject: Re: Efficiently reopening remotely-distributed indexes in 2.9? References: <843920a30910011415g4d6befm58798bddfd9888d8@mail.gmail.com> <843920a30910051730o19353216ue2cc35098b90137d@mail.gmail.com> <85d3c3b60910051739x2c42e50yf56e098032b59b1@mail.gmail.com> <843920a30910071251q2e7850b7yd15edff35fd88d87@mail.gmail.com> In-Reply-To: <843920a30910071251q2e7850b7yd15edff35fd88d87@mail.gmail.com> X-Enigmail-Version: 0.96.0 Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: 7bit X-Virus-Checked: Checked by ClamAV on apache.org Solr just copies them into the same directory - Lucene files are write once, so its not much different than what happens locally. Nigel wrote: > Right now we logically re-open an index by making an updated copy of the > index in a new directory (using rsync etc.), opening the new copy, and > closing the old one. We don't use IndexReader.reopen() because the updated > index is in a different directory (as opposed to being updated in-place). > > (Reading about some of the 2.9 changes motivated me to look into actually > using reopen(). And Michael Busch and Mark Miller both pointed out that I > was incorrect in saying that pre-2.9 reopen() wasn't more efficient than > just opening a new index -- I've read through that code now so I have at > least a basic understanding of what's happening there. Anyway, it seems > like reopen() is a Good Thing, so I'd like to use it. (-:) > > So, my real question was whether there is a "recommended" way to update an > index in-place with files copied from a separate indexing server. > > For example, do you simply rsync in the new cfs files, overwrite the > segments.gen and segments_XX files, and call reopen()? Or create an updated > copy in a new directory, then rename new directory to the old name once > you're sure you've copied everything successfully, then call reopen()? What > does Solr do? > > Thanks, > Chris > > On Mon, Oct 5, 2009 at 8:39 PM, Jason Rutherglen >> wrote: >> > > >> I'm not sure I understand the question. You're trying to reopen >> the segments that you're replicated and you're wondering what's >> changed in Lucene? >> >> On Mon, Oct 5, 2009 at 5:30 PM, Nigel wrote: >> >>> Anyone have any ideas here? I imagine a lot of other people will have a >>> similar question when trying to take advantage of the reopen improvements >>> >> in >> >>> 2.9. >>> >>> Thanks, >>> Chris >>> >>> On Thu, Oct 1, 2009 at 5:15 PM, Nigel wrote: >>> >>> >>>> I have a question about the reopen functionality in Lucene 2.9. As I >>>> understand it, since FieldCaches are now per-segment, it can avoid >>>> >> reloading >> >>>> everything when the index is reopened, and instead just load the new >>>> segments. >>>> >>>> For background, like many people we have a distributed architecture >>>> >> where >> >>>> indexes are created on one server and copied to multiple other servers. >>>> >> The >> >>>> way that copying works now is something like the following: >>>> >>>> 1. Let's say the current index is in /indexes/a and is open >>>> 2. An empty directory for the updated index is created, let's say >>>> /indexes/b >>>> 3. Hard links for the files in /indexes/a are created in /indexes/b >>>> 4. We rsync the current index on the server with /indexes/b, thus >>>> copying over new cfs files and deleting hard links to files no longer >>>> >> in use >> >>>> 5. A new IndexReader is opened for /indexes/b and warmed up >>>> 6. The application starts using the new reader instead of the old one >>>> 7. The old IndexReader is closed and /indexes/a is deleted >>>> >>>> I'm simplifying a few steps, but I think this is familiar to many >>>> >> people, >> >>>> and it's my impression that Solr implements something similar. >>>> >>>> The point is, the updated index lives in a new directory in this scheme, >>>> and so we don't actually reopen the existing IndexReader; we open a new >>>> >> one >> >>>> with a different FSDirectory. >>>> >>>> Before Lucene 2.9, I don't think this made any difference, as (I think) >>>> >> the >> >>>> only advantage to calling reopen vs. just creating another IndexReader >>>> >> was >> >>>> having reopen figure out whether the index had actually changed. (And >>>> >> whave >> >>>> a different way to figure that out, so it was a non-issue.) >>>> >>>> With Lucene 2.9, there's now a big difference, namely the per-segment >>>> caching mentioned above. So the question is how to make use of reopen >>>> >> with >> >>>> our distribution scheme. Is there an informal best practice for >>>> >> handling >> >>>> this case? For example, should step #5 above rename /indexes/b to >>>> /indexes/a so the index can be reopened in the same physical location? >>>> >> Or >> >>>> should rsync operate on the existing directory in-place, updating the >>>> segments* files last and relying on the fact that deleted files will not >>>> really be deleted (on Linux, at least) as long as the app is still >>>> >> holding >> >>>> them open? >>>> >>>> I guess the answer may depend on how exactly reopen knows which files >>>> >> are >> >>>> the "same" (e.g. does it look at filenames, or file descriptors, etc.). >>>> >>>> Thanks, >>>> Chris >>>> >>>> >> --------------------------------------------------------------------- >> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org >> For additional commands, e-mail: java-user-help@lucene.apache.org >> >> >> > > -- - Mark http://www.lucidimagination.com --------------------------------------------------------------------- To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org For additional commands, e-mail: java-user-help@lucene.apache.org