Return-Path: Delivered-To: apmail-lucene-java-dev-archive@www.apache.org Received: (qmail 52738 invoked from network); 24 Dec 2008 18:24:23 -0000 Received: from hermes.apache.org (HELO mail.apache.org) (140.211.11.2) by minotaur.apache.org with SMTP; 24 Dec 2008 18:24:23 -0000 Received: (qmail 89716 invoked by uid 500); 24 Dec 2008 18:24:17 -0000 Delivered-To: apmail-lucene-java-dev-archive@lucene.apache.org Received: (qmail 89659 invoked by uid 500); 24 Dec 2008 18:24:17 -0000 Mailing-List: contact java-dev-help@lucene.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: java-dev@lucene.apache.org Delivered-To: mailing list java-dev@lucene.apache.org Received: (qmail 89650 invoked by uid 99); 24 Dec 2008 18:24:17 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 24 Dec 2008 10:24:17 -0800 X-ASF-Spam-Status: No, hits=2.2 required=10.0 tests=HTML_MESSAGE,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (nike.apache.org: domain of jason.rutherglen@gmail.com designates 74.125.44.30 as permitted sender) Received: from [74.125.44.30] (HELO yx-out-2324.google.com) (74.125.44.30) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 24 Dec 2008 18:24:09 +0000 Received: by yx-out-2324.google.com with SMTP id 3so973893yxj.5 for ; Wed, 24 Dec 2008 10:23:48 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; h=domainkey-signature:received:received:message-id:date:from:to :subject:in-reply-to:mime-version:content-type:references; bh=9eJOtHxo6ATVFdP5Q8Dj8y+B/7KUvaqGNrCXjNuqFls=; b=HcfJ09eFm9v1Xw2MsO3yyeXbjWRV2/lEMkmz08pmcgHztbSCaPLWEnDHcHhQbDfMJD zkWHstrt/3ssH5fYOtnQgMYvm6y7aEQSeZFDlQiTHu/VZUJNuG8nMKLh07eUK8hzuIcT nLLsAgmBC+7mht81lG/yS0CuxMqaj/wcFLzAI= DomainKey-Signature: a=rsa-sha1; c=nofws; d=gmail.com; s=gamma; h=message-id:date:from:to:subject:in-reply-to:mime-version :content-type:references; b=s5gGB25Nzs02DqndukxDIFwexnTedlF/hhAgnoeXe4bdZLjloFEU0v1mqJq7LMNVrf GbZgOYqQs867P1fHlsH9tZqfYl+ZA1l0wzMjEf6vRn4D+9WjLNL9h6Z6g6l9750i1aav 4kKZWYhTl2CALd3usUFL2/Jolsx2CN2TfrfMs= Received: by 10.151.108.15 with SMTP id k15mr6969640ybm.54.1230143028256; Wed, 24 Dec 2008 10:23:48 -0800 (PST) Received: by 10.151.135.6 with HTTP; Wed, 24 Dec 2008 10:23:48 -0800 (PST) Message-ID: <85d3c3b60812241023u72b01463r38bb9b3d1ffd8ac@mail.gmail.com> Date: Wed, 24 Dec 2008 10:23:48 -0800 From: "Jason Rutherglen" To: java-dev@lucene.apache.org Subject: Re: Realtime Search In-Reply-To: <49527708.4090109@apache.org> MIME-Version: 1.0 Content-Type: multipart/alternative; boundary="----=_Part_117151_2806189.1230143028250" References: <85d3c3b60812231751k60f00283r95b8d65b2b7adf45@mail.gmail.com> <49527708.4090109@apache.org> X-Virus-Checked: Checked by ClamAV on apache.org ------=_Part_117151_2806189.1230143028250 Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: 7bit Content-Disposition: inline > Also, what are the requirements? Must a document be visible to search within 10ms of being added? 0-5ms. Otherwise it's not realtime, it's batch indexing. The realtime system can support small batches by encoding them into RAMDirectories if they are of sufficient size. > Or must it be visible to search from the time that the call to add it returns? Most people probably expect the update latency offered by SQL databases. > As a baseline, how fast is it to simply use RAMDirectory? It depends on how fast searches over the realtime index need to be. The detriment to speed occurs with having many small segments that are continuously decoded (terms, postings, etc). The advantage of MemoryIndex and InstantiatedIndex is an actual increase in search speed compared with RAMDirectory (see the Performance Notes at http://hudson.zones.apache.org/hudson/job/Lucene-trunk/javadoc//org/apache/lucene/index/memory/MemoryIndex.htmland )and no need to continuously decode segments that are short lived. Anecdotal tests indicated the merging overhead of using RAMDirectory as compared with MI or II is significant enough to make it only useful for doing batches in the 1000s which does not seem to be what people expect from realtime search. On Wed, Dec 24, 2008 at 9:53 AM, Doug Cutting wrote: > Jason Rutherglen wrote: > >> 2) Implement realtime search by incrementally creating and merging readers >> in memory. The system would use MemoryIndex or InstantiatedIndex to quickly >> (more quickly than RAMDirectory) create indexes from added documents. >> > > As a baseline, how fast is it to simply use RAMDirectory? If one, e.g., > flushes changes every 10ms or so, and has a background thread that uses > IndexReader.reopen() to keep a fresh version for reading? > > Also, what are the requirements? Must a document be visible to search > within 10ms of being added? Or must it be visible to search from the time > that the call to add it returns? In the latter case one might still use an > approach like the above. Writing a small new segment to a RAMDirectory and > then, with no merging, calling IndexReader.reopen(), should be quite fast. > All merging could be done in the background, as should post-merge reopens() > that involve large segments. > > In short, I wonder if new reader and writer implementations are in fact > required or whether, perhaps with a few optimizations, the existing > implementations might meet this need. > > Doug > > --------------------------------------------------------------------- > > To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org > For additional commands, e-mail: java-dev-help@lucene.apache.org > > ------=_Part_117151_2806189.1230143028250 Content-Type: text/html; charset=ISO-8859-1 Content-Transfer-Encoding: 7bit Content-Disposition: inline > Also, what are the requirements?  Must a document be visible to search within 10ms of being added?

0-5ms.  Otherwise it's not realtime, it's batch indexing.  The realtime system can support small batches by encoding them into RAMDirectories if they are of sufficient size.

> Or must it be visible to search from the time that the call to add it returns?

Most people probably expect the update latency offered by SQL databases.

> As a baseline, how fast is it to simply use RAMDirectory?

It depends on how fast searches over the realtime index need to be.  The detriment to speed occurs with having many small segments that are continuously decoded (terms, postings, etc).  The advantage of MemoryIndex and InstantiatedIndex is an actual increase in search speed compared with RAMDirectory (see the Performance Notes at http://hudson.zones.apache.org/hudson/job/Lucene-trunk/javadoc//org/apache/lucene/index/memory/MemoryIndex.html and )and no need to continuously decode segments that are short lived. 

Anecdotal tests indicated the merging overhead of using RAMDirectory as compared with MI or II is significant enough to make it only useful for doing batches in the 1000s which does not seem to be what people expect from realtime search. 

On Wed, Dec 24, 2008 at 9:53 AM, Doug Cutting <cutting@apache.org> wrote:
Jason Rutherglen wrote:
2) Implement realtime search by incrementally creating and merging readers in memory.  The system would use MemoryIndex or InstantiatedIndex to quickly (more quickly than RAMDirectory) create indexes from added documents.

As a baseline, how fast is it to simply use RAMDirectory?  If one, e.g., flushes changes every 10ms or so, and has a background thread that uses IndexReader.reopen() to keep a fresh version for reading?

Also, what are the requirements?  Must a document be visible to search within 10ms of being added?  Or must it be visible to search from the time that the call to add it returns?  In the latter case one might still use an approach like the above.  Writing a small new segment to a RAMDirectory and then, with no merging, calling IndexReader.reopen(), should be quite fast.  All merging could be done in the background, as should post-merge reopens() that involve large segments.

In short, I wonder if new reader and writer implementations are in fact required or whether, perhaps with a few optimizations, the existing implementations might meet this need.

Doug

---------------------------------------------------------------------

To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org


------=_Part_117151_2806189.1230143028250--