Return-Path: X-Original-To: apmail-lucene-dev-archive@www.apache.org Delivered-To: apmail-lucene-dev-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 2C1848440 for ; Wed, 24 Aug 2011 22:53:58 +0000 (UTC) Received: (qmail 10218 invoked by uid 500); 24 Aug 2011 22:53:56 -0000 Delivered-To: apmail-lucene-dev-archive@lucene.apache.org Received: (qmail 10165 invoked by uid 500); 24 Aug 2011 22:53:56 -0000 Mailing-List: contact dev-help@lucene.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: dev@lucene.apache.org Delivered-To: mailing list dev@lucene.apache.org Received: (qmail 10152 invoked by uid 99); 24 Aug 2011 22:53:56 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 24 Aug 2011 22:53:56 +0000 X-ASF-Spam-Status: No, hits=-2000.9 required=5.0 tests=ALL_TRUSTED,RP_MATCHES_RCVD X-Spam-Check-By: apache.org Received: from [140.211.11.116] (HELO hel.zones.apache.org) (140.211.11.116) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 24 Aug 2011 22:53:52 +0000 Received: from hel.zones.apache.org (hel.zones.apache.org [140.211.11.116]) by hel.zones.apache.org (Postfix) with ESMTP id 9C198CE7CD for ; Wed, 24 Aug 2011 22:53:31 +0000 (UTC) Date: Wed, 24 Aug 2011 22:53:31 +0000 (UTC) From: "Jason Rutherglen (JIRA)" To: dev@lucene.apache.org Message-ID: <1793101719.11414.1314226411636.JavaMail.tomcat@hel.zones.apache.org> Subject: [jira] [Updated] (LUCENE-2312) Search on IndexWriter's RAM Buffer MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: quoted-printable X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 X-Virus-Checked: Checked by ClamAV on apache.org [ https://issues.apache.org/jira/browse/LUCENE-2312?page=3Dcom.atlassi= an.jira.plugin.system.issuetabpanels:all-tabpanel ] Jason Rutherglen updated LUCENE-2312: ------------------------------------- Attachment: LUCENE-2312.patch This is a revised version of the LUCENE-2312 patch. The following are vari= ous and miscelaneous notes pertaining to the patch and where it needs to go= to be committed. =20 Feel free to review the approach taken, eg, we're getting around non-realti= me structures through the usage of array copies (of which the arrays can be= pooled at some point). * A copy of FreqProxPostingsArray.termFreqs is made per new reader. That a= rray can be pooled. This is no different than the deleted docs BitVector w= hich is created anew per-segment for any deletes that have occurred. * FreqProxPostingsArray freqUptosRT, proxUptosRT, lastDocIDsRT, lastDocFreq= sRT is copied into, per new reader (as opposed to an entirely new array ins= tantiated for each new reader), this is a slight optimization in object all= ocation. * For deleting, a DWPT is clothed in an abstract class that exposes the nec= essary methods from segment info, so that deletes may be applied to the RT = RAM reader. The deleting is still performed in BufferedDeletesStream. Bit= Vectors are cloned as well. There is room for improvement, eg, pooling the= BV byte[]=E2=80=99s. * Documents (FieldsWriter) and term vectors are flushed on each get reader = call, so that reading will be able to load the data. We will need to test = if this is performant. We are not creating new files so this way of doing = things may well be efficient. * We need to measure the cost of the native system array copy. It could ve= ry well be quite fast / enough. * Full posting functionality should be working including payloads * Field caching may be implemented as a new field cache that is growable an= d enables lock=E2=80=99d replacement of the underlying array * String to string ordinal comparison caches needs to be figured out. The = RAM readers cannot maintain a sorted terms index the way statically sized s= egments do * When a field cache value is first being created, it needs to obtain the i= ndexing lock on the DWPT. Otherwise documents will continue to be indexed,= new values created, while the array will miss the new values. The downsid= e is that while the array is initially being created, indexing will stop. = This can probably be solved at some point by only locking during the creati= on of the field cache array, and then notifying the DWPT of the new array. = New values would then accumulate into the array from the point of the max = doc of the reader the values creator is working from. * The terms dictionary is a ConcurrentSkipListMap. We can periodically con= vert it into a sorted [by term] int[], that has an FST on top. Have fun reviewing! :) > Search on IndexWriter's RAM Buffer > ---------------------------------- > > Key: LUCENE-2312 > URL: https://issues.apache.org/jira/browse/LUCENE-2312 > Project: Lucene - Java > Issue Type: New Feature > Components: core/search > Affects Versions: Realtime Branch > Reporter: Jason Rutherglen > Assignee: Michael Busch > Fix For: Realtime Branch > > Attachments: LUCENE-2312-FC.patch, LUCENE-2312.patch, LUCENE-2312= .patch > > > In order to offer user's near realtime search, without incurring > an indexing performance penalty, we can implement search on > IndexWriter's RAM buffer. This is the buffer that is filled in > RAM as documents are indexed. Currently the RAM buffer is > flushed to the underlying directory (usually disk) before being > made searchable.=20 > Todays Lucene based NRT systems must incur the cost of merging > segments, which can slow indexing.=20 > Michael Busch has good suggestions regarding how to handle deletes using = max doc ids. =20 > https://issues.apache.org/jira/browse/LUCENE-2293?focusedCommentId=3D1284= 1923&page=3Dcom.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpa= nel#action_12841923 > The area that isn't fully fleshed out is the terms dictionary, > which needs to be sorted prior to queries executing. Currently > IW implements a specialized hash table. Michael B has a > suggestion here:=20 > https://issues.apache.org/jira/browse/LUCENE-2293?focusedCommentId=3D1284= 1915&page=3Dcom.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpa= nel#action_12841915 -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira --------------------------------------------------------------------- To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org For additional commands, e-mail: dev-help@lucene.apache.org