Return-Path: X-Original-To: apmail-lucene-java-user-archive@www.apache.org Delivered-To: apmail-lucene-java-user-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 26AA611491 for ; Wed, 2 Jul 2014 14:00:14 +0000 (UTC) Received: (qmail 48617 invoked by uid 500); 2 Jul 2014 14:00:10 -0000 Delivered-To: apmail-lucene-java-user-archive@lucene.apache.org Received: (qmail 48551 invoked by uid 500); 2 Jul 2014 14:00:10 -0000 Mailing-List: contact java-user-help@lucene.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: java-user@lucene.apache.org Delivered-To: mailing list java-user@lucene.apache.org Received: (qmail 48534 invoked by uid 99); 2 Jul 2014 14:00:10 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 02 Jul 2014 14:00:10 +0000 X-ASF-Spam-Status: No, hits=1.5 required=5.0 tests=HTML_MESSAGE,RCVD_IN_DNSWL_LOW,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (nike.apache.org: domain of serera@gmail.com designates 74.125.82.42 as permitted sender) Received: from [74.125.82.42] (HELO mail-wg0-f42.google.com) (74.125.82.42) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 02 Jul 2014 14:00:07 +0000 Received: by mail-wg0-f42.google.com with SMTP id z12so11227151wgg.25 for ; Wed, 02 Jul 2014 06:59:43 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:in-reply-to:references:from:date:message-id:subject:to :content-type; bh=mAxNrFl/WPKZCYYgUQQ/F5wwGuMrxiqiYENMegIv058=; b=rspMkCCsu/24zfjdZaEpZljLiWqo9NBMq0P7YgSf6XUyRzOBymT9wkKmlq0yO85zmr ORmYU9FzI/AdUicH97SDjuvTkQmPqldVh13QA38gX09B4rRrIWQf2reTh0xkgDXVtq+7 3vxo8TCdKraRZfUvEaJoN84jJHukL3ktedtji92BT+bwiyFZI3xmRuK/SrBQF+LWYHr9 IMdY0LYvxSJ+/kwNaqtb0okSoSNgcYuy3BA/BEQTsHuevRBPjfv/kZgwKLi0UtZG+7AT Tlj16ma1hc5ORh1mlhLmj2lTAo3D873dq0RyOq3ENSzWgm2LXvEs4Cyvx68PBbB1vQO7 qItQ== X-Received: by 10.194.71.81 with SMTP id s17mr3432931wju.18.1404309583680; Wed, 02 Jul 2014 06:59:43 -0700 (PDT) MIME-Version: 1.0 Received: by 10.194.8.97 with HTTP; Wed, 2 Jul 2014 06:59:23 -0700 (PDT) In-Reply-To: <1404238663.13080.YahooMailNeo@web121305.mail.ne1.yahoo.com> References: <1404217931.99569.YahooMailNeo@web121302.mail.ne1.yahoo.com> <1404238663.13080.YahooMailNeo@web121305.mail.ne1.yahoo.com> From: Shai Erera Date: Wed, 2 Jul 2014 16:59:23 +0300 Message-ID: Subject: Re: Incremental Field Updates To: "java-user@lucene.apache.org" , Sandeep Khanzode Content-Type: multipart/alternative; boundary=047d7bfcf894e3dd3304fd364df6 X-Virus-Checked: Checked by ClamAV on apache.org --047d7bfcf894e3dd3304fd364df6 Content-Type: text/plain; charset=UTF-8 Using BinaryDocValues is not recommended for all scenarios. It is a "catchall" alternative to the other DocValues types. I would not use it unless it makes sense for your application, even if it means that you need to re-index a document in order to update a single field. DocValues are not good for "search" - by search I assume you mean take a query such as "apache AND lucene" and find all documents which contain both terms under the same field. They are good for sorting and faceting though. So I guess the answer to your question is "it depends" (it always is!) - I would use DocValues for sorting and faceting, but not for regular search queries. And I would use BinaryDocValues only when the other DocValues types don't match. Also, note that the current field-level update of DocValues is not always better than re-indexing the document, you can read here for more details: http://shaierera.blogspot.com/2014/04/benchmarking-updatable-docvalues.html Shai On Tue, Jul 1, 2014 at 9:17 PM, Sandeep Khanzode < sandeep_khanzode@yahoo.com.invalid> wrote: > Hi Shai, > > So one follow-up question. > > Assume that my use case is to have approx. ~50M documents indexed with > each document having about ~10-15 indexed but not stored fields. These > fields will never change, but there are another ~5-6 fields that will > change and will continue to change after the index is written. These ~5-6 > fields may also be multivalued. The size of this index turns out to be > ~120GB. > > In this case, I would like to sort or facet or search on these ~5-6 > fields. Which approach do you suggest? Should I use BinaryDocValues and > update using IW or use either a ParallelReader/Join query. > > ----------------------- > Thanks n Regards, > Sandeep Ramesh Khanzode > > > On Tuesday, July 1, 2014 9:53 PM, Shai Erera wrote: > > > > Except that Lucene now offers efficient numeric and binary DocValues > updates. See IndexWriter.updateNumeric/Binary... > > On Jul 1, 2014 5:51 PM, "Erick Erickson" wrote: > > > This JIRA is "complicated", don't really expect it in 4.9 as it's > > been hanging around for quite a while. Everyone would like this, > > but it's not easy. > > > > Atomic updates will work, but you have to stored="true" for all > > source fields. Under the covers this actually reads the document > > out of the stored fields, deletes the old one and adds it > > over again. > > > > FWIW, > > Erick > > > > On Tue, Jul 1, 2014 at 5:32 AM, Sandeep Khanzode > > wrote: > > > Hi, > > > > > > I wanted to know of the best approach to follow if a few fields in my > > indexed documents are changing at run time (after index and before or > > during search), but a majority of them are created at index time. > > > > > > I could see the JIRA given below but it is scheduled for Lucene 4.9, I > > believe. > > > > > > There are a few other approaches, like maintaining a separate index for > > changing fields and use either a parallelreader or use a Join. > > > > > > Can everyone share their experience for this scenario on how it is > > handled in your systems? Thanks, > > > > > > [LUCENE-4258] Incremental Field Updates through Stacked Segments - ASF > > JIRA > > > > > > > > > [LUCENE-4258] Incremental Field Updates through Stacked Segments - ASF > > JIRA > > > Shai and I would like to start working on the proposal to Incremental > > Field Updates outlined here ( > http://markmail.org/message/zhrdxxpfk6qvdaex > > ). > > > View on issues.apache.org Preview by Yahoo > > > > > > > > > ----------------------- > > > Thanks n Regards, > > > Sandeep Ramesh Khanzode > > > > --------------------------------------------------------------------- > > To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org > > For additional commands, e-mail: java-user-help@lucene.apache.org > > > > --047d7bfcf894e3dd3304fd364df6--