Return-Path: Delivered-To: apmail-lucene-dev-archive@www.apache.org Received: (qmail 65235 invoked from network); 10 May 2010 09:03:17 -0000 Received: from unknown (HELO mail.apache.org) (140.211.11.3) by 140.211.11.9 with SMTP; 10 May 2010 09:03:17 -0000 Received: (qmail 46099 invoked by uid 500); 10 May 2010 09:03:16 -0000 Delivered-To: apmail-lucene-dev-archive@lucene.apache.org Received: (qmail 45946 invoked by uid 500); 10 May 2010 09:03:15 -0000 Mailing-List: contact dev-help@lucene.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: dev@lucene.apache.org Delivered-To: mailing list dev@lucene.apache.org Received: (qmail 45939 invoked by uid 99); 10 May 2010 09:03:15 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 10 May 2010 09:03:15 +0000 X-ASF-Spam-Status: No, hits=-2000.0 required=10.0 tests=ALL_TRUSTED X-Spam-Check-By: apache.org Received: from [140.211.11.22] (HELO thor.apache.org) (140.211.11.22) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 10 May 2010 09:03:13 +0000 Received: from thor (localhost [127.0.0.1]) by thor.apache.org (8.13.8+Sun/8.13.8) with ESMTP id o4A92pOd019350 for ; Mon, 10 May 2010 09:02:51 GMT Message-ID: <9775518.62181273482171135.JavaMail.jira@thor> Date: Mon, 10 May 2010 05:02:51 -0400 (EDT) From: "Michael McCandless (JIRA)" To: dev@lucene.apache.org Subject: [jira] Commented: (LUCENE-1585) Allow to control how payloads are merged MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 X-Virus-Checked: Checked by ClamAV on apache.org [ https://issues.apache.org/jira/browse/LUCENE-1585?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12865707#action_12865707 ] Michael McCandless commented on LUCENE-1585: -------------------------------------------- Make sure you fix the whitespace -- some indents are now tabs or 8 spaces, but should be 2. bq. I believe the common use will be few PPs that handle few terms. Or, maybe even more common will be per-Directory switching and ignoring the Term? EG if I changed my payload format (for all terms) at some point... Though we don't have great support for versioning of payloads during searching... eg PayloadTermQuery doesn't make it simple to figure out which Dir you are now searching... My only concern w/ this API is that it has a built-in unnecessary global perf/synchronization cost, by design: I'll have to use a sync'd map or a thread local to implement that method. Even if my app ignores the Term, I'll need to sync. This sync is global -- all merges running concurrently, per Term, will share a single global lock. But it's only the Dir lookup that requires sync. So if, instead, the Dir lookup and the Term lookup were separate method calls, I'd only need sync on the Dir lookup (called very rarely often -- once per segment on the start of the merge). The Term lookup, called far far more often, is guaranteed to be thread private so it'd need no sync. I guess in practice the sync cost may not be such a big deal? So maybe we could commit w/ this approach (it is experimental), even with this limitation? It's just that I don't like adding APIs which make our concurrency worse... we are supposed to be moving in the other direction :) > Allow to control how payloads are merged > ---------------------------------------- > > Key: LUCENE-1585 > URL: https://issues.apache.org/jira/browse/LUCENE-1585 > Project: Lucene - Java > Issue Type: New Feature > Components: Index > Reporter: Michael Busch > Assignee: Shai Erera > Priority: Minor > Fix For: 3.1, 4.0 > > Attachments: LUCENE-1585_3x.patch, LUCENE-1585_3x.patch, LUCENE-1585_3x.patch, LUCENE-1585_trunk.patch > > > Lucene handles backwards-compatibility of its data structures by > converting them from the old into the new formats during segment > merging. > Payloads are simply byte arrays in which users can store arbitrary > data. Applications that use payloads might want to convert the format > of their payloads in a similar fashion. Otherwise it's not easily > possible to ever change the encoding of a payload without reindexing. > So I propose to introduce a PayloadMerger class that the SegmentMerger > invokes to merge the payloads from multiple segments. Users can then > implement their own PayloadMerger to convert payloads from an old into > a new format. > In the future we need this kind of flexibility also for column-stride > fields (LUCENE-1231) and flexible indexing codecs. > In addition to that it would be nice if users could store version > information in the segments file. E.g. they could store "in segment _2 > the term a:b uses payloads of format x.y". -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. --------------------------------------------------------------------- To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org For additional commands, e-mail: dev-help@lucene.apache.org