Return-Path: X-Original-To: archive-asf-public@eu.ponee.io Delivered-To: archive-asf-public@eu.ponee.io Received: from cust-asf.ponee.io (cust-asf.ponee.io [163.172.22.183]) by mx-eu-01.ponee.io (Postfix) with ESMTP id 774AA180630 for ; Tue, 2 Jan 2018 07:33:43 +0100 (CET) Received: by cust-asf.ponee.io (Postfix) id 6715E160C26; Tue, 2 Jan 2018 06:33:43 +0000 (UTC) Delivered-To: archive-asf-public@cust-asf.ponee.io Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by cust-asf.ponee.io (Postfix) with SMTP id AC8AF160C1B for ; Tue, 2 Jan 2018 07:33:42 +0100 (CET) Received: (qmail 52212 invoked by uid 500); 2 Jan 2018 06:33:33 -0000 Mailing-List: contact java-user-help@lucene.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: java-user@lucene.apache.org Delivered-To: mailing list java-user@lucene.apache.org Received: (qmail 52200 invoked by uid 99); 2 Jan 2018 06:33:33 -0000 Received: from pnap-us-west-generic-nat.apache.org (HELO spamd2-us-west.apache.org) (209.188.14.142) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 02 Jan 2018 06:33:33 +0000 Received: from localhost (localhost [127.0.0.1]) by spamd2-us-west.apache.org (ASF Mail Server at spamd2-us-west.apache.org) with ESMTP id 717461A06AB for ; Tue, 2 Jan 2018 06:33:32 +0000 (UTC) X-Virus-Scanned: Debian amavisd-new at spamd2-us-west.apache.org X-Spam-Flag: NO X-Spam-Score: -0.121 X-Spam-Level: X-Spam-Status: No, score=-0.121 tagged_above=-999 required=6.31 tests=[DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, RCVD_IN_DNSWL_NONE=-0.0001, RCVD_IN_MSPIKE_H3=-0.01, RCVD_IN_MSPIKE_WL=-0.01, SPF_PASS=-0.001] autolearn=disabled Authentication-Results: spamd2-us-west.apache.org (amavisd-new); dkim=pass (2048-bit key) header.d=gmail.com Received: from mx1-lw-us.apache.org ([10.40.0.8]) by localhost (spamd2-us-west.apache.org [10.40.0.9]) (amavisd-new, port 10024) with ESMTP id Qe_UCXm4mEyc for ; Tue, 2 Jan 2018 06:33:31 +0000 (UTC) Received: from mail-vk0-f51.google.com (mail-vk0-f51.google.com [209.85.213.51]) by mx1-lw-us.apache.org (ASF Mail Server at mx1-lw-us.apache.org) with ESMTPS id 5C3225F27E for ; Tue, 2 Jan 2018 06:33:31 +0000 (UTC) Received: by mail-vk0-f51.google.com with SMTP id w75so28567135vkd.7 for ; Mon, 01 Jan 2018 22:33:31 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=mime-version:from:date:message-id:subject:to; bh=WUHq9YPIu1F+h2R/seoslLHcmpwAF6fV+aTj22g4tBU=; b=t+D7ngzsCCCSkJUWicUV+grifsAhvow6l7BAHT3l+PksdQOpFmqfEAC06766Fp+ctI gNj/sBy+Faz6kv/Os/oArYESk3TnzD3vYruLGPkg2XkhbfaJDo6OPj55p4ESdypRLuP1 V0CcZZXttUfRcO3POIFG7tuiW8BQtBeWYkw8yEzmuSXmCrTtk2xsRX2HI/eu5gByjhm6 7yPGlLgIjfR4SjIOmq0GW0o33ceaeol3vja1enYQX8le0f+m/8FpB9TWztePNkbbjwd2 b9Wc/FjLYaW+MTPFLh0At78BgvryZc+cjicLlNIyREA09rvlA8q8/RWuKTMZEGhOE8iR eqZQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:from:date:message-id:subject:to; bh=WUHq9YPIu1F+h2R/seoslLHcmpwAF6fV+aTj22g4tBU=; b=sJpQDY2Rk+y0mVeb2xNavdJAmJlnh0P3pokNZ+if4P6oumf1oc9c3yZovNxZzD1B4E hYmQ6gvGM9uQVQKgXXSCDGb4ovBDWBwuX3xjgz525pIoFNs4BDwbCaan12+1JlovJOtj vLiNlttzj62Aaqmy7PEa7C/DSxnX1xk57kJt8JxZfepr50pgScNLgMSbW/BiwIYwDlwg 1KCkRw6F/gbHk/6giT6g+1kPON1nOQ78xV4w7fgpKhyV4ctxdPEtD+Jr7l0v5O9H+/2w ggWNsPKxa3flCka3Rqj95HmbgAAdlIHgEcJF6dlwitUOMP/5TpOWPrVbJiH+28oefG9J nhtw== X-Gm-Message-State: AKGB3mIigqgDY+C6ZtHAdE/wxWFEyDpvfa5JERNnmqv7q4qsDQA/q6R6 G1EsXLCa7WUq+46yDRJfVRLke3j+AO+j2w21InK44cHi X-Google-Smtp-Source: ACJfBotzkd6JPRp9BOfG86MoCqJNQ51ZvLkfQAkLr5Lu5ADvBKozy5S1d2vVpRT7L6BRbhLXwj1tRC43yBeyKZqXV38= X-Received: by 10.31.128.147 with SMTP id b141mr9901282vkd.158.1514874805126; Mon, 01 Jan 2018 22:33:25 -0800 (PST) MIME-Version: 1.0 Received: by 10.31.49.4 with HTTP; Mon, 1 Jan 2018 22:33:24 -0800 (PST) From: Chetan Mehrotra Date: Tue, 2 Jan 2018 12:03:24 +0530 Message-ID: Subject: Comparing two indexes for equality - Finding non stored fieldNames per document To: java-user@lucene.apache.org Content-Type: text/plain; charset="UTF-8" archived-at: Tue, 02 Jan 2018 06:33:43 -0000 Hi, We use Lucene for indexing in Jackrabbit Oak [2]. Recently we implemented a new indexing approach [1] which traverses the data to be indexed in a different way compared to the traversal approach we have been using so far. The new approach is faster and produces index with same number of documents. Some notes around index ------------------------------------ - The lucene index only has one stored field for ':path' of node in repository. - Content being indexed is unstructured so presence of fields may differ - Lucene version 4.7.x - Both approach would index a given node in same way. Its just the traversal order which differ Now we need to compare the index which is produced by earlier approach with newer one to determine if the generated index is "same". As indexed data is traversed in different order the documentId would differ between two indexes and hence the final size differs to some extent. So I would like to implement a logic which can logically compare 2 indexes. One way could be to find if a document with given path in 2 indexes has same fieldNames associated with it. However as fields are not stored its not possible to determine the fieldNames per document. Questions -------------- 1. Any way to map field names (not the values) associated with a given document 2. Any other way to logically compare the index data between 2 indexes which are generated using different approach but index same content. Chetan Mehrotra [1] https://issues.apache.org/jira/browse/OAK-6353 [2] http://jackrabbit.apache.org/oak/docs/query/lucene.html --------------------------------------------------------------------- To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org For additional commands, e-mail: java-user-help@lucene.apache.org