lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Dawid Weiss <dawid.we...@gmail.com>
Subject Re: Comparing two indexes for equality - Finding non stored fieldNames per document
Date Tue, 02 Jan 2018 07:33:58 GMT
How about the quickest solution: dump the content of both indexes to a
document-per-line text
file, sort, diff?

Even if your indexes are large, if you have large spare disk, this
will be super fast.

Dawid

On Tue, Jan 2, 2018 at 7:33 AM, Chetan Mehrotra
<chetan.mehrotra@gmail.com> wrote:
> Hi,
>
> We use Lucene for indexing in Jackrabbit Oak [2]. Recently we
> implemented a new indexing approach [1] which traverses the data to be
> indexed in a different way compared to the traversal approach we have
> been using so far. The new approach is faster and produces index with
> same number of documents.
>
> Some notes around index
> ------------------------------------
>
> - The lucene index only has one stored field for ':path' of node in repository.
> - Content being indexed is unstructured so presence of fields may differ
> - Lucene version 4.7.x
> - Both approach would index a given node in same way. Its just the
> traversal order which differ
>
> Now we need to compare the index which is produced by earlier approach
> with newer one to determine if the generated index is "same". As
> indexed data is traversed in different order the documentId would
> differ between two indexes and hence the final size differs to some
> extent.
>
> So I would like to implement a logic which can logically compare 2
> indexes. One way could be to find if a document with given path in 2
> indexes has same fieldNames associated with it. However as fields are
> not stored its not possible to determine the fieldNames per document.
>
> Questions
> --------------
>
> 1. Any way to map field names (not the values) associated with a given document
> 2. Any other way to logically compare the index data between 2 indexes
> which are generated using different approach but index same content.
>
> Chetan Mehrotra
> [1] https://issues.apache.org/jira/browse/OAK-6353
> [2] http://jackrabbit.apache.org/oak/docs/query/lucene.html
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Mime
View raw message