lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Chetan Mehrotra <chetan.mehro...@gmail.com>
Subject Re: Comparing two indexes for equality - Finding non stored fieldNames per document
Date Fri, 05 Jan 2018 09:01:50 GMT
Based on suggestion here implemented a script to un-invert the index
(details at OAK-7122 [1], [2]).

uninverting was done by following logic

  def collectFieldNames(DirectoryReader reader) {
        println "Proceeding to collect the field names per document"

        Bits liveDocs = MultiFields.getLiveDocs(reader)
        Fields fields = MultiFields.getFields(reader)
        fields.each {String fieldName ->
            Terms terms = fields.terms(fieldName)
            TermsEnum termsEnum = terms.iterator(null)

            while (termsEnum.next() != null) {
                DocsEnum docsEnum = termsEnum.docs(liveDocs, null,
DocsEnum.FLAG_NONE)
                while(docsEnum.nextDoc() != DocIdSetIterator.NO_MORE_DOCS) {
                    int docId = docsEnum.docID()
                    DocInfo di = infos.get(docId)
                    assert di : "No DocInfo for docId : $docId"
                    di.fieldIds << getFieldId(fieldName)
                }
            }
        }
    }

Thanks for the all the help!

Chetan Mehrotra
[1] https://issues.apache.org/jira/browse/OAK-7122
[2] https://github.com/chetanmeh/oak-console-scripts/tree/master/src/main/groovy/lucene

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Mime
View raw message