lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From jake dsouza <jakedsouz...@gmail.com>
Subject Re: pruning package- pruneAllPositions
Date Mon, 07 May 2012 15:46:48 GMT
Hi Zeynep,

I was facing the same issue in CarmelUniformTermPruningPolicy in
package org.apache.lucene.index.pruning .

I think the issue is in the while loop condition in following peice of code

*while ((docsPos < (docs.length - 1))*
* && termPositions.doc() > docs[docsPos].doc) {*
* docsPos++;*
* }*
* if (termPositions.doc() == docs[docsPos].doc) {*
* // pass*
* docsPos++; // move to next doc id*
* return false;*
* } else if (termPositions.doc() < docs[docsPos].doc) {*
* return true; // skip this one - it's less important*
* }*
*// should not happen!*
*throw new IOException("termPositions.doc > docs[docsPos].doc");*

in the while loop , docPos will keep getting incremented until the
condition fails which can happen in two cases
     1 If *docsPos < (docs.length - 1)  or *
*     2 If ** termPositions.doc() > docs[docsPos].doc*
*
*
The error occurs when docsPos < docs.length-1 is false , but
*termPositions.doc()
> docs[docsPos].doc *is still satisfied* . *
*
*
Due to this , the if() { } else if() { }  block does not run  and the
exception is thrown.

Fix - I added another condition which return true if(docsPos ==
docs.length-1)  just above the step which throws the exception

Im not sure if my fix is correct but it seems to be working . Will update
if I am certain .

Regards
Jake


On Mon, May 7, 2012 at 10:52 AM, Zeynep P. <zpvie@yahoo.com> wrote:

> Thanks for the link. I reviewed it.
> Here are more details about the exception:
>
> I used contrib/benchmark/conf/wikipedia.alg to index wikipedia dump with
> MAddDocs: 200000. I wanted to index only a specific period of time so I
> added an if statement in  doLogic of AddDocTask class.
> I tried to prune the index by using pruning package (CarmelTopKPruning) and
> I had the exception.
>
> I added System.out.println(term);  as the first line of the
> initPositionsTerm and System.out.println("***" + term); as the last line of
> it. Carmel top k exception comes from pruneAllPositions (throw new
> IOException("termPositions.doc > docs[docsPos].doc"); ).
>
> For example, for token body:freely I had the output as follows:
>
> body:freely
> ***body:freely
> body:freely
> ***body:freely
> body:freely
> ***body:freely
> Carmel topk in exception (docs[docsPos].doc = 4414, termPositions.doc() =
> 4995)
> Carmel topk in exception (docs[docsPos].doc = 4414, termPositions.doc() =
> 4996)
> Carmel topk in exception (docs[docsPos].doc = 4414, termPositions.doc() =
> 4997) ..
> Carmel topk in exception
> Carmel topk in exception
> Carmel topk in exception
> Carmel topk in exception
> Carmel topk in exception
> Carmel topk in exception
> Carmel topk in exception
> Carmel topk in exception
> Carmel topk in exception
> body:freely
> ***body:freely
> Carmel topk in exception
> Carmel topk in exception
> body:freely
> ***body:freely
> body:freely
> ***body:freely
>
> I hope that my problem is more clear now.
>
> Thanks in advance,
> Best Regards
> ZP
>
> --
> View this message in context:
> http://lucene.472066.n3.nabble.com/pruning-package-pruneAllPositions-tp3954762p3968723.html
> Sent from the Lucene - Java Users mailing list archive at Nabble.com.
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message