lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Steven Rowe (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (LUCENE-3037) TestFSTs.testRealTerms produces a corrupt index
Date Tue, 19 Apr 2011 20:06:05 GMT

    [ https://issues.apache.org/jira/browse/LUCENE-3037?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13021757#comment-13021757
] 

Steven Rowe commented on LUCENE-3037:
-------------------------------------

I implemented the theoretically O(log log n) complexity algorithm described [here|http://bonsaicode.wordpress.com/2010/05/07/programming-praxis-integer-logarithms/]
and compared timing for Robert's, Yonik's, and my implementation.  Yonik's is fastest :).

Timings (log1 is Robert's, log2 is Yonik's, and log3 is mine):
{noformat}
log1: 2384ms for 100000000 iterations.
log2: 1068ms for 100000000 iterations.
log3: 1697ms for 100000000 iterations.
{noformat}

Here's the test, which also compares log2 and log3 against log1 for correctness, or at least
consistency ({{-Xmx4g}} required to avoid OOMs with 100M iterations):
{code:java}
  public void testLogMethodPerformance() {
    Random r = new Random();
    int iterations = 100000000;
    int[] docFreqs = new int[iterations];
    int[] skipIntervals = new int[iterations];
    int[] log1Results = new int[iterations];
    int[] log2Results = new int[iterations];
    int[] log3Results = new int[iterations];
    for (int i = 0 ; i < iterations ; ++i) {
      docFreqs[i] = r.nextInt(1000000000);
      skipIntervals[i] = r.nextInt(1023) + 2;
    }
    long start = System.currentTimeMillis();
    for (int i = 0 ; i < iterations ; ++i) {
      log1Results[i] = MultiLevelSkipListReader.log(docFreqs[i], skipIntervals[i]);
    }
    long stop = System.currentTimeMillis();
    System.err.println("log1: " + (stop - start) + "ms for " + iterations + " iterations.");

    start = System.currentTimeMillis();
    for (int i = 0 ; i < iterations ; ++i) {
      log2Results[i] = MultiLevelSkipListReader.log2(docFreqs[i], skipIntervals[i]);
    }
    stop = System.currentTimeMillis();
    System.err.println("log2: " + (stop - start) + "ms for " + iterations + " iterations.");

    start = System.currentTimeMillis();
    for (int i = 0 ; i < iterations ; ++i) {
      log3Results[i] = MultiLevelSkipListReader.log3(docFreqs[i], skipIntervals[i]);
    }
    stop = System.currentTimeMillis();
    System.err.println("log3: " + (stop - start) + "ms for " + iterations + " iterations.");
    
    for (int i = 0 ; i < iterations ; ++i) {
      assertEquals(log1Results[i], log2Results[i]);
      assertEquals(log1Results[i], log3Results[i]);
    }
  }
{code}

Here's my implementation:

{code:java}
  public static int log3(int x, int b) {
    long b_lo = 1;
    long b_hi = b;
    long b_mid;
    int lo = 0;
    int hi = 1;
    int mid;
    // Bracket the solution by recursively squaring the base
    // until the result exceeds x
    while (b_hi < x) {
      b_lo = b_hi;
      b_hi *= b_hi;
      lo = hi;
      hi <<= 1;
    }
    // Find the solution by performing a binary search between
    // the bracketing values (lo,hi) found above
    while (hi - lo > 1) {
      mid = (lo + hi) >> 1;
      // b_mid = b_lo * b**(mid-lo)
      // Java has no integer pow() method - use a loop instead.
      // Yes, Math.pow(double,double) would work, but it's slower than this.
      b_mid = b_lo;
      for (int i = 0 ; i < mid - lo ; ++i) {
        b_mid *= b;
      }
      if (b_mid > x) {
        hi = mid;
        b_hi = b_mid;
      } else if (b_mid < x) {
        lo = mid;
        b_lo = b_mid;
      } else {
        return mid;
      }
    }
    return b_hi == x ? hi : lo;
  }
{code}



> TestFSTs.testRealTerms produces a corrupt index
> -----------------------------------------------
>
>                 Key: LUCENE-3037
>                 URL: https://issues.apache.org/jira/browse/LUCENE-3037
>             Project: Lucene - Java
>          Issue Type: Bug
>            Reporter: Robert Muir
>             Fix For: 4.0
>
>         Attachments: LUCENE-3037.patch, LUCENE-3037_test.patch, index.7z.001, index.7z.002,
index.7z.003
>
>
> seems to be prox/skip related: the test passes, but the checkindex upon closing fails.
> ant test-core -Dtestcase=TestFSTs -Dtests.seed=-4012305283315171209:0 -Dtests.multiplier=3
-Dtests.nightly=true -Dtests.linedocsfile=c:/data/enwiki.random.lines.txt.gz
> Note: to get the enwiki.random.lines.txt.gz you have to fetch it from hudson (warning
1 gigabyte file).
> you also have to run the test a few times to trigger it.
> ill upload the index this thing makes to this issue.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


Mime
View raw message