lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From <oh...@cox.net>
Subject Re: Weird discrepancy with term counts vs. terms (off by 1)
Date Sun, 02 Aug 2009 14:28:01 GMT
Hi,

BTW, my indexer app is basically the same as the demo IndexFiles.java.  Here's part of the
main:

    try {
      IndexWriter writer = new IndexWriter(INDEX_DIR, new StandardAnalyzer(), true, IndexWriter.MaxFieldLength.LIMITED);
      System.out.println("Indexing to directory '" +INDEX_DIR+ "'...");
      indexDocs(writer, docDir);
      System.out.println("Optimizing...");
      writer.optimize();
      writer.close();

      Date end = new Date();
      System.out.println(end.getTime() - start.getTime() + " total milliseconds");

    } catch (IOException e) {
      System.out.println(" caught a " + e.getClass() +
       "\n with message: " + e.getMessage());
    }

when I run the indexer, I can see it say it added the document that ends up being "missing"
from the terms.

Thanks,
Jim


---- ohaya@cox.net wrote: 
> Hi,
> 
> I've noticed a kind of strange problem with term counts and actual terms.
> 
> Some background:  I wrote an app that creates an index, including a "path" field.  
> 
> I am now working on an app (code was in the previous thread) that, as part of what it
does, needs to get a list of all of the "path" fields for documents that were added.
> 
> I first noticed the problem that I'm seeing while working on this latter app.  Basically,
what I noticed was that while I was adding 13 documents to the index, when I listed the "path"
terms, there were only 12 of them.
> 
> So then, I reviewed the index using Luke, and what I saw with that was that there were
indeed only 12 "path" terms (under "Term Count" on the left), but, when I clicked the "Show
Top Terms" in Luke, there were 13 terms listed by Luke.
> 
> At this point, I'm very puzzled about all of this :(...
> 
> Can anyone explain why the difference in Luke, and, more importantly, what I am only
getting 12 (i.e., 1 less than the # of documents added) when I try to programmatically list
the terms?
> 
> Thanks,
> Jim
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
> 


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Mime
View raw message