lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Combs, Craig" <>
Subject RE: Why are tokens not being indexed?
Date Mon, 05 Dec 2005 17:39:02 GMT
I'm able to see the documents that were indexed but not the tokens
associated with the document in Luke.

I'm using the multifield query parser and I did do the query.toString and
the tokens returned by the query parser matched the tokens returned from the
analyzer.  Some how I need to see which tokens are associated with what
documents in the Lucene index database.

I'm not sure Luke can do this.  I don't need to know which documents were
indexed but I need to know what tokens are actually indexed in lucene.  What
is the best way to look into an index that Lucene has created and what
tokens are associated with that given index.


-----Original Message-----
From: Erik Hatcher [] 
Sent: Monday, December 05, 2005 12:23 PM
Subject: Re: Why are tokens not being indexed?

On Dec 5, 2005, at 8:20 AM, Combs, Craig wrote:
> This is very mysterious
> I have check my parser and I'm returned body:<token>.  My analyzer  
> during
> indexing returns <token> in the token stream.  But when I perform  
> my search
> no results are found.
> Is there a way I can see what tokens are actually written by the index
> writer of lucene?

My article and the (free) code from has an  
analyzer demo that will show what comes out of the analyzer, but  
sounds like you've already troubleshot that aspect.

Luke (google for "luke lucene") will let you see what got indexed - I  
recommend trying that out.

> My analyzer returns the tokens and my queryparser returns the  
> tokens so I'm
> not sure why "SOME" tokens are not being found in the index.  These  
> are
> tokens in the middle of a token stream so it's not like they are at  
> the end
> or beginning, and I have not found a pattern to them yet.

Are you searching with QueryParser?   If you look at the generated  
Query.toString() does it match what you indexed?   If not, try a  
simple TermQuery for what gets returned from your analyzer and see if  
that works.  If not, Lucene is broken :)


To unsubscribe, e-mail:
For additional commands, e-mail:

The contents of this e-mail are intended for the named addressee only. It
contains information that may be confidential. Unless you are the named
addressee or an authorized designee, you may not copy or use it, or disclose
it to anyone else. If you received it in error please notify us immediately
and then destroy it. 

To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message