lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Erick Erickson <>
Subject Re: lucene index details
Date Thu, 19 Feb 2009 14:30:15 GMT
You have to look at Analyzers a bit here because that's what
controls what is in the index. The simplest case is a WhitespaceAnalyzer
that breaks the input stream up into tokens on any whitespace.

So, in your example and using a WhitespaceAnalyzer, you'd get
the following tokens:
lucene, is, used, to, index, files

Let's put these into a field called "text"  in a document. A document is
a little like a row in a database table. So you could have fields
"text", "filename".... In this example, "filename" has nothing
(and, in fact, doesn't even need to be present in this particular doc).

Now, parsing the query against the text field (see the query syntax)
essentially asks "does the document have the word 'index' OR the
word 'files' in the 'text' field"? (OR is the default operator).

But note that there's no magic involved here. Lucene, for instance,
doesn't know about indexing files. The examples in the book
have underlying code that opens the files, reads the data and
feeds that data through an Analyzer for indexing. That's code you
have to write yourself.

Anyway, I'd examine the examples carefully. Also, get a copy of
Luke, a program that allows you to examine the index and see what
various query parsers do. It's invaluable.

As far as the internal structure of the index, I just treat it as a black
box, but on the Wiki there are links to various explanations.


On Thu, Feb 19, 2009 at 6:17 AM, Seid Mohammed <> wrote:

> I am new to lucene, and reading lucene in action book
> sometimes, i better understand when somone tell me an answer than a book.
> my queston is
> when indexing, what actually lucene is doing?
> if i have a file called test.txt  with contents " lucen is used to
> index files" and i apply lucene indexing, what is the content of the
> index and  what is the structure of the index?.
> and if i apply lucene search, for example a query "index files", from
> where lucene searches, from the index or from the test.index file
> thanks a lot
> seid m
> --
> ---------------------------------------------------------------------
> To unsubscribe, e-mail:
> For additional commands, e-mail:

  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message