lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Paul Bell <arach...@gmail.com>
Subject Re: Beginner's questions
Date Fri, 29 Mar 2013 12:38:28 GMT
Hi Adrien,

Thank you for this warning. I think you're pointing out a fundamental
aspect of Lucene of which, given my noob-ness, I was unaware.

Last night reading in "Lucene in Action, 2nd edition," I came upon this
about addDocument(Document, Analyzer): "Adds the document using the
provided analyzer for tokenization. But be careful! In order for searches
to work correctly, you need the analyzer used at search time to "match" the
tokens produced by the analyzers at indexing time."

Is this warning from the author of a piece with what you're warning me
about?

-Paul


On Wed, Mar 27, 2013 at 8:12 PM, Adrien Grand <jpountz@gmail.com> wrote:

> On Wed, Mar 27, 2013 at 9:04 PM, Paul Bell <arachweb@gmail.com> wrote:
> > Thanks Adrien.
> >
> > I've scraped together a simple program in the Lucene 4.2 idiom (see
> below).
> > Does this illustrate what you meant by your last sentence?
> >
> > The code adds/indexes 5 documents all of whose content is identical, but
> > whose 'id' field is unique ("v1" through "v5"). It then queries the 'id'
> > field for the pattern "v*".
>
> Even if your program works, there is something "dangerous" in it: you
> index your id field with a String field, meaning that the field should
> not be analyzed and then query it using a query parser, which analyzes
> the data it is given. So you gave any of your document the id "ABC",
> you will never be able to find it since StandardAnalyzer filters
> tokens with a LowerCaseFilter. You could simply create the query
> manually:
>
> Query query = new PrefixQuery(new Term("id", "v" + id));
>
> without help from a query parser.
>
> To ensure that your id field is unique across documents, you could replace
>
> writer.addDocument(createDocument("This is a test; for the
> next 60 seconds..."));
>
> with
>
> Document doc = createDocument("This is a test; for the next 60 seconds...")
> writer.updateDocument(new Term("id", doc.get("id")), doc);
>
> > While we're at it, what method should I be using to obtain merely the
> > original document itself after a query?
>
> You can println document.get("id").
>
> --
> Adrien
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message