lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Erick Erickson" <erickerick...@gmail.com>
Subject Re: Real world app advice
Date Sat, 16 Sep 2006 18:32:17 GMT
Of course, the answer is "it depends" <G>..... This doesn't sound like a
very big index, so the first approach I'd make is making the index
complicated and keeping the queries as simple as possible. This assumes that
you really don't care about indexing speed/size and response time for
searches is what you do care about. And indexing speed won't be a problem
with this size index IMO.

Lucene in Action has an example of synonym injection into the indexing
stream that preserves proximity queries (SpanQueries) that you really want
to look at if you haven't already <G>....

Take care that the relationship between your indexing analyzer and your
search analyzers is correct, and get a copy of luke (google luke and lucene)
so you can examine your index and see how queries behave. Again, if you
haven't already I really, really recommend that you get a copy of Luke.

I think making the indexes more complex is actually lots less work, but I
don't have any real facts to back that up, FWIW.

Best
Erick

On 9/15/06, Luis Rodrigo Aguado <lrodrigo@isoco.com> wrote:
>
>     Hi all,
>
>     I have used Lucene so far for solving toy exaples and making
> tutorial examples, but now I am facing my first real-world high-quality
> application.
>
>     I need to manage around 50.000 docs, ranging from a few lines to a
> couple pages. I also need to handle lemmas and synonyms, and here is
> where my main doubts arise. I have considered two options: adding the
> synonyms and lemmas to the indexes and keeping the queries simple, or
> expanding the queries with these lemmas and synonyms and keeping the
> indexes simple. Is one of the two preferrable over the other? What are
> the benefits of each of them?
>
>     Thanks in advance!
>
>
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message