lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Alice" <aliceli...@gmail.com>
Subject RE: Help on search
Date Tue, 07 Nov 2006 17:22:28 GMT
Thank you for answering.

I thought about stemming at index and search time, but Im dealing with
names.
How can I make a rule for stemming names? 

I'm thinking about different names, I thought I could use Lucene to help
users find whomever they're looking for without having to spell their
friends names correctly or fully.

That’s why I gave the example about 'Frederich', it is very common that
users enter 'Fred'.

I'm going to read about the wildcards.

Anymore suggestions are very welcome.

Thanks

-----Original Message-----
From: Erick Erickson [mailto:erickerickson@gmail.com] 
Sent: terça-feira, 7 de novembro de 2006 15:12
To: java-user@lucene.apache.org
Subject: Re: Help on search

You really have to get away from thinking about lucene like you would a
database, or you'll have endless problems <G>.

If you want just a bag of name parts, why use separate fields? For instance,
index both firstname and lastname in a name field in each document. Then all
you have to do to find all documents that have Brown in them is search on
the single field name. There's no need to construct complex or clauses,
which is what I suspect you're doing now.....

You can also index firstname, lastname, and a third field "bagofnames" that
contains both if you want.

Vladimir is right, unless you take some actions you always get exact matches
when you search, so I'm not surprised that searching for Fred does NOT
return Frederich-containing documents.

Whether wildcarding is your best strategy is debatable, it depends upon your
intent. You could consider stemming both at index and search time.
Generically, all that means is that you index "Fred" for "Frederich". and at
search time you search for "Fred". There are stemming analyzers in Lucene,
but whether they do what you want only you can decide. Be aware that
wildcard searches frequently run afoul of a TooManyClauses exception. There
is a bunch of information from people far more knowledgeable than me in this
mail archive if you search on "wildcard". It's a more complicated matter
than you might think at first.

Note that the TooManyClauses exception very likely won't show up in your
test data sets. It won't show up until you put in a bunch of data. See
TooManyClauses in the javadoc.....

I'd argue if you can create rules for wildcarding, you can create rules for
stemming By that I mean that if you have a programmatic way to turn
"Frederich" into "Fred", you can just index and search on "Fred" equally
easily and not have to deal with wildcard complexity.

Of course, since I don't know what problem you're actually trying to solve,
this may be irrelevant....

Best
Erick

On 11/7/06, Alice <alicelista@gmail.com> wrote:
>
> Hello!
>
> I am totally new to Lucene and I'm trying to use it with my web
> application.
>
>
>
> What I'm doing is reading a table from my database line by line and
> indexing
> the columns.
>
> I read the users data such as First Name, Last Name, Email and so on.
>
>
>
> I hava a field by column, such as: firstName="Frederich" lastName="Brown"
>
> My intetion is to make users find other users using name, lastName or
> email.
>
> What is the best way to do it?
>
> If the user enter "Fred Brown", as I use queryParser, this string is
> broken
> to "Fred" "Brown" and I search those tokens in every field.
>
>
>
> As "Brown" was indexed, the user's registry is found.
>
> But if the user enters "Fred", no user is found.
>
> Why is that? I thought it would return the user "Frederich Brown" either.
>
> Can someone help?
>
>
>
> Thanks!
>
>
>
>
>


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Mime
View raw message