lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Erick Erickson" <erickerick...@gmail.com>
Subject Re: factor in stopwords when searching
Date Sat, 22 Mar 2008 21:12:35 GMT
Well, whether it's a good user experience is exactly the question. I've
spent far too much time satisfying customer (or product manager)
requests that add zero value to the product *in the user's eyes*.

And I quote:
"This is asked by some customer, who may not know what's "stop words" at
all."

which raises red flags to me. You might be far ahead spending the time up
front to work with the customer and see if it's really worthwhile.
Especially
when asking them to provide actual examples that show the "better user
experience". The question is *always* "is the benefit *in the real world* to
the user worth the extra time and money it'll cost to develop this?".
Especially
when you factor in ongoing maintenance, and the next developer having to
spend time  figuring out what the heck is going on, and ......

Double especially when you ask the customer "would you rather have this
feature or some other feature?". It's amazing how often customers don't
need a particular feature when the costs are made explicit. Either in
additional time or other features not getting worked on.

As a profession, we spend way too much time doing whatever the customer
asks for, rather than what the customer actually needs. We've done a pretty
poor job of reminding the customer what something costs and letting *them*
make the decision. Usually, we say something like "sure, we can do that".
The
problem is that the customer never gets a chance to say "but it's not worth
it",
because we often neglect to add "and it'll cost 6 developer-weeks which
means
we won't be able to do X, Y and Z in the time frame".

Anyway, enough ranting <G>. If the customers are willing to pay for it
that's
their business I guess..

Best
Erick

On Sat, Mar 22, 2008 at 2:38 PM, Chris Lu <chris.lu@gmail.com> wrote:

> This is asked by some customer, who may not know what's "stop words" at
> all.
>
> Jake's approach should be quite similar to what some search engine
> companies are doing. It'll cost some storage, but can achieve a good
> user experience.
>
> The benefit is kind of obvious in real world. When users enter some
> query, they do not really know stop words like "the" are not in the
> index at all.
> If they are looking for something, like a book titled "search the
> database", other books like "search database" is ranked top, which is
> not a good user experience.
>
> --
> Chris Lu
> -------------------------
> Instant Scalable Full-Text Search On Any Database/Application
> site: http://www.dbsight.net
> demo: http://search.dbsight.com
> Lucene Database Search in 3 minutes:
>
> http://wiki.dbsight.com/index.php?title=Create_Lucene_Database_Search_in_3_minutes
> DBSight customer, a shopping comparison site, (anonymous per request)
> got 2.6 Million Euro funding!
>
>
> On Sat, Mar 22, 2008 at 10:53 AM, Erick Erickson
> <erickerickson@gmail.com> wrote:
> > What's your reason for trying? The whole point of stop words is that
> >  they should be considered "no ops". That is, they add nothing to the
> >  semantics of whatever is being processed. I' don't understand the use
> >  case for why you want to go outside that assumption.
> >
> >  Another way of asking this is "what tangible benefit to your users
> >  are you trying to implement"?
> >
> >  Best
> >  Erick
> >
> >
> >
> >  On Fri, Mar 21, 2008 at 9:20 PM, Chris Lu <chris.lu@gmail.com> wrote:
> >
> >  > Let's say "the" is considered stopword. And for example two documents
> are
> >  > document A, content: "... search the database..."
> >  > document B, content: "... search database..."
> >  >
> >  > So when the user's input is "search the database", searching with
> >  > query content:"search database"~1 can return both.
> >  > But is there any way to translate that into a query that can rank the
> >  > document A higher than document B?
> >  >
> >  > Thanks!
> >  >
> >  > --
> >  > Chris Lu
> >  > -------------------------
> >  > Instant Scalable Full-Text Search On Any Database/Application
> >  > site: http://www.dbsight.net
> >  > demo: http://search.dbsight.com
> >  > Lucene Database Search in 3 minutes:
> >  >
> >  >
> http://wiki.dbsight.com/index.php?title=Create_Lucene_Database_Search_in_3_minutes
> >  > DBSight customer, a shopping comparison site, (anonymous per request)
> >  > got 2.6 Million Euro funding!
> >  >
> >
> >
> > > ---------------------------------------------------------------------
> >  > To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> >  > For additional commands, e-mail: java-user-help@lucene.apache.org
> >  >
> >  >
> >
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message