lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Karthik N S" <kart...@controlnet.co.in>
Subject RE: using different analyzer for searching
Date Fri, 01 Apr 2005 08:45:37 GMT
Hi

Try First Try Using the AnalysisDemo.java code from
http://today.java.net/pub/a/today/2003/07/30/LuceneIntro.html?page=last#thre
ad from java.net for the Contents u seems to experiment with and verify
which analyzer to use.

Probably this will give  u some Idea on Analyzers.



with regaards
Karthik



-----Original Message-----
From: pashupathinath [mailto:pashupathinathk@yahoo.com]
Sent: Friday, April 01, 2005 9:19 AM
To: java-user@lucene.apache.org
Subject: Re: using different analyzer for searching


hi erik,
   i'm creating a blogger application, where the users
can create blogs, upload pictures and post comments
etc etc.
   i'm storing all the information using mysql
database. i'm indexing the database contents and
searching on this index.i'm using lucene to implement
this feature.
   i give the user options to search based on
BlogTitle, Blogdesc,blogcategory. my main purpose of
search is ..whenever a user enters any query related
to blogtitle or blogdesc or blogcategory, it should
return all the matching documents for that search
string.
   the real problem i'm facing is ..whenever the user
enters some part of the mainstring, the search returns
 zero because i was using a whitespaceanalyser, which
needs the complete string. i should look into using
wildcardquery which i think will solve my problem to
some extent.
   i should do even more analysis as suggested by you
before i should come to a decision of which analyser i
should be using to solve this. what about writing a
custom analyzer to solve this ??? how can i go abt the
logic of implementing this in a custom analyzer..
where this returns all the documents that has even a
part of  the search string.
   any insight into this would be very helpful
especially in terms of performance wise.

thanks,
pashupathinath.k

--- Erik Hatcher <erik@ehatchersolutions.com> wrote:
>
> On Mar 31, 2005, at 11:44 AM, pashupathinath wrote:
>
> >   is it possible to index using a predefined
> analyzer
> > and search using a custom analyzer ??
>
> Yes, its perfectly fine to do so with the caveat
> that you end up
> searching for the terms exactly as they were
> indexed.
>
> I end up doing this in most applications, actually,
> primarily because
> untokenized fields need to use the KeywordAnalyzer
> during searching.
>
> >   i'm searching using the built in whitespace
> > analyser. the problem is when i'm searching for a
> part
> > of a string the search results are zero.
> >   i'm using white space analyzer. for example if
> the
> > statement is "my name is abc123" the search for
> abc or
> > 123 doesnt return any hits.
> >   anyinsight into this ??
>
> The exact terms indexed using WhitespaceAnalyzer are
> like this (using
> the Lucene in Action AnalyzerDemo - "ant
> AnalyzerDemo"):
>
>      [input] String to analyze: [This string will be
> analyzed.]
> my name is abc123
>       [echo] Running lia.analysis.AnalyzerDemo...
>       [java] Analyzing "my name is abc123"
>       [java]   WhitespaceAnalyzer:
>       [java]     [my] [name] [is] [abc123]
>
>       [java]   SimpleAnalyzer:
>       [java]     [my] [name] [is] [abc]
>
>       [java]   StopAnalyzer:
>       [java]     [my] [name] [abc]
>
>       [java]   StandardAnalyzer:
>       [java]     [my] [name] [abc123]
>
> So you indexed "abc123" and searches must search for
> that term
> *exactly*.  You can search for "abc*" as a
> PrefixQuery or WildcardQuery
> and find "abc123".  "*123" will also find it though
> QueryParser does
> not support leading wildcard characters (but the API
> does).  Wildcard
> queries are not ideally what you want as it tends to
> be much slower for
> large indexes.
>
> You may need to do specialized analysis.  Perhaps
> you could share you
> real needs with the list and we could offer
> recommendations.  It is
> possible to index "abc123", "abc", and "123" all
> within the same
> position in the index if you do some clever analysis
> and that meshes
> with what you're after.
>
> 	Erik
>
>
>
---------------------------------------------------------------------
> To unsubscribe, e-mail:
> java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail:
> java-user-help@lucene.apache.org
>
>

Send instant messages to your online friends http://uk.messenger.yahoo.com

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Mime
View raw message