lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Erick Erickson <erickerick...@gmail.com>
Subject Re: Twitter analyser
Date Tue, 05 Nov 2013 13:17:12 GMT
If your universe of items you want to match this way is small,
consider something akin to synonyms. Your indexing process
emits two tokens, with and without the @ or # which should
cover your situation.

FWIW,
Erick


On Tue, Nov 5, 2013 at 2:40 AM, St├ęphane Nicoll
<stephane.nicoll@gmail.com>wrote:

> Hi,
>
> I am building an application that indexes tweet and offer some basic
> search facilities on them.
>
> I am trying to find a combination where the following would work:
>
> * foo matches the foo word, a mention (@foo) or the hashtag (#foo)
> * @foo only matches the mention
> * #foo matches only the hashtag
>
> It should matches complete word so I used the WhiteSpaceAnalyzer for
> indexing.
>
> Any recommendation for this use case?
>
> Thanks !
> S.
>
> Sent from my iPhone
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message