lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Stephane Nicoll <stephane.nic...@gmail.com>
Subject Re: Twitter analyser
Date Tue, 05 Nov 2013 13:33:47 GMT
Hi,

Thanks for the reply. It's an index with tweets so any word really is a
target for this. This would mean a significant increase of the index. My
volumes are really small so that shouldn't be a problem (but
performance/scalability is a concern).

I have the control over the query. Another solution would be to translate a
query on "foo" to "foo or #foo or @foo"

WDYT?

Thanks!
S.




On Tue, Nov 5, 2013 at 2:17 PM, Erick Erickson <erickerickson@gmail.com>wrote:

> If your universe of items you want to match this way is small,
> consider something akin to synonyms. Your indexing process
> emits two tokens, with and without the @ or # which should
> cover your situation.
>
> FWIW,
> Erick
>
>
> On Tue, Nov 5, 2013 at 2:40 AM, St├ęphane Nicoll
> <stephane.nicoll@gmail.com>wrote:
>
> > Hi,
> >
> > I am building an application that indexes tweet and offer some basic
> > search facilities on them.
> >
> > I am trying to find a combination where the following would work:
> >
> > * foo matches the foo word, a mention (@foo) or the hashtag (#foo)
> > * @foo only matches the mention
> > * #foo matches only the hashtag
> >
> > It should matches complete word so I used the WhiteSpaceAnalyzer for
> > indexing.
> >
> > Any recommendation for this use case?
> >
> > Thanks !
> > S.
> >
> > Sent from my iPhone
> >
> > ---------------------------------------------------------------------
> > To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> > For additional commands, e-mail: java-user-help@lucene.apache.org
> >
> >
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message