lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Mingfeng Yang <mfy...@wisewindow.com>
Subject Re: tokenizer of solr
Date Fri, 12 Apr 2013 00:50:35 GMT
looks like it's due to the word delimiter filter.  Anyone know if the
"protected" file support regular expression or not?

Ming


On Thu, Apr 11, 2013 at 4:58 PM, Jack Krupansky <jack@basetechnology.com>wrote:

> Try the whitespace tokenizer.
>
> -- Jack Krupansky
>
> -----Original Message----- From: Mingfeng Yang Sent: Thursday, April 11,
> 2013 7:48 PM To: solr-user@lucene.apache.org Subject: tokenizer of solr
> Dear Solr users and developers,
>
> I am trying to index some documents some of which are twitter messages, and
> we have a problem when indexing retweet.
>
> Say a twitter user named "jpc_108" post a tweet, and then someone retweet
> his msg, and now @jpc_108 become part of the tweet text body.
>
> Seems like before indexing, the tokenizer factory of solr turns "@jpc_108"
> into "jpc and 108", and when we search for jpc_108, it's not there anymore.
>
>
> Is there anyway we can keep "jcp_108" when it appears as "@jpc_108"?
>
> Thanks,
> Ming-
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message