lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Ian Lea <ian....@gmail.com>
Subject Re: custome index rule
Date Mon, 24 Oct 2011 10:00:46 GMT
You can achieve pretty much anything by customizing parsers and
tokenizers but for your simple case I'd just use String.split() and
add the phrases one by one.  Something like

Document d = ...
String[] phrases = sentence,split(",");
for (String phrase : phrases) {
  d.add(new Field("phrase", phrase, ...);
}

I think that would achieve what you want.


On special characters. see
http://lucene.apache.org/java/3_4_0/queryparsersyntax.html#Escaping
Special Characters and QueryParser.escape(String s).


--
Ian.

On Mon, Oct 24, 2011 at 10:12 AM, janwen <tom.grade1986@163.com> wrote:
> Hi,
>  I want to implement a custom index rule:
>  Assume the sentence like the following:Note comma
>   I am in China,I am in USA,I am in UK
>
>  I hope lucene index above sentece based on the rule:
> 1)split the sentence with comma(,),so we get(I am in China)(I am in USA)(I am in UK)
> 2)then lucene just store the short senteces from step 1,NOT_ANALYZED
>
> P.S How many characters lucene do not support,and What they are?
> I input a^b  and get exception:
>  org.apache.lucene.queryParser.ParseException: Cannot parse 'a^b: Lexical error at line
1, column 4.  Encountered: "\u671d" (26397), after : ""
>
> thanks
>
> 2011-10-24
>
>
>
> janwen | China
> website : http://www.qianpin.com/

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Mime
View raw message