lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "janwen"<tom.grade1...@163.com>
Subject Re: Re: custome index rule
Date Mon, 24 Oct 2011 09:57:10 GMT
thanks,Ian.I will try your idea.

2011-10-24



janwen | China 
website : http://www.qianpin.com/




From:Ian Lea
Date:2011-10-24 18:01
Subject:Re: custome index rule
To:java-user
Cc:

You can achieve pretty much anything by customizing parsers and 
tokenizers but for your simple case I'd just use String.split() and 
add the phrases one by one.  Something like 

Document d = ... 
String[] phrases = sentence,split(","); 
for (String phrase : phrases) { 
  d.add(new Field("phrase", phrase, ...); 
} 

I think that would achieve what you want. 


On special characters. see 
http://lucene.apache.org/java/3_4_0/queryparsersyntax.html#Escaping 
Special Characters and QueryParser.escape(String s). 


-- 
Ian. 

On Mon, Oct 24, 2011 at 10:12 AM, janwen <tom.grade1986@163.com> wrote: 
> Hi, 
>  I want to implement a custom index rule: 
>  Assume the sentence like the following:Note comma 
>   I am in China,I am in USA,I am in UK 
> 
>  I hope lucene index above sentece based on the rule: 
> 1)split the sentence with comma(,),so we get(I am in China)(I am in USA)(I am in UK)

> 2)then lucene just store the short senteces from step 1,NOT_ANALYZED 
> 
> P.S How many characters lucene do not support,and What they are? 
> I input a^b  and get exception: 
>  org.apache.lucene.queryParser.ParseException: Cannot parse 'a^b: Lexical error at line
1, column 4.  Encountered: "\u671d" (26397), after : "" 
> 
> thanks 
> 
> 2011-10-24 
> 
> 
> 
> janwen | China 
> website : http://www.qianpin.com/ 

--------------------------------------------------------------------- 
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org 
For additional commands, e-mail: java-user-help@lucene.apache.org 
Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message