lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Michael Sokolov <msoko...@safaribooksonline.com>
Subject Re: Can we configure analyzers to not exclude specific characters
Date Wed, 28 Jan 2015 20:01:53 GMT
It's a bit of a hack, but we do this:

         <charFilter class="solr.PatternReplaceCharFilterFactory" 
pattern="([A-Za-z])\+\+" replacement="$1plusplus" />
         <charFilter class="solr.PatternReplaceCharFilterFactory" 
pattern="([A-Za-z])\#" replacement="$1sharp" />


On 1/28/2015 2:00 AM, Shivashankar Maddanimath wrote:
> Hi,
>
> I am using  Lucene standard and uax29urlemailtokenizer. These analysers are excluding
some characters like "+" ( I can't search C++). Is there any way we can  configure analyzers
to include specific characters in analyzers while tokenising?
>
> Regards,
> Shiv
>
> -----Original Message-----
> From: "Luis A Lastras" <lastrasl@us.ibm.com>
> Sent: ‎25-‎01-‎2015 08:05 AM
> To: "java-user@lucene.apache.org" <java-user@lucene.apache.org>
> Subject: Absolute term position in scoring
>
> Is it possible to incorporate in Lucene's scoring function the position of a matching
term (say as measured from the top of the document). The scenario is, if the set of documents
tend to lk about the most important stuff at the beginning of the document, then we would
like to give preference to documents that mention a term close to the top.
>
> Thanks,
>
> Luis
>
>
>
>
>
> Luis A Lastras, Ph.D.
> Research Staff Member & Manager, Concept Analytics, IBM Watson
> Member of the iBM Academy of Technology
> IBM Master Inventor
> email: lastrasl@us.ibm.com | Tel: 914-945-3613 | Cell: 914-382-1879
> address:  1101 Kitchawan Rd, Office 28-132, Yorktown Heights, NY, 10598


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Mime
View raw message