Ard,
In indexing_configuration.xml, where you named the property where the
analyzer is used (e.g. FullName), how to I set it so that it's used on all
properties of a node? As previously said, I'm using jcr:contains because I
need to search all parts of the node, so the analyzer needs to have effect
on all properties.
Regards,
Chris
On 27/08/10 2:22 AM, "H. Wilson" <wilsonh@randdss.com> wrote:
> Finally! I have been hacking away at this here and there for months,
> trying all different analyzers or not-using analyzers and modifying my
> queries all to no avail! Since I always like precise examples when I am
> searching forums, I will post my (nearly) exact solution both for others
> and so that Ard might verify that this was indeed what he meant.
>
> Ard, I was hoping you could embellish a little on why we would duplicate
> the property? (I didn't actually do it to get this working perfectly)
> You lost me a little there, was it for efficiency? Thanks for everything!
>
> H. Wilson
>
> repository.xml (modified both SearchIndex tags to include an
> indexingConfiguration):
>
> <SearchIndex
> class="org.apache.jackrabbit.core.query.lucene.SearchIndex">
>
> ....
> <param name="indexingConfiguration"
> value="${rep.home}/indexing_configuration.xml"/>
>
> </SearchIndex>
>
>
> indexing_configuration.xml:
>
> <configuration xmlns:nt="http://www.jcp.org/jcr/nt/1.0">
> <analyzers>
> <analyzer
> class="org.mycompany.lucene.analysis.LowerCaseKeywordAnalyzer">
> <property>fullName</property>
> </analyzer>
> </analyzers>
> </configuration>
>
>
> LowerCaseKeywordAnalyzer.java:
>
> package org.mycompany.lucene.analysis;
> import java.io.Reader;
> import org.apache.lucene.analysis.KeywordAnalyzer;
> import org.apache.lucene.analysis.LowerCaseFilter;
> import org.apache.lucene.analysis.TokenStream;
>
> public class LowerCaseKeywordAnalyzer extends KeywordAnalyzer {
>
> public TokenStream tokenStream ( String field, final Reader
> reader ) {
> TokenStream keywordTokenStream = super.tokenStream (field,
> reader);
> return ( new LowerCaseFilter ( keywordTokenStream ) );
> }
> }
>
>
> Our search class has a method which then does the following:
>
> public OurParameter[] getOurParameters (String searchTerm, String
> srchField ) { //srchField in this case was fullName
>
> TransientRepository repository = new TransientRepository (
> OUR_REPO_CONFIG, OUR_REPO_LOCATION);
> Session session = repository.login ();
> List<Class> classes = new ArrayList<Class>();
> classes.add (OurParameter.class);
> Mapper mapper = new AnnotationMapperImpl (classes);
> ObjectContentManager ocm = new ObjectContentManagerImpl
> (session, mapper);
> queryManager = ocm.getQueryManager();
> FilterImpl filter = (FilterImpl)queryManager.createFilter
> (OurParameter.class);
> filter.addContains ( srchField,
>
> org.apache.jackrabbit.util.Text.escapeIllegalXpathSearchChars(searchTerm).repl
> aceAll
> ("'","''"));
> // (that last was replace all single ticks with two ticks, I
> honestly can't remember why though)
> Query query = queryManager.createQuery (filter);
> Collection<OurParameter> resultsCollection =
> (Collection<OurParameter>)ocm.getObjects(query);
>
> //convert to an array, do some other stuff, and return...
>
> }
>
>
>
> On 08/26/2010 10:42 AM, Ard Schrijvers wrote:
>> On Thu, Aug 26, 2010 at 3:53 PM, H. Wilson<wilsonh@randdss.com> wrote:
>>> Ard,
>>>
>>> I have this same problem, however my scenario involves underscores rather
>>> than hyphens. Although since Chris seems to be seeing the same exact
>> It is because hyphens just as underscores are tokens the Standard
>> Lucene Analyzer splits on. This combined with query expansion that
>> happens for wildcard searches in lucene causes your issuess:
>>
>>> behavior as I was, I imagine we are both stuck on the same issue. After
>>> scouring the forums for the solution, and not seeing your mentioned
>>> solution, I actually posted my problem as detailed as possible here (
>>> http://markmail.org/message/yh72wqd5b2hbr3j6 ) and received no response.
>>> jcr:like was not an option for me, in this case, as our client wanted the
>>> option for case-insensitive searches. Is there any chance you could please
>>> narrow down where-about the post was which already covered this? Thanks for
>> I can't seem to find my post again. But, I'll give you a quite simple
>> solution:
>>
>> If you want to have the normal indexing of the property for normal
>> searching, but also want to have the yyy* option, you need to
>> duplicate the property also in another property. If your property,
>> like
>>
>> .North.South.East.WestLand
>>
>> is only needed for the one you describe with wildcard searching, you
>> only need it once. Now, suppose, your property is called myProp.
>>
>> To your configuration.xml add:
>>
>> <configuration xmlns:nt="http://www.jcp.org/jcr/nt/1.0">
>> <analyzers>
>> <analyzer
>> class="org.mycompany.lucene.analysis.LowerCaseKeywordAnalyzer">
>> <property>myProp</property>
>> </analyzer>
>> </analyzers>
>> </configuration>
>>
>> Your LowerCaseKeywordAnalyzer is very simple: it extends
>> http://lucene.apache.org/java/2_4_0/api/org/apache/lucene/analysis/KeywordAna
>> lyzer.html
>> and in the method
>>
>> TokenStream tokenStream(String fieldName,Reader reader)
>>
>> after calling the super, you invoke Lucene's LowerCaseFilter.
>>
>> That is all (after you do a re-index of your repository). Since now a
>> -, or _ or ~ or whatever is not seen as a token to split on, but you
>> still use lowercase filter, you can do exactly what you want.
>>
>> Do the words need the be split on spaces however? No problem, just add
>> a WhiteSpaceTokenizer from lucene. It is actually pretty simple,
>>
>> Hope this helps,
>>
>> Regards Ard
>>
>>> your time.
>>>
>>> *H. Wilson*
>>>
>>>
>>> On 08/26/2010 04:59 AM, Ard Schrijvers wrote:
>>>> Hello,
>>>>
>>>> You can search the archives (mail from me) for wildcard searching
>>>> things related below. There was someone having similar issues. I
>>>> explained the wildcard difficulties. Take a look at jcr:like for your
>>>> usecases
>>>>
>>>> Regards Ard
>>>>
>>>> On Thu, Aug 26, 2010 at 10:19 AM, Dunstall, Christopher
>>>> <cdunstall@csu.edu.au> wrote:
>>>>> Hi all,
>>>>>
>>>>> I'm having some trouble with an XPath query, where I'm searching for
>>>>> users with hyphens in their name.
>>>>>
>>>>> I'm using:
>>>>> jcr:contains(*/*/*,'query')
>>>>>
>>>>> And it returns some odd results.
>>>>>
>>>>> I have two users, Sophie-Allen and Sophie-Anne. When I search for
>>>>> 'sophie', I get back users back. Ok, fine, but if I search for 'sophie-a'
>>>>> (with the hyphen escaped as 'sophie\-a' as per the JSR-170 Spec) I get
>>>>> zero
>>>>> results returned. Oddly, if I search for either 'sophie-allen' or
>>>>> 'sophie-anne' I get the respective user details back fine. Shouldn't
I get
>>>>> both users back when escaping the hyphen? Have I missed something in
the
>>>>> spec?
>>>>>
>>>>> One other odd thing is the addition of an asterisk (*). Searching for
>>>>> 'soph' and 'soph*' return the same result (both users), but if I search
>>>>> for
>>>>> 'sophie-allen*', I get zero results, unlike when searching for just
>>>>> 'sophie-allen'. Searching for 'sophie-a*' has the same result as without
>>>>> the
>>>>> asterisk, i.e. nothing.
>>>>>
>>>>> The JSR-170 spec doesn't say anything (that I can find) but is the
>>>>> asterisk a wildcard in the jcr:contains function or does it serve some
>>>>> other
>>>>> purpose?
>>>>>
>>>>> Your assistance is greatly appreciated,
>>>>>
>>>>> Regards,
>>>>>
>>>>> Chris Dunstall | Service Support - Applications
>>>>> Technology Integration/OLE Virtual Team
>>>>> Division of Information Technology | Charles Sturt University | Bathurst,
>>>>> NSW, Australia
>>>>>
>>>>> Ph: 02 63384818 | Fax: 02 63384181
>>>>>
|