lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Ian Lea <ian....@gmail.com>
Subject Re: Query beginning with special characters
Date Mon, 14 Jan 2013 11:21:53 GMT
In fact I see you are ignoring all spaces between words.  Maybe that's
deliberate.  Break it down into the smallest possible complete code
sample that shows the problem and post that.


--
Ian.


On Mon, Jan 14, 2013 at 11:02 AM, Ian Lea <ian.lea@gmail.com> wrote:
> It won't be IndexWriter or IndexWriterConfig.  What exactly does your
> analyzer do - what is the full chain of tokenization?  Are you saying
> that  ':)a' and ')an' are not indexed?  Surely that is correct given
> your input with a space after the :).  And before as well so 's:)', is
> also suspect.
>
> --
> Ian.
>
>
> On Mon, Jan 14, 2013 at 7:42 AM, Hankyu Kim <gksrb92@gmail.com> wrote:
>> I'm working with Lucene 4.0 and I didn't use lucene's QueryParser, so
>> setAllowLeadingWildcard() is irrelevant.
>> I also realised the issue wasn't with querying, but it was indexing whihch
>> left the terms with leading special character out.
>>
>> My goal was to do a fuzzymatch by creating a trigram index. The idea is to
>> tokenize the documents into trigrams, not by words during indexing and
>> searching so lucene can search for part of a word or phrase.
>>
>> Say the original text in the document said : "Sample text with special
>> characters :) and such"
>> It's tokenized into
>>  'sam', 'amp', 'mpl', 'ple', 'let', 'ete', 'tex', 'ext', 'xtw', 'twi',
>> 'wit', 'ith', 'ths', 'hsp', 'spe', 'pec', 'eci', 'cia', 'ial', 'alc',
>> 'lch', 'cha', 'har', 'ara', 'rac', 'act', 'cte', 'ter', 'ers', 'rs:',
>> 's:)', ':)a', ')an', 'and', 'nds', 'dsu', 'suc', 'uch'.
>> The above is output from my tokenizer so there's nothing wrong with
>> creating trigrmas. However, when I check the index with lukeall, all the
>> other trigrams are indexed correctly except for the terms ':)a' and ')an'.
>> Since the missing indexes are related to lucene's special characters, I
>> don't think it's got to do with my custom code.
>>
>> I only changed analyser in the IndexFiles.java from demo to index the file.
>> Honestly, I can't locate even the exact class in which the problem is
>> caused. I'm only guessing IndexWriterConfig or IndexWriter is discarding
>> the terms with leading special characters.
>>
>> I hope the above infromation helps.
>>
>> 2013/1/11 Ian Lea <ian.lea@gmail.com>
>>
>>> QueryParser has a setAllowLeadingWildcard() method.  Could that be
>>> relevant?
>>>
>>> What version of lucene?  Can you post some simple examples of what
>>> does/doesn't work? Post the smallest possible, but complete, code that
>>> demonstrates the problem?
>>>
>>>
>>> With any question that mentions a custom version of something, that
>>> custom version has to be the prime suspect for any problems.
>>>
>>>
>>> --
>>> Ian.
>>>
>>>
>>> On Thu, Jan 10, 2013 at 12:08 PM, Hankyu Kim <gksrb92@gmail.com> wrote:
>>> > Hi.
>>> >
>>> > I've created a custom analyzer that treats special characters just like
>>> any
>>> > other. The index works fine all the time even when the query includes
>>> > special characters, except when the special characters come to the
>>> begining
>>> > of the query.
>>> >
>>> > I'm using spanTermQuery and wildCardQuery, and they both seem to suffer
>>> the
>>> > same issue with queries begining with special characters. Is it a
>>> > limitation of Lucene or am I missing something?
>>> >
>>> > Thanks
>>>
>>> ---------------------------------------------------------------------
>>> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
>>> For additional commands, e-mail: java-user-help@lucene.apache.org
>>>
>>>

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Mime
View raw message