lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Ian Lea <ian....@gmail.com>
Subject Re: Searching for a strict prefix
Date Mon, 24 May 2010 13:37:51 GMT
I bet it's that underscore in m_sz.  Different analyzers do different
things with different punctuation characters.  I can never remember
which does exactly what - it'll be in the javadocs or Lucene In Action
or somewhere on the web.

You can check what exactly has been indexed by using Luke - always a good idea.

In this case you probably want an analyzer that just downcases the tag
names.  Easy enough to build your own if there isn't an existing one
that meets your needs.


--
Ian.


On Mon, May 24, 2010 at 1:14 PM, Shlomy Reinstein <sreinst1@gmail.com> wrote:
> Hi,
>
> Thanks. By "strict prefix", I meant a prefix of the name
> (case-insensitive). What you suggest ("tagname: updateC*") was the
> first thing I tried, but it happens to work only partially. In my
> case, I have a lot of names beginning with "m_sz<something>", e.g.
> "m_szComment", "m_szName". Trying a query like "tagname: m_szC*" will
> not find the "m_szComment" value. I just wrote the wrong example here.
>
> Shlomy
>
> On Mon, May 24, 2010 at 11:32 AM, Ian Lea <ian.lea@gmail.com> wrote:
>> StandardAnalyzer should work fine, mark the field as indexed, no need
>> to store it unless you want to retrieve it for display.
>>
>> Query via QueryParser using "tagname: updateC*" or programatically via
>> PrefixQuery.
>>
>>
>> Although I'm not sure exactly what you mean by "strict prefix".  If
>> you mean that the prefix should match exactly on case, punctuation
>> etc. maybe use KeywordAnalyzer instead.
>>
>>
>>
>> --
>> Ian.
>>
>>
>> On Sun, May 23, 2010 at 3:03 PM, Shlomy Reinstein <sreinst1@gmail.com> wrote:
>>> Hi,
>>>
>>> I have a Lucene index that contains source code tags (a tag can be any
>>> named source code element - function, class, variable). Each document
>>> contains a field with the tag name and some additional information.
>>> I'd like to be able to perform strict prefix queries. E.g. if I have a
>>> tag named "updateChildren", I'd like to be able to write a query like
>>> "updateC*" and get the list of tags that have this prefix
>>> (updateChildren being one of them).
>>>
>>> Is there a way to do this? Please suggest how I should store the field
>>> for this purpose - which analyzer to use, whether to keep it stored,
>>> etc.
>>>
>>> Thanks,
>>> Shlomy
>>>
>>> ---------------------------------------------------------------------
>>> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
>>> For additional commands, e-mail: java-user-help@lucene.apache.org
>>>
>>>
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
>> For additional commands, e-mail: java-user-help@lucene.apache.org
>>
>>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>
>

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Mime
View raw message