lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Amin Mohammed-Coleman <ami...@gmail.com>
Subject Re: TermRangeQuery
Date Sun, 28 Nov 2010 20:47:22 GMT
Hi

I'll explain my use case more and then explain the out come of my implementation:

I have lucene documents that look like this:


Field name           Field Value
dataId                     TYX-CC-124
category                CATEGORY A


What I would like to do is for a given collection of dataIds I'd like to get it's corresponding
category.  The collection of ids that will be passed into my method will vary.  Also the id
prefix (TYX-CC) will be different for different groups invoking my method.  For example I
may have ids that look like below:

TX-CC-124
AVC-FF-124

and so on.

So I tried sorting the list of ids before creating the range query based on the numeric part
of the id. This did not work as the number of ids returned from the query was greater than
the input ids.  

You mentioned padding the number part of the ids but will that work in the case of the following:

aa-01
bb-01

If pass in aa-01 as the lower range, the query will return the result for bb-01 as well (unless
I have mis understood the usage of the range query).   

In order to get things moving i decided to create a boolean query and essentially batch the
queries to avoid hitting too many clause exception.   So for each 1000 ids I create a boolean
query with the ids being passed in.   This may not be the best approach but I can't seem to
get my head around how the range query can be used considering the numeric part of the id
is essentially not unique.  


Thanks
Amin



On 28 Nov 2010, at 18:19, Erick Erickson wrote:

> Why won't Ian's suggestion work? You haven't really given us a clue what it
> is about
> your attempt that didn't work. The expected and actual output would be
> useful...
> 
> But Ian's notion is the well-known issue that lexical and numeric sorting
> aren't
> at all the same. You'd get reasonable results if you left-padded the number
> portion of the IDs with 0 out to 4 spaces, thus
> aa-90   -> aa-0090
> aa-123 -> aa-0123
> aa-1000 > aa-1000
> 
> and your range queries should work. You might have to transform them
> back when displayed. Or you could add them to your document twice.
> Once in a "hidden" field, the one you searched against in your range query
> and the other to display. This latter wouldn't bloat your index (much) since
> you would store one and index the other....
> 
> Best
> Erick
> 
> On Fri, Nov 26, 2010 at 5:01 PM, Amin Mohammed-Coleman <aminmc@gmail.com>wrote:
> 
>> Essentially I'd like to construct a query which is almost like SQL in
>> clause.  The lucene document contains the id and a string value. I'd like to
>> get the string value based on the id key.  The ids may range within 1000. Is
>> this possible to do?
>> 
>> Thanks
>> Amin
>> 
>> Sent from my iPhone
>> 
>> On 26 Nov 2010, at 20:18, Ian Lea <ian.lea@gmail.com> wrote:
>> 
>>> What sort of ranges are you trying to use?  Maybe you could store a
>>> separate field, just for these queries, with some normalized form of
>>> the ids, with all numbers padded out to the same length etc.
>>> 
>>> --
>>> Ian.
>>> 
>>> On Fri, Nov 26, 2010 at 4:34 PM, Amin Mohammed-Coleman <aminmc@gmail.com>
>> wrote:
>>>> Hi
>>>> 
>>>> Unfortunately my range query approach did not work.   It seems to be
>> related to the ids themselves.  The list has ids that look this:
>>>> 
>>>> 
>>>> ID-NYC-1234
>>>> ID-LND-1234
>>>> TX-NYC-1334
>>>> TX-NYC-BBC-123
>>>> 
>>>> The ids may range from 90 to 1000.  Is there another approach I could
>> take?  I tried building a string with all the ids and set them against a
>> field for example:
>>>> 
>>>> dataId: ID-NYC-123 dataId: ID-NYC-1234....
>>>> 
>>>> but that's not a great approach I know...
>>>> 
>>>> any help would be appreciated.
>>>> 
>>>> Thanks
>>>> Amin
>>>> 
>>>> 
>>>> 
>>>> On 26 Nov 2010, at 14:39, Ian Lea wrote:
>>>> 
>>>>> Absolutely, as long as your ids will sort as you expect.
>>>>> 
>>>>> I'm not clear what you mean by XDF-123 but if you've got
>>>>> 
>>>>> AAA-123
>>>>> AAA-124
>>>>> ...
>>>>> ABC-123
>>>>> ABC-234
>>>>> etc.
>>>>> 
>>>>> then you'll be fine.  If they don't sort so neatly you can use the
>>>>> TermRangeQuery constructor that takes a Collator but note the
>>>>> performance warning in the javadocs.
>>>>> 
>>>>> 
>>>>> --
>>>>> Ian.
>>>>> 
>>>>> 
>>>>> On Fri, Nov 26, 2010 at 2:18 PM, Amin Mohammed-Coleman <
>> aminmc@gmail.com> wrote:
>>>>>> Hi All
>>>>>> 
>>>>>> I was wondering whether I can use TermRangeQuery for my use case.
 I
>> have a collection of ids (represented as XDF-123) and I would like to do a
>> search for all the ids (might be in the range of 10000) and for each
>> matching id I want to get the corresponding data that is stored in the index
>> (for example the document contains id and string value).  I am using a
>> custom collector to collect that string value for each match.  Is it ok to
>> use a TermRangeQuery for the ids rather than creating a massive query
>> string?
>>>>>> 
>>>>>> 
>>>>>> Thanks
>>>>>> Amin
>>>>>> ---------------------------------------------------------------------
>>>>>> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
>>>>>> For additional commands, e-mail: java-user-help@lucene.apache.org
>>>>>> 
>>>>>> 
>>>>> 
>>>>> ---------------------------------------------------------------------
>>>>> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
>>>>> For additional commands, e-mail: java-user-help@lucene.apache.org
>>>>> 
>>>> 
>>>> 
>>>> ---------------------------------------------------------------------
>>>> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
>>>> For additional commands, e-mail: java-user-help@lucene.apache.org
>>>> 
>>>> 
>>> 
>>> ---------------------------------------------------------------------
>>> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
>>> For additional commands, e-mail: java-user-help@lucene.apache.org
>>> 
>> 
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
>> For additional commands, e-mail: java-user-help@lucene.apache.org
>> 
>> 


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Mime
View raw message