lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Erick Erickson <erickerick...@gmail.com>
Subject Re: TermRangeQuery
Date Mon, 29 Nov 2010 03:51:22 GMT
OK, maybe I'm catching on at last. Have you considered just skipping the
whole search thing entirely? Something like using one of the classes
implementing TermDocs to get the document for a particular term (one of
the elements of your list of dataIds in this case). That would give you the
doc number which in turn would allow you to extract the category. You'd
have to take some care that the Term you constructed went through
all the transformations you do with regular querying (i.e. say you
lower-case the dataIds or anything else like that).

You may want to explore lazy loading to prevent the overhead of reading
any extraneous information when you get the doc from the TermDocs class,
it can result in quite a time savings.

Best
Erick


On Sun, Nov 28, 2010 at 3:47 PM, Amin Mohammed-Coleman <aminmc@gmail.com>wrote:

> Hi
>
> I'll explain my use case more and then explain the out come of my
> implementation:
>
> I have lucene documents that look like this:
>
>
> Field name           Field Value
> dataId                     TYX-CC-124
> category                CATEGORY A
>
>
> What I would like to do is for a given collection of dataIds I'd like to
> get it's corresponding category.  The collection of ids that will be passed
> into my method will vary.  Also the id prefix (TYX-CC) will be different for
> different groups invoking my method.  For example I may have ids that look
> like below:
>
> TX-CC-124
> AVC-FF-124
>
> and so on.
>
> So I tried sorting the list of ids before creating the range query based on
> the numeric part of the id. This did not work as the number of ids returned
> from the query was greater than the input ids.
>
> You mentioned padding the number part of the ids but will that work in the
> case of the following:
>
> aa-01
> bb-01
>
> If pass in aa-01 as the lower range, the query will return the result for
> bb-01 as well (unless I have mis understood the usage of the range query).
>
> In order to get things moving i decided to create a boolean query and
> essentially batch the queries to avoid hitting too many clause exception.
> So for each 1000 ids I create a boolean query with the ids being passed in.
>   This may not be the best approach but I can't seem to get my head around
> how the range query can be used considering the numeric part of the id is
> essentially not unique.
>
>
> Thanks
> Amin
>
>
>
> On 28 Nov 2010, at 18:19, Erick Erickson wrote:
>
> > Why won't Ian's suggestion work? You haven't really given us a clue what
> it
> > is about
> > your attempt that didn't work. The expected and actual output would be
> > useful...
> >
> > But Ian's notion is the well-known issue that lexical and numeric sorting
> > aren't
> > at all the same. You'd get reasonable results if you left-padded the
> number
> > portion of the IDs with 0 out to 4 spaces, thus
> > aa-90   -> aa-0090
> > aa-123 -> aa-0123
> > aa-1000 > aa-1000
> >
> > and your range queries should work. You might have to transform them
> > back when displayed. Or you could add them to your document twice.
> > Once in a "hidden" field, the one you searched against in your range
> query
> > and the other to display. This latter wouldn't bloat your index (much)
> since
> > you would store one and index the other....
> >
> > Best
> > Erick
> >
> > On Fri, Nov 26, 2010 at 5:01 PM, Amin Mohammed-Coleman <aminmc@gmail.com
> >wrote:
> >
> >> Essentially I'd like to construct a query which is almost like SQL in
> >> clause.  The lucene document contains the id and a string value. I'd
> like to
> >> get the string value based on the id key.  The ids may range within
> 1000. Is
> >> this possible to do?
> >>
> >> Thanks
> >> Amin
> >>
> >> Sent from my iPhone
> >>
> >> On 26 Nov 2010, at 20:18, Ian Lea <ian.lea@gmail.com> wrote:
> >>
> >>> What sort of ranges are you trying to use?  Maybe you could store a
> >>> separate field, just for these queries, with some normalized form of
> >>> the ids, with all numbers padded out to the same length etc.
> >>>
> >>> --
> >>> Ian.
> >>>
> >>> On Fri, Nov 26, 2010 at 4:34 PM, Amin Mohammed-Coleman <
> aminmc@gmail.com>
> >> wrote:
> >>>> Hi
> >>>>
> >>>> Unfortunately my range query approach did not work.   It seems to be
> >> related to the ids themselves.  The list has ids that look this:
> >>>>
> >>>>
> >>>> ID-NYC-1234
> >>>> ID-LND-1234
> >>>> TX-NYC-1334
> >>>> TX-NYC-BBC-123
> >>>>
> >>>> The ids may range from 90 to 1000.  Is there another approach I could
> >> take?  I tried building a string with all the ids and set them against a
> >> field for example:
> >>>>
> >>>> dataId: ID-NYC-123 dataId: ID-NYC-1234....
> >>>>
> >>>> but that's not a great approach I know...
> >>>>
> >>>> any help would be appreciated.
> >>>>
> >>>> Thanks
> >>>> Amin
> >>>>
> >>>>
> >>>>
> >>>> On 26 Nov 2010, at 14:39, Ian Lea wrote:
> >>>>
> >>>>> Absolutely, as long as your ids will sort as you expect.
> >>>>>
> >>>>> I'm not clear what you mean by XDF-123 but if you've got
> >>>>>
> >>>>> AAA-123
> >>>>> AAA-124
> >>>>> ...
> >>>>> ABC-123
> >>>>> ABC-234
> >>>>> etc.
> >>>>>
> >>>>> then you'll be fine.  If they don't sort so neatly you can use the
> >>>>> TermRangeQuery constructor that takes a Collator but note the
> >>>>> performance warning in the javadocs.
> >>>>>
> >>>>>
> >>>>> --
> >>>>> Ian.
> >>>>>
> >>>>>
> >>>>> On Fri, Nov 26, 2010 at 2:18 PM, Amin Mohammed-Coleman <
> >> aminmc@gmail.com> wrote:
> >>>>>> Hi All
> >>>>>>
> >>>>>> I was wondering whether I can use TermRangeQuery for my use
case.  I
> >> have a collection of ids (represented as XDF-123) and I would like to do
> a
> >> search for all the ids (might be in the range of 10000) and for each
> >> matching id I want to get the corresponding data that is stored in the
> index
> >> (for example the document contains id and string value).  I am using a
> >> custom collector to collect that string value for each match.  Is it ok
> to
> >> use a TermRangeQuery for the ids rather than creating a massive query
> >> string?
> >>>>>>
> >>>>>>
> >>>>>> Thanks
> >>>>>> Amin
> >>>>>>
> ---------------------------------------------------------------------
> >>>>>> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> >>>>>> For additional commands, e-mail: java-user-help@lucene.apache.org
> >>>>>>
> >>>>>>
> >>>>>
> >>>>> ---------------------------------------------------------------------
> >>>>> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> >>>>> For additional commands, e-mail: java-user-help@lucene.apache.org
> >>>>>
> >>>>
> >>>>
> >>>> ---------------------------------------------------------------------
> >>>> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> >>>> For additional commands, e-mail: java-user-help@lucene.apache.org
> >>>>
> >>>>
> >>>
> >>> ---------------------------------------------------------------------
> >>> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> >>> For additional commands, e-mail: java-user-help@lucene.apache.org
> >>>
> >>
> >> ---------------------------------------------------------------------
> >> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> >> For additional commands, e-mail: java-user-help@lucene.apache.org
> >>
> >>
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message