lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Nagelberg, Kallin" <KNagelb...@globeandmail.com>
Subject RE: seemingly impossible query
Date Fri, 21 May 2010 20:44:12 GMT
I just realized something that may make the fieldcollapsing strategy insufficient. My 'ids'
field is multi-valued. From what I've read you cannot field collapse on a multi-valued field.
Any other ideas?

Thanks,
-Kallin Nagelberg

-----Original Message-----
From: Geert-Jan Brits [mailto:gbrits@gmail.com] 
Sent: Thursday, May 20, 2010 1:03 PM
To: solr-user@lucene.apache.org
Subject: Re: seemingly impossible query

Hi Kallin,

again please look at
FieldCollapsing<http://wiki.apache.org/solr/FieldCollapsing> ,
that should do the trick.
basically: first you constrain the field: 'listOfIds' to only contain docs
that contain any of the (up to) 100 random ids as you know how to do

Next, in the same query, specify to collapse on field 'listOfIds '
basically:
q=listOfIds:1 OR listOfIds:10 OR listOfIds:24&
collapse.threshold=1&collapse.field=listOfIds&collapse.type=normal

this would return the top-matching doc for each id left in listOfIds. Since
you constrained this field by the ids specified you are left with 1 matching
doc for each id.

Again it is not guarenteed that all docs returned are different. Since you
didn't specify this as a requirement I think this will suffics.

Cheers,
Geert-Jan

2010/5/20 Nagelberg, Kallin <KNagelberg@globeandmail.com>

> Yeah I need something like:
> (id:1 and maxhits:1) OR (id:2 and maxits:1).. something crazy like that..
>
> I'm not sure how I can hit solr once. If I do try and do them all in one
> big OR query then I'm probably not going to get a hit for each ID. I would
> need to request probably 1000 documents to find all 100 and even then
> there's no guarantee and no way of knowing how deep to go.
>
> -Kallin Nagelberg
>
> -----Original Message-----
> From: darren@ontrenet.com [mailto:darren@ontrenet.com]
> Sent: Thursday, May 20, 2010 12:27 PM
> To: solr-user@lucene.apache.org
> Subject: RE: seemingly impossible query
>
> I see. Well, now you're asking Solr to ignore its prime directive of
> returning hits that match a query. Hehe.
>
> I'm not sure if Solr has a "unique" attribute.
>
> But this sounds, to me, like you will have to filter the results yourself.
> But at least you hit Solr only once before doing so.
>
> Good luck!
>
> > Thanks Darren,
> >
> > The problem with that is that it may not return one document per id,
> which
> > is what I need.  IE, I could give 100 ids in that OR query and retrieve
> > 100 documents, all containing just 1 of the IDs.
> >
> > -Kallin Nagelberg
> >
> > -----Original Message-----
> > From: darren@ontrenet.com [mailto:darren@ontrenet.com]
> > Sent: Thursday, May 20, 2010 12:21 PM
> > To: solr-user@lucene.apache.org
> > Subject: Re: seemingly impossible query
> >
> > Ok. I think I understand. What's impossible about this?
> >
> > If you have a single field name called <id> that is multivalued
> > then you can retrieved the documents with something like:
> >
> > id:1 OR id:2 OR id:56 ... id:100
> >
> > then add limit 100.
> >
> > There's probably a more succinct way to do this, but I'll leave that to
> > the experts.
> >
> > If you also only want the documents within a certain time, then you also
> > create a <time> field and use a conjunction (id:0 ...) AND time:NOW-1H
> > or something similar to this. Check the query syntax wiki for specifics.
> >
> > Darren
> >
> >
> >> Hey everyone,
> >>
> >> I've recently been given a requirement that is giving me some trouble. I
> >> need to retrieve up to 100 documents, but I can't see a way to do it
> >> without making 100 different queries.
> >>
> >> My schema has a multi-valued field like 'listOfIds'. Each document has
> >> between 0 and N of these ids associated to them.
> >>
> >> My input is up to 100 of these ids at random, and I need to retrieve the
> >> most recent document for each id (N Ids as input, N docs returned). I'm
> >> currently planning on doing a single query for each id, requesting 1
> >> row,
> >> and caching the result. This could work OK since some of these ids
> >> should
> >> repeat quite often. Of course I would prefer to find a way to do this in
> >> Solr, but I'm not sure it's capable.
> >>
> >> Any ideas?
> >>
> >> Thanks,
> >> -Kallin Nagelberg
> >>
> >
> >
>
>

Mime
View raw message