lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Mikhail Khludnev <mkhlud...@griddynamics.com>
Subject Re: multiple dateranges/timeslots per doc: modeling openinghours.
Date Mon, 03 Oct 2011 12:52:29 GMT
On Mon, Oct 3, 2011 at 3:09 PM, Geert-Jan Brits <gbrits@gmail.com> wrote:

> Interesting! Reading your previous blogposts, I gather that the to be
> posted
> 'implementation approaches' includes a way of making the SpanQueries
> available within SOLR?
>

It's going to be posted in two days. But please don't expect much from them,
it's just a proof of concept. It's not a code for production nor for
contribution. e.g. we've chosen 'quick hack' way of boolean query converting
instead of XmlQuery, SurroundParser or contrib's query parser, etc. i.e. we
can share only core ideas, some of these are possibly wrong.


> Also, would with your approach would (numeric) RangeQueries be possible as
> Hoss suggests?
>

Basically range queries are just conjunctions (sometimes it's not great at
all) for numbers. If you encode your terms in sortable manner eg A0715 for
Monday 7-15 am, you'll be able to build the span merging 'conjunction' - new
SpanOrQuery(new SpanTermQuery(..),.... ).

Regards

Mikhail


> Looking forward to that 'implementation post'
> Cheers,
> Geert-Jan
>
> Op 1 oktober 2011 19:57 schreef Mikhail Khludnev <
> mkhludnev@griddynamics.com
> > het volgende:
>
> > I agree about SpanQueries. It's a viable measure against "false-positive
> > matches on multivalue fields".
> >  we've implemented this approach some time ago. Pls find details at
> >
> >
> http://blog.griddynamics.com/2011/06/solr-experience-search-parent-child.html
> >
> > and
> >
> >
> http://blog.griddynamics.com/2011/07/solr-experience-search-parent-child.html
> > we are going to publish the third post about an implementation
> approaches.
> >
> > --
> > Mikhail Khludnev
> >
> >
> > On Sat, Oct 1, 2011 at 6:25 AM, Chris Hostetter <
> hossman_lucene@fucit.org
> > >wrote:
> >
> > >
> > > : Another, faulty, option would be to model opening/closing hours in 2
> > > : multivalued date-fields, i.e: open, close. and insert open/close for
> > each
> > > : day, e.g:
> > > :
> > > : open: 2011-11-08:1800 - close: 2011-11-09:0300
> > > : open: 2011-11-09:1700 - close: 2011-11-10:0500
> > > : open: 2011-11-10:1700 - close: 2011-11-11:0300
> > > :
> > > : And queries would be of the form:
> > > :
> > > : 'open < now && close > now+3h'
> > > :
> > > : But since there is no way to indicate that 'open' and 'close' are
> > > pairwise
> > > : related I will get a lot of false positives, e.g the above document
> > would
> > > be
> > > : returned for:
> > >
> > > This isn't possible out of the box, but the general idea of "position
> > > linked" queries is possible using the same approach as the
> > > FieldMaskingSpanQuery...
> > >
> > >
> > >
> >
> https://lucene.apache.org/java/3_4_0/api/core/org/apache/lucene/search/spans/FieldMaskingSpanQuery.html
> > > https://issues.apache.org/jira/browse/LUCENE-1494
> > >
> > > ..implementing something like this that would work with
> > > (Numeric)RangeQueries however would require some additional work, but
> it
> > > should certianly be doable -- i've suggested this before but no one has
> > > taken me up on it...
> > > http://markmail.org/search/?q=hoss+FieldMaskingSpanQuery
> > >
> > > If we take it as a given that you can do multiple ranges "at the same
> > > position", then you can imagine supporting all of your "regular" hours
> > > using just two fields ("open" and "close") by encoding the day+time of
> > > each range of open hours into them -- even if a store is open for
> > multiple
> > > sets of ranges per day (ie: closed for siesta)...
> > >
> > >  open: mon_12_30, tue_12_30, wed_07_30, wed_3_30, ...
> > >  close: mon_20_00, tue_20_30, wed_12_30, wed_22_30, ...
> > >
> > > then asking for "stores open now and for the next 3 hours" on "wed" at
> > > "2:13PM" becomes a query for...
> > >
> > > sameposition(open:[* TO wed_14_13], close:[wed_17_13 TO *])
> > >
> > > For the special case part of your problem when there are certain dates
> > > that a store will be open atypical hours, i *think* that could be
> solved
> > > using some special docs and the new "join" QParser in a filter query...
> > >
> > >        https://wiki.apache.org/solr/Join
> > >
> > > imagine you have your "regular" docs with all the normal data about a
> > > store, and the open/close fields i describe above.  but in addition to
> > > those, for any store that you know is "closed on dec 25" or "only open
> > > 12:00-15:00 on Jan 01" you add an additional small doc encapsulating
> > > the information about the stores closures on that special date - so
> that
> > > each special case would be it's own doc, even if one store had 5 days
> > > where there was a special case...
> > >
> > >  specialdoc1:
> > >    store_id: 42
> > >    special_date: Dec-25
> > >    status: closed
> > >  specialdoc2:
> > >    store_id: 42
> > >    special_date: Jan-01
> > >    status: irregular
> > >    open: 09_30
> > >    close: 13_00
> > >
> > > then when you are executing your query, you use an "fq" to constrain to
> > > stores that are (normally) open right now (like i mentioned above) and
> > you
> > > use another fq to find all docs *except* those resulting from a join
> > > against these special case docs based on the current date.
> > >
> > > so if you r query is "open now and for the next 3 hours" and "now" ==
> > > "sunday, 2011-12-25 @ 10:17AM your query would be something like...
> > >
> > > q=...user input...
> > > time=sameposition(open:[* TO sun_10_17], close:[sun_13_17 TO *])
> > > fq={!v=time}
> > > fq={!join from=store_id to=unique_key v=$vv}
> > > vv=-(+special_date:Dec-25 +(status:closed OR _query_:"{v=$time}"))
> > >
> > > That join based approach for dealing with the special dates should work
> > > regardless of wether someone implements a way to do pair wise
> > > "sameposition()" rangequeries ... so if you can live w/o the multiple
> > > open/close pairs per day, you can just use the "one field per day of
> hte
> > > week" type approach you mentioned combined with the "join" for special
> > > case days of hte year and everything you need should already work w/o
> any
> > > code (on trunk).
> > >
> > > (disclaimer: obviously i haven't tested that query, the exact syntax
> may
> > > be off but the princible for modeling the "special docs" and using
> > > them in a join should work)
> > >
> > >
> > > -Hoss
> > >
> >
> >
> >
> > --
> > Sincerely yours
> > Mikhail (Mike) Khludnev
> > Developer
> > Grid Dynamics
> > tel. 1-415-738-8644
> > Skype: mkhludnev
> > <http://www.griddynamics.com>
> >  <mkhludnev@griddynamics.com>
> >
>



-- 
Sincerely yours
Mikhail (Mike) Khludnev
Developer
Grid Dynamics
tel. 1-415-738-8644
Skype: mkhludnev
<http://www.griddynamics.com>
 <mkhludnev@griddynamics.com>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message