lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Otis Gospodnetic <otis.gospodne...@gmail.com>
Subject Re: solr as nosql - pulling all docs vs deep paging limitations
Date Tue, 17 Dec 2013 23:51:22 GMT
Joel - can you please elaborate a bit on how this compares with Hoss'
approach?  Complementary?

Thanks,
Otis
--
Performance Monitoring * Log Analytics * Search Analytics
Solr & Elasticsearch Support * http://sematext.com/


On Tue, Dec 17, 2013 at 6:45 PM, Joel Bernstein <joelsolr@gmail.com> wrote:

> SOLR-5244 is also working in this direction. This focuses on efficient
> binary extract of entire search results.
>
>
> On Tue, Dec 17, 2013 at 2:33 PM, Otis Gospodnetic <
> otis.gospodnetic@gmail.com> wrote:
>
> > Hoss is working on it. Search for deep paging or cursor in JIRA.
> >
> > Otis
> > Solr & ElasticSearch Support
> > http://sematext.com/
> > On Dec 17, 2013 12:30 PM, "Petersen, Robert" <
> > robert.petersen@mail.rakuten.com> wrote:
> >
> > > Hi solr users,
> > >
> > > We have a new use case where need to make a pile of data available as
> XML
> > > to a client and I was thinking we could easily put all this data into a
> > > solr collection and the client could just do a star search and page
> > through
> > > all the results to obtain the data we need to give them.  Then I
> > remembered
> > > we currently don't allow deep paging in our current search indexes as
> > > performance declines the deeper you go.  Is this still the case?
> > >
> > > If so, is there another approach to make all the data in a collection
> > > easily available for retrieval?  The only thing I can think of is to
> > query
> > > our DB for all the unique IDs of all the documents in the collection
> and
> > > then pull out the documents out in small groups with successive queries
> > > like 'UniqueIdField:(id1 OR id2 OR ... OR idn)' 'UniqueIdField:(idn+1
> OR
> > > idn+2 OR ... etc)' which doesn't seem like a very good approach because
> > the
> > > DB might have been updated with new data which hasn't been indexed yet
> > and
> > > so all the ids might not be in there (which may or may not matter I
> > > suppose).
> > >
> > > Then I was thinking we could have a field with an incrementing numeric
> > > value which could be used to perform range queries as a substitute for
> > > paging through everything.  Ie queries like 'IncrementalField:[1 TO
> 100]'
> > > 'IncrementalField:[101 TO 200]' but this would be difficult to maintain
> > as
> > > we update the index unless we reindex the entire collection every time
> we
> > > update any docs at all.
> > >
> > > Is this perhaps not a good use case for solr?  Should I use something
> > else
> > > or is there another approach that would work here to allow a client to
> > pull
> > > groups of docs in a collection through the rest api until the client
> has
> > > gotten them all?
> > >
> > > Thanks
> > > Robi
> > >
> > >
> >
>
>
>
> --
> Joel Bernstein
> Search Engineer at Heliosearch
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message