lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Joel Bernstein <joels...@gmail.com>
Subject Re: solr as nosql - pulling all docs vs deep paging limitations
Date Wed, 18 Dec 2013 01:41:00 GMT
They are for different use cases. Hoss's approach, I believe, focuses on
deep paging of ranked search results. SOLR-5244 focuses on the batch export
of an entire unranked search result in binary format. It's basically a very
efficient bulk extract for Solr.


On Tue, Dec 17, 2013 at 6:51 PM, Otis Gospodnetic <
otis.gospodnetic@gmail.com> wrote:

> Joel - can you please elaborate a bit on how this compares with Hoss'
> approach?  Complementary?
>
> Thanks,
> Otis
> --
> Performance Monitoring * Log Analytics * Search Analytics
> Solr & Elasticsearch Support * http://sematext.com/
>
>
> On Tue, Dec 17, 2013 at 6:45 PM, Joel Bernstein <joelsolr@gmail.com>
> wrote:
>
> > SOLR-5244 is also working in this direction. This focuses on efficient
> > binary extract of entire search results.
> >
> >
> > On Tue, Dec 17, 2013 at 2:33 PM, Otis Gospodnetic <
> > otis.gospodnetic@gmail.com> wrote:
> >
> > > Hoss is working on it. Search for deep paging or cursor in JIRA.
> > >
> > > Otis
> > > Solr & ElasticSearch Support
> > > http://sematext.com/
> > > On Dec 17, 2013 12:30 PM, "Petersen, Robert" <
> > > robert.petersen@mail.rakuten.com> wrote:
> > >
> > > > Hi solr users,
> > > >
> > > > We have a new use case where need to make a pile of data available as
> > XML
> > > > to a client and I was thinking we could easily put all this data
> into a
> > > > solr collection and the client could just do a star search and page
> > > through
> > > > all the results to obtain the data we need to give them.  Then I
> > > remembered
> > > > we currently don't allow deep paging in our current search indexes as
> > > > performance declines the deeper you go.  Is this still the case?
> > > >
> > > > If so, is there another approach to make all the data in a collection
> > > > easily available for retrieval?  The only thing I can think of is to
> > > query
> > > > our DB for all the unique IDs of all the documents in the collection
> > and
> > > > then pull out the documents out in small groups with successive
> queries
> > > > like 'UniqueIdField:(id1 OR id2 OR ... OR idn)' 'UniqueIdField:(idn+1
> > OR
> > > > idn+2 OR ... etc)' which doesn't seem like a very good approach
> because
> > > the
> > > > DB might have been updated with new data which hasn't been indexed
> yet
> > > and
> > > > so all the ids might not be in there (which may or may not matter I
> > > > suppose).
> > > >
> > > > Then I was thinking we could have a field with an incrementing
> numeric
> > > > value which could be used to perform range queries as a substitute
> for
> > > > paging through everything.  Ie queries like 'IncrementalField:[1 TO
> > 100]'
> > > > 'IncrementalField:[101 TO 200]' but this would be difficult to
> maintain
> > > as
> > > > we update the index unless we reindex the entire collection every
> time
> > we
> > > > update any docs at all.
> > > >
> > > > Is this perhaps not a good use case for solr?  Should I use something
> > > else
> > > > or is there another approach that would work here to allow a client
> to
> > > pull
> > > > groups of docs in a collection through the rest api until the client
> > has
> > > > gotten them all?
> > > >
> > > > Thanks
> > > > Robi
> > > >
> > > >
> > >
> >
> >
> >
> > --
> > Joel Bernstein
> > Search Engineer at Heliosearch
> >
>



-- 
Joel Bernstein
Search Engineer at Heliosearch

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message