lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Slomin, David" <david.slo...@here.com>
Subject retrieve ids of all indexed docs efficiently
Date Wed, 18 Jan 2017 20:44:22 GMT
Hi --

I'd like to retrieve the ids of all the docs in my Solr 5.3.1 index.  In my query, I've set
rows=1000, fl=id, and am using the cursorMark mechanism to split the overall traversal into
multiple requests.  Not because I care about the order, but because the documentation implies
that it's necessary to make cursorMark work reliably, I've also set sort=id asc.  While this
does give me the data I need on a smaller index, it causes the heap memory utilization to
go through the roof; for our large indices, the Solr JVM throws an out of memory exception,
and we've already configured it as large as is practical given the physical memory of the
machine.

For what it's worth, we do use Solr cloud to split each of our indices into multiple shards.
 However for this query, I'm addressing a single shard directly (connecting to the correct
Solr server instance for one replica of that shard and setting distrib=false in my query)
rather than relying on Solr to route and assemble the results.
Thanks in advance,
Div Slomin.

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message