uima-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jens Grivolla <j+...@grivolla.net>
Subject Re: CPE memory usage
Date Tue, 16 Aug 2016 11:33:58 GMT
Solr is known not to be very good at deep paging, but rather getting the
top relevant results. Running a query asking for the millionth document is
pretty much the worst you can do as it will have to rank all documents
again, up to the millionth, and return that one. It can also be unreliable
if your document collection changes.

We did get it to work quite well, though. I believe we used only filters
and retrieved the results in natural order, so that Solr wouldn't have to
rank the documents. We also had a version where we first retrieved all
matching document ids in one go, and then queried for the documents by id,
one by one, in getNext().

Deep paging has also seen some major improvements over time IIRC, so newer
Solr versions should perform much better than the ones from a few years ago.

Best,
Jens

On Tue, Aug 9, 2016 at 12:20 PM, <Armin.Wegner@bka.bund.de> wrote:

> Hi!
>
> Finally, it looks like that Solr causes the high memory consumption. The
> SolrClient isn't expected to be used like I did it. But it isn't documented
> either. The Solr documentation is very bad. I just happened to find a
> solution on the web by accident.
>
> Thanks,
> Armin
>
> -----Urspr√ľngliche Nachricht-----
> Von: Richard Eckart de Castilho [mailto:rec@apache.org]
> Gesendet: Montag, 8. August 2016 15:33
> An: user@uima.apache.org
> Betreff: Re: CPE memory usage
>
> Do you have code for a minimal test case?
>
> Cheers,
>
> -- Richard
>
> > On 08.08.2016, at 15:31, <Armin.Wegner@bka.bund.de> <
> Armin.Wegner@bka.bund.de> wrote:
> >
> > Hi Richard!
> >
> > I've changed the document reader to a kind of no-op-reader, that always
> sets the document text to an empty string: same behavior, but much slower
> increase in memory usage.
> >
> > Cheers,
> > Armin
>
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message