uima-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jens Grivolla <j+...@grivolla.net>
Subject Re: CPE memory usage
Date Mon, 29 Aug 2016 16:31:44 GMT
Hi Armin, glad I could help. Getting all IDs first also avoids problems
with changing data which could mess with the offsets. This way you have a
fixed snapshot of all existing documents (at the beginning).

Best,
Jens

On Mon, Aug 29, 2016 at 8:12 AM, <Armin.Wegner@bka.bund.de> wrote:

> Hi Jens,
>
> I just want to confirm your information. As you said, the query gets
> slower the larger start is, even using filters. The best solution is to get
> all ids first (may take some time), and then to get each documents by id
> successively. There is a request handler (get) and a Java API method
> (HttpSolrClient.getById()) to do so.
>
> Thanks to your help, I have a constantly fast queries, now.
>
> Cheers,
> Armin
>
> -----Urspr√ľngliche Nachricht-----
> Von: jens@grivolla.net [mailto:jens@grivolla.net] Im Auftrag von Jens
> Grivolla
> Gesendet: Dienstag, 16. August 2016 13:34
> An: user@uima.apache.org
> Betreff: Re: CPE memory usage
>
> Solr is known not to be very good at deep paging, but rather getting the
> top relevant results. Running a query asking for the millionth document is
> pretty much the worst you can do as it will have to rank all documents
> again, up to the millionth, and return that one. It can also be unreliable
> if your document collection changes.
>
> We did get it to work quite well, though. I believe we used only filters
> and retrieved the results in natural order, so that Solr wouldn't have to
> rank the documents. We also had a version where we first retrieved all
> matching document ids in one go, and then queried for the documents by id,
> one by one, in getNext().
>
> Deep paging has also seen some major improvements over time IIRC, so newer
> Solr versions should perform much better than the ones from a few years
> ago.
>
> Best,
> Jens
>
> On Tue, Aug 9, 2016 at 12:20 PM, <Armin.Wegner@bka.bund.de> wrote:
>
> > Hi!
> >
> > Finally, it looks like that Solr causes the high memory consumption. The
> > SolrClient isn't expected to be used like I did it. But it isn't
> documented
> > either. The Solr documentation is very bad. I just happened to find a
> > solution on the web by accident.
> >
> > Thanks,
> > Armin
> >
> > -----Urspr√ľngliche Nachricht-----
> > Von: Richard Eckart de Castilho [mailto:rec@apache.org]
> > Gesendet: Montag, 8. August 2016 15:33
> > An: user@uima.apache.org
> > Betreff: Re: CPE memory usage
> >
> > Do you have code for a minimal test case?
> >
> > Cheers,
> >
> > -- Richard
> >
> > > On 08.08.2016, at 15:31, <Armin.Wegner@bka.bund.de> <
> > Armin.Wegner@bka.bund.de> wrote:
> > >
> > > Hi Richard!
> > >
> > > I've changed the document reader to a kind of no-op-reader, that always
> > sets the document text to an empty string: same behavior, but much slower
> > increase in memory usage.
> > >
> > > Cheers,
> > > Armin
> >
> >
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message