lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Erick Erickson <>
Subject Re: Fail to huge collection extraction
Date Sun, 09 Sep 2012 19:10:17 GMT

I'll buy you a beer sometime, it's just sooo pleasant when someone
else has the same worldview I do....

Particularly look at the paragraph that has "the XY problem" in it.


On Sun, Sep 9, 2012 at 8:56 AM, Alexandre Rafalovitch
<> wrote:
> I am sorry, but your customer is extremely unlikely to want the whole
> result in his browser. It is just a red flag that they are converting
> their (business) requirements into your (IT) language and that's what
> they end up with.
> Go the other way, ask them to pretend that you've done it already and
> then explain what happens once all those records are on their screen
> (and their operating system is no longer responsive :-) ). What is the
> business process that request is for. And how often they want to do
> this (and what is the significance of that frequency).
> Do they want a weekly audit copy to make sure nobody changed the
> records? Then, maybe they want a batch report emailed to them instead
> (or even just generated weekly on a shared drive). Do they want
> something they can access on their laptop while they are not connected
> to a network? Maybe they need a local replica of the (subset of the)
> app working from local index?
> Perhaps you have already asked that and this is just what they want.
> Then, I am afraid, you are just stuck fighting against the system
> designed for other use cases. Good luck.
> But if you haven't asked yet, do try! Do it often enough and you may
> get a payrise out of it because you will be meeting your clients on
> their territory instead of them having to come to yours.
> Regards,
>    Alex.
> Personal blog:
> LinkedIn:
> - Time is the quality of nature that keeps events from happening all
> at once. Lately, it doesn't seem to be working.  (Anonymous  - via GTD
> book)
> On Sun, Sep 9, 2012 at 11:24 AM, neosky <> wrote:
>> Thanks Alex!
>> Yes, you hit my key points.
>> Actually I have to implement both of the requirements.
>> The first one works very well as the reason you state. Now I have a website
>> client which is 20 records per page. It is fast.
>> However, my customer also wants to use Servlet to  download the whole query
>> set.(1 millions records maximum possible)
>> So at this time, I tried to use Solr pull out 10000 or 5000 records for each
>> page(Divided to 100 times or 200 times queries) . Then just print out these
>> records to the client browser.
>> I am not sure how the exception was generated?
>> Is my client program(the Servlet program)  out of memory?or Connect timeout
>> for some reason?
>> This exception doesn't always happen. Sometimes it works well even I query
>> 10000 records each time and works for many times , but sometimes it crashes
>> only 5000 records without an explicit reason.
>> You suggestion is great! But the implementation is a little complicated for
>> us.
>> Is Lucene better than Solr for this requirement? But the paging in Lucene
>> seems not very intuitively.
>> --
>> View this message in context:
>> Sent from the Solr - User mailing list archive at

View raw message