lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Erick Erickson <erickerick...@gmail.com>
Subject Re: Fail to huge collection extraction
Date Sun, 09 Sep 2012 19:10:17 GMT
Alexandre:

I'll buy you a beer sometime, it's just sooo pleasant when someone
else has the same worldview I do....

http://searchhub.org/dev/2011/11/03/stop-being-so-agreeable/

neosky:
Particularly look at the paragraph that has "the XY problem" in it.

Best
Erick

On Sun, Sep 9, 2012 at 8:56 AM, Alexandre Rafalovitch
<arafalov@gmail.com> wrote:
> I am sorry, but your customer is extremely unlikely to want the whole
> result in his browser. It is just a red flag that they are converting
> their (business) requirements into your (IT) language and that's what
> they end up with.
>
> Go the other way, ask them to pretend that you've done it already and
> then explain what happens once all those records are on their screen
> (and their operating system is no longer responsive :-) ). What is the
> business process that request is for. And how often they want to do
> this (and what is the significance of that frequency).
>
> Do they want a weekly audit copy to make sure nobody changed the
> records? Then, maybe they want a batch report emailed to them instead
> (or even just generated weekly on a shared drive). Do they want
> something they can access on their laptop while they are not connected
> to a network? Maybe they need a local replica of the (subset of the)
> app working from local index?
>
> Perhaps you have already asked that and this is just what they want.
> Then, I am afraid, you are just stuck fighting against the system
> designed for other use cases. Good luck.
>
> But if you haven't asked yet, do try! Do it often enough and you may
> get a payrise out of it because you will be meeting your clients on
> their territory instead of them having to come to yours.
>
> Regards,
>    Alex.
> Personal blog: http://blog.outerthoughts.com/
> LinkedIn: http://www.linkedin.com/in/alexandrerafalovitch
> - Time is the quality of nature that keeps events from happening all
> at once. Lately, it doesn't seem to be working.  (Anonymous  - via GTD
> book)
>
>
> On Sun, Sep 9, 2012 at 11:24 AM, neosky <neosky11@yahoo.com> wrote:
>> Thanks Alex!
>> Yes, you hit my key points.
>> Actually I have to implement both of the requirements.
>> The first one works very well as the reason you state. Now I have a website
>> client which is 20 records per page. It is fast.
>> However, my customer also wants to use Servlet to  download the whole query
>> set.(1 millions records maximum possible)
>> So at this time, I tried to use Solr pull out 10000 or 5000 records for each
>> page(Divided to 100 times or 200 times queries) . Then just print out these
>> records to the client browser.
>> I am not sure how the exception was generated?
>> Is my client program(the Servlet program)  out of memory?or Connect timeout
>> for some reason?
>> This exception doesn't always happen. Sometimes it works well even I query
>> 10000 records each time and works for many times , but sometimes it crashes
>> only 5000 records without an explicit reason.
>> You suggestion is great! But the implementation is a little complicated for
>> us.
>> Is Lucene better than Solr for this requirement? But the paging in Lucene
>> seems not very intuitively.
>>
>>
>>
>> --
>> View this message in context: http://lucene.472066.n3.nabble.com/Fail-to-huge-collection-extraction-tp4003559p4006450.html
>> Sent from the Solr - User mailing list archive at Nabble.com.

Mime
View raw message