Return-Path: X-Original-To: apmail-lucene-solr-user-archive@minotaur.apache.org Delivered-To: apmail-lucene-solr-user-archive@minotaur.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id DD0D3D216 for ; Sun, 9 Sep 2012 19:10:49 +0000 (UTC) Received: (qmail 72016 invoked by uid 500); 9 Sep 2012 19:10:46 -0000 Delivered-To: apmail-lucene-solr-user-archive@lucene.apache.org Received: (qmail 71976 invoked by uid 500); 9 Sep 2012 19:10:46 -0000 Mailing-List: contact solr-user-help@lucene.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: solr-user@lucene.apache.org Delivered-To: mailing list solr-user@lucene.apache.org Received: (qmail 71967 invoked by uid 99); 9 Sep 2012 19:10:46 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Sun, 09 Sep 2012 19:10:46 +0000 X-ASF-Spam-Status: No, hits=0.6 required=5.0 tests=FSL_RCVD_USER,RCVD_IN_DNSWL_LOW,SPF_PASS,URI_HEX X-Spam-Check-By: apache.org Received-SPF: pass (nike.apache.org: domain of erickerickson@gmail.com designates 209.85.219.48 as permitted sender) Received: from [209.85.219.48] (HELO mail-oa0-f48.google.com) (209.85.219.48) by apache.org (qpsmtpd/0.29) with ESMTP; Sun, 09 Sep 2012 19:10:38 +0000 Received: by oagn16 with SMTP id n16so351995oag.35 for ; Sun, 09 Sep 2012 12:10:17 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :content-type; bh=fkfhkNpnuLoN7BMeZZNk7pFMrOSEb526LpWOEc3dgxY=; b=cvuToEiI5lDMZN1z6y3KlZ/6PZs5DeGKrYS07XQHuREqCLH3MXzQQyJxQBN5/AAMW9 KVd6XzK/ezbqptRJQLSwGY0f46FcD/b531d/ijLAiqpCkYN0445/PHrXQJWgeevGONNE lqYBinWZiea4XCbYzIsKiyt7ooFDMS+tjxDCm/K6ACCY8apNATaWD1QgeiSy9SCk09Aq BM+6/oC1KZ/w3Dz0akMXysfMMeaVeTGzg8Dc23DwGPX8ODesBk1V94YVaMp+hzjRf96R AJmOlUoMNtIur1QNhhDdYUvczUXS+0dj0lrypqIqxC7LwaPHY+Cw4JES6NTc49+pKuTE lK7g== MIME-Version: 1.0 Received: by 10.182.169.105 with SMTP id ad9mr11905458obc.90.1347217817817; Sun, 09 Sep 2012 12:10:17 -0700 (PDT) Received: by 10.60.84.39 with HTTP; Sun, 9 Sep 2012 12:10:17 -0700 (PDT) In-Reply-To: References: <1346093786899-4003559.post@n3.nabble.com> <1347160856135-4006399.post@n3.nabble.com> <1347204269308-4006450.post@n3.nabble.com> Date: Sun, 9 Sep 2012 12:10:17 -0700 Message-ID: Subject: Re: Fail to huge collection extraction From: Erick Erickson To: solr-user@lucene.apache.org Content-Type: text/plain; charset=ISO-8859-1 Alexandre: I'll buy you a beer sometime, it's just sooo pleasant when someone else has the same worldview I do.... http://searchhub.org/dev/2011/11/03/stop-being-so-agreeable/ neosky: Particularly look at the paragraph that has "the XY problem" in it. Best Erick On Sun, Sep 9, 2012 at 8:56 AM, Alexandre Rafalovitch wrote: > I am sorry, but your customer is extremely unlikely to want the whole > result in his browser. It is just a red flag that they are converting > their (business) requirements into your (IT) language and that's what > they end up with. > > Go the other way, ask them to pretend that you've done it already and > then explain what happens once all those records are on their screen > (and their operating system is no longer responsive :-) ). What is the > business process that request is for. And how often they want to do > this (and what is the significance of that frequency). > > Do they want a weekly audit copy to make sure nobody changed the > records? Then, maybe they want a batch report emailed to them instead > (or even just generated weekly on a shared drive). Do they want > something they can access on their laptop while they are not connected > to a network? Maybe they need a local replica of the (subset of the) > app working from local index? > > Perhaps you have already asked that and this is just what they want. > Then, I am afraid, you are just stuck fighting against the system > designed for other use cases. Good luck. > > But if you haven't asked yet, do try! Do it often enough and you may > get a payrise out of it because you will be meeting your clients on > their territory instead of them having to come to yours. > > Regards, > Alex. > Personal blog: http://blog.outerthoughts.com/ > LinkedIn: http://www.linkedin.com/in/alexandrerafalovitch > - Time is the quality of nature that keeps events from happening all > at once. Lately, it doesn't seem to be working. (Anonymous - via GTD > book) > > > On Sun, Sep 9, 2012 at 11:24 AM, neosky wrote: >> Thanks Alex! >> Yes, you hit my key points. >> Actually I have to implement both of the requirements. >> The first one works very well as the reason you state. Now I have a website >> client which is 20 records per page. It is fast. >> However, my customer also wants to use Servlet to download the whole query >> set.(1 millions records maximum possible) >> So at this time, I tried to use Solr pull out 10000 or 5000 records for each >> page(Divided to 100 times or 200 times queries) . Then just print out these >> records to the client browser. >> I am not sure how the exception was generated? >> Is my client program(the Servlet program) out of memory?or Connect timeout >> for some reason? >> This exception doesn't always happen. Sometimes it works well even I query >> 10000 records each time and works for many times , but sometimes it crashes >> only 5000 records without an explicit reason. >> You suggestion is great! But the implementation is a little complicated for >> us. >> Is Lucene better than Solr for this requirement? But the paging in Lucene >> seems not very intuitively. >> >> >> >> -- >> View this message in context: http://lucene.472066.n3.nabble.com/Fail-to-huge-collection-extraction-tp4003559p4006450.html >> Sent from the Solr - User mailing list archive at Nabble.com.