Return-Path: Delivered-To: apmail-lucene-java-user-archive@www.apache.org Received: (qmail 55503 invoked from network); 18 Dec 2007 16:49:54 -0000 Received: from hermes.apache.org (HELO mail.apache.org) (140.211.11.2) by minotaur.apache.org with SMTP; 18 Dec 2007 16:49:54 -0000 Received: (qmail 45878 invoked by uid 500); 18 Dec 2007 16:49:35 -0000 Delivered-To: apmail-lucene-java-user-archive@lucene.apache.org Received: (qmail 45844 invoked by uid 500); 18 Dec 2007 16:49:35 -0000 Mailing-List: contact java-user-help@lucene.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: java-user@lucene.apache.org Delivered-To: mailing list java-user@lucene.apache.org Received: (qmail 45833 invoked by uid 99); 18 Dec 2007 16:49:35 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 18 Dec 2007 08:49:35 -0800 X-ASF-Spam-Status: No, hits=-0.0 required=10.0 tests=SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (athena.apache.org: domain of vkblogger@gmail.com designates 72.14.202.177 as permitted sender) Received: from [72.14.202.177] (HELO ro-out-1112.google.com) (72.14.202.177) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 18 Dec 2007 16:49:10 +0000 Received: by ro-out-1112.google.com with SMTP id k5so676334rog.7 for ; Tue, 18 Dec 2007 08:49:13 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; h=domainkey-signature:received:received:message-id:date:from:to:subject:in-reply-to:mime-version:content-type:content-transfer-encoding:content-disposition:references; bh=oQoFG2wIoCr3KZXkNMqg9rWaXyfwU1rdo1D8XMn+eyg=; b=N3/REye3dURVNn+/brBZQrSyQNHBJOErBc7jLlYWI9z8oapo65IHG62yrplZfJl0bmlPXUFkJGJ8oN+SE6x2r/VuVUys7r0zYfcJbBA672k5W5jlzr3T3qRCTgnZdwgUgI30r8y1JHszVAaDk6clPStHlYMQTB3dka70g0qqNRY= DomainKey-Signature: a=rsa-sha1; c=nofws; d=gmail.com; s=gamma; h=message-id:date:from:to:subject:in-reply-to:mime-version:content-type:content-transfer-encoding:content-disposition:references; b=p87xGMsmPsyd0mgGYKhWwNE5GJIoYj7jbgIdhwfLXqYYaYclz3sHHm8NXBrJso9wgBlcxqKhPfs9Rjk7mgMD1HDWmiP5ZPQYO+zYpRwvSrcO1R1VVazAvfHO+xM53+ZVs6IHzOGHEsSm0TR0vf1JMOSPG+huKS0zBk7OVmEvqyA= Received: by 10.142.191.2 with SMTP id o2mr1037080wff.209.1197996552062; Tue, 18 Dec 2007 08:49:12 -0800 (PST) Received: by 10.142.180.6 with HTTP; Tue, 18 Dec 2007 08:49:12 -0800 (PST) Message-ID: Date: Tue, 18 Dec 2007 11:49:12 -0500 From: "v k" To: java-user@lucene.apache.org Subject: Re: Infrastructure Question In-Reply-To: <0342FFF3-0390-4822-9226-AC0AB377C124@apache.org> MIME-Version: 1.0 Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: 7bit Content-Disposition: inline References: <0342FFF3-0390-4822-9226-AC0AB377C124@apache.org> X-Virus-Checked: Checked by ClamAV on apache.org Sorry about that. For some reason, my post did not show up in the mailing list and I still cannot see it ( maybe a settings issue). I don't mean to barrage the mailing list with the same question. Thanks for the advise. On Dec 18, 2007 11:43 AM, Grant Ingersoll wrote: > Hi Venkat, > > There is no need to post your question multiple times or cross-post. > People are distributed all around the world on this list and aren't > always available or capable to answer your question. Having to wait > 11 hours for an answer on a free mailing list is not at all > unreasonable. > > If you are just looking to get your hands dirty with Lucene, why not > just start w/ a subset on a machine you already own and work to scale > up? This way, you could start with what you have available and get a > feel for your memory usage, etc. Then you will be in a better > position to decide what your needs are. > > If there is one thing that is true about search it is the fact that > everyone's situation is different. > > Cheers, > Grant > > > On Dec 18, 2007, at 11:21 AM, v k wrote: > > > Hello, > > > > I am using Lucene to build an index from roughly 10 million documents > > in number. The documents are about 4 TB in total. > > > > After some trial runs, indexing a subset of the documents I am trying > > to figure out a hosting service configuration to create a full index > > from the entire 10 TB of data. As I am still unsure how this project > > will turn out I am not purchasing hardware/ram but considering a web > > host. > > for the purpose of : > > 1) download the data and to start indexing it. > > 2) The web front end to access this index will be a python framework ( > > eg. Django etc) > > > > I am seriously contemplating signing up with Joyent for this plan: > > AMD Opteron x64 multi-core servers with 4GiB RAM per core > > 1/16 (Burstable up to 95%) > > 1 TB - Bandwidth/month, 1 GB RAM, + as such as NAS storage as I > > can > > afford to pay for. > > > > My QUESTION is - Will this RAM and CPU be sufficient during > > development of the search application and building the index, etc. or > > is it so abysmal and under-equipped in terms of hardware that the > > development version of my application will not work. > > I understand that having more RAM is always good, but is 1GB as good > > as nothing? > > > > This setup is NOT for production but for for development so I can get > > my hands dirty with lucene which will require plenty of tweaks as the > > project moves along. > > > > What initial configuration would you recommend for a development > > version given the corpus size. I am not even sure how large my index > > will look like at this point. > > > > I hope to build an my indexes this way and once the search > > infrastructure is working and the web-front end complete, I plan to > > worry about Redundancy, availability and scalability for the many > > users I hope to provide this free service for :-) > > > > Many of you in this forum have built successful products with Lucene. > > To name a few I am aware of - Ken Krugle, James Ryley, Dennis Kubes > > > > Some of you must have started with small machines,test set-ups etc > > where you built your initial search apps. I hope to receive some > > advise about my plan and approach to start building an infrastructure > > to support my Lucene app. > > > > Thank you. > > > > Venkat > > > > --------------------------------------------------------------------- > > To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org > > For additional commands, e-mail: java-user-help@lucene.apache.org > > > > -------------------------- > Grant Ingersoll > http://lucene.grantingersoll.com > > Lucene Helpful Hints: > http://wiki.apache.org/lucene-java/BasicsOfPerformance > http://wiki.apache.org/lucene-java/LuceneFAQ > > > > > --------------------------------------------------------------------- > To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org > For additional commands, e-mail: java-user-help@lucene.apache.org > > --------------------------------------------------------------------- To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org For additional commands, e-mail: java-user-help@lucene.apache.org