From lucene-user-return-8302-apmail-jakarta-lucene-user-archive=jakarta.apache.org@jakarta.apache.org Fri May 14 14:20:14 2004 Return-Path: Delivered-To: apmail-jakarta-lucene-user-archive@www.apache.org Received: (qmail 32303 invoked from network); 14 May 2004 14:20:14 -0000 Received: from hermes.apache.org (HELO mail.apache.org) (209.237.227.199) by minotaur-2.apache.org with SMTP; 14 May 2004 14:20:14 -0000 Received: (qmail 93557 invoked by uid 500); 14 May 2004 14:20:07 -0000 Delivered-To: apmail-jakarta-lucene-user-archive@jakarta.apache.org Received: (qmail 93384 invoked by uid 500); 14 May 2004 14:20:05 -0000 Mailing-List: contact lucene-user-help@jakarta.apache.org; run by ezmlm Precedence: bulk List-Unsubscribe: List-Subscribe: List-Help: List-Post: List-Id: "Lucene Users List" Reply-To: "Lucene Users List" Delivered-To: mailing list lucene-user@jakarta.apache.org Received: (qmail 93167 invoked by uid 98); 14 May 2004 14:20:02 -0000 Received: from jt2oob@yahoo.co.uk by hermes.apache.org by uid 82 with qmail-scanner-1.20 (clamuko: 0.70. Clear:RC:0(217.12.10.63):. Processed in 0.309519 secs); 14 May 2004 14:20:02 -0000 X-Qmail-Scanner-Mail-From: jt2oob@yahoo.co.uk via hermes.apache.org X-Qmail-Scanner: 1.20 (Clear:RC:0(217.12.10.63):. Processed in 0.309519 secs) Received: from unknown (HELO web25203.mail.ukl.yahoo.com) (217.12.10.63) by hermes.apache.org with SMTP; 14 May 2004 14:20:01 -0000 Message-ID: <20040514141945.95064.qmail@web25203.mail.ukl.yahoo.com> Received: from [80.47.200.220] by web25203.mail.ukl.yahoo.com via HTTP; Fri, 14 May 2004 15:19:45 BST Date: Fri, 14 May 2004 15:19:45 +0100 (BST) From: =?iso-8859-1?q?jt=20oob?= Subject: (Distributed) Search system designs To: Lucene-Users-List MIME-Version: 1.0 Content-Type: text/plain; charset=iso-8859-1 Content-Transfer-Encoding: 8bit X-Spam-Rating: hermes.apache.org 1.6.2 0/1000/N X-Spam-Rating: minotaur-2.apache.org 1.6.2 0/1000/N Hi, I currently have a working search system based on lucene 1.2 as follows 14 indexes, average size just over 1G, min size 36M, max size 3.3G, total size 15G. Search times are currently between 20s and 4 minutes depending on the query, the system uses a multisearcher to search all indexes. The indexes are currently all stored on an internal raid. There are lots of things wrong with the index, including many words which should be in stop lists which aren't etc. The search is run on a linux system with 8G of RAM and 2G of swap. - - - - I am looking at writing a replacement system, and this time trying to everything properly, writing document parsers etc. Any pointers would be well recieved! The questions: 1) The documentation about how to get a basic lucene search going is great, is there any similar documentation or a HOWTO on how to design and implement distributed searches? 2) For distributed searches what are the best options for building in redundancy? Is a large shared storage solution such a SAN required, or will duplicating indexes on several machines suffice? 3) I had been told that using RAMDirectory on a linux system was pointless because the kernel cached files in spare RAM anyway. Is this true? Thanks! jt ____________________________________________________________ Yahoo! Messenger - Communicate instantly..."Ping" your friends today! Download Messenger Now http://uk.messenger.yahoo.com/download/index.html --------------------------------------------------------------------- To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org For additional commands, e-mail: lucene-user-help@jakarta.apache.org