Return-Path: X-Original-To: apmail-lucene-java-user-archive@www.apache.org Delivered-To: apmail-lucene-java-user-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 2AF756740 for ; Wed, 27 Jul 2011 10:43:14 +0000 (UTC) Received: (qmail 45539 invoked by uid 500); 27 Jul 2011 10:36:35 -0000 Delivered-To: apmail-lucene-java-user-archive@lucene.apache.org Received: (qmail 45384 invoked by uid 500); 27 Jul 2011 10:36:18 -0000 Mailing-List: contact java-user-help@lucene.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: java-user@lucene.apache.org Delivered-To: mailing list java-user@lucene.apache.org Received: (qmail 45339 invoked by uid 99); 27 Jul 2011 10:36:13 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 27 Jul 2011 10:36:13 +0000 X-ASF-Spam-Status: No, hits=1.5 required=5.0 tests=FREEMAIL_FROM,HTML_MESSAGE,RCVD_IN_DNSWL_LOW,SPF_PASS,T_TO_NO_BRKTS_FREEMAIL X-Spam-Check-By: apache.org Received-SPF: pass (nike.apache.org: domain of anshumg@gmail.com designates 209.85.216.48 as permitted sender) Received: from [209.85.216.48] (HELO mail-qw0-f48.google.com) (209.85.216.48) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 27 Jul 2011 10:36:06 +0000 Received: by qwj9 with SMTP id 9so1038490qwj.35 for ; Wed, 27 Jul 2011 03:35:45 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; h=mime-version:in-reply-to:references:from:date:message-id:subject:to :content-type; bh=nGOudMPTrF4Wgc0fXGHj3Mbnsa5mke6hSAHfgm0lmRc=; b=iJ0AQwBfAQs/Jee3oKz5AalJNmp+DbGVwaTNs0VkMRyg8tg1uWQnaF06BGVS7zMadp sFJfCbF8iwdpZJF6sRaXujOKWxJv1HyctBF53d/ACnLMPGN+Y4R/4kph9kxAFve9HSfZ lXtN/KdujSzYDo3D4sSuK3x5rY3TTUqFa1H/s= Received: by 10.229.182.84 with SMTP id cb20mr1458283qcb.6.1311762945168; Wed, 27 Jul 2011 03:35:45 -0700 (PDT) MIME-Version: 1.0 Received: by 10.229.79.145 with HTTP; Wed, 27 Jul 2011 03:35:25 -0700 (PDT) In-Reply-To: References: From: Anshum Date: Wed, 27 Jul 2011 16:05:25 +0530 Message-ID: Subject: Re: Is There a Way To Split The Lucene Index Segments To Samller Size Less Than 1 GB To: java-user@lucene.apache.org Content-Type: multipart/alternative; boundary=0016e65096e660545a04a90a9c19 X-Virus-Checked: Checked by ClamAV on apache.org --0016e65096e660545a04a90a9c19 Content-Type: text/plain; charset=ISO-8859-1 Hi Ravi, You could reindex the data and have a sharding mechanism i.e. Open 3 index writers, read documents from source and add to the appropriate indexwriter to create a sharded index. Incase you'd want to avoid the re-indexing due to whatever reasons, you may create 3 copies of the index and delete documents basis some criteria leaving data/documents for only that shard e.g. Index contains docs with ids 1,2,3,4,5,6,7,8,9 You could create 3 copies, fire a delete on the first one to retain only Docs 1,2 and 3. Similarly do it for the other 2 shards. Leaving you with 3 indexes. On the other hand, why do you want to split a 9G index? Is there a reason? performance issue? It'd be good if you could share the reason as the problem could be completely different. -- Anshum Gupta http://ai-cafe.blogspot.com 2011/7/27 Gudi, Ravi Sankar > Hi Lucene Team, > > If you know or if there is any way of splitting Lucene indexing segments to > smaller segments of size less than 1 GB, can you please know me? > Here I am giving one index segments sizes, total size of index is 9.7 GB, > here there are three Lucene files a) _12r7.prx b) _kft.prx c) _ls6.prx of > size greater than 1 GB. > I want to split them to different pieces and want to reduce their size. > > [root@sc-s1-172-1.oxford.com ~]# ls -lh /index/TP_0000000000000000499/ > total 9.7G > -rw-r--r-- 1 appuser appuser 80M Jul 27 13:53 _12r7.fdt > -rw-r--r-- 1 appuser appuser 1.4M Jul 27 13:53 _12r7.fdx > -rw-r--r-- 1 appuser appuser 397 Jul 27 13:53 _12r7.fnm > -rw-r--r-- 1 appuser appuser 649M Jul 27 13:58 _12r7.frq > -rw-r--r-- 1 appuser appuser 3.9M Jul 27 13:58 _12r7.nrm > -rw-r--r-- 1 appuser appuser 2.2G Jul 27 13:58 _12r7.prx > -rw-r--r-- 1 appuser appuser 33 Jul 27 13:58 _12r7.stats > -rw-r--r-- 1 appuser appuser 334K Jul 27 13:58 _12r7.tii > -rw-r--r-- 1 appuser appuser 28M Jul 27 13:58 _12r7.tis > -rw-r--r-- 1 appuser appuser 24K Jul 27 14:44 _12ts.fdt > -rw-r--r-- 1 appuser appuser 400 Jul 27 14:44 _12ts.fdx > -rw-r--r-- 1 appuser appuser 361 Jul 27 14:44 _12ts.fnm > -rw-r--r-- 1 appuser appuser 90K Jul 27 14:44 _12ts.frq > -rw-r--r-- 1 appuser appuser 1.1K Jul 27 14:44 _12ts.nrm > -rw-r--r-- 1 appuser appuser 218K Jul 27 14:44 _12ts.prx > -rw-r--r-- 1 appuser appuser 25 Jul 27 14:44 _12ts.stats > -rw-r--r-- 1 appuser appuser 8.7K Jul 27 14:44 _12ts.tii > -rw-r--r-- 1 appuser appuser 656K Jul 27 14:44 _12ts.tis > -rw-r--r-- 1 appuser appuser 309K Jul 27 14:44 _12tt.fdt > -rw-r--r-- 1 appuser appuser 5.1K Jul 27 14:44 _12tt.fdx > -rw-r--r-- 1 appuser appuser 361 Jul 27 14:44 _12tt.fnm > -rw-r--r-- 1 appuser appuser 1.9M Jul 27 14:44 _12tt.frq > -rw-r--r-- 1 appuser appuser 14K Jul 27 14:44 _12tt.nrm > -rw-r--r-- 1 appuser appuser 3.7M Jul 27 14:44 _12tt.prx > -rw-r--r-- 1 appuser appuser 29 Jul 27 14:44 _12tt.stats > -rw-r--r-- 1 appuser appuser 38K Jul 27 14:44 _12tt.tii > -rw-r--r-- 1 appuser appuser 2.6M Jul 27 14:44 _12tt.tis > -rw-r--r-- 1 appuser appuser 62M Jul 15 19:51 _kft.fdt > -rw-r--r-- 1 appuser appuser 1.3M Jul 15 19:51 _kft.fdx > -rw-r--r-- 1 appuser appuser 397 Jul 15 19:51 _kft.fnm > -rw-r--r-- 1 appuser appuser 626M Jul 15 20:40 _kft.frq > -rw-r--r-- 1 appuser appuser 3.5M Jul 15 20:40 _kft.nrm > -rw-r--r-- 1 appuser appuser 2.6G Jul 15 20:40 _kft.prx > -rw-r--r-- 1 appuser appuser 31 Jul 15 20:40 _kft.stats > -rw-r--r-- 1 appuser appuser 20K Jul 19 23:01 _kft_sv.del > -rw-r--r-- 1 appuser appuser 295K Jul 15 20:40 _kft.tii > -rw-r--r-- 1 appuser appuser 25M Jul 15 20:40 _kft.tis > -rw-r--r-- 1 appuser appuser 6.6K Jul 19 18:32 _ls6_aj.del > -rw-r--r-- 1 appuser appuser 17M Jul 17 18:21 _ls6.fdt > -rw-r--r-- 1 appuser appuser 418K Jul 17 18:21 _ls6.fdx > -rw-r--r-- 1 appuser appuser 397 Jul 17 18:21 _ls6.fnm > -rw-r--r-- 1 appuser appuser 556M Jul 17 19:13 _ls6.frq > -rw-r--r-- 1 appuser appuser 1.2M Jul 17 19:13 _ls6.nrm > -rw-r--r-- 1 appuser appuser 2.9G Jul 17 19:13 _ls6.prx > -rw-r--r-- 1 appuser appuser 31 Jul 17 19:13 _ls6.stats > -rw-r--r-- 1 appuser appuser 155K Jul 17 19:13 _ls6.tii > -rw-r--r-- 1 appuser appuser 14M Jul 17 19:13 _ls6.tis > -rw-r--r-- 1 appuser appuser 20 Jul 27 14:44 segments.gen > -rw-r--r-- 1 appuser appuser 158 Jul 27 14:44 segments_pg5 > [root@sc-s1-172-1.oxford.com ~]# > > [root@sc-s1-172-1.oxford.com ~]# ls -lh /index/TP_0000000000000000499/ | > grep G > total 9.7G > -rw-r--r-- 1 appuser appuser 2.2G Jul 27 13:58 _12r7.prx > -rw-r--r-- 1 appuser appuser 2.6G Jul 15 20:40 _kft.prx > -rw-r--r-- 1 appuser appuser 2.9G Jul 17 19:13 _ls6.prx > [root@sc-s1-172-1.oxford.com ~]# > > Regards > Ravi > --0016e65096e660545a04a90a9c19--