lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Anshum <ansh...@gmail.com>
Subject Re: Is There a Way To Split The Lucene Index Segments To Samller Size Less Than 1 GB
Date Wed, 27 Jul 2011 10:35:25 GMT
Hi Ravi,
You could reindex the data and have a sharding mechanism i.e. Open 3 index
writers, read documents from source and add to the appropriate indexwriter
to create a sharded index.
Incase you'd want to avoid the re-indexing due to whatever reasons, you may
create 3 copies of the index and delete documents basis some criteria
leaving data/documents for only that shard e.g.
Index contains docs with ids 1,2,3,4,5,6,7,8,9
You could create 3 copies, fire a delete on the first one to retain only
Docs 1,2 and 3.
Similarly do it for the other 2 shards. Leaving you with 3 indexes.

On the other hand, why do you want to split a 9G index? Is there a reason?
performance issue? It'd be good if you could share the reason as the problem
could be completely different.

--
Anshum Gupta
http://ai-cafe.blogspot.com


2011/7/27 Gudi, Ravi Sankar <ravisankarg.ravisankarg@hp.com>

> Hi Lucene Team,
>
> If you know or if there is any way of splitting Lucene indexing segments to
> smaller segments of size less than 1 GB, can you please know me?
> Here I am giving one index segments sizes, total size of index is 9.7 GB,
> here there are three Lucene files a) _12r7.prx b) _kft.prx c) _ls6.prx of
> size greater than 1 GB.
> I want to split them to different pieces and want to reduce their size.
>
> [root@sc-s1-172-1.oxford.com ~]# ls -lh /index/TP_0000000000000000499/
> total 9.7G
> -rw-r--r-- 1 appuser appuser  80M Jul 27 13:53 _12r7.fdt
> -rw-r--r-- 1 appuser appuser 1.4M Jul 27 13:53 _12r7.fdx
> -rw-r--r-- 1 appuser appuser  397 Jul 27 13:53 _12r7.fnm
> -rw-r--r-- 1 appuser appuser 649M Jul 27 13:58 _12r7.frq
> -rw-r--r-- 1 appuser appuser 3.9M Jul 27 13:58 _12r7.nrm
> -rw-r--r-- 1 appuser appuser 2.2G Jul 27 13:58 _12r7.prx
> -rw-r--r-- 1 appuser appuser   33 Jul 27 13:58 _12r7.stats
> -rw-r--r-- 1 appuser appuser 334K Jul 27 13:58 _12r7.tii
> -rw-r--r-- 1 appuser appuser  28M Jul 27 13:58 _12r7.tis
> -rw-r--r-- 1 appuser appuser  24K Jul 27 14:44 _12ts.fdt
> -rw-r--r-- 1 appuser appuser  400 Jul 27 14:44 _12ts.fdx
> -rw-r--r-- 1 appuser appuser  361 Jul 27 14:44 _12ts.fnm
> -rw-r--r-- 1 appuser appuser  90K Jul 27 14:44 _12ts.frq
> -rw-r--r-- 1 appuser appuser 1.1K Jul 27 14:44 _12ts.nrm
> -rw-r--r-- 1 appuser appuser 218K Jul 27 14:44 _12ts.prx
> -rw-r--r-- 1 appuser appuser   25 Jul 27 14:44 _12ts.stats
> -rw-r--r-- 1 appuser appuser 8.7K Jul 27 14:44 _12ts.tii
> -rw-r--r-- 1 appuser appuser 656K Jul 27 14:44 _12ts.tis
> -rw-r--r-- 1 appuser appuser 309K Jul 27 14:44 _12tt.fdt
> -rw-r--r-- 1 appuser appuser 5.1K Jul 27 14:44 _12tt.fdx
> -rw-r--r-- 1 appuser appuser  361 Jul 27 14:44 _12tt.fnm
> -rw-r--r-- 1 appuser appuser 1.9M Jul 27 14:44 _12tt.frq
> -rw-r--r-- 1 appuser appuser  14K Jul 27 14:44 _12tt.nrm
> -rw-r--r-- 1 appuser appuser 3.7M Jul 27 14:44 _12tt.prx
> -rw-r--r-- 1 appuser appuser   29 Jul 27 14:44 _12tt.stats
> -rw-r--r-- 1 appuser appuser  38K Jul 27 14:44 _12tt.tii
> -rw-r--r-- 1 appuser appuser 2.6M Jul 27 14:44 _12tt.tis
> -rw-r--r-- 1 appuser appuser  62M Jul 15 19:51 _kft.fdt
> -rw-r--r-- 1 appuser appuser 1.3M Jul 15 19:51 _kft.fdx
> -rw-r--r-- 1 appuser appuser  397 Jul 15 19:51 _kft.fnm
> -rw-r--r-- 1 appuser appuser 626M Jul 15 20:40 _kft.frq
> -rw-r--r-- 1 appuser appuser 3.5M Jul 15 20:40 _kft.nrm
> -rw-r--r-- 1 appuser appuser 2.6G Jul 15 20:40 _kft.prx
> -rw-r--r-- 1 appuser appuser   31 Jul 15 20:40 _kft.stats
> -rw-r--r-- 1 appuser appuser  20K Jul 19 23:01 _kft_sv.del
> -rw-r--r-- 1 appuser appuser 295K Jul 15 20:40 _kft.tii
> -rw-r--r-- 1 appuser appuser  25M Jul 15 20:40 _kft.tis
> -rw-r--r-- 1 appuser appuser 6.6K Jul 19 18:32 _ls6_aj.del
> -rw-r--r-- 1 appuser appuser  17M Jul 17 18:21 _ls6.fdt
> -rw-r--r-- 1 appuser appuser 418K Jul 17 18:21 _ls6.fdx
> -rw-r--r-- 1 appuser appuser  397 Jul 17 18:21 _ls6.fnm
> -rw-r--r-- 1 appuser appuser 556M Jul 17 19:13 _ls6.frq
> -rw-r--r-- 1 appuser appuser 1.2M Jul 17 19:13 _ls6.nrm
> -rw-r--r-- 1 appuser appuser 2.9G Jul 17 19:13 _ls6.prx
> -rw-r--r-- 1 appuser appuser   31 Jul 17 19:13 _ls6.stats
> -rw-r--r-- 1 appuser appuser 155K Jul 17 19:13 _ls6.tii
> -rw-r--r-- 1 appuser appuser  14M Jul 17 19:13 _ls6.tis
> -rw-r--r-- 1 appuser appuser   20 Jul 27 14:44 segments.gen
> -rw-r--r-- 1 appuser appuser  158 Jul 27 14:44 segments_pg5
> [root@sc-s1-172-1.oxford.com ~]#
>
> [root@sc-s1-172-1.oxford.com ~]# ls -lh /index/TP_0000000000000000499/ |
> grep G
> total 9.7G
> -rw-r--r-- 1 appuser appuser 2.2G Jul 27 13:58 _12r7.prx
> -rw-r--r-- 1 appuser appuser 2.6G Jul 15 20:40 _kft.prx
> -rw-r--r-- 1 appuser appuser 2.9G Jul 17 19:13 _ls6.prx
> [root@sc-s1-172-1.oxford.com ~]#
>
> Regards
> Ravi
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message