lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Sebastin <sebasmt...@gmail.com>
Subject Re: ways to minimize index size?
Date Fri, 22 Jun 2007 14:54:51 GMT

Steve,
          i use your idea it works for me great,once again i say thanks to
you.But when i use
                    (Index.No_NORMS ) it increase the size in the same time
when i use(Index.TOKENIZED)it will reduce the size.

           i use the code given by you   
BigInteger  _bi = new java.math.BigInteger("9198408365809", 10);
 System.out.println(_bi.toString(36));     

 other RADIX increase the size.                    

          Modifications I made in my code is below:

    String outgoingNumber="9198408365809";

     String incomingNumber="9840861114";
     String datesc="070601";
     String imsiNumber="444021365987";
     String callType="1";


     String outgoingRoute="DJZ01" ;
     String incomingRoute="BSC01";

BigInteger  _on = new java.math.BigInteger(outgoingNumber, 10);
 String compOutgoingNumber= _on.toString(36);

BigInteger  _in = new java.math.BigInteger( incomingNumber, 10);
 String compIncomingNumber= _in.toString(36);

BigInteger  _ds = new java.math.BigInteger(dateSc, 10);
 String compDateSc= _ds.toString(36);

BigInteger  _im = new java.math.BigInteger(imsiNumber, 10);
 String compImsiNumber= _im.toString(36);

String contents(compOutgoingNumber+" "+compIncomingNumber+" "+compDateSc+"
"+compImsiNumber+callTYpe);

String records=((compOutgoingNumber+" "+compIncomingNumber+" "+compDateSc+ "
" +outgoingRoute+" "+incomingRoute);

File indexDir = new File("/home/Mediation/Index");
IndexWriter indexWriter =new IndexWriter(indexDir, new StandardAnalyzer(),
true);
Document doc=new Document();
doc.add("contents",contents,Field.Store.NO,Field.Index.TOKENIZED);
doc.add("records",records,Field.Store.YES ,Field.Index.No);
indexWriter.addDocument(document);

please help me to acheive that


Sebastin wrote:
> 
> Hi Steve,
>      thanks for your reply a lot.its now compress upto 50% of the original
> size.is there any other possiblity using this code compress upto 80%.
> 
> Steve Liles wrote:
>> 
>> Compression aside you could index the "contents" as terms in separate 
>> fields instead of tokenized text, and disable storing of norms:
>> 
>> String outgoingNumber="9198408365809";
>> String incomingNumber="9840861114";
>> 
>> _doc.add(new Field("outgoingNumber", outgoingNumber, Store.NO, 
>> Index.NO_NORMS));
>> _doc.add(new Field("incomingNumber", incomingNumber, Store.NO, 
>> Index.NO_NORMS));
>> 
>> According to the docs "Index.NO_NORMS" will save you one byte per 
>> document in the index.
>> 
>> Or you could index all of the data as separate terms in the same 
>> "contents" field if you wanted (make the first param "contents" for all 
>> of the terms), which is more comparable to what you are currently doing.
>> (Another advantage is that the Analyzer will not be used for fields 
>> which are untokenized, and indexing should be faster.)
>> 
>> ...
>> 
>> One way to compress numerical data (possibly not the best - i'm no 
>> expert) is to change the base of the number that is indexed / stored in 
>> the index.
>> 
>> java.lang.Long and java.math.BigInteger have methods for converting from 
>> one radix to another. Taking your "outgoingNumber" as an example:
>> 
>> //compression
>> BigInteger  _bi = new java.math.BigInteger("9198408365809", 10);
>> System.out.println(_bi.toString(36));
>> 
>>  > 39douufap
>> 
>> //decompression
>> BigInteger _bi = new java.math.BigInteger("39douufap", 36);
>> System.out.println(_bi.toString(10));
>> 
>>  >9198408365809
>> 
>> Converting to a higher radix will give you better compression but you'll
>> have to do it yourself as the jdk classes only work up to base 36
>> <http://en.wikipedia.org/wiki/Base_36>.
>> 
>> It's worth compressing your unstored "contents" field as well as your 
>> stored "records" field, as the unique terms in the "contents" field will 
>> effectively be stored.
>> 
>> Also don't forget to convert the terms when you search too, otherwise 
>> you won't find anything ;)
>> 
>> Steve.
>> 
>> 
>> Sebastin wrote:
>>> When i use the standardAnalyzer storage size increases.how can i
>>> minimize
>>> index store
>>>
>>> Sebastin wrote:
>>>   
>>>>                        
>>>> String outgoingNumber="9198408365809";
>>>> String incomingNumber="9840861114";
>>>> String datesc="070601";
>>>> String imsiNumber="444021365987";
>>>> String callType="1";
>>>>
>>>> //Search Fields
>>>>  String contents=(outgoingNumber+" "+incomingNumber+" "+dateSc+"
>>>> "+imsiNumber+" "+callType );
>>>>
>>>> //Display Fields
>>>>                      
>>>>                           String records=(callingPartyNumber+"
>>>> "+calledPartyNumber+" "+dateSc+" "+chargDur+" "+incomingRoute+"
>>>> "+outgoingRoute+" "+timeSc);
>>>>                           
>>>>                      
>>>>                        IndexWriter indexWriter = new
>>>> IndexWriter(indexDir,new StandardAnalyzer(),true);  
>>>>                         
>>>>                           Document document = new Document();
>>>>   
>>>>                              document.add(new
>>>> Field("contents",contents,Field.Store.NO,Field.Index.TOKENIZED));
>>>>                              
>>>>                      
>>>>                      
>>>>                 document.add(new
>>>> Field("records",records,Field.Store.YES,Field.Index.NO));
>>>>                              
>>>>                            
>>>>                              indexWriter.setUseCompoundFile(true);
>>>>                              indexWriter.addDocument(document);
>>>>                           }
>>>>
>>>> please help me to acheive the minimum size
>>>>
>>>>
>>>>
>>>>
>>>>
>>>> Erick Erickson wrote:
>>>>     
>>>>> Show us the code you use to index. Are you storing the fields?
>>>>> omitting norms? Throwing out stop words?
>>>>>
>>>>> Best
>>>>> Erick
>>>>>
>>>>> On 6/19/07, Sebastin <sebasmtech@gmail.com> wrote:
>>>>>       
>>>>>> Hi Does anyone give me an idea to reduce the Index size to down.now
i
>>>>>> am
>>>>>> getting 42% compression in my index store.i want to reduce upto 70%.i
>>>>>> use
>>>>>> standardanalyzer to write the document.when i use SimpleAnalyzer
it
>>>>>> reduce
>>>>>> upto 58% but i couldnt search the document.please help me to acheive.
>>>>>>
>>>>>> Thanks in advance
>>>>>>
>>>>>> Jeff-188 wrote:
>>>>>>         
>>>>>>>> I found that reducing my index from 8G to 4G (through not
stemming)
>>>>>>>>             
>>>>>> gave
>>>>>> me
>>>>>>         
>>>>>>> about a 10% performance improvement.
>>>>>>>
>>>>>>> How did you do this? I don't see this as an option.
>>>>>>>
>>>>>>> Jeff
>>>>>>>
>>>>>>>
>>>>>>>           
>>>>>> --
>>>>>> View this message in context:
>>>>>> http://www.nabble.com/ways-to-minimize-index-size--tf3401213.html#a11195406
>>>>>> Sent from the Lucene - Java Users mailing list archive at Nabble.com.
>>>>>>
>>>>>>
>>>>>> ---------------------------------------------------------------------
>>>>>> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
>>>>>> For additional commands, e-mail: java-user-help@lucene.apache.org
>>>>>>
>>>>>>
>>>>>>         
>>>>>       
>>>>     
>>>
>>>   
>> 
>> 
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
>> For additional commands, e-mail: java-user-help@lucene.apache.org
>> 
>> 
>> 
> 
> 

-- 
View this message in context: http://www.nabble.com/ways-to-minimize-index-size--tf3401213.html#a11253761
Sent from the Lucene - Java Users mailing list archive at Nabble.com.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Mime
View raw message