lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Li Li <fancye...@gmail.com>
Subject what's wrong with my implementation?
Date Tue, 14 Feb 2012 11:43:04 GMT
hi all,
    maybe it's not a suitable question here, but I want  advices from
experts knowing the details of lucene indexing.
    we modified lucene 2.9.1 and add a feature we call attribute fields
which may be named column based storage in many K-V system.
    because we need frequently update some fields of a document, such as
click count of webpage. this kind of field is not indexed but only stored.
this field may affect ranking. but for now, let alone ranking. we can just
think attribute an alternative thing for stored fields in fdt/fdx.
    attributes can be updated later. but it's also not related to my
question.
    I imitated implementation of the fdt/fdx.
    for each call of IndexWriter.updateDocument(),
    the core methid is processDocument called in updateDocument()
    final DocWriter perDoc = state.consumer.processDocument();
    the codes related to fdt/fdx in DocFieldProcessorPerThread is
      if (field.isStored) {
        fieldsWriter.addField(field, fp.fieldInfo);
      }
     which do the detailed things.
   I add my codes after  final DocWriter perDoc =
state.consumer.processDocument();
   writing attributes for each document. The difference is that, stored
fields will be written to file(write but not flushed) and I hold all my
attributes in memory and flush them in DocumentsWriter.flush()
            for(int i=0;i<threadStates.length;i++)
                threads.add(threadStates[i].consumer);
            consumer.flush(threads, flushState);

            //added by LiLi
            //flush fixed length attributes
            this.flushFixedLengthAttributes(flushState.segmentName);
            consumer.flush() will eventually call FieldsWriter.flush()
            which flush fdt/fdx here.
            indexStream.flush();
            fieldsStream.flush();
            after fdt/fdx tii tis frq prx tvxxx are flushed. I then flush
my attributes

          Another place need to modify is SegmentMerger.merge()

              mergedDocs = mergeFields();

              //added by LiLi
              //merge anm file
              mergeAttributeInfos();
              //merge attr files
              mergeAttributes(infos);

         it works fine in normal situation. But there is a bug. this bug
occured 2 times. one is last year and the other is this week. I found the
wrong segment have this phenomenon: my attribute file is created but file
size is zero. the segment has only 1 document and is deleted(numDoc()==0
maxDoc()==1)
         all other file is correct except prx(its size is also zero) and my
attr file
-rw-r--r-- 1 lili lili 218 2012-02-13 16:34 _54g.0.anm
-rw-r--r-- 1 lili lili   9 2012-02-13 16:35 _54g_1.del
-rw-r--r-- 1 lili lili   0 2012-02-13 16:34 _54g.cTime.0.att
-rw-r--r-- 1 lili lili   0 2012-02-13 16:34 _54g.downloadCount.0.att
-rw-r--r-- 1 lili lili   5 2012-02-13 16:34 _54g.fdt
-rw-r--r-- 1 lili lili  12 2012-02-13 16:35 _54g.fdx
-rw-r--r-- 1 lili lili 252 2012-02-13 16:34 _54g.fnm
-rw-r--r-- 1 lili lili   2 2012-02-13 16:34 _54g.frq
-rw-r--r-- 1 lili lili  11 2012-02-13 16:35 _54g.nrm
-rw-r--r-- 1 lili lili   0 2012-02-13 16:34 _54g.offline.0.att
-rw-r--r-- 1 lili lili   0 2012-02-13 16:34 _54g.prx
-rw-r--r-- 1 lili lili   0 2012-02-13 16:34 _54g.quality.0.att
-rw-r--r-- 1 lili lili   0 2012-02-13 16:34 _54g.quoteCount.0.att
-rw-r--r-- 1 lili lili   0 2012-02-13 16:34 _54g.startTime.0.att
-rw-r--r-- 1 lili lili  35 2012-02-13 16:34 _54g.tii
-rw-r--r-- 1 lili lili  79 2012-02-13 16:35 _54g.tis
-rw-r--r-- 1 lili lili   0 2012-02-13 16:34 _54g.travelMonths.0.att
           I think there are some code path that I missed. anyone could
help?
           Thank you very much.

Mime
View raw message