lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Michael McCandless <luc...@mikemccandless.com>
Subject Re: [jira] [Commented] (LUCENE-3403) Term vectors missing after addIndexes + optimize
Date Mon, 29 Aug 2011 11:06:41 GMT
Could you boil this down to a smallish test case, showing the term
vector files getting incorrectly deleted?

Then we can test this test case against the current 3.x trunk where
LUCENE-3403 is fixed, to see if that fixes it.

Luke removing the files means that the files were "dead", ie,
unreferenced by any segments_N files in the index, which is bad if
that index was produced by calling addIndexes into a new directory.

Mike McCandless

http://blog.mikemccandless.com

2011/8/29  <Ari.Ko@csk.com>:
>
>
> Hi, I don't know whether my problem is the same reason.
>
> When I merged some indexs to one the term vectors missed in this case.
>
> The input indexs is saved on several difference dirctories, for example
> /index1/, /index2/, /index3/.
> And the output merge index will be saved to another new directory, for
> example /mergeindex/.
>
> addIndexs and opitimize is used here.
>
> So the results is that the index is merged correctly in the /mergeindex/
> and there is no any problem for search function.
> And the TermVector files(tvd, tvf and tvx) are also hold the data because
> the size of them is not 0 byte.
>
> But after I open this index using Luke, I found the TermVector files(tvd,
> tvf and tvx) became 0 byte in the Luke file list.
> And after closed Luke, this three files is disappeared.
>
> I checked a lot of documents and Lucene index format explanation,
> I think the reason is that the [DocStoreOffset]  is not set correctly in
> the segment file here.
> Because the [DocStoreOffset] is not set, Luke thought there is no these
> three files.
>
> But if I merged these in indexs to one which is existed already, the
> results is correct and TermVector is also layout correctly.
>
> For exampl, the input is /index1/, /index2/, /index3/ and the merge output
> dirctory is /index1/.
>
> I don't know whether my case is same as your case and I don't know whether
> it is a bug of lucene.
>
> In fact I post one question about it Aug. 25 which title is <The question
> about DocStoreOffset>.
>
> I would apprecate if you could give me some advices about it
>
> Thanks in advance.
>
> Best regards.
>
> Yali Hu
>
>
>
>
>
> 送信者:     "Michael McCandless (JIRA)" <jira@apache.org> 日付: 2011/08/26
>       12:37 GMT
>
> dev@lucene.apache.orgに返信してください
>
> 宛先:  dev@lucene.apache.org
> cc:
> 件名:  [jira] [Commented] (LUCENE-3403) Term vectors missing after
>       addIndexes + optimize
>
>
>
>    [
>    https://issues.apache.org/jira/browse/LUCENE-3403?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13091739#comment-13091739
>     ]
>
> Michael McCandless commented on LUCENE-3403:
> --------------------------------------------
>
> Phew nice catch Shai!
>
>
>> Term vectors missing after addIndexes + optimize
>> ------------------------------------------------
>>
>>                 Key: LUCENE-3403
>>                 URL: https://issues.apache.org/jira/browse/LUCENE-3403
>>             Project: Lucene - Java
>>          Issue Type: Bug
>>          Components: core/index
>>    Affects Versions: 3.3
>>            Reporter: Shai Erera
>>            Assignee: Shai Erera
>>            Priority: Blocker
>>             Fix For: 3.4, 4.0
>>
>>         Attachments: LUCENE-3403.patch
>>
>>
>> I encountered a problem with addIndexes where term vectors disappeared
> following optimize(). I wrote a simple test case which demonstrates the
> problem. The bug appears with both addIndexes() versions, but does not
> appear if addDocument is called twice, committing changes in between.
>> I think I tracked the problem down to IndexWriter.mergeMiddle() -- it
> sets term vectors before merger.merge() was called. In the addDocs case,
> merger.fieldInfos is already populated, while in the addIndexes case it is
> empty, hence fieldInfos.hasVectors returns false.
>> will post a patch shortly.
>
> --
> This message is automatically generated by JIRA.
> For more information on JIRA, see: http://www.atlassian.com/software/jira
>
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
> For additional commands, e-mail: dev-help@lucene.apache.org
>
>
>
>
>
>
>
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
> For additional commands, e-mail: dev-help@lucene.apache.org
>
>

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


Mime
View raw message