lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Michael McCandless (JIRA)" <j...@apache.org>
Subject [jira] Commented: (LUCENE-2664) Add SimpleText codec
Date Mon, 27 Sep 2010 10:33:32 GMT

    [ https://issues.apache.org/jira/browse/LUCENE-2664?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12915264#action_12915264
] 

Michael McCandless commented on LUCENE-2664:
--------------------------------------------

Committed, but I had to leave SimpleText out of the nightly rotation... some tests run incredibly
slowly, due to heavy reliance on the terms dict cache (which SimpleText doesn't have)... I'd
like to separately fix that and then hopefully put SImpleText in for rotation, so I'll leave
this issue open for that.

> Add SimpleText codec
> --------------------
>
>                 Key: LUCENE-2664
>                 URL: https://issues.apache.org/jira/browse/LUCENE-2664
>             Project: Lucene - Java
>          Issue Type: Improvement
>          Components: Index
>            Reporter: Michael McCandless
>            Assignee: Michael McCandless
>             Fix For: 4.0
>
>         Attachments: LUCENE-2664.patch
>
>
> Inspired by Sahin Buyrukbilen's question here:
>   http://www.lucidimagination.com/search/document/b68846e383824653/how_to_export_lucene_index_to_a_simple_text_file#b68846e383824653
> I made a simple read/write codec that stores all postings data into a
> single text file (_X.pst), looking like this:
> {noformat}
> field contents
>   term file
>     doc 0
>       pos 5
>   term is
>     doc 0
>       pos 1
>   term second
>     doc 0
>       pos 3
>   term test
>     doc 0
>       pos 4
>   term the
>     doc 0
>       pos 2
>   term this
>     doc 0
>       pos 0
> END
> {noformat}
> The codec is fully funtional -- all Lucene & Solr tests pass with
> -Dtests.codec=SimpleText -- but, its performance is obviously poor.
> However, it should be useful for debugging, transparency,
> understanding just what Lucene stores in its index, etc.  And it's a
> quick way to gain some understanding on how a codec works...

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


Mime
View raw message