lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Michael McCandless (JIRA)" <j...@apache.org>
Subject [jira] Commented: (LUCENE-971) Create enwiki indexable data as line-per-article rather than file-per-article
Date Wed, 01 Aug 2007 16:33:52 GMT

    [ https://issues.apache.org/jira/browse/LUCENE-971?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12517007
] 

Michael McCandless commented on LUCENE-971:
-------------------------------------------


> I can look at what it would take to avoid the line file ... but
> ... what about the overhead of the XML parser? I don't tend to think
> of XML parsers as "light". Would bundling that into the test be a
> concern?

Right I too would not consider XML parsing overhead "light".  So tests
that are sensitive to the XML parsing cost should first create a line
file.

But, this is the case regardless of which approach we use (ie, both
approaches allow you use a line file -- the WriteLineDocTask writes a
line file from any DocMaker).  It's just that the new approach would
buy us more flexibility for those people who don't need (or want) to
use the line file as an intermediary.


> Create enwiki indexable data as line-per-article rather than file-per-article
> -----------------------------------------------------------------------------
>
>                 Key: LUCENE-971
>                 URL: https://issues.apache.org/jira/browse/LUCENE-971
>             Project: Lucene - Java
>          Issue Type: Improvement
>            Reporter: Steven Parkes
>         Attachments: LUCENE-971.patch.txt
>
>
> Create a line per article rather than a file. Consume with indexLineFile task.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org


Mime
View raw message