lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Doron Cohen (JIRA)" <j...@apache.org>
Subject [jira] Commented: (LUCENE-971) Create enwiki indexable data as line-per-article rather than file-per-article
Date Wed, 01 Aug 2007 19:00:52 GMT

    [ https://issues.apache.org/jira/browse/LUCENE-971?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12517047
] 

Doron Cohen commented on LUCENE-971:
------------------------------------

> But, this is the case regardless of which approach we use (ie, both
> approaches allow you use a line file -- the WriteLineDocTask writes a
> line file from any DocMaker).  It's just that the new approach would
> buy us more flexibility for those people who don't need (or want) to
> use the line file as an intermediary.

So there would now be two alternative ways to index wiki data:
(1) using the proposed WikiDocMaker directly to feed AddDoc task.
(2) using line file after first running WriteLineDocTask when the 
doc maker was WikiDocMaker.

I like this approach.

This means that WikiDocMaker would read the data straight from 
temp/enwiki-20070527-pages-articles.xml. So the extract-enwiki 
target in build.xml would no longer be needed, right?



> Create enwiki indexable data as line-per-article rather than file-per-article
> -----------------------------------------------------------------------------
>
>                 Key: LUCENE-971
>                 URL: https://issues.apache.org/jira/browse/LUCENE-971
>             Project: Lucene - Java
>          Issue Type: Improvement
>            Reporter: Steven Parkes
>         Attachments: LUCENE-971.patch.txt
>
>
> Create a line per article rather than a file. Consume with indexLineFile task.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org


Mime
View raw message