lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Marvin Humphrey <>
Subject Re: [jira] Updated: (LUCENE-848) Add supported for Wikipedia English as a corpus in the benchmarker stuff
Date Mon, 02 Apr 2007 21:59:09 GMT

On Apr 2, 2007, at 2:50 PM, Steven Parkes wrote:

> On the one hand, creating separate per-article files is "clean" in  
> that
> when you then ingest, you only have disk i/o that's going to affect  
> the
> ingest performance (as opposed to, say, uncompressing/parsing). On the
> other hand, that's a lot of disk i/o (compresses by about 5X) and a  
> lot
> of directory lookups.

One reason I was expanding the elements into individual files was so  
that I could compare different libraries against Lucene, including  
those in other languages.  It was important to measure the engines  
themselves, not SGML parsers.

Marvin Humphrey
Rectangular Research

To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message