lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Marcelo Ochoa" <marcelo.oc...@gmail.com>
Subject Re: Typical Indexing performance
Date Tue, 03 Jun 2008 19:57:02 GMT
Hi:
  Here my latest testing of Oracle-Lucene integration (Lucene 2.3.2
binary dist. / Oracle 11g):
http://marceloochoa.blogspot.com/2008/06/new-binary-release-of-lucene-oracle.html
  Tested against Spanish Wikipedia Dumps and using Wikipedia Analyzer/Tokenizer.
  There is independent times for uploading process and for indexing process.
  Uploading process means parsing of Wikipedia XML dumps and insert
into Oracle XMLDB repository which transform it in an object
relational structure:
http://marceloochoa.blogspot.com/2007/12/uploading-wikipedia-dumps-to-oracle.html
  Indexing process means a creation of a Lucene Domain Index with
something like this:

create index pages_lidx_all on pages p (value(p))
indextype is Lucene.LuceneIndex
parameters('PopulateIndex:false;SyncMode:Deferred;LogLevel:WARNING;Analyzer:org.apache.lucene.analysis.SpanishWikipediaAnalyzer;ExtraCols:extractValue(object_value,''/page/title'')
"title",extractValue(object_value,''/page/revision/comment'')
"comment",extract(object_value,''/page/revision/text/text()'')
"text",extractValue(object_value,''/page/revision/timestamp'')
"revisionDate";FormatCols:revisionDate(day);IncludeMasterColumn:false;LobStorageParameters:PCTVERSION
0 ENABLE STORAGE IN ROW CHUNK 32768 CACHE READS
FILESYSTEM_LIKE_LOGGING');

  Which indexs in separately Lucene Fields title, comment,  text and
timestamp XML nodes and the Oracle ROWID.
  Best regards, Marcelo.
On Tue, Jun 3, 2008 at 4:42 PM, Otis Gospodnetic
<otis_gospodnetic@yahoo.com> wrote:
> There i really no "typical".  I'm playing with Hadoop (HDFS) and Solr at the moment,
for example, and I'm seeing indexing rate of cca 70 docs/second.  However, the bottleneck
there is not indexing, it is reading data from HDFS (over the network).
>
>
> I've also seen 500+ docs/second.
>
> It depends on many factors:
> how fast reading your data source is, how complex your analysis is, the size of documents
and number of fields, whether fields are stored or only indexed, the IndexWriter settings
for segment merging and memory usage, of course, there is hardware, etc.
>
> Otis
> --
> Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch
>
>
> ----- Original Message ----
>> From: Simon Wistow <simon@thegestalt.org>
>> To: Lucene <java-user@lucene.apache.org>
>> Sent: Monday, June 2, 2008 7:40:52 PM
>> Subject: Typical Indexing performance
>>
>> I know this is one of those "How long is a piece of string?" questions
>> but I'm curious as to the order of magnitude of indexing performance.
>>
>> http://lucene.apache.org/java/docs/benchmarks.html
>>
>> seems to indicate about 100-120 docs/s is pretty good for average sized
>> documents (say, an email or something) or is that ludicrously out of
>> date for 2.3.x ?
>>
>> Simon
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
>> For additional commands, e-mail: java-user-help@lucene.apache.org
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>
>



-- 
Marcelo F. Ochoa
http://marceloochoa.blogspot.com/
http://marcelo.ochoa.googlepages.com/home
______________
Do you Know DBPrism? Look @ DB Prism's Web Site
http://www.dbprism.com.ar/index.html
More info?
Chapter 17 of the book "Programming the Oracle Database using Java &
Web Services"
http://www.amazon.com/gp/product/1555583296/
Chapter 21 of the book "Professional XML Databases" - Wrox Press
http://www.amazon.com/gp/product/1861003587/
Chapter 8 of the book "Oracle & Open Source" - O'Reilly
http://www.oreilly.com/catalog/oracleopen/

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Mime
View raw message