lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Vico Marziale <vicod...@gmail.com>
Subject Common Bottlenecks
Date Tue, 09 Jun 2009 23:17:58 GMT
Hello all. I am new to Lucene as well as this list. I am a PhD student at
the University of New Orleans. My current research in in leveraging
highly-multicore processors to speed computer forensics tools. For the
moment I am trying to figure out what the most common performance bottleneck
inside of Lucene itself is. I will then take a crack at porting some (small)
portion of Lucene to CUDA (http://www.nvidia.com/object/cuda_what_is.html)
and see what kind of speedups are achievable.

The portion of code to be ported must be trivially parallelizable. After
spending some time digging around the docs and source, StandardAnalyzer
appears to be a likely candidate. I've run the demo code through a profiler,
but it was less than helpful, especially in light of the fact bottlenecks
are going to be dependent on the way the Lucene API is used. In
general, what is the most computationally expensive part of the process?
Does the analyzer seem like a reasonable choice?

Thanks,
-- 
Lodovico Marziale
PhD Candidate
Department of Computer Science
University of New Orleans

Mime
View raw message