lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Grant Ingersoll <gsing...@apache.org>
Subject Re: indexing api wrt Analyzer
Date Fri, 14 Mar 2008 00:39:52 GMT
There is an addDocument method that takes an Analyzer and overrides  
the one used at construction of the IndexWriter.  See
http://lucene.apache.org/java/2_3_1/api/core/org/apache/lucene/index/IndexWriter.html#addDocument(org.apache.lucene.document.Document,%20org.apache.lucene.analysis.Analyzer)

.



On Mar 13, 2008, at 4:12 PM, John Wang wrote:

> Hi Grant:
>
>    For our corpus, we don't rely on idf in scoring calculation that  
> much,
> so I don't see that being a problem that much.
>
>    About performance, instantiating 1 indexWriter for a batch of say  
> 1000
> docs, e.g. iterate over 1000 docs and do addDocument; comparing with
> instantiating and closing 1000 indexWriters each doing 1  
> addDocument. Are
> you saying the expected performance is the same? I thought when you  
> call
> addDocument, it adds to memory and flush when segment needs to be  
> merged or
> writer closes.
>
>    Maybe I am missing something.
>
> Thanks
>
> -john
>
> On Thu, Mar 13, 2008 at 11:37 AM, Grant Ingersoll  
> <gsingers@apache.org>
> wrote:
>
>>
>> On Mar 13, 2008, at 11:03 AM, John Wang wrote:
>>
>>> Yes, but usually it's a good idea to add documents in batch and not
>>> having
>>> to reinstantiate the writer for every document and then closing it.
>>>
>>> It would be nice if one can specify to the writer which analyzer to
>>> use.
>>>
>>> PerfieldAnalyzer wouldn't work because different analyzers may apply
>>> on the
>>> same field depending on the doc, e.g.
>>>
>>
>> Also, I don't know that it is wise to put different langs in the same
>> field.  I can't prove it definitively, but it seems to me your corpus
>> statistics could be skewed by terms that are spelled the same but  
>> have
>> different meanings across languages.
>>
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
>> For additional commands, e-mail: java-user-help@lucene.apache.org
>>
>>

--------------------------
Grant Ingersoll
http://www.lucenebootcamp.com
Next Training: April 7, 2008 at ApacheCon Europe in Amsterdam

Lucene Helpful Hints:
http://wiki.apache.org/lucene-java/BasicsOfPerformance
http://wiki.apache.org/lucene-java/LuceneFAQ






---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Mime
View raw message