lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Lance Norskog <goks...@gmail.com>
Subject Re: easy way to figure out most common tokens?
Date Mon, 20 Aug 2012 02:04:27 GMT
You don't need to index the data. Just run the analyzer and maintain
your own counters. This will be disk-bound and will run at your disk
reading speed.

On Sun, Aug 19, 2012 at 5:17 PM, Shaya Potter <spotter@gmail.com> wrote:
> On 08/19/2012 08:07 PM, Shaya Potter wrote:
>>
>> On 08/15/2012 02:34 PM, Ahmet Arslan wrote:
>>>>
>>>> Is there an easy way to figure out
>>>> the most common tokens and then remove those tokens from the
>>>> documents.
>>>
>>>
>>> Probably this :
>>>
>>> http://lucene.apache.org/core/3_6_1/api/all/org/apache/lucene/misc/HighFreqTerms.html
>>>
>>
>> unsure how to use this
>>
>> as far as I can tell org.apache.lucene.misc.TermStats doesn't exist in
>> lucene 3.6.1 (there seems to be some class like that in 4.x, but that
>> doesn't help me).
>
>
> I'm wrong, its there, but eclipse isn't seeing it (haven't tried javac by
> itself), even though it sees HighFreqTerms just fine.
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>



-- 
Lance Norskog
goksron@gmail.com

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Mime
View raw message