lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Robby <java....@phi-integration.com>
Subject Grouping Based on Multiple Fields Similarity
Date Mon, 21 May 2012 14:31:53 GMT
Hi Everyone,

I'm quite new to Lucene and would like to ask if my case below is possible
with Lucene solution.

Let's say I have 200,000 rows from a relational table with multiple fields,
and I will have them indexed with Lucene. After indexing, I'd like to have
a grouping / clustering based on similarity between four of five fields.

The end result would be something like this :

- Grouping 1, count : 3
      - row id = 1
      - row id =  23
      - row id =  100
- Grouping 2
      - row id = 1
      - row id =  23
- ...

I have done some research and MoreLikeThis class can be use on this :

http://lucene.apache.org/core/old_versioned_docs/versions/3_0_0/api/contrib-queries/org/apache/lucene/search/similar/MoreLikeThis.html

I'm still learning the usage of this class. But maybe if anyone can confirm
the approach and maybe some guide, it would be very appreciated.

Many thanks before..

Regards,

Robby

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message