lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Mohammad Tariq <donta...@gmail.com>
Subject Optimal way to index
Date Mon, 11 Feb 2013 15:14:10 GMT
Hello list,

         I have a scenario wherein I need an in-memory index as I need
faster search. The problem goes like this :

I have a list which contains a couple of thousands words. Each word has a
corresponding ID and a list of synonyms. The actual word is a column in my
Hbase table. I get files which contain values for this column and I have to
extract values from these files and put them into the appropriate column.
But sometimes files may contain the synonym instead of the actual word.
Now, this is the place where index come into picture. I should have an
index that contains all the words along with its ID and all the synonyms
and it should be in-memory always so that inserts into Hbase are quick.
Something like this :

 ID          WORD           SYNONYMS
 13991     A                  a, A, Aa, aa, AA

Then the index should be something like this :
a    A   13991
A    A   13991
Aa  A   13991
aa   A   13991
AA  A   13991

So that if I get 'a' in the file, I should be able to do a lookup and index
should give me 'A' along with '13991'. I need both the base name and the
ID. The names could even be strings of 4 to 5 words.

I have all this information stored in a Hbase table having two columns
where the first column contains the actual word and the second column
contains the entire list of synonyms. And the rowkey is the ID.

Now. I am not getting whether it is feasible to use Lucene to get this or
 should I go with something like 'Guava Table' or something else. Need some
guidance as being new to Lucene I am not able to think in the right
direction. If it is feasible to use Lucene to achieve this how to do it
efficiently?

I am using Hbase filters right now to do the fetch which is slowing down
the process.

I am sorry if my questions sound too childish or senseless as I am not very
good at Lucene. Thank you so much for your valuable time.

Warm Regards,
Tariq
https://mtariq.jux.com/
cloudfront.blogspot.com

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message