lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Mohammad Tariq <donta...@gmail.com>
Subject Re: Optimal way to index
Date Mon, 11 Feb 2013 16:39:02 GMT
Hey Ian. Thank you so much for the quick reply. I'll definitely give Lucene
a shot. I'll start off with it and get back to you in case of any problem.

Many thanks.

Warm Regards,
Tariq
https://mtariq.jux.com/
cloudfront.blogspot.com


On Mon, Feb 11, 2013 at 10:03 PM, Ian Lea <ian.lea@gmail.com> wrote:

> You can certainly use lucene for this, and it will be blindingly fast
> even if you use a disk based index.
>
> Just index documents as you've laid it out, with the field you want to
> search on added as indexable and the others stored.
>
> I've never used Guava Table so can't comment on that, but with only a
> few thousand words it would certainly be feasible to use something
> like that.  Better?  I don't know.
>
> Personally I'd probably go with lucene as I'd be positive it would a)
> work and b) be fast even if the thousands ending being tens of
> thousands, or more.
>
>
>
>
> --
> Ian.
>
> On Mon, Feb 11, 2013 at 3:14 PM, Mohammad Tariq <dontariq@gmail.com>
> wrote:
> > Hello list,
> >
> >          I have a scenario wherein I need an in-memory index as I need
> > faster search. The problem goes like this :
> >
> > I have a list which contains a couple of thousands words. Each word has a
> > corresponding ID and a list of synonyms. The actual word is a column in
> my
> > Hbase table. I get files which contain values for this column and I have
> to
> > extract values from these files and put them into the appropriate column.
> > But sometimes files may contain the synonym instead of the actual word.
> > Now, this is the place where index come into picture. I should have an
> > index that contains all the words along with its ID and all the synonyms
> > and it should be in-memory always so that inserts into Hbase are quick.
> > Something like this :
> >
> >  ID          WORD           SYNONYMS
> >  13991     A                  a, A, Aa, aa, AA
> >
> > Then the index should be something like this :
> > a    A   13991
> > A    A   13991
> > Aa  A   13991
> > aa   A   13991
> > AA  A   13991
> >
> > So that if I get 'a' in the file, I should be able to do a lookup and
> index
> > should give me 'A' along with '13991'. I need both the base name and the
> > ID. The names could even be strings of 4 to 5 words.
> >
> > I have all this information stored in a Hbase table having two columns
> > where the first column contains the actual word and the second column
> > contains the entire list of synonyms. And the rowkey is the ID.
> >
> > Now. I am not getting whether it is feasible to use Lucene to get this or
> >  should I go with something like 'Guava Table' or something else. Need
> some
> > guidance as being new to Lucene I am not able to think in the right
> > direction. If it is feasible to use Lucene to achieve this how to do it
> > efficiently?
> >
> > I am using Hbase filters right now to do the fetch which is slowing down
> > the process.
> >
> > I am sorry if my questions sound too childish or senseless as I am not
> very
> > good at Lucene. Thank you so much for your valuable time.
> >
> > Warm Regards,
> > Tariq
> > https://mtariq.jux.com/
> > cloudfront.blogspot.com
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message