lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Glen Newton" <glen.new...@gmail.com>
Subject Re: Lucene Indexing structure
Date Fri, 02 May 2008 16:31:06 GMT
Vaijanath,

I think I would do things in a different fashion:
Lucene default distance metric is based on tf/idf and the cosine
model, i.e. the frequencies of items. I believe the values that you
are adding as Fields are the values in n-space for each of these
image-based attributes. I don't believe Lucene's default ranking will
not work for this.

You need to alter Lucene so that it understands that the Fields you
are adding represent the n-space values and not tokens, and alter
Lucene so that it uses this n-space to determine distance.

I am not a Lucene internals expert, but I think you need to write a
custom Similarity[1] class for use in your IndexSearcher[2] and
IndexWriter[3] and I think you might need a custom analyser that
understands that you are putting in actual numbers, not tokens, that
you use when building the index as well as querying it.

There are probably things I am missing and there may be a better way
to do this....

-Glen

[1]http://lucene.apache.org/java/2_2_0/api/org/apache/lucene/search/Similarity.html
[2]http://lucene.apache.org/java/2_2_0/api/org/apache/lucene/search/Searcher.html#setSimilarity(org.apache.lucene.search.Similarity)
[3]http://lucene.apache.org/java/2_2_0/api/org/apache/lucene/index/IndexWriter.html#getSimilarity()

2008/4/26 Vaijanath N. Rao <vaijanathrao@aol.com>:
> Hi Lucene-user and Lucene-dev,
>
>  I want to use lucene as an backend for the Image search (Content based
> Image retrieval).
>
>  Indexing Mechanism:
>  a) Get the Image properties such as Texture Tamura (TT), Texture Edge
> Histogram (TE), Color Coherence Vector (CCV) and Color Histogram (CH) and
> Color Correlogram  (CC) .
>  b) Convert each of these vector into String and index into lucene as
> fields, thush each Image (document in terms of lucene) consist of 6 fields
> Image name, TT field, TE field, CCV field, CH field and CC field.
>
>  Searching Mechanism:
>  a) For the search Image convert the Image into the above 5 properties.
>  b) for every field and for every value within the field construct the
> query, For example let's say the user wants to search only Color histogram
> based similarity and the query Image has 3 1 4 5 as the CH value the query
> will look like.
>    query = "CH:3 CH:1CH:4 CH:5"
>  c) for the results returned convert all the field values back into float
> and do the distance computation and re-rank the document with lower the
> distance on the top and larger distance at the bottom.
>  for example:
>    For above query assume that output has two documents
>    with one having CH as "1 3 5 4" and other one having CH as " 3 1 5 4", so
> the distance computation will rank the second document higher than the
> first.
>
>  Obviously there is something wrong with the above approach (as to get the
> correct document we need to get all the documents and than do the required
> distance calculation), but that' due to lack of my knowledge of Luce and
> lucene's Index storage.
>
>  What I want to know how to improve upon the exsisting architecture other
> than making number of fields in the lucene equalling to total number of
> feature*size of each feature.
>
>  Any other pointer will be welcomed. Is there is any Range tree
> implementation within lucene which I can use for this operation.
>
>  --Thanks and Regards
>  Vaijanath N. Rao
>
>  ---------------------------------------------------------------------
>  To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
>  For additional commands, e-mail: java-user-help@lucene.apache.org
>
>



-- 

-

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Mime
View raw message