lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From 秋水 <sdrkyj_luc...@163.com>
Subject [java-user]How did you guys store category info
Date Thu, 20 Sep 2012 01:55:50 GMT
Hello.
my project may require the tree style category info, how to store it so all leaf docs under
some category node could be retrieved ?

in thought, planing to store the vertical category info in field : "level 1", "level 2", ...
with the "level last" field appended. no ideas about the ease of use yet.

before that, I'd like to store the layered category info in one field, like "/usr/bin/...",
which seems not working well, if the info is a "term" or "phrase" that contains spaces.

not-analyzed fields can only be acquired by precisely matched terms.
while constructing field value in the manner of custom terms sequences impossible.
also it could not do query by matching initial terms from the beginning of fields.
I found there is a "regexQuery" contrib, not trailed yet again.. no detailed demo not Howtos
..

neither the term vector storage, nor self-defined byte stream studied yet, which seems too
complicated, as well as not a wise option upon Lucene project.

maybe I could store the root category with a "root" word, and force all subcategories using
other words, or just store all this kind of info in a DB, and references by id. Or just by
some dirty way, using "encodeURIComponent"-like functions, or reversible encryption transforming..
then the approximation query, "1 2 3"~0

I've read some article on IBM about geography search in Lucene, that guy reffered a geohash
function, that make hash value in the same prefix from positions inside the same district.
a good way ha. but the wildcard query seems not working very well in some situation. also
considered the extra analyzing and search consuming, that Lucene is especial for "full text
search", not field value String-begin-with search.
ah, the prefixQuery .. how to using different analyzer for not normal English words segmentation
? eg. "a1b2c3" , is always turned into tems "a 1 b 2 3", or index with not-analyzed switch
and not probable for wildcard matching as well as "PrefixQuery".

in my project, the demands isn't clear yet. such a joke ha ..

thanks for sparing time with my nonsense.


Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message