lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Paul Bell <arach...@gmail.com>
Subject Re: Indexing a long list
Date Sun, 31 Mar 2013 13:38:59 GMT
I think I see what you mean.

Suppose I have a vertex one of whose in-edges has an identifier equal to
"v123"? In indexing the properties of that vertex I might index/store
something like this:

    TextField txtField = new TextField("inEdges", "id:v123 name:hassnapshot
type:image timestamp:7654321", Field.Store.NO);
    doc.add(txtField);

I think this would allow me to query the tokenized value of the TextField
like this:

    Query query = new TermQuery(new Term("inEdges", "type:image"));

But how would I handle the fact that the vertex in question could have
thousands of edges? That is, under either of its edge properties ('inEdges'
or 'outEdges') there could be many entries of the form shown above
("id:vXXX name:XXX ....), etc.Is it simply a matter of appending subsequent
edge properties strings? Or would it make more sense to index on
'inEdges.v123'? The problem with this, I think, is that I can no longer ask
about multiple edges with a single query, right?

Thanks, Jack.

-Paul



On Sun, Mar 31, 2013 at 9:00 AM, Jack Krupansky <jack@basetechnology.com>wrote:

> Multivalued fields are the other approach to keyword value pairs.
>
> And if you can denormalize your data, storing structure as separate
> documents can make sense and support more powerful queries. Although the
> join capabilities are rather limited.
>
>
> -- Jack Krupansky
>
> -----Original Message----- From: Paul Bell
> Sent: Sunday, March 31, 2013 8:52 AM
> To: java-user@lucene.apache.org
> Subject: Re: Indexing a long list
>
>
> Hi Jack,
>
> Thanks for the reply. I am very new to Lucene.
>
> Your timing is a bit uncanny. I was just coming to the conclusion that
> there's nothing special about this case for Lucene, i.e., a tokenized field
> should work, when I looked up and saw your e-mail.
>
> In re the larger context: yeah, the properties in question here belong to
> some kind of node, e.g., maybe a vertex in a graph DB. Possible properties
> include 'name', 'type', 'inEdges', 'outEdges', etc. Most properties are
> simple k=v pairs. But a few, notable the 'edge' properties, could be long
> lists.
>
> My intent was to create a Lucene Document for each node. The Fields in this
> Document would represent all of the node's properties. A generic (not in
> Lucene syntax) query should be able to ask after any property, e.g.,
>
>    ('name' equals "vol1" AND 'outEdges.name' startsWith "hasMirror")
>
> Note that 'outEdges.name' represents multiple elements, where 'name'
> represents only one. That is, the generic query syntax is trying to match
> any out-edge whose name property starts with "hasMirror". I haven't quite
> crystallized the generic query syntax and don't know how best to map it to
> both a Lucene query and to an appropriate Lucene index structure. Please
> let me know if you've any suggestions!
>
> Thanks again.
>
> -Paul
>
>
>
> On Sun, Mar 31, 2013 at 8:33 AM, Jack Krupansky <jack@basetechnology.com>*
> *wrote:
>
>  The first question is how do you want to access the data? What do you want
>> your queries to look like?
>>
>> What is the larger context? Are these properties of larger documents? Are
>> there more than one per document? Etc.
>>
>> Why not just store the property as a tokenized field? Then you can query
>> whether v(i) or v(j) are or are not present as keywords.
>>
>> -- Jack Krupansky
>>
>> -----Original Message----- From: Paul Bell
>> Sent: Sunday, March 31, 2013 8:21 AM
>> To: java-user@lucene.apache.org
>> Subject: Indexing a long list
>>
>>
>> Hi All,
>>
>> Suppose I need to index a property whose value is a long list of terms.
>> For
>> example,
>>
>>    someProperty = ["v1", "v2", .... , "v1000000"]
>>
>> Please note that I could drop the leading "v" and index these as numbers
>> instead of strings.
>>
>> But the question is what's the best practice in Lucene when dealing with a
>> case like this? I need to be able to retrieve the list. This makes methink
>> that I need to store it. And I suppose that the list could be stored in
>> the
>> index itself or in the "content" to which the index points.
>>
>> So there are really two parts to this question:
>>
>> 1. Lucene "best practices" for long list
>> 2. Where to store such a list
>>
>> Thanks for your help.
>>
>> -Paul
>>
>> ------------------------------****----------------------------**
>> --**---------
>> To unsubscribe, e-mail: java-user-unsubscribe@lucene.****apache.org<
>> java-user-**unsubscribe@lucene.apache.org<java-user-unsubscribe@lucene.apache.org>
>> >
>> For additional commands, e-mail: java-user-help@lucene.apache.****org<
>> java-user-help@lucene.**apache.org <java-user-help@lucene.apache.org>>
>>
>>
>>
>
> ------------------------------**------------------------------**---------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.**apache.org<java-user-unsubscribe@lucene.apache.org>
> For additional commands, e-mail: java-user-help@lucene.apache.**org<java-user-help@lucene.apache.org>
>
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message