lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Paul Elschot <paul.elsc...@xs4all.nl>
Subject Re: Document links
Date Tue, 21 Sep 2010 17:56:00 GMT
Op dinsdag 21 september 2010 18:30:08 schreef mark harwood:
> >>Wouldn't that be sufficient?
> 
> Not for some apps. I tried playing the "Kevin Bacon" game using a Lucene index 
> of IMDB data using actorID and movieID keys.
> The difference between that and Neo4j on the same data and query  is night and 
> day. The graph databases are really onto something when resolving a relationship 
> doesn't first require an index to look up endpoints.

When the key values are given by the user this would boil down to adding
the primary and foreign key to Lucene, but that does not appear to be the idea.

It should be possible to randomly add and delete such relationships after
indexWriter.addDocument(), is that the idea?

Adding such relationships by docId would need the addition of
a separate (from the segments) index structure (probably some B-tree) that
would have segmentId-segmentDocId as (part of) the keys, and also as (part of) the values.

Would each link also have an attribute (think payload)?
Would such relationships be named (sth like foreign key field names)?


Regards,
Paul Elschot


> 
> 
> ----- Original Message ----
> From: Paul Elschot <paul.elschot@xs4all.nl>
> To: dev@lucene.apache.org
> Sent: Tue, 21 September, 2010 17:25:31
> Subject: Re: Document links
> 
> When the (primary) key values are provided by the user,
> one could use additional small documents to only store/index
> these relations whenever they change.
> 
> Wouldn't that be sufficient?
> 
> Regards,
> Paul Elschot
> 
> 
> 
> Op dinsdag 21 september 2010 00:35:02 schreef mark harwood:
> > I've been looking at Graph Databases recently (neo4j, OrientDb, InfiniteGraph) 
> 
> > as a faster alternative to relational stores. I notice they either embed Lucene

> >
> > for indexing node properties or (in the case of OrientDB) are talking about 
> > doing this. 
> > 
> > I think their fundamental performance advantage over relational stores is that 
> 
> > they don't have to de-reference foreign keys in a b-tree index to get from a 
> > source node to a target node. Instead they use internally-generated IDs to act 
> 
> > like pointers with more-or-less direct references between nodes/vertexes.  As a

> >
> > result they can follow links very quickly. This got me thinking could Lucene 
> > adopt the idea of creating links between documents that were equally fast using

> >
> > Lucene doc ids?
> > 
> > Maybe the user API would look something like this...
> > 
> >     indexWriter.addLink(fromDocId, toDocId);
> >     DocIdSet reader.getInboundLinks(docId);
> >     DocIdSet reader.getOutboundLinks(docId);
> > 
> > 
> > Internally a new index file structure would be needed to record link info. Any 
> 
> > recorded links that connect documents from different segments would need 
> >careful 
> >
> > adjustment of referenced link IDs when segments merge and Lucene doc IDs are 
> > shuffled.
> > 
> > As well as handling typical graphs (social networks, web data) this could 
> > potentially be used to support tagging operations where apps could create "tag"

> >
> > documents and then link them to existing documents that are being tagged 
> >without 
> >
> > having to update the target doc. There are probably a ton of applications for 
> > this stuff.
> > 
> > I see the Graph DBs busy recreating transactional support, indexes, segment 
> > merging etc and it seems to me that Lucene has a pretty good head start with 
> > this stuff.
> > Anyone else think this might be an area worth exploring?
> > 
> > Cheers
> > Mark
> > 
> > 
> >      
> > 
> > ---------------------------------------------------------------------
> > To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
> > For additional commands, e-mail: dev-help@lucene.apache.org
> > 
> > 
> > 
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
> For additional commands, e-mail: dev-help@lucene.apache.org
> 
> 
>       
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
> For additional commands, e-mail: dev-help@lucene.apache.org
> 
> 
> 

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


Mime
View raw message