lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From mark harwood <>
Subject Document links
Date Mon, 20 Sep 2010 22:35:02 GMT
I've been looking at Graph Databases recently (neo4j, OrientDb, InfiniteGraph) 
as a faster alternative to relational stores. I notice they either embed Lucene 
for indexing node properties or (in the case of OrientDB) are talking about 
doing this. 

I think their fundamental performance advantage over relational stores is that 
they don't have to de-reference foreign keys in a b-tree index to get from a 
source node to a target node. Instead they use internally-generated IDs to act 
like pointers with more-or-less direct references between nodes/vertexes.  As a 
result they can follow links very quickly. This got me thinking could Lucene 
adopt the idea of creating links between documents that were equally fast using 
Lucene doc ids?

Maybe the user API would look something like this...

    indexWriter.addLink(fromDocId, toDocId);
    DocIdSet reader.getInboundLinks(docId);
    DocIdSet reader.getOutboundLinks(docId);

Internally a new index file structure would be needed to record link info. Any 
recorded links that connect documents from different segments would need careful 
adjustment of referenced link IDs when segments merge and Lucene doc IDs are 

As well as handling typical graphs (social networks, web data) this could 
potentially be used to support tagging operations where apps could create "tag" 
documents and then link them to existing documents that are being tagged without 
having to update the target doc. There are probably a ton of applications for 
this stuff.

I see the Graph DBs busy recreating transactional support, indexes, segment 
merging etc and it seems to me that Lucene has a pretty good head start with 
this stuff.
Anyone else think this might be an area worth exploring?



To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message