giraph-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Ahmet Emre Aladağ <>
Subject Re: HBase EdgeInputFormat
Date Fri, 19 Jul 2013 13:17:15 GMT
Thank you,

TextVertexInputFormat has getEdges() method but EdgeInputFormat does not 
have (since it's not a vertex) and it does not support returning 
multiple edges per record. Normally, a row should have only one edge but 
in my case (Nutch2), we have multiple edges per row.

key: URL
value: ol:URL2, ol:URL3, ol:URL4, ...

Indicating multiple outlinks per row.

Is there a way to overcome this?

On 07/19/2013 01:03 AM, Avery Ching wrote:
> I don't think it will be hard to implement.  Just start with the 
> HbaseVertexInputFormat and have it extend EdgeInputFormat.  You can 
> look at TableEdgeInputFormat for an example.  It sounds like a good 
> contribution to Giraph.
> On 7/18/13 1:57 PM, Puneet Jain wrote:
>> I also need this feature. Will be really helpful.
>> On Thu, Jul 18, 2013 at 10:49 AM, Ahme Emre Aladağ 
>> < <>> wrote:
>>     Hi,
>>     Question: Will there be HBaseEdgeInputFormat class or is there a
>>     restriction of HBase thus we can't implement it?
>>     HBaseVertexInputFormat is fine for vertex-centric reading, i.e.
>>     each row in HBase corresponds to one Vertex. But it does not
>>     allow me to create duplicate vertices with the same ID.
>>     Now I have the case "many rows in HBase can correspond to one
>>     Vertex, each representing sets of edges."
>>     Example:
>>     a1 - x y z
>>     a2 - t p
>>     a3 - k
>>     will be
>>     vertex "a" with edges to x y z t p k
>>     It gives me the intuition that if there existed
>>     HBaseEdgeInputFormat, I could solve this case. But it doesn't
>>     exist yet.

View raw message