hadoop-common-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Cam Bazz" <camb...@gmail.com>
Subject edge count question
Date Thu, 26 Jun 2008 22:02:56 GMT
hello,

I have a lucene index storing documents which holds src and dst words. word
pairs may repeat. (it is a multigraph).

I want to use hadoop to count how many of the same word pairs there are. I
have looked at the aggregateword count example, and I understand that if I
make a txt file
such as

src1>dst2
src2>dst2
src1>dst2

..

and use something similar to the aggregate word count example, I will get
the result desired.

Now questions. how can I hookup my lucene index to hadoop. is there a better
way then dumping the index to a text file with >'s, copying this to dfs and
getting the results back?

how can I make incremental runs? (once the index processed and I got the
results, how can I dump more data onto it so it does not start from
beginning)

Best regards,

-C.B.

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message