hadoop-common-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Xaida <hota.a...@gmail.com>
Subject How to implement this with hadoop, guidelines PLEASE, hadoop beginner
Date Sat, 12 Jun 2010 06:44:17 GMT

Hi all!

dataset for my project turned out to be huge, and my teacher told me I have
to use hadoop framework. I am stuggling to understand how to make this, but
honestly, i cant move from dead point :( I am not that good programmer and I
can not find any of classmates who knows hadoop to help me out...I will
excuse myself ahead for writing a lot, but i really dont know what else to

So I have this implemented with some Java concurrency features, but it is
too slow for this set size

- algorithm takes one folder, and in all its subfolders, finds .txt files
with specific name
- It queries lucene index and pupulates a list of most frequent terms
- Parses the .txt files line by line, and searches for a match between every
line's third word and if there is match in the list
- In case that there was match between some list term and third word from
some line in txt, the entire line is stored in buffer and afterwards buffers
are written to output txt files.

So final result are txt files, which are of identical structure as original
ones, except that they are smaller, since they contain only matching lines.

I am attaching files 
1) TextFileAnalyzer, is a java callable object which takes txt file and list
and does the parsing and comparison.
2) MainAnalyzer.java, goes through main folder, gets txt files, and gives
them to TextFileAnalyzer callables, together with list it gets from lucene

I am sorry for  asking for so much help, but i really have nobody to ask and
i tried to grasp how to do this, but with this brain and time, its out of my

Also, I also read that it is not possible to query lucene index on

I will very much apreciate all the help, it is very much needed.
Thank you in advance!

View this message in context: http://lucene.472066.n3.nabble.com/How-to-implement-this-with-hadoop-guidelines-PLEASE-hadoop-beginner-tp890309p890309.html
Sent from the Hadoop lucene-users mailing list archive at Nabble.com.

View raw message