hadoop-common-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Black, Michael (IS)" <Michael.Bla...@ngc.com>
Subject Re: Custom input split
Date Sun, 26 Dec 2010 12:59:21 GMT
You mean the file is "not trusted".  I was using Outlook and my company automatically puts
a digital certificate on all emails.   I'm using webmail right now which doesn't.  That certificate
is installed by default on all company computers so it looks trusted to us without having
to explicitly trust the certificate.
I don't think my split problem has anything to do with the Lucene index...that was just informational.
Here's my getsplits...it calls other functions which aren't important to the problem at hand...
        public List<InputSplit> getSplits(JobContext context) throws IOException,
                        InterruptedException {
                Configuration conf = context.getConfiguration();
                List<InputSplit> splits = new ArrayList<InputSplit>;
                Indexer indexer = new Indexer(conf.get(Config.Index), true);
                Iterator<Document> iDocument = indexer.iterator();
                int ndocs=20; // limt the # of docs for testing -- got over 100,000 of these
                while(iDocument.hasNext() && i < 20) {
                        Document document = iDocument.next();
                        String docid = document.getId();
                        System.out.println("Adding ID  " + docid);
                        splits.add(new PInputSplit(docid));
                return splits;

I assume there's a way to make a specific # of splits and add each document to the separate
splits...but I'll be darned if I can find the docs or an example to show this.
As I said I'm using hadoop-0.20.2 which I know makes a difference as so many things get deprecated
on each release.  Old references don't seem to work.
Michael D. Black
Senior Scientist
Advanced Analytics Directorate
Northrop Grumman Information Systems


From: ?? [mailto:toppiprc@gmail.com]
Sent: Sat 12/25/2010 10:32 AM
To: common-user@hadoop.apache.org
Subject: EXTERNAL:Re: Custom input split

What is the file you have attached? It is not safe.

I don't know the format of lucene index, would you please give an example?

  • Unnamed multipart/mixed (inline, None, 0 bytes)
View raw message