nutch-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "salima abdulsalam (JIRA)" <j...@apache.org>
Subject [jira] Created: (NUTCH-749) Fetching the url from crawldb
Date Fri, 21 Aug 2009 13:38:14 GMT
Fetching the url from crawldb
-----------------------------

                 Key: NUTCH-749
                 URL: https://issues.apache.org/jira/browse/NUTCH-749
             Project: Nutch
          Issue Type: Bug
         Environment: Nutch with solr integration
            Reporter: salima abdulsalam


Hi,
 Iam new to using the nutch with solr.I followed the link  http://www.lucidimagination.com/blog/2009/03/09/nutch-solr/
 for integration.Iam getting an error while fetching the url from crawldb.

I used the below command

  bin/nutch fetch $SEGMENT -noParsing and i set the SEGMENT as  export SEGMENT=crawl/segments/`ls
-tr crawl/segments|tail -1`

after running the command, iam getting the error as


Fetcher: Your 'http.agent.name' value should be listed first in 'http.robots.agents' property.
Fetcher: starting
Fetcher: segment: crawl/segments/20090821062021
Exception in thread "main" java.io.IOException: Illegal file pattern: Expecting set closure
character or end of range, or } for glob 20090821062021 at 30
        at org.apache.hadoop.fs.FileSystem$GlobFilter.error(FileSystem.java:1086)
        at org.apache.hadoop.fs.FileSystem$GlobFilter.setRegex(FileSystem.java:1071)
        at org.apache.hadoop.fs.FileSystem$GlobFilter.<init>(FileSystem.java:989)
        at org.apache.hadoop.fs.FileSystem.globPathsLevel(FileSystem.java:955)
        at org.apache.hadoop.fs.FileSystem.globPathsLevel(FileSystem.java:964)
        at org.apache.hadoop.fs.FileSystem.globPathsLevel(FileSystem.java:964)
        at org.apache.hadoop.fs.FileSystem.globPathsLevel(FileSystem.java:964)
        at org.apache.hadoop.fs.FileSystem.globPathsLevel(FileSystem.java:964)
        at org.apache.hadoop.fs.FileSystem.globPathsLevel(FileSystem.java:964)
        at org.apache.hadoop.fs.FileSystem.globPathsLevel(FileSystem.java:964)
        at org.apache.hadoop.fs.FileSystem.globPathsLevel(FileSystem.java:964)
        at org.apache.hadoop.fs.FileSystem.globStatusInternal(FileSystem.java:904)
        at org.apache.hadoop.fs.FileSystem.globStatus(FileSystem.java:868)
        at org.apache.hadoop.mapred.FileInputFormat.listStatus(FileInputFormat.java:159)
        at org.apache.hadoop.mapred.SequenceFileInputFormat.listStatus(SequenceFileInputFormat.java:39)
        at org.apache.nutch.fetcher.Fetcher$InputFormat.getSplits(Fetcher.java:101)
        at org.apache.hadoop.mapred.JobClient.submitJob(JobClient.java:797)
        at org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:1142)
        at org.apache.nutch.fetcher.Fetcher.fetch(Fetcher.java:969)
        at org.apache.nutch.fetcher.Fetcher.main(Fetcher.java:1003)

Can anyone help in this.

Thanks,
Salima


 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


Mime
View raw message