nutch-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Vincent Couturier (JIRA)" <j...@apache.org>
Subject [jira] Commented: (NUTCH-427) protocol-smb: plugin protocol implementing the CIFS/SMB protocol. This protocol allows Nutch to crawl Microsoft Windows Shares remotely using the CIFS/SMB protocol implmentation.
Date Mon, 14 Dec 2009 17:15:18 GMT

    [ https://issues.apache.org/jira/browse/NUTCH-427?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12790244#action_12790244
] 

Vincent Couturier commented on NUTCH-427:
-----------------------------------------

The last attached zip does not contain the changes of Ilquiz Latypov. It's necessary to patch
the zip with the protocol-smb-diff.txt. I will try to put a patched version but if Iluqiz
can put his updated version it would be easier.

> protocol-smb: plugin protocol implementing the CIFS/SMB protocol. This protocol allows
Nutch to crawl Microsoft Windows Shares remotely using the CIFS/SMB protocol implmentation.
> ----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
>
>                 Key: NUTCH-427
>                 URL: https://issues.apache.org/jira/browse/NUTCH-427
>             Project: Nutch
>          Issue Type: New Feature
>          Components: fetcher
>    Affects Versions: 0.8.1, 0.9.0, 1.0.0
>         Environment: JAVA - OS independent
>            Reporter: Armel Nene
>            Priority: Minor
>         Attachments: protocol-smb-diff.txt, protocol-smb.zip, protocol-smb.zip, protocol-smb.zip
>
>
> Title:    protocol-smb - Nutch protocol plugin for crawling Microsoft Windows shares
> Author:   Armel T. Nene 
> Update:   Vadim Bauer
> Email:    armel.nene NOSPAM-AT-NOSPAM idna-solutions.com, V a d i m B a u e r <AT>
g m x . d e
> A.  Introduction
>     The protocol-smb plugins allows you to crawl Microsoft Windows shares. It implements
>     the CIFS/SMB protocol which is commonly used on Microsoft OS. The plugin replicate
the
>     behaviour of the protocol-file over CIFS/SMB protocol. This plugin uses the JCifs
library and also
>     support all the properties from the JCifs library.
>     You can find more information on the following site: http://jcifs.samba.org/
>     The smb protocol syntax for crawling is as follow: smb://xxxxx (i.e. smb://server/share).
>     
> B.  Installation
>     1) Binaries only:   The protocol-smb files can be found in the ../plugins directory.
> 				Copy the "protocol-smb" to NUTCHHOME/build/plugins directory.
>                         Put the "smb.properties" file in the NUTCHHOME/conf directory.
>                         Configure the properties in "smb.properties" file
>                         Enable the plugin by updating "nutch-site.xml" file found in
NUTCHHOME/conf directory
> 				e.g. <property>
>     				     	<name>plugin.includes</name>
>     				     	<value>protocol-smb| other plugins...</value>
>     				     	<description>
>  	 			     	</description>
>  	 			     </property>
>     2)  Source code:    The protocol-smb sources can be found in the ../src directory.
> 				Always refer to the Nutch wiki for detailed instructions on building Nutch.  In short:
>                         Copy the 'protocol-smb' folder to NUTCHHOME/src/plugin
>                         Update the build.xml in NUTCHHOME/src/plugin to include plugin
>                         Update the NUTCHHOME/default.properties file to include plugin
>                         run ant to build
>                         Copy the 'smb.properties' file to NUTCHHOME/conf, and configure
the properties
>                         Enable the plugin by updating the nutch-site.xml file
> C: Known Issues
>     1) URLMalformedException: unkown protocol: smb
>        The SMB URL protocol handler is not being successfully installed. 
>        In short, the jCIFS jar must be loaded by the System class loader.
>        Workaround: a) a short term solutions will be to installed the JCIFS jar 
>                       library found in protocol-smb folder in 
>                       JDKHOME/jre/lib/ext and (or) JREHOME/lib/ext
>                    b) After completing step a), if the exeception is still thrown
>                       set the System properties by passing the following arguments
>                       to the JVM: 
>                       -Djava.protocol.handler.pkgs=jcifs
> 			 c) You can set the property also in your Code for example if 
> 			    you start Crawling with org.apache.nutch.crawl.Crawl
> 			    Add the following two lines. This will be the Same like in b)
> 			    public static void main(String args[]) throws Exception {
> 	  		    	System.setProperty("java.protocol.handler.pkgs", "jcifs");
> 				new java.util.PropertyPermission("java.protocol.handler.pkgs","read, write")
> 				//and so on
>        Also you can visit the FAQ page: http://jcifs.samba.org/src/docs/faq.html
>     2) FATAL smb.SMB - Could not read content of protocol: smb://xxxxxx
>        This problem usually occurs if the following properties are not set correctly
in
>        the "smb.properties" file:
>        - username
>        - password
>        - domain
>        Also refer to the following resources for more information on the list of
>        available properties and how to set them:
>        http://jcifs.samba.org/src/docs/api/overview-summary.html#scp
>        Also you can visit the FAQ page: http://jcifs.samba.org/src/docs/faq.html
>        N.B. All properties should set in the "smb.properties" file. You can set 
>             all supported JCIFS properties in the "smb.properties" file.
>      
>     3) Only tested on Windows XP and Windows Server 2003. Please report any tests 
>        conclusion on other OS.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


Mime
View raw message