Changeset 1080

Show
Ignore:
Timestamp:
12/18/07 08:40:11 (8 months ago)
Author:
xkovah
Message:

Added the limit_spidering honeyclient.xml option which if set to 1 will prevent the honeyclient from picking up any new links from the sites it visits, so that it will only visit the URLs in the initial starting list. This overrides max_relative_links_to_visit

Files:

Legend:

Unmodified
Added
Removed
Modified
Copied
Moved
  • honeyclient/trunk/etc/honeyclient.xml

    r1068 r1080  
    8383            <ActiveContent> 
    8484                <enable description="Enables active content parsing. 1 enables, 0 disables." default="1"> 
    85                     1 
     85                    0 
    8686                </enable> 
    8787                <Flash> 
     
    9797                    1 
    9898                </ignore_links_timed_out> 
    99                 <max_relative_links_to_visit description="An integer, representing the maximum number of relative links that the browser should visit, before moving onto another website.  If negative, then the browser will exhaust all possible relative links found, before moving on.  This functionality is best effort; it's possible for the browser to visit new links on previously visited websites." default="-1"> 
     99                <limit_spidering description="Sometimes you only want to check the URLs in your initial list, and not add any of the relative or absolute links found on the sites you visit. (For instance, when trying to determine whether a specific list of URLs contains malicious sites.) In this case you should set limit_spidering to 1. This option will override max_relative_links_to_visit (essentially setting it to 0)." default="0"> 
    100100                    1 
     101                </limit_spidering> 
     102                <max_relative_links_to_visit description="An integer, representing the maximum number of relative links that the browser should visit, before moving onto another website.  If negative, then the browser will exhaust all possible relative links found, before moving on.  This functionality is best effort; it's possible for the browser to visit new links on previously visited websites. Note that this value can be overridden if limit_spidering is set to 1 above." default="-1"> 
     103                    5 
    101104                </max_relative_links_to_visit> 
    102105                <positive_words description="If a link contains any number of these words, then its probability of being visited (its score) will increase."> 
     
    147150        </Driver> 
    148151        <perform_integrity_checks description="An integer, representing whether the Agent should perform any integrity checks. 1 enables, 0 disables." default="1"> 
    149             1 
     152            0 
    150153        </perform_integrity_checks> 
    151154        <!-- HoneyClient::Agent::Integrity Options --> 
  • honeyclient/trunk/lib/HoneyClient/Agent/Driver/Browser.pm

    r1079 r1080  
    923923        max_relative_links_to_visit => getVar(name => "max_relative_links_to_visit"), 
    924924 
     925        #Sometimes you only want to check the URLs in your initial list, and  
     926        #not add any of the relative or absolute links found on the sites you  
     927        #visit. (For instance, when trying to determine whether a specific list  
     928        #of URLs contains malicious sites.) In this case you should set  
     929        #limit_spidering to 1. This option will override max_relative_links_to_visit  
     930        #(essentially setting it to 0). 
     931        limit_spidering => getVar(name => "limit_spidering"), 
     932 
    925933        # An array of positive words, where a link's probability of being 
    926934        # visited (its score) will increase, if the link contains any of these 
     
    10921100        # Assume that all other content types are HTML-based. 
    10931101        } else { 
    1094             # Call the link scoring function 
    1095             %scored_links = $self->_scoreLinks($base, $content); 
     1102            #If limit_spidering is set, we don't want to add any new links 
     1103            #Hence, by not calling _scoreLinks() the next logic will just drop all 
     1104            #found links, because it won't call _processLinks in the next conditional 
     1105            #(The only reason for putting this check here rather than there is to avoid 
     1106            #the cost of useless link parsing for scoring) 
     1107            #NOTE: This technically could go at the level that we don't even use LWP::UserAgent 
     1108            #but it has just been put here to be conservative as we may want to use the 
     1109            #data from LWP for the future hybrid approach. 
     1110            if(!$self->{limit_spidering}){ 
     1111                # Call the link scoring function 
     1112                %scored_links = $self->_scoreLinks($base, $content); 
     1113            } 
    10961114        } 
    10971115    }