root/honeyclient/branches/bug/42/lib/HoneyClient/Agent/Driver/Browser.pm

Revision 96, 46.4 kB (checked in by kindlund, 2 years ago)

Completed registry parser documentation and unit tests; corrected minor mispellings; updated POD documentation to reflect public website.

  • Property svn:keywords set to Id "$file"
Line 
1 #######################################################################
2 # Created on:  November 06, 2006
3 # Package:     HoneyClient::Agent::Driver::Browser
4 # File:        Browser.pm
5 # Description: A generic driver for automating the link visitation
6 #              behavior of a web browser running inside a
7 #              HoneyClient VM.
8 #
9 # CVS: $Id$
10 #
11 # @author knwang, kindlund, stephenson
12 #
13 # Copyright (C) 2006 The MITRE Corporation.  All rights reserved.
14 #
15 # This program is free software; you can redistribute it and/or
16 # modify it under the terms of the GNU General Public License
17 # as published by the Free Software Foundation, using version 2
18 # of the License.
19 #
20 # This program is distributed in the hope that it will be useful,
21 # but WITHOUT ANY WARRANTY; without even the implied warranty of
22 # MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
23 # GNU General Public License for more details.
24 #
25 # You should have received a copy of the GNU General Public License
26 # along with this program; if not, write to the Free Software
27 # Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston, MA
28 # 02110-1301, USA.
29 #
30 #
31 #######################################################################
32
33 =pod
34
35 =head1 NAME
36
37 HoneyClient::Agent::Driver::Browser - Perl extension to drive a
38 web browser, running inside a HoneyClient VM.
39
40 =head1 VERSION
41
42 This documentation refers to HoneyClient::Agent::Driver::Browser version 1.0.
43
44 =head1 SYNOPSIS
45
46   use HoneyClient::Agent::Driver::Browser;
47
48   # Library used exclusively for debugging complex objects.
49   use Data::Dumper;
50
51   # Create a new Browser object, initialized with a collection
52   # of URLs to visit.
53   my $browser = HoneyClient::Agent::Driver::Browser->new(
54       links_to_visit => {
55           'http://www.google.com'  => 1,
56           'http://www.cnn.com'     => 1,
57       },
58   );
59
60   # If you want to see what type of "state information" is physically
61   # inside $browser, try this command at any time.
62   print Dumper($browser);
63
64   # Continue to "drive" the driver, until it is finished.
65   while (!$browser->isFinished()) {
66
67       # Before we drive the application to a new set of resources,
68       # find out where we will be going within the application, first.
69       print "About to contact the following resources:\n";
70       print Dumper($browser->next());
71
72       # Now, drive browser for one iteration.
73       $browser->drive();
74
75       # Get the driver's progress.
76       print "Status:\n";
77       print Dumper($browser->status());
78      
79   }
80
81   # At this stage, the driver has exhausted its collection of links
82   # to visit.  Let's say we want to add the URL "http://www.mitre.org"
83   # to the driver's list.
84   $browser->{links_to_visit}->{'http://www.mitre.org'} = 1;
85
86   # Now, drive the browser for one iteration.
87   $browser->drive();
88
89 =head1 DESCRIPTION
90
91 This library allows the Agent module to drive an instance of any broswer,
92 running inside the HoneyClient VM.  The purpose of this module is to
93 programmatically navigate the browser to different websites, in order to
94 become purposefully infected with new malware.
95
96 This module is object-oriented in design, retaining all state information
97 within itself for easy access.  A specific browser class must inherit from
98 Browser.
99
100 Fundamentally, the browser driver is initialized with a set of absolute URLs
101 for the browser to drive to.  Upon visiting each URL, the driver collects
102 any B<new> links found and will attempt to drive the browser to each
103 valid URL upon subsequent iterations of work.
104
105 For each top-level URL given, the driver will attempt to process all
106 corresponding links that are hosted on the same server, in order to
107 simulate a complete 'spider' of each server.  B<However>, because
108 URLs are added and removed from hashtables, the order of which URLs
109 are processed B<cannot be guaranteed nor maintained across subsequent
110 iterations of work>.
111
112 This means that the browser driver will try to visit all links shared by a
113 common server in random order before moving on to drive to other,
114 external links in a random fashion.  B<However>, this cannot be
115 guaranteed, as additional links from the same server may be found
116 later, after processing the contents of an external link.
117
118 As the browser driver navigates the browser to each link, it
119 maintains a set of hashtables that record when valid links were
120 visited (see L<links_visited>); when invalid links were found
121 (see L<links_ignored>); and when the browser attempted to visit
122 a link but the operation timed out (see L<links_timed_out>).
123 By maintaining this internal history, the driver will B<never>
124 navigate the browser to the same link twice.
125
126 Lastly, it is highly recommended that for each driver B<$object>,
127 one should call $object->isFinished() prior to making a subsequent
128 call to $object->drive(), in order to verify that the driver has
129 not exhausted its set of links to visit.  Otherwise, if
130 $object->drive() is called with an empty set of links to visit,
131 the corresponding operation will B<croak>.
132
133 =cut
134
135 package HoneyClient::Agent::Driver::Browser;
136
137 # XXX: Disabled version check, Honeywall does not have Perl v5.8 installed.
138 #use 5.008006;
139 use strict;
140 use warnings;
141 use Config;
142 use Carp ();
143
144 # Traps signals, allowing END: blocks to perform cleanup.
145 use sigtrap qw(die untrapped normal-signals error-signals);
146
147 #######################################################################
148 # Module Initialization                                               #
149 #######################################################################
150
151 BEGIN {
152     # Defines which functions can be called externally.
153     require Exporter;
154     our (@ISA, @EXPORT, @EXPORT_OK, %EXPORT_TAGS, $VERSION);
155
156     # Set our package version.
157     $VERSION = 0.9;
158
159     # Define inherited modules.
160     use HoneyClient::Agent::Driver;
161
162     @ISA = qw(Exporter HoneyClient::Agent::Driver);
163
164     # Symbols to export on request
165     # Note: Since this module is object-oriented, we do *NOT* export
166     # any functions other than "new" to call statically.  Each function
167     # for this module *must* be called as a method from a unique
168     # object instance.
169     @EXPORT = qw();
170
171     # Items to export into callers namespace by default. Note: do not export
172     # names by default without a very good reason. Use EXPORT_OK instead.
173     # Do not simply export all your public functions/methods/constants.
174
175     # This allows declaration use HoneyClient::Agent::Driver::Browser ':all';
176     # If you do not need this, moving things directly into @EXPORT or @EXPORT_OK
177     # will save memory.
178
179     # Note: Since this module is object-oriented, we do *NOT* export
180     # any functions other than "new" to call statically.  Each function
181     # for this module *must* be called as a method from a unique
182     # object instance.
183     %EXPORT_TAGS = (
184         'all' => [ qw() ],
185     );
186
187     # Symbols to autoexport (:DEFAULT tag)
188     @EXPORT_OK = ( @{ $EXPORT_TAGS{'all'} } );
189
190     # XXX: Fix this!
191     # Check to make sure our OS is Windows-based.
192     #if ($Config{osname} !~ /^MSWin32$/) {
193     #    Carp::croak "Error: " . __PACKAGE__ . " will only run on Win32 platforms!\n";
194     #}   
195
196     $SIG{PIPE} = 'IGNORE'; # Do not exit on broken pipes.
197 }
198 our (@EXPORT_OK, $VERSION);
199
200 #######################################################################
201
202 # Include the Global Configuration Processing Library
203 use HoneyClient::Util::Config qw(getVar);
204
205 # Use ISO 8601 DateTime Libraries
206 use DateTime::HiRes;
207
208 # Use fractional second sleeping.
209 # TODO: Need unit testing.
210 use Time::HiRes qw(sleep);
211
212 # Use Storable Library
213 # TODO: Need unit testing.
214 use Storable qw(dclone);
215
216 # Use threads Library
217 # TODO: Need unit testing.
218 use threads;
219 # TODO: Need unit testing.
220 use threads::shared;
221
222 # TODO: Need unit testing.
223 use HoneyClient::Util::SOAP qw(getClientHandle);
224    
225 # TODO: Need unit testing.
226 use LWP::UserAgent;
227
228 # TODO: Need unit testing.
229 use HTTP::Request::Common;
230
231 # TODO: Need unit testing.
232 use HTML::LinkExtor;
233
234 # TODO: Need unit testing.
235 use URI::URL;
236
237 =pod
238
239 =head1 DEFAULT PARAMETER LIST
240
241 When a Browser B<$object> is instantiated using the B<new()> function,
242 the following parameters are supplied default values.  Each value
243 can be overridden by specifying the new (key => value) pair into the
244 B<new()> function, as arguments.
245
246 Furthermore, as each parameter is initialized, each can be individually
247 retrieved and set at any time, using the following syntax:
248
249   my $value = $object->{key}; # Gets key's value.
250   $object->{key} = $value;    # Sets key's value.
251
252 =head2 links_to_visit
253
254 =over 4
255
256 This parameter is a hashtable of fully qualified URLs for the browser
257 to visit.  Specifically, each 'key' corresponds to an absolute URL
258 and the 'value' is always 1.
259
260 =back
261
262 =head2 links_visited
263
264 =over 4
265
266 This parameter is a hashtable of fully qualified URLs that the
267 browser has already visited.  Specifically, each 'key' corresponds
268 to an absolute URL and the 'value' is a string representing the
269 date and time of when the link was visited.
270
271 B<Note>: See internal documentation of _getTimestamp() for the
272 corresponding date/time format of each value.
273
274 =back
275
276 =head2 links_ignored
277
278 =over 4
279
280 This parameter is a hashtable of fully qualified URLs that the browser
281 has found during its link traversal process, but the browser could not
282 access the link.
283
284 Links could be added to this list if access requires any type of
285 authentication, or if the link points to a non-HTTP or HTTPS
286 resource (i.e., "javascript:doNetDetect()").
287
288 Specifically, each 'key' corresponds to an absolute URL and the
289 'value' is a string representing the date and time of when the link
290 was visited.
291
292 B<Note>: See internal documentation of _getTimestamp() for the
293 corresponding date/time format of each value.
294
295 =back
296
297 =head2 relative_links_to_visit
298
299 =over 4
300
301 This parameter is a hashtable of fully qualified URLs, such that each
302 URL shares a common B<hostname>.  This is an internal hashtable used
303 by the Browser driver that should be initially empty.  As the Browser
304 driver extracts and removes new URLs off the B<links_to_visit> hashtable,
305 driving the browser to each URL, any B<relative> links found are
306 added into this hashtable; any B<external> links found are added
307 back into the B<links_to_visit> hashtable.
308
309 When driving to the next link, this hashtable is exhausted prior
310 to the main B<links_to_visit> hashtable.  This allows a
311 browser to navigate to all links hosted on the same server, prior
312 to contacting a different server.
313
314 Specifically, each 'key' corresponds to an absolute URL and the
315 'value' is always 1.
316
317 =back
318
319 =head2 next_link_to_visit
320
321 =over 4
322
323 This parameter is a scalar that contains the next URL to visit.
324 It is updated dynamically, any time $object->getNextLink() is called.
325
326 When the browser is ready to drive to the next link, B<next_link_to_visit>
327 is checked first.  If that value is B<undef>, then the B<relative_links_to_visit>
328 hashtable is checked next.  If that hashtable is empty, then finally the
329 B<links_to_visit> hashtable is checked last.
330
331 =back
332
333 =head2 links_timed_out
334
335 =over 4
336
337 This parameter is a hashtable of fully qualified URLs that the browser
338 has found during its link traversal process, but the browser
339 could not access the corresponding resource due to the operation
340 timing out.
341
342 Specifically, each 'key' corresponds to an absolute URL and the
343 'value' is a string representing the date and time of when access to
344 the resource was attempted.
345
346 B<Note>: See internal documentation of _getTimestamp() for the
347 corresponding date/time format of each value.
348
349 =back
350
351 =head2 ignore_timed_out_links
352
353 =over 4
354
355 If this parameter is set to 1, then the browser will also
356 never attempt to revisit any links that caused the browser to
357 time out.
358
359 =back
360
361 =head2 process_name
362
363 =over 4
364
365 A string containing the process name of the  browser application,
366 as it appears in the Task Manager.
367
368 =back
369
370 =head2 max_relative_links_to_visit
371
372 =over 4
373
374 An integer, representing the maximum number of relative links that
375 the browser should visit, before moving onto another website.  If
376 negative, then the browser will exhaust all possible relative links
377 found, before moving on.  This functionality is best effort; it's
378 possible for the browser to visit new links on previously visited
379 websites.
380
381 =back
382
383 =cut
384
385 my %PARAMS = (
386
387     # This is a hashtable of fully qualified URLs
388     # to visit by the browser.  Specifically, the 'key' is
389     # the absolute URL and the 'value' is always 1.
390     links_to_visit          => { },
391
392     # This is a hashtable of fully qualified URLs that the
393     # browser has already visited.  Specifically, the
394     # 'key' is the absolute URL and the 'value' is a string
395     # representing the date and time of when the link was visited.
396     #
397     # Note: See _getTimestamp() for the corresponding date/time
398     # format.
399     links_visited           => { },
400
401     # This is a hashtable of URLs that the browser has found
402     # during its traversal process, but the browser could not
403     # access the link.
404     #
405     # Links could be added to this list if access requires any type of
406     # authentication, or if the link points to a non-HTTP or HTTPS
407     # resource (i.e., "javascript:doNetDetect()").
408     #
409     # The 'key' is the absolute URL and the 'value' is a string
410     # representing the date and time of when the link was visited.
411     #
412     # Note: See _getTimestamp() for the corresponding date/time
413     # format.
414     links_ignored           => { },
415
416     # This is a hashtable of fully qualified URLs
417     # that all share a common *hostname*.  This hashtable should be
418     # initially empty.  As the driver extracts and removes new URLs
419     # off the 'links_to_visit' hashtable, driving the browser to each URL,
420     # any *relative* links found are added into this hashtable; any
421     # *external* links found are added back into the 'links_to_visit'
422     # hashtable.
423     #
424     # When navigating to the next link, this hashtable is exhausted prior
425     # to the main 'links_to_visit' hashtable.  This allows a
426     # browser to navigate to all links hosted on the same server, prior
427     # to contacting a different server.
428     #   
429     # Specifically, the 'key' is the absolute URL and the 'value'
430     # is always 1.
431     relative_links_to_visit => { },
432
433     # This is a scalar that contains the next URL to visit.
434     # It is updated dynamically, any time getNextLink() is called.
435     # When the browser is ready to drive to the next link,
436     # 'next_link_to_visit' is checked.  If that value is undef, then
437     # the 'relative_links_to_visit' hashtable is checked next.
438     # If that hashtable is empty, then finally the 'links_to_visit'
439     # hashtable is checked.
440     next_link_to_visit      => undef,
441
442     # This is a hashtable of URLs that the browser has found
443     # during its traversal process, but the browser could not
444     # access the resource due to the operation timing out.
445     #
446     # The 'key' is the absolute URL and the 'value' is a string
447     # representing the date and time of when the link was visited.
448     #
449     # Note: See _getTimestamp() for the corresponding date/time
450     # format.
451     links_timed_out         => { },
452
453     # If this parameter is a defined scalar, then the browser
454     # will also never attempt to revisit any links that caused
455     # the browser to time out.
456     ignore_links_timed_out  => getVar(name => "ignore_links_timed_out"),
457
458     # A string containing the process name of the browser application,
459     # as it appears in the Task Manager.
460     process_name            => getVar(name => "process_name"),
461
462     # An integer, representing how many relative links the browser
463     # should continue to drive to, before moving onto another
464     # website.  If negative, then the browser will exhaust all possible
465     # relative links, before moving on.  (This internal variable should
466     # never be modified externally.)
467     _remaining_number_of_relative_links_to_visit => getVar(name => "max_relative_links_to_visit"),
468
469     # An integer, representing the maximum number of relative links that
470     # the browser should visit, before moving onto another website.  If
471     # negative, then the browser will exhaust all possible relative links
472     # found, before moving on.  This functionality is best effort; it's
473     # possible for the browser to visit new links on previously visited
474     # websites.
475     max_relative_links_to_visit => getVar(name => "max_relative_links_to_visit"),
476    
477 );
478
479 #######################################################################
480 # Private Methods Implemented                                         #
481 #######################################################################
482
483 # Helper function designed to retrieve the next link for the browser
484 # to navigate to.
485 #
486 # Note: Calling this function will implicitly remove the next link from
487 #       any and all applicable hashtables/scalars.
488 #
489 # When getting the next link, 'next_link_to_visit' is checked first.
490 # If that value is undef, then the 'relative_links_to_visit' hashtable
491 # is checked next.  If that hashtable is empty, then finally the
492 # 'links_to_visit' hashtable is checked.
493 #
494 # Inputs: HoneyClient::Agent::Driver::Browser object
495 # Outputs: link, or undef if all applicable scalars/hashtables are empty
496 sub _getNextLink {
497
498     # Get the object state.
499     my $self = shift;
500    
501     # Set the link to find as undef, initially.
502     # We use undef to signify that our URL *_links_to_visit hashtables
503     # are empty.  If we were to use the empty string instead, as our
504     # signal, then this code would misinterpret an empty link
505     # <a href=""></a> as a signal that our URL hashtables were empty.
506     my $link = undef;
507
508     while (!defined($link) or ($link eq "")) {
509         # Try getting the next link from the next link
510         # scalar.
511         $link = $self->next_link_to_visit;
512         $self->{next_link_to_visit} = undef;
513
514         # If the next link scalar is empty, try
515         # getting a link from the relative hashtable.
516         unless (defined($link)) {
517             $link = _pop($self->relative_links_to_visit);
518         }
519
520         # If the relative hashtable is empty, try getting one
521         # from the external hashtable.
522         unless (defined($link)) {
523             $link = _pop($self->links_to_visit);
524         }
525
526         # If all hashtables/scalars were empty, immediately return an
527         # undef value.
528         unless (defined($link)) {
529             return $link;
530         }
531
532         # Now, make sure the link is valid, before we return
533         # it; if it's not valid, we simply move on to the next
534         # one in our hashtables.  Invalid links will cause this
535         # function to return an empty string.
536         $link = $self->_validateLink($link);
537     }
538
539     # Return the next link found.
540     return $link;
541 }
542
543 # Helper function designed to get a current timestamp from
544 # the system OS.
545 #
546 # Note: This timestamp is in ISO 8601 format.
547 #
548 # Inputs: none
549 # Outputs: timestamp
550 sub _getTimestamp {
551     my $dt = DateTime::HiRes->now();
552     return $dt->ymd('-') . " " .
553            $dt->hms(':') . "." .
554            $dt->nanosecond();
555 }
556
557 # Helper function designed to "pop" a key off a given hashtable.
558 # When given a hashtable reference, this function will extract a valid key
559 # from the hashtable and delete the (key, value) pair from the
560 # hashtable.
561 #
562 # Note: There is no guaranteed order about how this function picks
563 # keys from the hashtable.
564 #
565 # Inputs: hashref
566 # Outputs: valid key, or undef if the hash is empty
567 sub _pop {
568
569     # Get supplied hash reference.
570     my $hash = shift;
571
572     # Get a new key.
573     my @keys = keys(%{$hash});
574     my $key = pop(@keys);
575    
576     # Delete the key from the hashtable.
577     if (defined($key)) {
578         delete $hash->{$key};
579     }
580
581     # Return the key found.
582     return $key;
583 }
584
585 # This is the abstract function which actually fetches the web content using
586 # a specific browser implementation.  Must be implemented by each browser class.
587
588 sub getContent {
589
590 }
591
592 # Helper function which parses the HTTP::Response from LWP::UserAgent
593 # and returns an array of the links contained in the response
594 #
595 # Inputs: HTTP::Response object
596 # Outputs: Array containing all href links within the response
597
598 sub _getAllLinks {
599    
600     my $response = shift;
601     my $hostname = shift;
602     my @links = ();
603     my $thislink;
604    
605     my $html = $response->content;
606    
607     while( $html =~ m/<A HREF=\"(.*?)\"/gi ) {
608         $thislink = $1;
609
610         # For relative links, prepend the hostname
611         # TODO:  Probably shouldn't assume the HTTP protocol...
612         if ($thislink =~ /^\//) {
613             $thislink = "http://" . $hostname . $thislink;
614         }
615        
616         push @links, $thislink;
617     }
618
619     #Return the list of absolute links
620     return @links;
621 }
622
623 # Helper function, designed to extract the hostname
624 # (and, if it exists, the port number) from a given
625 # URL.
626 #
627 # For example, if "http://hostname.com:80/path/index.html"
628 # is given, then "hostname:80" would be returned.
629 #
630 # Inputs: URL
631 # Outputs: hostname[:port]
632 sub _extractHostname {
633
634     # Sanity check.
635     my $arg = shift();
636
637     if (!defined($arg)) {
638         return "";
639     }
640
641     # Get the URL supplied.
642     my $url = $arg . "/"; # Tack on an ending delimeter.
643
644     # Note: The '?' chars make a critical difference
645     # in how this regex operates.
646     $url =~ s/^.*?\/\/(.*?)\/.*$/$1/;
647
648     # Return the extracted hostname.
649     return $url;
650 }
651
652 # Helper function, designed to process all links found at a
653 # given URL, once the browser has been driven to that URL
654 # and has collected all corresponding links.
655 #
656 # When supplied with the array of URL strings,
657 # this function will categorize the corresponding URLs
658 # as follows:
659 #
660 # (Note: The terms "valid" and "invalid" are defined in
661 #  the _validateLink() documentation.)
662 #
663 # "New" links are those we've never driven the browser to.
664 # "Old" links are those we've driven the browser to before.
665 #
666 # - If a link is new and "invalid", then it gets added to
667 #   the 'links_ignored' hashtable.
668 #   
669 # - If a link is old and "invalid", then it gets
670 #   ignored.
671 #
672 # - If a link is old and "valid", then it gets ignored.
673 #
674 # - If a link is new and "valid", then we check to see if
675 #   the referring URL's hostname[:port] and the link's
676 #   hostname[:port] match.  If they match, then the link
677 #   is added to the 'relative_links_to_visit' hash.
678 #   Otherwise, the link is added to the 'links_to_visit'
679 #   hash.
680 #
681 # Inputs: HoneyClient::Agent::Driver::Browser object,
682 #         hostname[:port] of referring URL,
683 #         array of URL strings
684 # Outputs: HoneyClient::Agent::Driver::Browser object
685 sub _processLinks {
686
687     # Get the object state.
688     my $self = shift;
689
690     # Get the referrer and the corresponding array of links.
691     my ($referrer, @links) = @_;
692    
693     foreach my $url (@links) {
694
695         # Skip over any undefined links.
696         unless (defined($url)) {
697             next;
698         }
699
700         # Validate each link.
701         $url = $self->_validateLink($url);
702
703         if (!defined($url) or ($url eq "")) {
704             # If we get here, then the link is either invalid or
705             # already visited.  In either case, skip to the next
706             # link.
707             next;
708         }
709
710         # Link is new and valid; go ahead and add to the appropriate
711         # hashtable.
712       
713         # Extract the core hostname of the URL to visit.
714         # If $url is undef, then this function will return an empty string.
715         my $hostname = _extractHostname($url);
716      
717         # If the referrer's hostname and the URL's hostname match...
718         if ($hostname eq $referrer) {
719             # Then add the URL to the 'relative_links_to_visit' hashtable,
720             # since we're visiting links that share the same hostname.
721             $self->relative_links_to_visit->{$url} = 1;
722         } else {
723             # Else, add the URL to the 'links_to_visit' hashtable,
724             # since we're visiting links that do NOT share the same hostname.
725             $self->links_to_visit->{$url} = 1;
726         }
727     }
728    
729     # Return the modified object state.
730     return $self;
731 }
732
733 # Helper function designed to validate supplied links.
734 #
735 # When a link is provided as an argument:
736 #
737 #  - The link is checked to make sure it has a valid
738 #    HTTP or HTTPS prefix in the URL; any other link
739 #    types are considered invalid.
740 #
741 #  - The 'links_visited' history is checked; if the link
742 #    already exists within the history, then it is considered
743 #    invalid.
744 #
745 # If the link is valid, then it is returned.  Otherwise, undef
746 # is returned for all invalid links.  Also, all invalid links
747 # are added to the 'links_ignored' history -- if they're not
748 # already in the hashtable.
749 #
750 # Inputs: HoneyClient::Agent::Driver::Browser object, url to validate
751 # Outputs: url if valid, empty string if invalid
752 sub _validateLink {
753    
754     # Get the object state.
755     my $self = shift;
756
757     # Get the supplied link.
758     my ($link) = @_;
759
760     # Strip off all anchors/fragments/bookmarks from within URLs by default.
761     # Note: RFC 3986 Section 3 guarantees that all fragments
762     # appear at the end of any URL.  Keep in mind, that this stripping
763     # assumes we won't have any wierd corner cases, like:
764     # http://www.mitre.org/path/index.html#bookmark?arg=value
765     # ... where we would want to strip the bookmark, but keep the
766     # arg=value piece (which may not be a valid URL syntax, anyway).
767     $link =~ s/\#.*//;
768
769     # First, check to see if the link is either an
770     # "http://" or "https://" URL.
771     unless ($link =~ /^http[s]?:\/\/.*/i) {
772         # The link is invalid, so we check to see if it's already
773         # in our 'links_ignored' history.
774
775         # Check if the 'links_ignored' history is not empty and
776         # already has our invalid link recorded.
777         unless (scalar(%{$self->links_ignored}) and
778                 exists($self->links_ignored->{$link})) {
779
780             # The invalid link is brand new; add it to our list.
781             $self->links_ignored->{$link} = _getTimestamp();
782         }
783
784         # The link is invalid, return an empty string.
785         return "";
786     }
787
788     # Next, we check to see if we've already visited or ignored this
789     # link.  Check if the 'links_visited' and 'links_ignored' histories
790     # are not empty and does not already have this valid link recorded.
791     if ((scalar(%{$self->links_visited}) and
792          exists($self->links_visited->{$link})) or
793         (scalar(%{$self->links_ignored}) and
794          exists($self->links_ignored->{$link}))) {
795        
796         # Link is valid but already visited, so return undef.
797         return;
798     }
799
800     # If we haven't returned by now, then the link is considered
801     # valid and we need to visit it.
802     return $link;
803 }
804
805 # Helper function designed to kill all instances of the driven
806 # application.
807 #
808 # Inputs: None
809 # Outputs: None
810 sub _killProcess {
811
812     # Get the object state.
813     my $self = shift;
814
815     # TODO: Make this more robust.
816
817     # This function will croak, if it ever tries to return an undefined
818     # object.
819     my $stub = getClientHandle(address   => 'localhost',
820                                namespace => 'HoneyClient::Agent');
821            
822     my $som = $stub->killProcess($self->process_name);
823
824     if (!$som->result) {
825         Carp::carp "Failed to kill process: '" . $self->process_name . "'!\n";
826     }
827 }
828
829 #######################################################################
830 # Public Methods Implemented                                          #
831 #######################################################################
832
833 =pod
834
835 =head1 METHODS IMPLEMENTED
836
837 The following functions have been implemented by the Browser driver.  Many
838 of these methods were implementations of the parent Driver interface.
839
840 As such, the following code descriptions pertain to this particular
841 Driver implementation.  For further information about the generic
842 Driver interface, see the L<HoneyClient::Agent::Driver> documentation.
843
844 =head2 HoneyClient::Agent::Driver::Browser->new($param => $value, ...)
845
846 =over 4
847
848 Creates a new Browser driver object, which contains a hashtable
849 containing any of the supplied "param => value" arguments.
850
851 I<Inputs>:
852  B<$param> is an optional parameter variable.
853  B<$value> is $param's corresponding value.
854  
855 Note: If any $param(s) are supplied, then an equal number of
856 corresponding $value(s) B<must> also be specified.
857
858 I<Output>: The instantiated Browser driver B<$object>, fully initialized.
859
860 =back
861
862 =begin testing
863
864 # XXX: Add this.
865 1;
866
867 =end testing
868
869 =cut
870
871 sub new {
872
873     # - This function takes in an optional hashtable,
874     #   that contains various key => 'value' configuration
875     #   parameters.
876     #
877     # - For each parameter given, it overwrites any corresponding
878     #   parameters specified within the default hashtable, %PARAMS,
879     #   with custom entries that were given as parameters.
880     #
881     # - Finally, it returns a blessed instance of the