Ticket #84 (closed bug: fixed)

Opened 1 year ago

Last modified 8 months ago

Problems with max_relative_links_to_visit

Reported by: apuigventos@isecauditors.com Assigned to: xkovah
Priority: high Milestone: 1.0
Component: HoneyClient::Agent::Driver::Browser Version: 0.99
Severity: minor Keywords: max_relative_links_to_visit, honeyclient.xml, browser
Cc:

Description

Hi again guys :)

I try to config max_relative_links_to_visit to 0 or 1, and scan only one url, for example www.7a69ezine.org, but in the final report (with six false-positives urls) the mitre scan show more pages out of specified domain 7a69ezine.org.

Attachments

Change History

08/28/07 11:42:05 changed by kindlund

  • status changed from new to assigned.
  • severity changed from major to minor.
  • component changed from HoneyClient::Manager to HoneyClient::Agent::Driver::Browser.
  • summary changed from Problems with concurrency levels of the links to Problems with max_relative_links_to_visit.
  • milestone set to 0.9.
  • keywords set to max_relative_links_to_visit, honeyclient.xml, browser.

Hi Àngel,

Okay, I think I understand the problem you're running into.

You're saying that when you edit your etc/honeyclient.xml file and change the max_relative_links_to_visit to 0 or 1, you still have the honeyclient code browsing to the same site multiple times.

I think I have a solution for you. Currently, I assume you're starting up the honeyclients on the host system, using the following command:

perl /usr/bin/StartManager.pl --url_list urls.txt

Try this command, instead:

perl /usr/bin/StartManager.pl --url_list urls.txt --max_relative_links 1

Please let us know if this works for you.

Regards,

— Darien

08/28/07 12:21:06 changed by Angel <apuigventos@isecauditors.com>

Yep,

For example, i need scan fp.com (my test site) but if in code have external link (www.google.com) and my max_letative_links_to_visit as 1, the scan search all site google.com, and google.com is very very big :) and 0 i think that not run correctly because the results are same.

Is important that scan only the domain.site of url.txt not external references of site :-o

08/28/07 12:52:32 changed by kindlund

Okay, so does this command work for you?

perl /usr/bin/StartManager.pl --url_list urls.txt --max_relative_links 0

Please let us know.

Thanks,

— Darien

08/29/07 05:06:16 changed by Angel <apuigventos@isecauditors.com>

Yes, i try to use

perl /usr/bin/StartManager.pl —url_list urls.txt —max_relative_links 0

When it takes the first clone, this scan the fp.com plus the external link www.google.com. When it has finalized first clone, the second clone scan all site www.google.com.

08/29/07 13:55:23 changed by kindlund

  • version changed from none to 0.99.

Okay, I understand. I'll try and replicate this issue locally, then work on a corresponding fix.

Regards,

— Darien

08/29/07 14:33:10 changed by kindlund

  • priority changed from normal to high.
  • milestone changed from 0.9 to 1.0.

08/30/07 12:28:18 changed by kindlund

Okay, can you provide me with a copy of your "urls.txt" file?

Feel free to either paste it inline, attach it to the ticket, or send it as a direct email to honeyclient@mitre.org.

Thanks,

— Darien

08/30/07 13:37:13 changed by kindlund

Hi Àngel,

Thanks for emailing me the information.

Okay, I think I've identified a fixable issue in our code; however, I think I need to further explain what max_relative_links does and does not do.

When the Honeyclient VM visits a single webserver, 'www.foo.com' (for example), it will attempt to organize all found links into 3 categories:

1. relative_links_to_visit

(For example, those URLs that share the same webserver name, like: http://www.foo.com/2.html)

2. links_to_visit

(For example, those URLs that have DIFFERENT webserver names, like: http://www.bar.com/index.html)

3. links_ignored

(For example, those URLs that the browse can't go to, like: javascript:popup() )

So, based upon this understanding, if you were to specify the following:

perl /usr/bin/StartManager.pl --url_list urls.txt --max_relative_links 1

Where urls.txt contained only "http://www.google.com", then you would see the Honeyclient VM visit:

http://www.google.com/

… and then perform an integrity check.

We call the honeyclient VM going through a single "iteration", when it visits a set of (relative) URLs between integrity checks. In that iteration, once an integrity check is performed, all other relative_links_to_visit are discarded; however, the external links_to_visit are retained for subsequent iterations.

The 'max_relative_links' and 'max_relative_links_to_visit' values are the same — they specify how many relative URLs the honeyclient VM should visit BETWEEN integrity checks.

As such, when you execute:

perl /usr/bin/StartManager.pl --url_list urls.txt --max_relative_links 1

The Honeyclient VM may perform the following steps:

In this example, the honeyclient VM operations are considered VALID. Technically, the Honeyclient VM visited one relative URL between each iteration.

What you're looking for is functionality where you tell the Honeyclient VM to visit one URL and ONLY one URL per website domain — ever. Unfortunately, we don't have this functionality in our code yet. I'll see if it's possible to easily add this type of functionality, using some separate variable.

Let me know if this makes sense.

Regards,

— Darien

08/30/07 14:20:55 changed by kindlund

Okay, I've fixed an ancillary bug (r806), regarding some of the issues you've mentioned.

For now, don't use:

perl /usr/bin/StartManager.pl --url_list urls.txt --max_relative_links 0

Instead, use:

perl /usr/bin/StartManager.pl --url_list urls.txt --max_relative_links 1

Based upon what you've wanted, I think I know of I couple of different ways to fix the code to get it to do what you want. However, I have some questions which may help decide which solution to implement:

1. Do you want the Honeyclient VM to only visit the URLs listed in urls.txt and no other links?

- OR -

2. Do you want the Honeyclient VM to visit all the URLs listed in urls.txt and ANY other links the Honeyclient VM finds — as long as each link the Honeyclient VM visits has a unique SERVER name. Any subsequent links the Honeyclient VM finds that refer to previously visited servers (i.e., www.google.com) would be ignored.

Regards,

— Darien

09/03/07 04:05:39 changed by Angel <apuigventos@isecauditors.com>

Hi, i think that two options are valid, if is possible select one or other in config file perfect :)

09/03/07 12:59:08 changed by kindlund

Hi Àngel,

Okay, it will take time to implement some of these features. I'll post to this ticket once new versions have some of these features implemented in them.

Regards,

— Darien

11/29/07 14:08:08 changed by xkovah

  • owner changed from kindlund to xkovah.
  • status changed from assigned to new.

I just ran into this issue as well. I will be trying to fix it since it directly relates to the issue of malicious URL validation. I will also add something like max_absolute_links_to_visit, because it doesn't do any good to set max_relative to 0 if you visit a site like google or youtube which use absolute URLs. Thus, in my case, to only browse to the URLs in my initial URL list, I would set max_relative and max_absolute to 0.

12/18/07 14:21:35 changed by xkovah

  • status changed from new to closed.
  • resolution set to fixed.

apuigventos@isecauditors.com: I have added the "limit_spidering" option to the honeyclient.xml file. If you set it to 1 it will give you the behavior you were looking for (i.e. only going to the 1 site you put in your initial list). If you are using our trunk code you can just do an "svn up" to get the new xml file and relevant changed Browser.pm file. If you are not using our trunk code, you can try getting just the etc/honeyclient.xml and lib/HoneyClient/Agent/Driver/Browser.pm files from trunk and those alone may be sufficient to allow the desired behavior. I have also fixed the behavior for max_relative_links_to_visit. If it is set to 0 it will not add any links from the same site, but will still add links to external sites (hence the need for the limit_spidering option). If you set it to 1 or greater, then it will open up to that many additional links from the same site, after visiting the first site.

Note for historical purposes incase this gets reopened: limit_spidering was added in commit 1080, and the fix for max_relative_links_to_visit behavior was in 1082.


Add/Change #84 (Problems with max_relative_links_to_visit)




Change Properties
Action