Ticket #59 (closed bug: fixed)

Opened 1 year ago

Last modified 1 year ago

Agent crash

Reported by: mike@shadowserver.org Assigned to: kindlund
Priority: normal Milestone: 0.9
Component: HoneyClient::Agent Version: none
Severity: none Keywords: agent, vm, process
Cc:

Description (Last modified by kindlund)

I got the honeyclient up and running and set up a test with a url list using the following URLS:

www.google.com
www.cnn.com
www.slashdot.org

It processed google with no problems, but has now crashed twice at the same point on cnn. In the ssh window, where I ran the honeyclient command, I see this:

Calling getStatus()...
HoneyClient::Manager->_handleFault(): Error occurred during processing.
500 Can't connect to 10.0.0.137:9000 (connect: Connection refused) at /usr/share/perl5/SOAP/Lite.pm line 3387
Result:
Resetting firewall...
Cannot encode 'sources' element as 'hash'. Will be encoded as 'map' instead
Cannot encode 'value' element as 'hash'. Will be encoded as 'map' instead
Cannot encode 'sources' element as 'hash'. Will be encoded as 'map' instead
Cannot encode 'value' element as 'hash'. Will be encoded as 'map' instead

And then in the VM, I see this (I've included as much log as I could in the attached file, but I think this is the relevant section) and it just hangs there:

Using Resource: http://rss.cnn.com/rss/cnn_topstories.rss
Performing Integrity Checks...
Watchdog fault detected, recovering Agent daemon.
Can't create daemon: Address already in use at /usr/lib/perl5/site_perl/5.8/SOAP/Transport/HTTP.pm line 463
        SOAP::Transport::HTTP::Daemon::new('SOAP::Transport::HTTP::Daemon', 'LocalAddr', 0.0.0.0, 'LocalPort', 9000, 'Reuse', 1) called at /usr/lib/perl5/site_perl/5.8/HoneyClient/Util/SOAP.pm line 361
        HoneyClient::Util::SOAP::getServerHandle('address', 0.0.0.0, 'port', 9000) called at /usr/lib/perl5/site_perl/5.8/HoneyClient/Agent.pm line 423
        HoneyClient::Agent::init('HoneyClient::Agent') called at /usr/bin/StartAgent.pl line 46
        main::_watchdogFaultHandler('SOAP::Lite=HASH(0x10f8fe34)', '\x{a}syntax error at line 1, column 0, byte 0 at /usr/lib/perl5/v...') called at /usr/lib/perl5/site_perl/5.8/SOAP/Lite.pm line 3412
        SOAP::Lite::call('SOAP::Lite=HASH(0x10f8fe34)', 'getState') called at /usr/lib/perl5/site_perl/5.8/SOAP/Lite.pm line 3377
        SOAP::Lite::__ANON__('SOAP::Lite=HASH(0x10f8fe34)') called at /usr/bin/StartAgent.pl line 59

Attachments

Change History

04/18/07 21:30:11 changed by mike@shadowserver.org

The attachment was marked as spam and wouldn't let me attach it. I can email it directly if its needed.

04/18/07 21:48:10 changed by kindlund

  • keywords set to agent, vm, process.
  • status changed from new to assigned.
  • description changed.
  • milestone set to 0.9.

Hi Mike,

From the status information you've provided, it looks like the Agent daemon code running inside the Agent VM has hung, and that the watchdog integrity check can't restart the Agent daemon inside the VM, because an already running perl process has taken over port 9000 (so any new running Agent daemons that start inside the VM will fail to run, since port 9000 is already in use).

To fix the immediate issue, inside the Agent VM, load up the task manager and kill any other perl.exe processes. That should allow subsequent Agent daemons to load.

I'd like more information about how the problem came about, and I suspect that's what you were trying to post to the ticket. In case that doesn't work, please email me the attachment (kindlund at mitre.org).

We've run into this issue before sporadically, and we're still trying to resolve it, but we've found it's a little difficult to reproduce. If you're able to reproduce the exact URLs where the Agent daemon initially hangs, that would be very helpful.

Thanks, Darien

04/18/07 21:59:50 changed by kindlund

One other thing to note. Our watchdog process basically sets a countdown timer, and expects the integrity check to not exceed 1800 seconds (30 mins). Thus, if the integrity checker is unable to complete in that time, the watchdog considers the Agent daemon to be hung and tries to kill the perl.exe process, creating a new Agent daemon and moving on.

Thus, if your start to run the honeyclient system on a VMware Server with no load and it works fine… but then you introduce external load on the system outside of the honeyclient code (i.e., large amounts of CPU usage or disk activity) — anything that could slow down the Agent VM (and all internal processes)… it's possible that the integrity checking could take longer than 30 mins (since the Host system is busy with other activity).

Basically, try increasing the timeout value in your honeyclient.xml file (within the Agent VM filesystem), as listed here:

<HoneyClient>
    <timeout>1800</timeout>
</HoneyClient>

If the problem still persists, then we can try to enable further debugging.

Regards, Darien

06/20/07 13:35:05 changed by kindlund

  • status changed from assigned to closed.
  • resolution set to fixed.

Closing ticket, since I have not received any new reports.


Add/Change #59 (Agent crash)




Change Properties
Action