Ticket #146 (closed bug: fixed)

Opened 5 months ago

Last modified 4 months ago

Manager Can't Process URLs of Length > 1024 Chars From Drone

Reported by: kindlund Assigned to: kindlund
Priority: normal Milestone: 1.1
Component: HoneyClient::Manager::Database Version: 1.02
Severity: none Keywords: url, length, 1024, limitation, yaml
Cc:

Description

Apparently, when the Manager fetches URLs from the Drone web service, if any of those URLs are > 1024 characters long, then the Manager will output an error message similar to the following:

Error: YAML::XS::Load Error: The problem:

    could not find expected ':'

was found at document: 1, line: 7, column: 1095
while scanning a simple key at line: 7, column: 1

The problem is with the Ruby to Perl communication — specifically, the YAML encoding. According to the YAML v1.1 specification for a "simple key":

http://www.yaml.org/spec/1.1/#simple%20key/

A simple key has no identifying mark. It is recognized as being a key either due to being inside a flow mapping, or by being followed by an explicit value. Hence, to avoid unbound lookahead in YAML processors, simple keys are restricted to a single line and must not span more than 1024 stream characters (hence the need for the flow-key context). Note the 1024 character limit is in terms of Unicode characters rather than stream octets, and that it includes the separation following the key itself.

There is a solution in YAML; that is, to use "explicit keys" instead:

http://www.yaml.org/spec/1.1/#explicit%20key/

Apparently, "explicit keys" do not contain the 1024 character limitation. The problem, however, is getting both Perl (YAML::XS) and Ruby to output explicit keys.

Attachments

Change History

03/07/08 13:06:05 changed by kindlund

Okay, so YAML::XS does the right thing, in that when hashtable references contain keys/values that span > 1024 chars, it automatically converts them to "explicit keys". The trick now, is to see if Ruby does this same operation.

03/07/08 14:26:26 changed by kindlund

Okay, after doing some researching, it looks like the built-in YAML library for Ruby has some significant limitations. First off, it's supposed to respect the Object.to_yaml( options ) arguments, as described here:

http://yaml4r.sourceforge.net/doc/page/the_options_hash.htm

There's an option there called :ExplicitTypes that you could (theoretically) use to force a hash key/value whose length is > 1024 to use explicit keys. But, the options hash gets completely dropped on the floor by the YAML library. In fact, this is the evidence provided for this activity:

http://209.85.207.104/search?q=cache:RGDji7LihKgJ:www.arkanis-development.de/weblog/2007/6/20/options-for-rubys-%40to_yaml%40-method+ruby+yaml+ExplicitTypes&hl=en&ct=clnk&cd=1&gl=us

Moreover, after digging around the primary codebase for Yaml4r, I've discovered that this is a known issue:

http://code.whytheluckystiff.net/syck/ticket/19

What's even more worrying, is the fact that this issue has been unresolved for 2 years, which is rather stupid, considering that the author simply needed to pass the options hash into the C-library (syck), but forgot about it, as listed here:

http://osdir.com/ml/text.yaml.general/2005-09/msg00004.html

In other words, this particular ruby support (syck) for YAML is "okay", but sucks when you're trying to do anything complex with the data. Furthermore, it doesn't look like it's actively maintained, so as a result, I'm switching the codebase to use a different YAML parser, specifically one that is written entirely in Ruby (called RbYAML):

http://rubyforge.org/projects/rbyaml

After doing some testing, it appears that this parser correctly emits explicit keys, when the key/value size is > 1024 characters. It should just be a drop in replacement, but I'll need to update the Drone User Guide in order to get people to install the RbYAML gem, accordingly.

More soon.

03/07/08 14:34:38 changed by kindlund

  • status changed from new to closed.
  • resolution set to fixed.

Fixed in r1342. Updated Drone User Guide accordingly.

03/07/08 15:06:51 changed by kindlund

  • status changed from closed to reopened.
  • resolution deleted.

Argh. Possible regression bug introduced. Noticing the following errors on the perl side:

2008-03-07 15:05:19  INFO [HoneyClient::Manager::runSession] (lib/HoneyClient/Manager.pm:826) - Saving URL History to Database.
2008-03-07 15:05:19 ERROR [HoneyClient::Manager::Database::_AUTOLOAD] (lib/HoneyClient/Manager/Database.pm:256) - Error: undefined method `anchor' for RbYAML::MappingEndEvent():RbYAML::MappingEndEvent
Error: Error: undefined method `anchor' for RbYAML::MappingEndEvent():RbYAML::MappingEndEvent at lib/HoneyClient/Manager.pm line 925

Investigating further…

03/07/08 15:31:00 changed by kindlund

  • status changed from reopened to closed.
  • resolution set to fixed.

Okay, so apparently RbYAML.load() is as buggy as hell, when it comes to handling complex data types. So, annoyingly, we'll have to use YAML.load() and RbYAML.dump() accordingly.

This is reflected in r1344.

Eventually, we may want to look at something other than YAML for these datatypes.


Add/Change #146 (Manager Can't Process URLs of Length > 1024 Chars From Drone)




Change Properties
Action