Matt,
Now that we have the honeyclient/drone system operational, it looks like we're going to need to keep track of where our URLs are originating from, in order to help (eventually) with our notification system.
Here's a first shot idea; please let me know what you think before you implement it…
Create a new table, called "url_sources". This table has 2 columns: "id" and "name". Sample data could be:
1 "command-line"
2 "email"
3 "proxy logs"
4 "drone webservice submissions"
The "queue_urls" and "history_urls" tables are then updated with the "url_source_id" column.
Right now, URLs are inserted using the insert_queue_urls webservice call, like:
$urls = {
'http://www.foo.com' => 1,
'http://www.bar.com' => 2,
}
...insert_queue_urls($urls);
We would update this call, like this:
$urls = {
'command-line' => {
'http://www.foo.com' => 1,
},
'proxy logs' => {
'http://www.bar.com' => 2,
},
}
...insert_queue_urls($urls);
… where the top level hash entry defines where the data is coming from. If we get a new data source defined, we simply update the "url_sources" table with a new entry… Very similar to how we update the "hosts" table.
Let me know if this makes sense.
— Darien