Brendon Jones's blog
Finished up the graphs for the simple dashboard to show when events took
place over the last day, and which sites/attributes were involved in the
largest number of events. More detailed information about event groups in
the last day are also shown, much as they were in the original dashboard
(expandable, with links to the graphs).
Got the initial debianisation done for NNTSC. Packages can now be built
that install all the files into the correct places, which should make
distributing it a lot easier - just a matter of correctly editing the
configuration file before starting (we can't guess this automatically).
Made a fix to the graphing library we are using to get selections working
properly. We had identified the problem a while ago, but with the bug
already reported and no fix coming from upstream decided to investigate
it. Following the suggested fix by the bug reporter appears to solve our
problems, with no obvious side effects.
Tidied up some AMP documentation describing the behaviour of schedules and
nametables, bringing it up to date with the current behaviour. Added
documented configuration files with some basic examples to the
Tested and merged the new web10g throughput test written by Richard into
the new amplet code. Had to change a few minor things to build cleanly
with the structural changes I've made recently both otherwise it
integrated fine. Also made a few minor changes to some of the
language/descriptions of how schedules work to tidy up some confusing
portions included from the original test (with regards to packet size,
byte counts, etc).
Started porting the event dashboard from the original php to fit into the
nice python framework the rest of the web interface uses, as well as
properly using the API to fetch data rather than accessing the database
Finished up merging events and properly marking them on the time series
graphs. Need to keep an eye on them now and see how they work in practice
with actual event frequencies and use cases.
Sat down with Shane and had a think about where we are with the
NNTSC/eventing/alerting/etc code and what needs to be done to have it in a
usable state for Lightwire to test. Started planning the path towards that
milestone and creating tickets to do so.
Made lots of small fixes to the new amplet code to remove dead code,
install files in more sensible (and consistent) locations, improve user
control of logging and improve the quality of output from the icmp/dns
tests. After talking with Brad about machine provisioning systems, puppet,
remote management etc decided it was a good idea to move schedule
configuration into a directory rather than using a single file, similar to
how apache, cron and others can work.
Actual events are now marked on the graph with proper descriptions, all
fetched via the event API Shane added the other week. Having briefly used
the new event system I've seen a few improvements to be made when there
are a large number around the same time, so I've started to merge them to
make a less cluttered display (while aiming to not lose any useful
Spent some time getting amplets reporting data properly after renumbering
erg. Some of our firewall rules are perhaps a bit too restrictive and DNS
under-used, which made this harder than it should have been. Hopefully
learnt some useful lessons here we can apply as we refresh machines.
Spent the week working on graphs for the AMP web interface. Added
colouring based on loss to the smokeping graphs and made sure that the
y-axis is always long enough to accommodate the smokey peaks rather than
following the median line.
Started to work on an event graph type that can be added to any of the
existing time series graphs. It currently draws hardcoded events as
vertical bars on the graph with mouse highlighting and tooltips describing
the event. Spoke to Shane about the event API and started to implement the
code to fetch events dynamically based on the graph.
Spent a little bit of time getting the wrampsim software working in the
student labs again. Something in the layout code appears to have changed,
which was causing elements to be rendered in the wrong place.
Finished up the python decoder for the http test to properly extract
results from the reported data.
Got most of the core functionality in place to create smokeping style
timeseries graphs to display data we've collected from smokeping sources.
Once I have feature parity I hope to start marking events on the graphs,
as well as extending them to be used with other data sources (for example,
this would be really cool to use with AMP latency tests).
Got enough of the Debian packaging sorted for new AMP to have it build an
install the client software, along with some basic default configuration
files, man pages, init scripts, etc. Updated some of the build scripts to
properly use configurable file locations rather than hard-coded ones so
that files end up in the right place (but different places) both when
building manually and when building the package.
Also had to build libwandevent packages for Debian to install new AMP. Got
both installed on prophet using the new packages, and it is now running
again, collecting data.
Added syslog support to the AMP logging and built an rsyslogd config file
to redirect messages to the appropriate file.
Added proper reporting to the HTTP test so that programs reading from the
rabbitmq queue can can do useful things with it. Wrote most of a python
decoder for the test so that Shane can store the data.
Had a chat to Richard Sanger about adding web10g data to the AMP
throughput test and got him set up with the current code.
Finished porting the main part of the http test to the new AMP. It runs
fine as a standalone program, but I still need to add the new style of
reporting to be able to send test data back to the collector.
Started packaging the new AMPlet code for Debian to make installation and
distribution easier for testing. It's a little bit more involved than I
thought it would be, as the build process has become a lot more automagic
since I last created a package from scratch.
Worked with Shane to get smokeping data exported to the web API to graph.
In doing so, found and fixed a few issues with the binned timestamps that
meant the data always looked old, even when it was up to date.
Started working on adding a new graph type to properly show smokeping data
(and any other data type where we have plenty of measurements, such as the
AMP latency tests). While doing this I found that the graphs would never
quite show the most recent data if I zoomed in with the select box, so I
started to investigate that. I don't want to jump to conclusions
and attribute the error to the library, but it appears that envision.js is
possibly off by one when calculating the selected area. This combined with
the fairly aggressive binning of the summary graph means we lose a fair
chunk of data off the end (one bin). Will need to look more into this and
see if it isn't actually a problem with the data or its presentation.
Made a few small updates to AMP for the NLNOG RING and got some good
feedback from them and others who are using AMP.
Finished up porting the traceroute test to new AMP, as well as writing the
supporting modules that allow for reading and decoding result data coming
off the broker. Wrote an importer for NNTSC to deal with the data and
properly save it into the database. Started work on porting the HTTP test.
Built a minimal scamper RPM to use with the AMP RPMs on our test
deployment on a perfsonar node.