Broadband connection improvements — avoiding DNS-interception and “buffer bloat”

This whole saga started when I read an Ars-Technica article called “Small ISPs use “malicious” DNS servers to watch web searches, earn cash.”  Here’s the lede that got my attention:

Nearly 2 percent of all US Internet users suffer from “malicious” domain name system (DNS) servers that don’t properly turn website names like google.com into the IP addresses computers need to communicate on the ‘Net. And, to make matters worse, the problem isn’t caused by hackers or malware, but by the local ISPs people pay for access to the Internet.

As I read more about this issue, I came across the ICSI Netalyzr which is a nifty network-diagnostic tool that tests a bunch of dimensions of a broadband connection and will detect this DNS-interception if it’s happening.  The good news, is that none of my broadband connections have this problem.  BUT, the Netalyzr did discover another problem called “buffer bloat” on my connection at the farm, which explains some of the erratic network behavior here.  The rest of this post is the saga of a delightful geek project to get this fixed — and documentation to remind me what I did plus provide some goodies for anybody who’d like to follow along.

Buffer-bloat mitigation — Background

First up — what is “buffer bloat?”  I came across a post by Jim Getty called “Mitigations and Solutions of Bufferbloat in Home Routers and Operating Systems” which is mostly focused on a strategy to fix the problem (and is the basis for the stuff I’ve done here).  Fersure read this post — but if you’re a geek who’s interested in understanding what the problem is, also read the “surrounding” posts on his blog.  I’m left pretty completely in the dust by the technical discussion, but I follow it enough to share Jim’s concern that this could become a really interesting puzzle.

The short version of what I’m doing with this project is to protect the Internet from my over-eager home computers by putting my own traffic meter (just like the one at a freeway on ramp) on my Internet connection.  I will tell you true — taking 10-20% off the top speed of my Internet connection makes it “feel” a WHOLE LOT faster.  Formerly-unuseable video streaming (Vimeo streams were the worst, but YouTube was pretty crummy too) is now just fine.  My VoIP phone service from Vonage is now rock solid even when we’re doing lots of other uploading/downloading, etc.  I like it a lot and based on this experience I’m going to do the same thing at my other connections as well.

Ingredients — a new router and Gargoyle

I have been interested in the idea of putting open-source software on a consumer router for a long time, but hadn’t had a good reason until I read Jim’s piece.  Unfortunately, the Apple AirPort Extreme sitting in the basement isn’t on the list of routers that can be treated that way (and, interestingly, also doesn’t provide any bandwidth-shaping capability).  So it was off to the Gargoyle site to do some shopping for a new router, one that would be a good target for an upgrade to Gargoyle.   I wound up getting a TP-Link TL-WR1043ND because it’s cool looking with its 3 antennae and has lots of CPU-horsepower and memory so performance was likely to be spiffy.

Installation tips

It’s always a little nerve-wracking to venture into a whole new realm of activity for me, so I took it pretty slow and easy on the actual set-up process.  I set the new router up with a completely “standard” configuration and ran it that way for a day or two before getting into the exciting Gargoyle stuff.  One thing that interested me was that the TP-Link router software had bandwidth-shaping capability already and I wanted to see if I could mitigate the buffer-bloat just using that.  That didn’t work — see “Tests” below — but it provided some good entertainment for a day or two, running the tests.  Here’s what I did after that:

  • Upgrade the router software.  I went out to TP-Link’s web site and pulled down the latest version of the WR1043ND firmware and updated the firmware in the router to the current release.  This had the added bonus of providing me with a “factory” copy of firmware if I needed to fall back from the Gargoyle software.
  • Install Gargoyle on the router.  I followed these instructions for loading Gargoyle on the WR1043ND that are published on the Gargoyle site.  There are two things to note.  The first is that those drop-down menus aren’t really drop-down menus, they’re just pictures of them.  To actually get the software, follow your nose through the download section until you get to the place that’s described by those graphics.  But here’s the other note — the graphics are a little old and point to the 1.3.14 version of Gargoyle — I jumped ahead to the 1.3.16 version and it’s been fine (for the big 24 hours that I’ve been running it).  The rest of the installation went without a hitch — I used the “firmware upgrade” function on the standard software, pointed at the Gargoyle file I’d just downloaded, had a couple sips of a beverage and the router rebooted itself into Gargoyle.
  • Test the “fallback to factory software” scenario.  Before messing around with Gargoyle, I tested rolling the router back to a standard configuration.  I used the slightly-modified “factory” software from the Gargoyle page, ran it through Gargoyle’s “Update Firmware” process and scared the heck out of myself when the upgrade didn’t complete.  I thought I’d turned the router into a brick — but it turns out that the web-interface just isn’t smart enough to know that the router has rebooted itself.  I logged back into the router and found factory screens rather than Gargoyle screens.  Whew.  Then I upgraded the software to the software I’d downloaded from TP-Link and got myself back to a completely-factory router again.  Once I’d gotten through all that I repeated the process of loading Gargoyle on the router and that’s where it sits today.
  • By the way, Gargoyle’s default password is “password”, not the typical “admin” — just a note to save time the next time I upgrade the firmware.

Tests and observations.

One nice thing about Netalyzr is that it leaves permanent copies of the results out on the net so’s you can refer anybody to them.  Here’s the series of tests I ran at the farm.  Unfortunately, I forgot to capture the permalink of the very couple tests (with the Apple router and the WR1043 in default configurations).  Dang.  So I’ll skip forward to a series of tests running the Gargoyle software on the new router.

1st test — New router, Gargoyle software, default configuration, QoS turned off.  Note the Red-bordered part of the results — which show 5400ms of buffering on the uplink and 509ms on the downlink.  This is bad — this is what got me started on this project in the first place.

2nd test — New router, Gargoyle software, default configuration, QoS turned on.   Buffer-bloat is dramatically lower — uplink is 220ms and downlink is 44oms.  BUT, there’s a cost.  The default settings in Gargoyle limit the speed of the connection to 300k upstream and 3000k downstream, which is almost cutting the bandwidth in half.  On the other hand, it proves that buffering can be mitigated.

3rd test — New router, Gargoyle software, bandwidth QoS settings increased to 500k downstream x 5000k upstream, QoS turned on.  Uplink buffering remains around 220ms (same as before — this is good), downstream buffering is starting to creep up at 680ms.  This is where I’ve left it for now — more experimentation to follow, but this gives you a sense of the thing.  Upstream buffering is less than half what it was, downstream buffering is reduced almost ten-fold.

IPHouse test — You want to see a perfect score on the Netalyzr test?  I ran the test from my little server over at IPHouse.  Perfection — no flags at all.  What else would you expect from IPHouse?  It proves that you CAN configure a network correctly and eliminate buffer-bloat.

So there you have it.  The “real world” results are still coming in, but so far the connection here at the farm “feels” more solid.  I downloaded a few videos and they don’t stutter they way they used to.  The Vonage line is now getting top priority in QoS and should be less subject to disruption when we’re doing a lot of uploading (although that will have to wait for a teleconference for confirmation).  All good, an easy project and a neat new router/software combo in the basement.

Image: jscreationzs / FreeDigitalPhotos.net