Latest and greatest code… Or not so much…

Ever call into your vendors support with an issue?  I’m sure we all have.  What’s the first thing(s) they ask of you?  “What code is running on your hardware?”  And, “Can you try updating to the latest code?”  Sound familiar?  I’d imagine so as it seems to be a pretty common consensus among vendors to push their latest code revs as in lots of cases, issues can and will be resolved by code upgrades.  But more times than I care to count, it seems I end up doing the “One step forward, two steps back” song and dance.

Here’s my latest issue with keeping up with the bleeding edge of code for a particular vendor (which will remain nameless, but you can probably tell by the other posts here the main vendor I work with).

We had to upgrade our NMS boxes to patch a large security vulnerability with the Apache running on the servers.  For more details on that, see here.  When doing this, the hardware that is housed on this mgmt. platform and any new hardware added, would now have access to be updated to the “latest and greatest” firmware.  And the NMS now by default prefers this new code.

With little time to test the latest code, an unsuspecting individual could easily and accidentally upgrade code across a new site and not think much about it.

Here is where things can spiral downhill fast.  We were installing a new WLAN at a site recently and this exact thing happened.  We had spent a good bit of time onsite during the preliminary stages of the project to learn a lot about the facilities.  We took this knowledge back to our office and with Ekahau, we were able to determine what we though would be a good quality channel plan for the site.

Once deployed, we again went out onsite to do our final validation and any adjustments to the WLAN.  And to our dismay, nothing looked as we expected.  RF was propagating much much further than we expected based on our initial site visit data gathered and what we input into Ekahau to visualize it.  And for the life of us we couldn’t quite understand why or how we were so off in our predictive modeling.

After a bit of wading through the mud, I ended up testing against some of the same hardware back at the office.  And to my surprise, I was not noticing the same coverage cell sizes.  So, in order to do a true apples to apples comparison, I set the firmware on my test environment to what was deployed at the site.

What I saw next I’m sure made steam come out both ears.  With zero changes to the configuration of the AP and only booting between two different versions of code on said AP, my NIC/software would see a 12-18 dB difference in the RSSI.

Old code with AP set at 4 dBm

Screen Shot 2017-05-21 at 12.19.02 PM.png

No config change, just a reboot to other code on AP.

Screen Shot 2017-05-21 at 12.25.05 PM.png

And back to the original code…

Screen Shot 2017-05-21 at 12.31.22 PM.png

In the above screenshots, the physical location of all involved components never changes, nor does the actual configuration of the hardware.  All I simply was doing was rebooting from firmware “A” to firmware “B”.  How can this be that code can cause such a large increase in RSSI?

As it turns out, this same vendor had a bug in their code several months back (18+) that would cause an AP with a statically assigned power setting below 9 dBm to lose that setting after a reboot.  I suspect this issue has made a comeback in this latest code and will again cause us much frustration.

Test code pre-production deployment.  Even if your vendor urges you to use it.  Test that it works for all facets of your designs so you don’t end up like us and needing to now revisit a site after we’re able to get new code on things to again do our validation and final adjustments to their WLAN.