The advancement of, and the growing popularity of 802.11 mesh networks is a continuing hot topic among many different wireless groups. This topic is also fueled by the recently ratified 802.11s; the standard for mesh networks. Mesh networks often test out fine in the lab and test environments, but after a few months of real world usage fall short of our expectations; mainly due to high latency and dropped packets. I would like to take some time and explain the basics of mesh networks; give you a bird’s eye view of why they work and why they don’t. The two biggest reasons that wireless mesh networks fail is that people don’t understand how they work, and the scope of the original project changes.
Before we get into the nuts and bolts we need an understanding of some basics. 802.11 radios are in networking terms half duplex. Half duplex simply means that a radio can either talk or listen, but can’t do both at the same time. The best way to explain half duplex is two way communications, one way at a time. This is not any different than a conversation that you might have with a friend. When your friend is talking you listen, and when your friend is done talking you respond and your friend listens.
Another helpful term is distribution system. A distribution system is nothing more than the medium used to interconnect your radios and integrate it with your local area network. A distribution system can contain several radios linked wirelessly into an extended service set, an Ethernet connection or a combination of both.
It is important to understand that when a device roams from one AP to the next, it is not the AP that is making the decision. All roaming decisions are made based on the proprietary code of the mobile device. Therefore, no matter how well you design a mesh network, the decision to roam is solely up to the mobile device. You can design the best mesh network in the world, but it will always be judged by the associated devices. 802.11s may set the stage for standardizing mesh infrastructure and queued packet forwarding from one AP to the next. But in a mesh environment, there is no standard for CPE roaming and every manufacturer uses its own proprietary formula as to when and if to re-associate. There is even one popular brand of wireless VoIP phone that I see sitting on shelves almost all the time. The reason is that the manufacturer has never programmed the phone to roam. Every change from one AP to the next is done as if it were nomadic roaming. Even on the company website they boast “Designed to take advantage of a single Access Point”; as if it was an enhanced feature. To fully test your mesh you’ll want roaming data from multiple wireless clients using the same APs to be able to narrow down any problems to a specific wireless client or an overall AP.
A mobile device, designed to roam, will send a re-association request to the new Access Point (AP). The Re-association frame includes the Basic Service Set Identifier (BSSID most people know this as an SSID) and the MAC address of the AP it is currently associated to. The new AP then sends an acknowledgement frame (ACK) to the station. The new AP will then try to contact the old AP over the distribution system. The new AP attempts to notify the old AP of the move and request any queued packets. If this communication is successful the old AP will forward and buffered data to the new AP. At this point the new AP will send a re-association response frame to the client station. The client station responds with an ACK to the new AP and the move is complete. All of this is done in thousands of a second.
If the exchange of these six frames is done properly there are no dropped packets life is good and the end user is oblivious to the fact they have switched from one AP to another. The biggest problem comes into play when the rules aren’t followed or cannot be followed.
As I said earlier every manufacturer uses its own proprietary formula as to when and if to re-associate. If you’re having problems you may need a packet sniffer to discover if the client station is following the rules. Some AP manufactures like Mikrotik have these tools built in and they can be a great asset when trouble shooting issues on a mesh network.
Another option is to move from an autonomous Access Points (standalone AP that does not require a vendor specific central controller) to lightweight access points. A lightweight Access Point requires a central controller and does not have the ability nor the intelligence to standalone. One of the best implementations of a central controller based system that I have seen is by a company called Extricom. The Extricom radios act more like antenna extensions then APs. The advantage is that the client station only sees a single MAC address so it takes the re-association rules out of play. This system even worked flawlessly with the wireless VoIP phone I mentioned earlier. This solution may have a higher upfront cost then other solutions, but it has worked flawlessly in situations where many other radio systems have failed. I beat my head against a wall working on a wireless VoIP system that was for a nursing home, but the phone was just not designed to roam. I’m not in any way trying to make this into an Extricom advertisement, but this system has worked for me in some unbelievable situations. If your goal is mobile data I would look at an autonomous access point solution. If you think VoIP will be in the future of your wireless mesh look at a central controller based system.
Senior Wireless Engineer