The ‘Roaming’ World We Live In

Most end users don't need to think about Wi-Fi roaming. They use their devices, and they ‘just work’ even as they move around the home, workplace, campus, or industrial environment. A seamless internet connection is no longer a luxury in today’s on-the-move society; it is necessary. Wi-Fi roaming is the unsung hero of this seamless connection between access points. I will give a brief overview of what roaming is and how it affects the user’s experience with a device. I will go over how we test and measure a device's capability to roam successfully. I want to bring to light how having a device that can roam successfully plays a part in the overall success of an end product.

What is Wi-Fi Roaming?

Roaming in a Wi-Fi network refers to a  client device (station/STA) smoothly disconnecting from one Access Point (AP) to connect to another AP with less RF path loss or interference. For this transition to work, the APs must be on the same Extended Service Set (ESS) with the same Service Set Identifier (SSID). From the end user’s perspective, this transition/roaming happens seamlessly, but  there is actually a brief interval where there is no connection. This occurs as the station disconnects (dissociates) from the first AP and connects (associates) with the second AP. According to the IEEE 802.11 standard, only one association is allowed at a time. While the stations make the decision of when or if to roam, there are mechanisms through which APs can assist them.

What triggers roaming?

The Wi-Fi Alliance® and IEEE have left the implementation of roaming when and where up to vendors. In the unstandardized world of the roaming decision, these are some metrics that stations use to determine when to roam.

  • Low signal strength: A Wi-Fi signal strength lower than -67 decibel milliwatts (dBm) is starting to be considered low. Anything below -90 dBm is considered disconnected.
  • High Packet Error Rate: This is the percentage of packets that are received with errors over the link. Anything over 5% is considered high.
  • Signal-to-Noise Ratio (SNR): The SNR is the ratio of the power of a signal to the level of noise. The lower the SNR, the worse the signal quality. Anything below 25dB is considered a poor connection quality.

These parameters can form the basis of a station’s roaming decision. However, determining the right weights for these parameters is crucial. Testing is essential in this case.  Wi-Fi presents more challenges than its wired predecessors of the world. Testing a Wi-Fi device in the open air is impossible due to numerous unknown or uncontrollable factors (such as other devices using the Wi-Fi air space) that can affect measurements. To achieve consistent and verifiable results, isolation chambers are used. Programmable attenuators are placed between the chambers to simulate the distance from the APs to the station. Using these, we can simulate the station moving away from one AP and closer to another. A verifiable test setup helps the engineer to be assured that their roaming formula is working. Having the testing repeatable allows an engineer to test changes without fear of random deviation throwing off the results.

Connection and roaming events

What I call the “roaming event” is when a STA is moving from one AP to another.

When looking for the roaming event, it is important to keep in mind different Wi-Fi connection situations. The Initial Connection is when the STA has no previous connection to the network. Although the time that this takes could be the same as the roaming, this is not the event we would be looking to capture when testing roaming.

Another potential is the Connection Termination and Re-Establishment situation. The STA or AP detects a degraded connection, the connection is terminated and re-established from scratch. This could happen for a number of reasons and should be noted when testing roaming, but again, this is not the event we would be looking for when testing. When this event is seen, it may indicate that the STA could not perform Roaming within its thresholds (be it timing or Received Signal Strength Indicator (RSSI)). It could also mean that the STA is sticky (trying to hold on to its original AP) and it is not making the decision to roam in time.

The event we are looking for is the Wi-Fi Roaming Event. The STA is associated with an AP and reassociates within the same ESS to another AP. The goal of roaming is to find a stronger connection to another AP while making the move appear seamless without interrupting any active connections on the station. It is also important for this transition to happen before the connection becomes too degraded. If the connection becomes too degraded or packet error rates become too high, we may run into the termination and re-establishment scenario. This could indicate that the STA has not been optimized for roaming yet.

What happens once a STA has decided it should roam?

In general, a STA will start scanning and probing for a new AP. The STA is going to send out probe requests on other channels. As the STA gets responses from APs, it will build a neighbor list prioritizing by dBm levels, best signal strength first. When the scanning is complete, the AP will select the most optimal AP, if there is one, and re-associate to that AP. We are looking for the event to take less than 150ms for applications like voice and video, because any more than that and it could introduce jitter or noticeable delays. For applications that require less data or are less time sensitive, longer times may be acceptable. Most of the time, clients can do this transition in under 75ms when using an open or Pre-Shared Key (PSK) secured network. But how are we measuring this time? We are measuring the time from the last data packet on the first AP to the first data packet on the second AP.

Probe Request through the last EAPoL-KeyMeasuring the success of the roaming event will differ for each device, just like the many ways roaming parameters can be weighted. The use case for the device will dictate time constraints. For some devices, we might want to be looking at when probe requests start to happen. We can measure time from a Probe Request through the last EAPoL-Key (4-way handshake-example shown in this image). Maybe some would like to know the time from the Auth Request through EAPoL-Key. The most relevant for the end user is from last data frame to first data frame. The data-to-data will provide information relevant to any application that will be using the Wi-Fi convention.

There are other factors to consider when looking into the time roaming takes to happen. IEEE added amendments to the 802.11 standard, 802.11v, 802.11k, and 802.11r, to improve AP roaming. If the STA supports 802.11k, STAs can request a neighbor report from the AP, which the station can use to make the best roaming decision. If the STA supports 802.11r, roaming will be faster and more seamless, as the initial handshake is completed with the new AP before it roams instead of during. If the STA supports 802.11v, the APs will encourage devices to roam by sending roaming recommendations. If the STA in question is going to be in a very congested environment, the engineer may want to consider implementing the ability to utilize these standards in their STA.

A walk through our basic roaming test setup.

It is vital that a STA undergoes testing for roaming in today’s dynamic environment. The responsibilities of roaming on the STA and is not standardized in the 802.11. APs also play a role in roaming, this article focuses on the testing of STAs. To conduct testing,Octoscope isolation chambers are used to control the RF testing environment. Two AP’s with the same SSID are placed in separate chambers, each connected to the STA’s chamber through a programmable attenuator, allowing simulation of different distances between the STA and each AP.

The testing involves placing the STA in one chamber and setting up Wi-Fi adapters (sniffers) to capture Wi-Fi activity frames. Initially, the first AP (AP-1) is set to 0 dB attenuation, while AP-2 is out of range.

The testing process includes capturing the initial connection, measuring throughput using a tool called "iPerf" to pass TCP traffic through the AP to the STA, and configuring UDP traffic based on a percentage of the max throughput to measure packet error rate and dropped packets.

The attenuation is then adjusted between the STA and the APs, while capturing data such as RSSI and packet error rates. Probe requests from the STA are monitored and the attenuation is gradually increased until the roaming event is captured.

Captured Wi-Fi data is analyzed to provide timing data, and the test can be repeated multiple times to provide a clear picture of the device's performance, specifically ensuring that the roaming event happens efficiently and consistently.

Going beyond the measurement of the roaming event.

Real life doesn't happen in isolation chambers, but this test does provide the desired benefits of repeatable results. For this reason, we have different simulated scenarios we will put a STA through to help the manufacturer measure the success of the roaming algorithm they have implemented. There is not one measure of success with these tests. What one manufacturer may want to see will be different then others depending on use cases. The goal for this testing is, that when completed, a device can be released with confidence that it will roam in the intended way, giving the desired end user experience when using the Wi-Fi device on the move, in different scenarios or RF environments.

Here we have a depiction of the basic roaming test described above, intended to test the straightforward STA moving between APs.

Next, we have the non-continuous coverage test, where the APs have a gap in the middle that is not being covered by any AP. This is when we would be looking at the Connection Termination and Re-Establishment scenario.

Following that, we have the Midpoint Buffer Test, which evaluates how well the STA handles sitting at its thresholds, moving back and forth between APs. This test has the STA right in that overlapping region moving around.

Lastly, the Outer Range Test determines how well a device handles coming into range of another AP but then moving back away.

Life after testing

Good testing and test results for your device's application are significant, and for some might be enough. But is testing ever really over? Many engineers know that having one good run might only be a start. Development is usually ongoing and as changes are made, it is always a good idea to do regression testing. This includes making sure changes to drivers, firmware, or even the physical layout of the device have not negatively affected performance.

With a Wi-Fi device, anything from bandwidth to connection issues will be blamed on the device. A bad experience caused by lagging or connection loss will make the user question the dependability of such devices. This leaves a bad taste in the user’s mouth and affects a user’s opinion of the preferred brand. The same is true for a device that works without problems, users will return to the brand they know to work well. This is simple and basic but true. Wi-Fi engineers need to understand how the application will work and  the device’s Wi-Fi needs. It is always important to consider how well the device handles roaming and it depends on being connected. A Wi-Fi device's experience directly correlates with how well it stays connected, and roaming plays a vital role in this connection. The most accurate way to measure the connection is through testing.