Did you recently complete a long-awaited project to upgrade your network and virtualize your PCs, data centers, and infrastructure?
I’m guessing you might be facing some challenges with monitoring how it went and how users are enjoying (or NOT enjoying) their virtual desktop experience.
While your IT Director does glow a bit more radiantly walking down the hallway and whistle a bit more frequently in the elevator now that bulky physical desktops are gone, you still need to troubleshoot problems and optimize performance.
Plus, the new CIO wants a report that validates your infrastructure changes were and will continue to be a sound investment and the executive team wants to know in advance about any performance bottlenecks.
They ultimately want to snapshot, quantify, and track changes in the user experience for all users, on all devices, 24/7!
Monitoring the Virtual Desktop User Experience
In the past, physical machines offered IT shops the opportunity to customize the user experience (UX). Christine in marketing had more RAM than Bill in accounting, and Ramesh in services had access to more network storage than either of them.
But with virtual machines, many shops do not monitor user experience and use a policy where all 20,000 employees get precisely the exact same virtual desktop; same processor, same RAM, same configuration, and same access to resources.
As you might have imagined, Ramesh would be cursing your IT staff through a support chat app, and Bill would be overwhelmed.
Christine just walked out.
In other words, without monitoring the user experience, this failed policy would:
- Upgrade low-demand users who did not require access to advanced resources.
- Downgrade high-demand power users who previously enjoyed a superior level of service.
Therefore, remember this important rule of virtualization—because you can dynamically allocate and throttle resources, monitoring the user experience is even more important than it was in the physical environment.
Four Reasons Why You Should Monitor the User Experience
- Constant adjustments require usage data for maximum optimization. Monitoring helps you discover areas of improvement.
- Users would otherwise experience issues and wrongly assign blame to virtualization.
- Opportunities for automation, enhanced collection, and dynamic real-time reallocation of resources.
- You can now more easily do it; much easier than physical environments.
How to Measure the Virtual User Experience
A rule-of-thumb in this business is that the virtual user experience must be at least the same or better than the physical experience. We can’t declare victory until that assertion is shared by a clear majority of users.
Naturally, you may be wondering, how do we measure that objectively?
Let me address three primary methods below.
1. Delays and Crashes
First, establish a rubric or benchmark based on a standard set of factors. Track the following three parameters and chart trends over time:
- App Load Delay
- Login Delay
- App Not Responding (ANR) and Similar Crashes
Delays and crashes are strong indicators of user frustration level. In any given range of time (3 weeks, 3 days, or 3 hours), these numbers are going to point to the issue. Remember, lower numbers are better when it comes to measuring load times and crashes. Four crashes are better than 40 and a three-second load time is better than 30 seconds.
Like the indicators used by economists to describe trends in the business market, these are lagging indicators. For example, existing home sales, jobless claims, and new jobs for the past month. Lagging indicators reliably report on events that have already occurred.
2. Technical Metrics
Second, track the following four major technical metrics:
- Disk Storage
- Network Traffic
Trending data from those four metrics add-up and empirically point to general environment issues that contribute to user frustration.
To continue our economic metaphor, these are leading indicators such as bond yields or new housing starts. They are based on conditions that offer insight as to what might occur if we can quickly assess the data and make accurate predictions. For example, don’t cutover to a new enterprise app that uses a lot of RAM if two-thirds of desktops are reporting out-of-memory issues.
3. User Experience Feedback Surveys
Third, conduct user experience feedback surveys. Because the results will be swayed by the current mood of each user in a highly subjective manner, you’ll need participation and feedback from many users to reliably establish objective statistical significance that reflects the population mean.
You might include the following survey questions:
- How would you rate the speed of your virtual desktop?
- Would you consider any of the applications you use to be slow?
- If YES, please list which apps are slow and the time of day when they are slow.
- List any applications that you have used in the past three months that crashed?
- How often did each application crash?
Consult with your data scientist or marketing team to carefully construct the questions in your survey. For best results, you want to invest up front in getting the first survey as accurate as possible, and consistently track future results.
Skip the attempt to build a custom solution in-house. A few commercial tools are available to help you collect user experience data. Most solutions provide views with metrics that track architecture specs, infrastructure changes, desktops, laptops, workstations, kiosks, terminals, other devices, users, and apps.
Market tools include:
- Liquidware Stratusphere UX: The reliable established market leader in this segment.
- Lakeside Systrack: A good tool for automated reports and dashboards.
- ControlUp: Their real-time product includes a responsive dashboard that helps you resolve issues quickly.
- Nexthink: Another real-time product with historical usage and IT service performance records, visualizations, actionable dashboards, reporting, and feedback surveys integrated.
These solutions also include built-in root cause analysis and problem identification.
They all tend to be strong at monitoring crashes, delays, and metrics; however, they typically lack an end-user survey feedback function. Nexthink is an exception. It delivers on all three points I made in the previous section, including surveys, but has some other disadvantages such as configuration requirements and cost.
When it comes to evaluating the costs and features of these competitors, I invite you to compare and decide for yourself. I will suggest that you can likely conduct the surveys yourself using SurveyMonkey, SurveyGizmo, GetFeedback, or another popular online survey tool.
Data Collection Tips:
- Collect metrics and feedback data for as large a user pool as possible with a consistent number of users. For example, if you cannot survey all 15,000 employees, poll 1,000 every quarter. If you can do it every 60 days or monthly, that’s even better. You also want to have data before a change to serve as a baseline, and after a change to make comparisons. For example, immediately before and immediately after a shift from physical to virtual desktops.
- Run the delay, crash, and technical metrics tools as often as possible. You want them capturing data almost constantly. Compare the data every month, examine reports, and look for trends.
- It’s also important to note that all the tools I mention are strictly for monitoring. They don’t perform any corrective actions. You could script your own, but most organizations today are cautious about building yet another in-house custom solution when the cloud promises so much including everything from automation to updates.
- Corrective automation tools on servers are available; however, not for virtual desktops. Some server real-time resource allocation features exist in Turbonomics and VMware vRealize Operations Manager/Automation.
Evaluate the Trends
After collecting the data, examine any trends. If you see an increase in crashes, delays, helpdesk tickets, and other common issues, the overall user experience at your organization is in trouble. Like a crime drama or forensics TV show, go into analysis mode to determine why.
Use the feedback surveys to substantiate the trends. It works both ways; you can also use the metrics to support a trend in user feedback results.
For users that report poor performance, your survey should also ask them to specify when it occurs. If you can, try to pinpoint a two-hour window. Then, focus on that time and try to determine a root cause. You also have the names and machine IDs to go on.
Other forensic analysis tips:
- Analyze just two or three users: They will reveal findings representative of a larger audience. Troubleshooting forensics for dozens of users will yield too many data points and too much variability.
- Focus on snippets of user experience feedback: For example, three users reported crashes while using the same streaming app at the same time.
- Look for patterns: For example, every 30 days you notice a block of days with high disk utilization metrics. Run another report for just that week and look for trends and sustained peaks. Within those peaks focus on just three hours, then one hour.
- Filter out false positives: When you upgraded to a new application, everyone’s RAM suddenly became insufficient in the metrics; however, a new patch next week fixes a known memory leak vulnerability.
- Memory is critical: The most common issues center around insufficient resources. Users often need more RAM. It’s typically more important than processor speed or flash storage.
After running monthly reports and tracking the trends, narrow your analysis window and draw your conclusions. It’s typical to prioritize the corrective actions that you want to make.
For example, after identifying a storage bottleneck or memory issue that impacts 500 users, you might choose to allocate more memory to the top 50 and monitor that change for a few days.
A perception issue also plays a role. Studies show that users do not notice an improvement unless it signifies at least a 20 percent increase over the previous state. In other words, don’t spread resource allocation adjustments so thin that each user is given a two percent incremental bump-up every six months. They won’t even notice the change. Better to boldly introduce a 20 percent increase today. Your users will definitely notice the improvement.
Monitor changes and look for new patterns for at least two full weeks after a significant change. Compare data before, during, and after the change. Look at variances expressed in units and as percentages. Make sure your audience, staff, and customers are aware of the changes. User engagement is helpful.
Finally, quantify the cost of slow performance in terms of its financial and political impacts:
When 500 users experience slow applications every day for a week, the lost productivity is significant. On a recent CDI engagement, we found an anti-virus process that crawled along very slowly during peak work hours. There was no need to impact users like this when the process could run after midnight.
Another financial example involves a hospital billing department. The accounts receivable team would face a severe challenge if slow network speeds prevented new billings from going out on time.
A critical medical procedure might require MRI images in the next 20 minutes while the patient remains under anesthesia. Now is not the time for performance delays.
Slow physical or virtualized environments also carry legal risks. A firm might be sued for losses involving delays for thousands of users.
Slow performance and a poor user experience does not reflect highly on the brand. Company executives and account managers want to look their best when showcasing new product demos. In these situations, some of your IT staff may receive phone calls from frustrated callers demanding a fix or your resignation.
Performance is no joke, especially when you factor in contractual service agreements and the competitive dynamics of the cloud economy. A sub-standard user experience impacts your bottom line and perception in the news and social media.
In the long run, prevention pays for itself, so fund your performance fixes and attack the next set of bugs early and often. Equipping your staff with faster performance is essential for business.
The Final Word
People expect robust, fast, responsive computing devices. They want to leverage powerful networks, platforms, and applications to increase their productivity. When a weak link in the system arises, it can snowball and user productivity can dramatically decline or drop-off altogether.
In the physical realm, you can still go buy a better laptop.
But in the virtual realm, monitoring the user experience is essential to identify pain points and make the right adjustments.
#1 Do Not Use Persistent Virtual Desktops
Always use non-persistent virtual desktops. They are more secure because they are refreshed from their original image. Persistent virtual desktops behave like physical desktop PCs and are more susceptible to malware, virus infections, and corruption. They may be more difficult to implement and manage, with more requirements, but they are the safer bet in the long run.
Some users may be inconvenienced when their personal files such as Microsoft Word documents that they saved may no longer appear after a desktop refresh. However, as an administrator, you can address this problem by configuring the environment to save personal files and other auxiliary settings and restore them from the user’s network profile after they log in again.
Even though more time is required for managing a non-persistent refresh-ready virtual desktop environment, this investment is well worth the effort. As a case in point, a public school made a smart decision to virtualize about half of their nearly 1,000 desktops. When a virus attack was detected, they simply advised their users to log off. That action alone was all that was required to destroy the virus from all user-accessible VDI desktops, and in only about five minutes. Half the network was spared with only physical desktops and a few servers needing attention. Any non-virtualized PCs or non-persistent desktops required considerable time for remediation. Therefore, it is advised to virtualize the vast majority of your computing resources. For example, imagine the security you would enjoy if fully 90% of your desktops were virtual and only 10% of resources (typically servers) remained as physical hardware devices.
#2 Maintain Agentless Anti-Virus
Most PCs are running a standard anti-virus package. Don’t scale back on dedicated anti-virus. But if you want to optimize performance, you’ll need an agentless anti-virus solution. In tests, typical anti-virus software decreased storage IOPS performance by as much as 30 percent.
Consider an agentless option for the hypervisor, where a light agent is built into VMware Tools on every virtual machine. Since the agent is so small, the solution is considered agentless. VMware’s NSX or vShield also provide a structure to use agentless antivirus and you can put a product like TrendMicro Deep Security or McAfee MOVE on your infrastructure servers. You’ll achieve full-agentless antivirus scanning on virtual desktops.
When a user logs on, they get a fresh virtual machine with no virus. While using the desktop, real-time scans prevent a virus. And when the user logs off, the desktop is refreshed from a clean image. Again, no viruses.
Some customers (schools, municipalities, or small businesses looking to save money) might skip agentless anti-virus, or even skip out on licensing a standard anti-virus package on virtualized machines entirely. This is a poor decision. In these environments, the virus will be introduced, continue to exist, and spread. Even a refresh on a virtual desktop won’t eradicate the virus on these compromised systems. The recurrence of the virus will continue. Even if all users log off, while reducing infection risk dramatically, the potential threat continues to exist. You must maintain real-time anti-virus protection. Agentless options are preferred to eliminate the 30% performance hit.
#3 Disable Multiple Virtual Desktop Logins
Do not allow the same user to log on to multiple virtual desktops at the same time. As an administrator, you need to disable that setting.
The following example illustrates a potential problem scenario that you want to avoid:
A user logs on to their PC. Later that day, that user logs into a virtual machine (VM1). Without logging off of either machine, they go home and decide to use a remote connection to the same machine (VM1) or even a different one (VM2). The security concern is that the session on VM1 is still open and vulnerable while that user is not present. Anyone walking by the PC can assume control of that virtual session.
As a precaution, institute the following network security policy:
Whenever the same user logs into another virtual desktop, automatically log them off the previous machine or virtual desktop.
#4 Use Two-Factor Authentication
Strength and options depend on the vendor technology, but generally speaking, we’re talking about a strong password plus a second form of physical or biometric authentication. Authentication providers include Okta, Imprivata, RSA, Duo, Yubico and others.
You want to enable and maintain an effective two-factor authentication arrangement to prevent unwanted cyber-attacks, data breaches, security intrusions, viruses, malware, and hacks from home or remote PCs.
#5 Use Single-Sign-On (SSO) Tools
Network policies typically enforce strong passwords and force users to change their main desktop password used to establish SSO to network applications every 90 days. Strong SSO password policies typically enforce rules for a minimum length, number of special characters, letters, and numbers, as well as preventing common strings or recycled passwords as a precaution.
With SSO, instead of multiple passwords, users only have to remember one. They are automatically logged into their network applications based on their desktop ID in the corporate LDAP, active directory, or user store. Even remote cloud-hosted applications such as Salesforce.com and Office365 can authenticate users with SSO. That one SSO password is more convenient for both backend administrators and for users. The administrators don’t have to maintain separate user stores with their own password policies. And the users can typically remember their password without writing it down or copying it from an unprotected Excel file.
Security is also improved because there is a 1:1 ratio of unique identifiable usernames with real human employees as opposed to an environment without SSO where a single person might have 10, 20, or more different usernames that obscure the very notion of an authentic identity. However, in the event of a breach, the distributed separate passwords would then be more secure. Hacking or phishing for SSO credentials can allow the hacker to infiltrate more data.
Today, biometrics, once confined to science fiction, Hollywood, and television media, are common today including fingerprint, thumbprint, and retina pattern scanning. Thus, you can combine biometric two-factor authentication with SSO for a successful easy to use, yet secure solution. For the next 10-15 years, dual-authentication consisting of a thumb print or retina scan paired with a traditional memorized password seems likely to remain the de facto two-factor authentication gold standard for government security. Financial institutions are likely to continue one tier below that with a silver standard that consists of a password and a dynamically-generated temporary code.
#6 Restrict Access by Device Type
You can and should restrict access by device type. This involves establishing policies on Windows or Mac servers that restrict access by device type. These restrictions help you respond to the bring-your-own-device (BYOD) mania that took over corporate wireless networks over the past 10-15 years. More secure variations on this theme include restricting access to pre-configured Windows-based thin clients (good), Linux-based thin clients (better), and even more secure zero-clients (best).
Thin-clients are typically Windows or Linux workstations. As such, they could contain viruses. For example, a virus spreads malware onto the thin-client that contains keystroke capture spyware that could compromise the virtual desktop credentials. Linux and Mac clients are considered more secure than Windows devices because of the large Windows market share, and thus larger number of existing Windows viruses.
A more secure alternative is to procure zero-client hardware right from the start. These are dedicated hardware devices with no OS and only a standard BIOS architecture. Zero-clients are available from 10zig, Dell, HP, Samsung, and other popular vendors. A zero-client has no other function but to provide a secure connection to the virtual desktop. For that reason, since they have no OS or other local apps, they are very secure. Windows-based thin clients are not as secure and still remain susceptible to viruses.
For example, a recent innovative hospital was wheeling out patient care carts with diagnostic equipment and each cart had its own Apple iPad to establish a virtual patient chart. The administrators established a policy to allow exclusive access to a patient care app on a virtual desktop infrastructure locked down beyond the reach of other devices.
The following access restriction strategies are common:
You can prohibit connections from certain unwanted devices. For example, you can allow or deny access to users with a PC, Mac, a specific OS, a specific set of login credentials to a virtual desktop, an iPad, an iPhone, a tablet, an Android device, a Windows phone, a Chromebook, or a specific mobile OS. (Hint: Based on recent history, Apple iOS devices are more secure than Android devices.)
You can use management tools to establish policies that secure your own preset thin-clients or zero-clients. For example, Apple utilities and third-party management tools can turn the iPad into a zero-client.
For maximum security, you can reduce the number of access points to your network by enabling client security certificates. Essentially, you enable a tool for handling certificates and then use management software to push a policy to all approved thin or zero-clients to verify a certificate before allowing login.
#7 Configure VDI Servers, Desktops, and Devices on Separate VLANs
Do not use the same VLAN for all network components. For optimum performance and security, you want your virtual desktops, access devices, and infrastructure servers on their own separate VLANs. When on the same VLAN, a weak access point such as a PC with an older OS might become infected with a virus that would easily spread to other virtual desktop clients on the same VLAN. Even servers are not immune when on the same shared VLAN.
Separate VLANs with discrete gateways also add variation to IP addresses, which make device hacking more difficult. Another benefit is more DHCP IP addresses are available because you are splitting access across VLANs. On one C-class VLAN, you would be limited to 256 devices on a single gateway.
#8 Use Network Micro-Segmentation
Gaining in popularity, especially among big government, banking, finance, and pharmaceutical organizations, a micro-segmentation security strategy integrates directly into the VDI without a hardware firewall. Your network policies are synchronized with a virtual network, virtual machine, OS, or other virtual security target to create a security bubble. Access control capabilities in virtual switches replace existing firewall functions for segregation and controlled access across data center tenants.
Micro-segmentation is ideal for today’s software-defined networks with virtual desktops and pools of users on multiple smaller devices. For example, let’s say you want to protect a pool of desktops for the accounting business unit. That department stores very sensitive information and you must maintain a secure environment. With micro-segmentation, you allocate virtual desktops in that specific zone so they can only communicate with Internet and VDI servers, and are blocked from seeing any other desktops. Restricting IP traffic to any sibling desktops is extremely effective at neutralizing the spread of malware or viruses.
Don’t be alarmed by these statistics. More importantly, don’t become a statistic yourself. I’m sharing a few factoids here to help protect you, as one of the nearly 4.6 billion mobile device users out there (Gartner).
- Cybercrimes including hacking and theft cost American businesses over $55 million per year (Ponemon Institute)
- Every month, one in four mobile devices succumbs to some type of cyber threat (Skycure)
- Last year in the United States alone, over five million smartphones were stolen or lost (Consumer Reports)
Who is responsible for such mayhem? Hackers, of course, and online thieves all over the world.
But who is responsible for protecting your device? You are.
As IT and Networking professionals, we can manage mobile device security around the clock, seven days a week, 365 days a year, but it is you, the mobile device owner or user, who ultimately determines the relative health of your smartphone or tablet and the level of security you want to experience.
To protect your mobile device, follow these recommended best practices:
- Lock your device with a passcode: One of the most common ways your identity can be stolen is when your phone is stolen. Lock your device with a password, but do not use common combinations like 1234, 1111. On Android phones you can establish a swipe security pattern. Always set the device to auto-lock when not in use.
- Choose the Right Mobile OS for Your Risk Tolerance: Open source integrations, price, and app selection might guide you toward Android or Windows phones; however, Apple devices running iOS are generally more secure. A recent NBC Cybersecurity News article revealed that Google’s Android operating system has become a primary target for hackers because “app marketplaces for Android tend to be less regulated.” Hackers can more easily deploy malicious apps that can be downloaded by anyone. As an example, the article reported that over 180 different types of ransomware were designed to attack Android devices in 2015. If you’re an Android owner, fear not. Consumers who choose Android can still remain safe by being aware of the vulnerabilities and actively applying the other tips in this article.
- Monitor Links and Websites Carefully: Take a moment to monitor the links you tap and the websites you open. Links in emails, tweets, and ads are often how cybercriminals compromise your device. If it looks suspicious, it’s best to delete it, especially if you are not familiar with the source of the link. When in doubt, throw it out.
If you have Android and your friend has an iOS device, and you both have a link you are not sure about opening, open the link on iOS first. This practice allows you to check out the link while lowering your exposure to risks including malware.
- Regularly Update Your Mobile OS: Take advantage of fixes in the latest OS patches and versions of apps. These updates include fixes for known vulnerabilities. (To avoid data plan charges, download these updates when connected to a trusted wireless network.) Every few days, and especially whenever you hear news about a new virus, take the time to check for OS updates or app patches.
In 2016, an iOS 9.x flaw resulted in a vulnerability for iPhone users where simply receiving a certain image could leave the device susceptible to infection. Apple pushed out a patch. A year ago a similar flaw was detected on Android devices; however, the risk to users was significantly greater, impacting 95 percent of nearly one billion Android devices. An expected 90-day patch was late. Meanwhile, the flaw allowed hacking to the maximum extent possible including gaining complete control of the phone, wiping the device, and even accessing apps or secretly turning on the camera. Don’t ignore those prompts to update!
At this point you may be asking, “Do I need a separate anti-virus app, especially if I use an Android device?” To answer that question, balance your need for security against how much risk you plan on taking with your device. Do you often use public wireless networks and make poor choices with the links you open? For now, you may not need an anti-virus app; however, some early industry trends are showing more anti-virus apps on the horizon.
- Do Not Jailbreak Your Smartphone: Reverse engineering and unauthorized modification of your phone (jailbreaking) leaves your phone vulnerable to malware. Even jailbreaking an iOS device leaves it open to infections. If your cousin already customized your device for you, it’s not too late. Restore the OS through the update process or check with an authorized reseller.
For the rest of the tips please read my work blog: