What’s next for Virtual Desktop Infrastructure?

Greetings CIOs, IT Managers, VM-ers, Cisco-ites, Microsoftians, and all other End-Users out there… Yury here. Yury Magalif. Inviting you now to take another virtual trip with me to the cloud, or at least to your data center. As Practice Manager at CDI, your company is depending on my team of seven (plus or minus a consultant or two) to manage the implementation of virtualized computing including hardware, software, equipment, service optimization, monitoring, provisioning, etc. And you thought we were sitting behind the helpdesk and concerned only with front-end connectivity. Haha (still laughing) that’s a good one!

VDI: OUR JOURNEY BEGINS HERE
Allow me to paint a simple picture and add a splash of math to illustrate why your CIO expects so much from me and my team. Your company posted double-digit revenue growth for three years running and somehow, now, in Q2 of year four, finds itself in a long fourth down and 20 situation. (What? You don’t understand American football analogies? Okay, in the international language of auto-racing, we are 20 laps behind and just lost a wheel.) One thousand employees need new laptops, docking stations, flat panel displays, and related hardware. Complicating the matter are annual software licensing fees for a group of 200 but with only five simultaneous concurrent users worldwide. At $1,500 per user times 1,000, plus the $100 fee, your CIO has to decide how it will explain to the board that it plans to spend another 1.5 million dollars on IT just after Q1 closed down 40 percent and Q2 is looking to be even worse.

To read the rest of this blog, where I try something different, please go to my work blog page:

http://www.cdillc.com/whats-next-virtual-desktop-infrastructure/

Assessing your Infrastructure for VDI with real data – Part 2 of 2 – Analysis

For VSI, we established that using analysis tools was a necessity, and VMware provided wonderful Capacity Planner tool. However, it soon became evident that for VDI, it is even more important to use analysis tools. That is because for VDI, when you buy hardware and software, the investment is generally higher. You need a lot more, faster storage. You need many servers and a fast network. So the margin of error is smaller.

Consequently, using Liquidware FIT or Lakeside SysTrack is essential. There are now a few more tools on the market, like ControlUp or Login PI. However, the new entrants have not been battle tested yet.

So how do you analyze your physical desktops for VDI?

First, buy a license for the Liquidware FIT tool (per user, inexpensive), or buy an engagement from your friendly Valued Added Reseller or Integrator who is a Liquidware partner. If you buy a service from a partner, then usually up to 250 desktop license will be included with the service.

Here, I will talk about services of the partner because that is what I do. However, if you are doing this yourself, just apply the same steps.

You will need to provide your partner’s engineer with space for 2 small Liquidware virtual appliances. The only gotcha is that you want them on the fastest storage you have (SSD preferable). That is because on slower storage, it takes much longer to process any analysis or reports.

The engineer will come and install the 2 appliances into your vSphere. Then, the engineer will give you an EXE or MSI with an agent. Usually, you can use the same mechanism you already use to install software on your desktops to distribute the agent. For example, distribution tools like Microsoft SCCM, Symantec Altiris, LANDesk, and even Microsoft Group Policy will all be good. If you don’t have a mechanism for software distribution, then your engineer can use a script to install the agents on all PCs.

Make sure to choose a subset of your PCs, and at least some from each possible group of similar users (Accounting, Sales, IT, etc.). Your sample size could be about 10-25% of total user count. Obviously, the higher the analysis percentage, the more accuracy you get. But the goal here is not 100% accuracy – it’s impossible to achieve 100%. Assessment and performance analysis is an art as much as a science. Thus, you need just enough users to get a ballpark estimate of what hardware you need to buy. Also, run the assessment for 1 month preferably, or at a bare minimum 2 weeks. The time of the start of the data collection above should start from the time you deploy your last user with the Liquidware agent.

Your partner engineer will need remote access, if possible, to check on the progress of the installation. First, the engineer will check if the agents are reporting successfully back to the Liquidware appliances. During the month, the engineer will make sure agents are reporting and data can be extracted from the appliance.

In the middle of the assessment, engineer will do a so-called “normalization” of the data. That is to make sure the results are compatible with rules of thumb for analysis. If necessary, the engineer will readjust thresholds and recalculate the data back to the beginning.

At the end of 30 days, the engineer will generate a machine-made report on the overall performance metrics, and will present the report to you.

At some partners, for an extra service price, the engineer will go further, and will analyze the report for the amount and performance parameters of hardware you need. In addition, the engineer will create a written report and present all the data to you.

In either case, you will know which desktops have the best score for virtualization, and which ones you should not virtualize. If you go with more advanced report services from your partner, then you will also understand how to map the results to hardware and further insights.

One way of mitigating bad VDI sizings is to also use a load simulation tool like LoginVSI. However, LoginVSI is only useful for clients who can afford to buy similar equipment for the lab that they will buy for production. Using LoginVSI, you can test robotic (fake) users doing tasks that normal users will do in VDI. LoginVSI allows you to have a ballpark hardware number that is good. However, the LoginVSI number does not have real user experience data. For that, you need tools like Liquidware FIT and associated work to determine proper VDI strategy.

Understanding what your current user experience is, and also how that experience could be accommodated with virtual desktops is essential to VDI. You should do this assessment before buying your hardware. Doing an assessment ensures that your users get the same experience or better on the virtual desktop as they have on the physical desktop (the holy grail of VDI).

Assessing your Infrastructure for VDI with real data – Part 1 of 2 – History

It is now a common rule of thumb that when you are building Virtual Server Infrastructure (VSI), you must assess your physical environment with analysis tools. The analysis tools show you how to fit your physical workloads onto virtual machines and hosts.

The golden standard in analysis tools is VMware’s Capacity Planner. Capacity Planner used to be made by a company called AOG. AOG was analyzing not just for physical to virtual migrations, but was doing overall performance analysis of different aspects of the system. AOG was one of the first agentless data collections tools. Agentless was better because you did not have to touch each system in a drastic way, there was less chance of drivers going bad or performance impact to the target system.

Thus, AOG partnered with HP and other manufacturers, and was doing free assessments for their customers, while getting paid by the manufacturer on the backend. AOG tried to sell itself to HP, but HP, stupidly, did not buy AOG. Suddenly, VMware came from nowhere and snapped up AOG. VMware at the time needed an analysis tool to help customers migrate to the virtual infrastructure faster.

When VMware bought AOG, VMware dropped AOG’s other analysis business, and made AOG a free tool for partners to analyze migrations to the virtual infrastructure. It was a shame, because AOG’s tool, renamed to Capacity Planner, was really good. Capacity Planner relies solely on Windows Management Instrumentation (WMI) functions that is already built into Windows and is collecting information all the time. Normally, WMI discards information like performance, unless it is collected somewhere by choice. Capacity Planner just enabled that choice, and collected WMI performance and configuration data from each physical machine.

When VMware entered the Virtual Desktop Infrastructure (VDI) business with Horizon View, it lacked major pieces in the VDI ecosphere. One of the pieces was profile management, another piece was planning and analysis, another piece was monitoring. Immediately, numerous companies sprang to life to help VMware fill the need. Liquidware Labs (where the founder worked for VMware) was the first to come up with a robust planning and analysis tool in Stratusphere FIT, then with monitoring tool in Stratusphere UX. Lakeside SysTrack came on the scene. VMware used both internally, although the preference was for Liquidware.

Finally, VMware realized that the lack of analysis tool for VDI, made by VMware, was hindering them. But what they failed to realize, was that such tool already existed at VMware for years – Capacity Planner. The Capacity Planner team was neglected, so rarely would any updates were done to the tool. However, since Capacity Planner could already analyze physical machines for performance, it was easy to modify the code to collect information on virtualizing physical desktops, in addition to servers.

Capacity Planner code was eventually updated with desktops analysis. All VMware partners were jumping with joy – we now had a great tool and we did not have to relearn any new software. I remember that I eagerly collected my first data, and began to analyze the data. After analysis, the tool told me I needed something like twenty physical servers to hold 400 virtual desktops. Twenty desktops per server? That sounded wasteful. I was a beginner VDI specialist then, so I trusted the tool but still had doubts. Then I did a few more passes at the analysis, and kept getting wildly different numbers. Trusting my gut instinct, I decided to redo one analysis with Liquidware FIT.

Of course, Liquidware FIT has agents, so I used it, but always thought that it would be nice not to have agents. So VMware’s addition of desktop analysis to agentless Capacity Planner was very welcome. So, back to my analysis, after running Liquidware FIT, I came up with completely different numbers. I don’t remember what they were – perhaps 60 desktops per physical server, or something else. But what I do remember was that Liquidware’s analysis made sense, where Capacity Planner did not. My suspicions about Capacity Planner as a tool were confirmed by VMware’s own VDI staff, who, when asked if they use Capacity Planner to size VDI said, “For VDI, avoid Cap Planner like the plague, and keep using Liquidware FIT.”

As a result, I kept using Liquidware FIT since then, and never looked back. While FIT does have agents, now I understand that getting metrics like Application load times and User Login delay is not possible without agents. That is because Windows does not include such metrics in WMI. Therefore, a rich agent is able to pick up many more user experience items, and thus do much better modeling.

Collateral for my presentation at the Workshop of the Association of Environmental Authorities of NJ (AEANJ)

I was glad for a chance to present at the Workshop of the Association of Environmental Authorities of NJ (AEANJ). There were great questions from the audience.

Thank you to attendees, Leon McBride for the invitation, Peggy Gallos, Karen Burris, and to my colleague Lucy Valle for videotaping.

My presentation is called “Data Portability, Data Security, and Data Availability in Cloud Services”

Here are the collateral files for the session:

Slides:

AEANJ Workshop 2016-slides-YuryMagalif

Video:

AEANJ Workshop 2016 Video – Yury Magalif

The lure of Hyper-Converged for VDI

So you decided to implement Virtual Desktop Infrastructure (VDI). Virtual desktops and app delivery sound sexy, but once you’ve started delving into the nitty gritty, you quickly realize that VDI has many variables. In fact, so many, that you start to feel overwhelmed.

At this point you have a couple of options. First, you can keep doing this yourself, but that will take valuable time.  You can hire a VDI engineer to your team, but that also tales time and money to find a great engineer.

Another option is to hire a Value Added Reseller that has done VDI a hundred times. Great idea – I will love you forever, and will do great VDI for you. But I can be expensive.

One particular sticking point in VDI is the sizing of the hardware for the environment. If you undershoot the amount of compute, storage, memory or networking, you risk having unhappy users with underpowered virtual desktops. If you overshoot, you may be chastised for overspending.

Too often I have seen the user profile not properly examined, sized etc. The result is that the derived virtual desktop is low on memory or CPU. The user immediately blames the new technology, not even assigning blame to something they may have done. But the real performance problem culprit may lie somewhere else. However, the user just had his shiny physical machine taken away, and it was replaced with something intangible. Of course, all the problems, whether related to VDI or not, will be blamed on VDI, and possibly the VDI sizing. The bad buzz spreads through the company. Such buzz kills your VDI project faster than performance problems.

So, what is one way to avoid thinking about sizing? Hyper-Converged.

Hyper-Converged means a node in a cluster has a little bit of everything – compute, storage, memory, network. Each node is generally the same but there could be different types of nodes – for example, Simplivity has some nodes with everything, and some nodes only doing compute.

Since most nodes are the same, once you figured out how many average Virtual Desktops in a specific profile fit on a node, you can just keep adding nodes for scalability.

In fact, Nutanix capitalized on that brilliantly when they announced the famous guarantee – once the customer says how many users they want to put on Nutanix, the vendor will provide enough Hyper-Converged nodes to have a great user experience. The guarantee was hard to enforce on both the customer end and Nutanix end. But the guarantee sure had lots of marketing power. Time and time again I heard it from customers and other VARs. The guarantee was a placebo for making VDI easier.

Consequently, you should not just rely on a guarantee for VDI sizing. Sizing should be verified with load simulation tools like LoginVSI and View Planner. Then, the profile of your actual user should be evaluated by collecting user experience data with a tool like Liquidware FIT or Lakeside SysTrack.

Once the data is collected and analyzed, you can decide what number of Hyper-Converged nodes to buy. Hyper-Converged makes the sizing easy because you always deal with uniform nodes.

Once you are in production, you should be monitoring user experience constantly with a tool like Liquidware UX. UX will allow you to always have a solid idea of what your user profiles to. As a result, you can confidently say, “On my Hyper-Converged node I can host up to 50 users.” Thus, if you grow to 100 users, you need 2 Hyper-Converged nodes.

Saying the above is the holy grail of scalability. And therein lies the lure of Hyper-Converged – as a basic VDI building block. That is why Hyper-Converged companies started with strong VDI stories, and only later began marketing for Virtual Server Infrastructure.

And any technology that makes VDI easier, even by one iota, makes VDI more popular. Hail Hyper-Converged for VDI!

Is FCoE winning the war vs. Native FC?

Note: this article was first published in 2013 on my work blog Cloud-Giraffe. I am republishing it here because the article is no longer available from my work blog. The ideas here are still relevant.

I had a customer recently buy 2 Cisco Nexus 5548UP switches to be used for their Fibre Channel (FC) Storage functionality, which is 50% of the total switch functionality. For Ethernet features, the customer bought separate Nexus switches. I did not advise on the design of this solution. However, it is a telling sign of Cisco’s weakened efforts to play down native FC functionality in favor of native Fibre Channel over Ethernet (FCoE). Unfortunately, promoting FCoE is proving to be harder than it looked. This is due to two main problems — confusion about the management of FCoE and Cisco’s main storage switch competitor Brocade wielding formidable muscle to keep native FC alive.

In 2003, Ciscoentered the FC native switch market with a splash. Cisco did this by perfecting one of their first spin-ins. A spin-in is when a company tells a few hundred of their engineers, “You will become a separate company and receive shares. Then, you will build us the best new technology on the market. If the tech is successful, Cisco will buy the company back, and all who received shares will profit handsomely.” Thus, Andiamo Systems was born and built the Cisco MDS FC storage switch.
At the time of their MDS FC switch release, Cisco’s primary competitor Brocade had 1/3 of the software features of the new MDS switch. I remember talking to my colleagues, old school FC guys, about the Cisco MDS. “Cisco does not know the FC storage market — they are network specialists. They will never release a switch as easy and robust as a Brocade or McData,” forecasted my bearded comrades. I believed them — they had years of experience.
Soon, Cisco was sending me to MDS courses, and the new CCIE certification in Storage was born. After I took the courses and realized the MDS’s feature dominance, I was hooked. Any geek worth his salt likes an abundance of gadgets — Cisco had Virtual Storage Area Networks (VSANs), FC pings, Inter-VSAN routing and Internet Small Computer System Interface (iSCSI) built in. This was when iSCSI storage was a novel, branch only solution. But with Cisco, you could do iSCSI with ANY storage.
The only Cisco problem was cost. Cisco attacked the large enterprise market first. As a result, an entry level MDS 9216 switch with all the wizardry cost $52 thousand list. The price was prohibitive for smaller markets. However, Cisco’s access to large enterprise decision makers allowed the company to quickly dispatch of the McData switch company. McData was gobbled up by Brocade. Brocade was not passive — it quickly ramped up development of software features. Further, Brocade’s main counter bet was the raw hardware speed strategy. Brocade decided to improve native FC chips faster than Cisco. As a result, Brocade was faster to market with 4 Gbit and then 8 Gbit FC.
At the time of Brocade’s 4 and 8 Gbit increases, most critics said that no client requires such speed, that it is purely a marketing gimmick. Such marketing gimmick charges were also leveled against Cisco for advanced software gadgets in their switches. The FC customer had a choice — raw FC speed vs. advanced software. While I enjoyed advanced software, I admit — few clients used switch based iSCSI and extra Cisco features. Both Brocade’s FC speed dominance and Cisco software advantage were pure marketing. But with time, it became evident that faster chips prevail in the marketing argument. Software was faster to develop than custom Application-specific Integrated Circuits (ASICs). Consequently, Brocade matched Cisco’s features, and was winning in FC speeds. Pundits that shot down the speed were silenced, as throngs of customers wanted faster Input/Output (I/O) for their growing VMware virtual server farms.
Still, Cisco had another card up its sleeve. Andiamo’s success was spectacular. The team behind the MDS was itching to disrupt another market, and Cisco was happy to oblige. Nuova Systems put together the same cast of characters in another spin-in. The spin-in secretly burrowed inside Cisco, building something unique. Speculation abounded, but when Cisco finally announced Nuova’s work, it was revolutionary.
Cisco/Nuova was the first to release the Nexus, a unified switch which supported native FC and Ethernet/Internet Protocol (IP) in the same box. Further, Cisco was the first to release the 10 Gbit FCoE protocol for their unified switches. Also, Cisco entered the server market with the Unified Computing System (UCS), a blade server platform with a network oriented architecture. Brocade was again caught off-guard, and went to its tried and true strategy — win with native FC, while countering Cisco’s feature superiority with “me toos.” Thus, Brocade was the first on the market with 16 Gbit FC. Meanwhile, Cisco touted the benefits of unified switching at 10 Gbit.
The promise of unified switching made complete sense. Why have 2 separate networks — storage and IP, which have similar concepts? Yet, in the 1990s, the storage market diverged onto a separate network path. Storage required a switching infrastructure, and the industry delivered the Storage Area Network (SAN). Developed in part by the SCSI guys mixed in with IP pros, FC was a robust protocol, perfect for storage. However, when FC expanded into multiple sites and routing, its management became similar to IP network management. Still, storage was managed by the storage team and IP by the network team. Separate, sometimes warring Network vs. Storage silos increased hardware and staff costs at large enterprises.
Cisco argued it could reintegrate the silos into one job role — an omniscient super Data Center guru. To that end, Cisco Certified Internetwork Expert (CCIE) certification in Storage was recently discontinued in favor of CCIE in Data Center. The Data Center CCIE requires knowledge of the server in Cisco UCS, the storage of Cisco MDS and Nexus FC, the Cisco Nexus Layer 2/3 network, and the Nexus 1000V virtual switching functionality. In Cisco’s view, the Data Center CCIE is the James Bond of the Data Center world, able to handle any problem thrown at him. I have no doubt that such people will exist, and they will assemble multi-disciplinary departments.
However, the current realities in the field seem to tell a different story. Over time, FC and storage developed into a whole separate area with its own deep expertise. The storage professional was required to know the management of multiple storage arrays from different manufacturers — EMC, NetApp, HP, Hitachi, Dell, and others. The number of storage protocols increased from just FC and occasional Fibre Channel over IP (FCIP) for routing across distances to iSCSI, Network File System (NFS), Common Internet File System (CIFS) and FCoE. Brocade absorbed McData, but Qlogic began to make FC switches.
On the network side, the amount of technology keeps increasing. The Routing and Switching CCIE of today has to know 3-5 times more than CCIE #1. Moreover, we have virtual and Software Defined Networking (SDN) disrupting the field. The future network guy has to know FabricPath, Overlay Transport Virtualization (OTV), the new Cisco’s Cloud Services Router 1000V (CSR), VMware’s vCloud Director, Nicira, and Brocade-acquired Vyatta.
The FCoE protocol was meant to unify storage and network management through the Nexus switch. Unfortunately, what I see is that the network guys are weak in the Nexus FC functionality. Yes, they may have gone to a class to learn the MDS or Nexus’s FC side, and they know the transport side. However, they lack the knowledge and control of the day to day management of the endpoints — server, virtualization and storage arrays. As a result, whoever controls the Nexus FCoE, cannot control and troubleshoot the native NetApp FCoE card, and also cannot control LUN presentation in a UCS blade. But, complete control of the FC stack is essential in FC troubleshooting. On the other hand, storage guys rarely want to delve into the network features of the Nexus switch. Consequently, because the Nexus is first and foremost a network switch, the storage guys never have daily administrative control.
When a shop is moving from one manufacturer to another, toward a converged network, it does not want to have multiple manufacturers. Another customer asked me to design a Nexus only solution, because they wanted to move away from Brocades. They had money for the Nexuses, but not enough Nexuses to accommodate all Brocade FC ports currently in existence. When I mentioned introducing Cisco MDSs, they said, “Why are you introducing native FC into the mix when the Nexus FCoE mantra is suppose to solve our storage needs?” The death of native FC, and especially Cisco MDSs has been predicted before. When Cisco retired CCIE in Storage, that was another sign that the MDS native FC switch is on its way out. Yet, Brocade released 16 Gbit native FC 2 years ago.
Today, the Data Center has many management headaches due to convergence.
  • Who will manage the server and its network devices (Converged Network Adapter cards, end-host I/O modules like Hewlett-Packard Virtual Connect or Cisco UCS Fabric Interconnects) — VMware guys, old school server guys, storage or network gals?
  • Who will manage the network going from the server out to the first access device (end-host I/O modules, Cisco UCS Fabric Interconnects)?
  • Who will manage storage and network transports (Cisco Nexus, Brocade switches)?
  • Who will manage the network in a virtual network world (Cisco Nexus 1000V, CSR, Vyatta, Nicira, hardware Nexus, OpenStack)?
These management questions have not been settled. In fact, what I see is the expertise of IT staff, like water, flows back and forth between departments. No one is quite sure where her responsibility ends and the colleague’s begins.
In this world of management confusion, the tried and true resonates. As a result, Brocade was successful in convincing the Data Center to wait on the death of native FC. And, just like before with 4 and 8 Gbit FC, Cisco had to answer. Therefore, the new Cisco 16 Gbit FC MDS 9710 Multilayer Director and the upcoming MDS 9250i Multiservice Fabric Switch that were just announced continue the time honored tradition of Brocade vs. Cisco FC war. In response, on LinkedIn Brocade immediately pointed to a press release touting added software features in their current code base. In addition, I am sure Brocade R&D is already well on the way to release the 32 Gbit FC ASICs in the next round of battle.
Meanwhile, many IT departments like my original customer will continue to follow a dual path — native FC separate from the Network, avoiding FCoE. Even when that avoidance is done on the switches from the same manufacturer — Cisco, and when it happens with the switch that fully supports FCoE — the Nexus. The native Fibre Channel protocol is alive and well.
References:
Cisco Helps Customers Address Rising Cloud, Big Data Requirements By Raising the Bar for Storage Networking.
Brocade Advances Fibre Channel Innovation With New Fabric Vision Technology.

VMware vSphere misidentifies local or SAN-attached SSD drives as non-SSD

Symptom:

You are trying to configure Host Cache Configuration feature in VMware vSphere. The Host Cache feature will swap memory to a local SSD drive, if vSphere encounters memory constraints. It is similar to the famous Windows ReadyBoost.

Host Cache requires an SSD drive, and ESXi will detect the drive type as SSD. If the drive type is NOT SSD, Host Cache Configuration will not be allowed.

However, even though you put in some local SSD drives on the ESXi host, and also have an SSD drive on your storage array coming through, ESXi refuses to recognize the drives as SSD type, and thus refuses to let you use Host Cache.

Solution:

Apply some CLI commands to force ESXi into understanding that your drive is really SSD. Then reconfigure your Host Cache.

Instructions:

Look up the name of the disk and its naa.xxxxxx number in VMware GUI. In our example, we found that the disks that are not properly showing as SSD are:

  • Dell Serial Attached SCSI Disk (naa.600508e0000000002edc6d0e4e3bae0e)  — local SSD
  • DGC Fibre Channel Disk (naa.60060160a89128005a6304b3d121e111) — SAN-attached SSD

Check in the GUI that both show up as non-SSD type.

SSH to ESXi host. Each ESXi host will require you to look up the unique disk names and perform the commands below separately, once per host.

Type the following commands, and find the NAA numbers of your disks.

In the examples below, the relevant information is highlighted in RED.

The commands you need to type are BOLD.

The comments on commands are in GREEN.

———————————————————————————————-

~ # esxcli storage nmp device list

naa.600508e0000000002edc6d0e4e3bae0e

Device Display Name: Dell Serial Attached SCSI Disk (naa.600508e0000000002edc6d0e4e3bae0e)

Storage Array Type: VMW_SATP_LOCAL

Storage Array Type Device Config: SATP VMW_SATP_LOCAL does not support device configuration.

Path Selection Policy: VMW_PSP_FIXED

Path Selection Policy Device Config: {preferred=vmhba0:C1:T0:L0;current=vmhba0:C1:T0:L0}

Path Selection Policy Device Custom Config:

Working Paths: vmhba0:C1:T0:L0

naa.60060160a89128005a6304b3d121e111

Device Display Name: DGC Fibre Channel Disk (naa.60060160a89128005a6304b3d121e111)

Storage Array Type: VMW_SATP_ALUA_CX

Storage Array Type Device Config: {navireg=on, ipfilter=on}{implicit_support=on;explicit_support=on; explicit_allow=on;alua_followover=on;{TPG_id=1,TPG_state=ANO}{TPG_id=2,TPG_state=AO}}

Path Selection Policy: VMW_PSP_RR

Path Selection Policy Device Config: {policy=rr,iops=1000,bytes=10485760,useANO=0;lastPathIndex=1: NumIOsPending=0,numBytesPending=0}

Path Selection Policy Device Custom Config:

Working Paths: vmhba2:C0:T1:L0

naa.60060160a891280066fa0275d221e111

Device Display Name: DGC Fibre Channel Disk (naa.60060160a891280066fa0275d221e111)

Storage Array Type: VMW_SATP_ALUA_CX

Storage Array Type Device Config: {navireg=on, ipfilter=on}{implicit_support=on;explicit_support=on; explicit_allow=on;alua_followover=on;{TPG_id=1,TPG_state=ANO}{TPG_id=2,TPG_state=AO}}

Path Selection Policy: VMW_PSP_RR

Path Selection Policy Device Config: {policy=rr,iops=1000,bytes=10485760,useANO=0;lastPathIndex=1: NumIOsPending=0,numBytesPending=0}

Path Selection Policy Device Custom Config:

Working Paths: vmhba2:C0:T1:L3

———————————————————————————————-

Note that the Storage Array Type is VMW_SATP_LOCAL for the local SSD drive and VMW_SATP_ALUA_CX for the SAN-attached SSD drive.

Now we will check to see if in CLI, ESXi reports the disks as SSD or non-SSD for both disks. Make sure to specify your own NAA number when typing the command.

———————————————————————————————-

~ # esxcli storage core device list –device=naa.600508e0000000002edc6d0e4e3bae0e

naa.600508e0000000002edc6d0e4e3bae0e

Display Name: Dell Serial Attached SCSI Disk (naa.600508e0000000002edc6d0e4e3bae0e)

Has Settable Display Name: true

Size: 94848

Device Type: Direct-Access

Multipath Plugin: NMP

Devfs Path: /vmfs/devices/disks/naa.600508e0000000002edc6d0e4e3bae0e

Vendor: Dell

Model: Virtual Disk

Revision: 1028

SCSI Level: 6

Is Pseudo: false

Status: degraded

Is RDM Capable: true

Is Local: false

Is Removable: false

Is SSD: false

Is Offline: false

Is Perennially Reserved: false

Thin Provisioning Status: unknown

Attached Filters:

VAAI Status: unknown

Other UIDs: vml.0200000000600508e0000000002edc6d0e4e3bae0e566972747561

~ # esxcli storage core device list –device=naa.60060160a89128005a6304b3d121e111

naa.60060160a89128005a6304b3d121e111

Display Name: DGC Fibre Channel Disk (naa.60060160a89128005a6304b3d121e111)

Has Settable Display Name: true

Size: 435200

Device Type: Direct-Access

Multipath Plugin: NMP

Devfs Path: /vmfs/devices/disks/naa.60060160a89128005a6304b3d121e111

Vendor: DGC

Model: VRAID

Revision: 0430

SCSI Level: 4

Is Pseudo: false

Status: on

Is RDM Capable: true

Is Local: false

Is Removable: false

Is SSD: false

Is Offline: false

Is Perennially Reserved: false

Thin Provisioning Status: yes

Attached Filters: VAAI_FILTER

VAAI Status: supported

Other UIDs: vml.020000000060060160a89128005a6304b3d121e111565241494420

———————————————————————————————-

Now we will add a rule to enable SSD on those 2 disks. Make sure to specify your own NAA number when typing the commands.

———————————————————————————————-

~ # esxcli storage nmp satp rule add –satp VMW_SATP_LOCAL –device naa.600508e0000000002edc6d0e4e3bae0e –option=enable_ssd

~ # esxcli storage nmp satp rule add –satp VMW_SATP_ALUA_CX –device naa.60060160a89128005a6304b3d121e111 –option=enable_ssd

———————————————————————————————-

Next, we will check to see that the commands took effect for the 2 disks.

———————————————————————————————-

~ # esxcli storage nmp satp rule list | grep enable_ssd

VMW_SATP_ALUA_CX     naa.60060160a89128005a6304b3d121e111                                                enable_ssd                  user

VMW_SATP_LOCAL       naa.600508e0000000002edc6d0e4e3bae0e                                                enable_ssd                  user

———————————————————————————————-

Then, we will run storage reclaim commands on those 2 disks. Make sure to specify your own NAA number when typing the commands.

———————————————————————————————-

~ # esxcli storage core claiming reclaim -d naa.60060160a89128005a6304b3d121e111

~ # esxcli storage core claiming reclaim -d naa.600508e0000000002edc6d0e4e3bae0e

Unable to unclaim path vmhba0:C1:T0:L0 on device naa.600508e0000000002edc6d0e4e3bae0e. Some paths may be left in an unclaimed state. You will need to claim them manually using the appropriate commands or wait for periodic path claiming to reclaim them automatically.

———————————————————————————————-

If you get the error message above, that’s OK. It takes time for the reclaim command to work.

You can check in the CLI by running the command below and looking for “Is SSD: false”

———————————————————————————————-

~ # esxcli storage core device list –device=naa.600508e0000000002edc6d0e4e3bae0e

naa.600508e0000000002edc6d0e4e3bae0e

Display Name: Dell Serial Attached SCSI Disk (naa.600508e0000000002edc6d0e4e3bae0e)

Has Settable Display Name: true

Size: 94848

Device Type: Direct-Access

Multipath Plugin: NMP

Devfs Path: /vmfs/devices/disks/naa.600508e0000000002edc6d0e4e3bae0e

Vendor: Dell

Model: Virtual Disk

Revision: 1028

SCSI Level: 6

Is Pseudo: false

Status: degraded

Is RDM Capable: true

Is Local: false

Is Removable: false

Is SSD: false

Is Offline: false

Is Perennially Reserved: false

Thin Provisioning Status: unknown

Attached Filters:

VAAI Status: unknown

Other UIDs: vml.0200000000600508e0000000002edc6d0e4e3bae0e566972747561

———————————————————————————————-

Check in the vSphere Client GUI. Rescan storage.

If it still does NOT say SSD, reboot the ESXi host. 

Then look in the GUI and rerun the command below.

———————————————————————————————-

~ # esxcli storage core device list —device=naa.60060160a89128005a6304b3d121e111

naa.60060160a89128005a6304b3d121e111

Display Name: DGC Fibre Channel Disk (naa.60060160a89128005a6304b3d121e111)

Has Settable Display Name: true

Size: 435200

Device Type: Direct-Access

Multipath Plugin: NMP

Devfs Path: /vmfs/devices/disks/naa.60060160a89128005a6304b3d121e111

Vendor: DGC

Model: VRAID

Revision: 0430

SCSI Level: 4

Is Pseudo: false

Status: on

Is RDM Capable: true

Is Local: false

Is Removable: false

Is SSD: true

Is Offline: false

Is Perennially Reserved: false

Thin Provisioning Status: yes

Attached Filters: VAAI_FILTER

VAAI Status: supported

Other UIDs: vml.020000000060060160a89128005a6304b3d121e111565241494420

———————————————————————————————-

If it still does NOT say SSD, you need to wait. Eventually, the command works and displays as SSD in CLI and the GUI. 

More Information:

See the article below:

Swap to host cache aka swap to SSD?

Enable Microsoft Exchange 2010 & 2013 DAC mode

Description:

Datacenter Activation Coordination (DAC) mode is a property setting for a database availability group (DAG).

DAC mode is disabled by default and should be enabled for all DAGs with 2 or more members that use continuous replication.

That means the majority of Exchange DAG deployments need the DAC mode.

The DAC is most useful in a multi-datacenter configuration to prevent split brain syndrome, a condition that occurs when all networks fail, and DAG members can’t receive heartbeat signals from each other.

However, I suggest you always enable the DAC.

If you enable the DAC and you need to recover, the recovery takes less commands on the command line. Only the following commandlets will be necessary:

Stop-DatabaseAvailabilityGroup

Restore-DatabaseAvailabilityGroup

Start-DatabaseAvailabilityGroup

Also, if you did NOT enable the DAC, you cannot do so if you have a failure. The DAC must be enabled ahead of time.

Instructions:

Here is how to enable the DAC mode:

1. Go to Exchange Management Shell

2. Type the following, where DAG2 is your DAG name:

Set-DatabaseAvailabilityGroup -Identity DAG2 -DatacenterActivationMode DagOnly

More Information:

For more information, read this article:

http://technet.microsoft.com/en-us/library/dd979790.aspx

Exchange 2010-2013 Database Availability Group (DAG) cluster timeout settings for VMware

Symptom:

Exchange 2010-2013 Database Availability Group (DAG) active database moves between DAG nodes without any reason, when the DAG nodes are VMware Virtual Machines. This may be due to the DAG node being VMotioned by vSphere DRS cluster.

Solution:

The settings below allow you to VMotion without DAG active databases flipping between nodes for no reason.

Although the tip below is mainly useful for Multi-Site DAG clusters, I have seen this flipping happen even within the same site. So, the recommendation is to do these commands on ANY DAG cluster that is running on VMware.

Instructions:

Substitute your DAG name for an example DAG name below, yourDAGname or rpsdag01.

On any Mailbox Role DAG cluster node, open Windows PowerShell with modules loaded.

Image

Type the following command to check current settings:

cluster /cluster:yourDAGname /prop

Note the following Values:

SameSubnetDelay

SameSubnetThreshold

CrossSubnetDelay

CrossSubnetThreshold

Image

Type the following commands to change the timeout settings.

cluster /cluster:yourDAGname /prop SameSubnetDelay=2000

cluster /cluster:yourDAGname /prop SameSubnetThreshold=10

cluster /cluster:yourDAGname /prop CrossSubnetDelay=4000

cluster /cluster:yourDAGname /prop CrossSubnetThreshold=10

Image

Type the command to check that settings took

cluster /cluster:yourDAGname /prop

Image

You ONLY need to run this on ONE DAG node. It will be replicated to ALL the other DAG nodes.

More Information:

See the article below:

http://technet.microsoft.com/en-us/library/dd197562(v=ws.10).aspx

How to disable cluster shared volumes (CSV) on Windows Server 2008 R2.

Description:

Cluster Shared Volumes (CSV) are shared drives holding an NTFS volume that can be written to by all cluster nodes in a Windows Server Failover Cluster.

They are especially important for Microsoft Hyper-V virtualization, because all host servers can see the same volume and can store multiple Virtual Machines on that volume. Using CSVs will allow Windows to perform Live Migration with less timeouts.

Unfortunately, many third party applications do not support CSVs. In addition, CSVs can be much harder to troubleshoot.

Instructions:

If you need to disable CSVs, use the following command in PowerShell:

Get-Cluster | %{$_.EnableSharedVolumes = “Disabled”}

References:

http://windowsitpro.com/windows/q-how-can-i-disable-cluster-shared-volumes-windows-server-2008-r2