I am working on a problem that has me vexed and maybe you can help. My data is SAN Switch data that carries data traffic between servers in a data center. In a switch environment, individual ports are configured into zones to isolate data traffic. An example of a zone might be two ports with one being a server and the other being a storage array. Best practice dictates that each zone has a single host in it (called a single initiator). It is possible for a single zone to have multiple hosts with the same name (in the case of multiple host bus adapters in the server). This is not ideal, but not invalid. What I want to look for is to find zones with more than one host in it, where the hosts are not the same. Below is an example of the scenario I am looking for. This actual scenario comes from the attached .tsv file.
FC Switch Vendor Zone Name Zoneset FC Host Name Host Type
LSH9513-32 Cisco Systems v300_wdca1cvd_02_p0_vmax2721_2e0 LSH_A_iSeries wdca1cvd SAN Host
LSH9513-32 Cisco Systems v300_wdca1cvd_02_p0_vmax2721_2e0 LSH_A_iSeries VMAX-2721 SAN Array
LSH9513-32 Cisco Systems v300_wdca1cvd_02_p0_vmax2721_2e0 LSH_A_iSeries c0507603f0fd0122 SAN Host
As you can see, on the "LSH9513-32" switch, in the "v300_wdca1cvd_02_p0_vmax2721_2e0" zone, there are two records that are "SAN Hosts" and they have different FC Host Names. This is an example of a single zone with multiple initiators. My problem is that my data sets are large. All are strings in my data. How do I find the condition where within a single zone name, there exists more than one "SAN Host", where the "FC Host Name"'s are different?
Ultimately I want to build a visualization that shows the user the total number of zones, the number of zones with a single initiator, the number of zones with multiple initiators with the same host name, and finally the number of zones with multiple initiators that are different (and then display a list of the zones with multiples).
If anyone can help me I would be very appreciative.