
1. Re: How? Compare cell value to cell value in all other rows, count matches
Mary Solbrig May 11, 2015 12:09 AM (in response to daniel.ayoub.0)Best way I can think of doing this exact thing would be to use a selfjoin. Instructions below assume that this is an Excel file:
Step 1: Create the connection
1. Connect to data using the legacy Excel connection. See steps 13 of the following blog for instructions:
https://www.interworks.com/blogs/kfontenot/2014/07/24/customsqltableau82
2. Drag Sheet 1 onto the Data Connect pane twice.
3. Click on the joincircles and define a join as ID <> ID (Sheet $1)
4. Alias the fields as ID, Latitude, ID 2, and Latitude 2.
5. Go to sheet
Step 2: Create the Calculations
1. Create the following calculated fields:
Close:
IF ABS([Latitude]  [Latitude 2]) < .5
THEN 1 ELSE 0 END
Count Close:
WINDOW_SUM([Close])
First Filter:
FIRST() = 0
Step 3: Create the view
1. Drag ID to the Rows shelf
2. Drag ID 2 and Count Close to the Columns shelf
3. Rightclick on Count Close and select Compute Using > Table Across
4. Drag First Filter to the Filters shelf
5. Rightclick on First Filter and select Compute Using > Table Across
6. Select True on the popup dialogue and select OK
7. Rightclick on ID 2 and deselect Show Headers
Another option is to enable an R connection and use the following calculation:
SCRIPT_REAL("
x1 < .arg1
out < rep(0,length(x1))
for(i in 1:length(x1)){
out[i] < sum(abs(x1x1[i]) < .5)
}
out
",
AVG([Latitude])
)
The question I really have though is what question are you trying to answer? If you are just trying to find out are close to any individual point, then a parameter could be used to choose which point you were referring to. This would avoid a lot of hassle, but you would only be able to see the results for one point at a time. Another option might be to just round the points to the nearest whole number, and then count how many points lie in each wholenumber bin. This is not quite the same information though.

2. Re: How? Compare cell value to cell value in all other rows, count matches
Jonathan Drummey May 11, 2015 2:27 AM (in response to daniel.ayoub.0)1 of 1 people found this helpfulI'd started on a response to this one, and then came back to it finding that @Mary had written a post that covered the two solutions I hadn't generated options for. Here's my description and a third option:
COUNTIFS() is a function that demonstrates the flexibility of Excel's cellbased model. In a given cell we can write a COUNTIFS() calculation and have it reference any other range of cells, so for each Row ID it's easy to have it reference all the other Row IDs.
Tableau has a different model where each record is aggregated at the level of detail of the view i.e. the dimension(s) in the view to generate marks. In this case, the level of detail would be the Row ID. Tableau does have ways to aggregate across the marks, techniques include table calculations, level of detail expressions, computed Sets, and data blends. However, this particular case is an edge case where Tableau's calculations don't quite have the native ability to do what we want. The way I think of this is it's "compare the value of each to each other" problem, in that for each Row ID we want to compare the value of the Latitude for this Row ID to the value of Latitude for every other Row ID.
In Tableau terms, that means every Row ID ultimately needs to have all the values of Latitude for other Row IDs available to the computation for the individual Row ID for computation. Tableau doesn't have something like a THIS() function or an iterator function that would let us do that in a simple way. There are three approaches I can think of:
1) Generate a cross product for all Row IDs in the data connection, then do the calculation in Tableau. This could be done using the selfjoin that Mary described, Custom SQL, or a view in the data source. A variation on this would just do all the calculation & aggregation in SQL.
2) Use Tableau's R integration. I haven't worked out all the details of the R script, however I'm certain it's possible since R's vector operations are built for this sort of thing. Mary provided a solution for this as well.
3) Pad out the data in Tableau using Tableau's densification behavior. http://drawingwithnumbers.artisart.org/comparingeachagainsteachotherthenosqlcrossproduct/ shows an example of this, and I used the same technique to build out this view:
Here's how it works:
 With Row ID on Rows and Row ID (copy) on columns and a table calculation with the right Compute using, Tableau pads out the combinations of Row & Row ID using what is called 'domain completion' to generate N^2 cells.
 One table calculation can the pad out the values of Latitude across each Row by having a Compute Using on the Row ID (copy). Here's the formula: PREVIOUS_VALUE(WINDOW_MAX(MAX([Latitude]))). The key part is the WINDOW_MAX(), the PREVIOUS_VALUE is an optimization that definitely helped improve performance through v8, I haven't heard about any changes for v9 so I'm guessing it's still useful.
 Another table calculation pads out the values of Latitude down each column of Row ID (copy). It has a Compute Using on Row ID and uses the same formula as the prior calc.
 A third nested table calculation with a compute using on Row ID (copy) does the comparison of the Latitude (across) to the Latitude (down) and returns 1 if they are within 0.5, that is summed up across all the Row ID (copy) values and finally only a single nonNull result is returned for Row ID (copy). Here's the formula:
IF FIRST()==0 THEN
WINDOW_SUM(IF ABS([Latitude (down for others)]  [Latitude (across this row)]) <= 0.5 THEN
1
END)
END
Here's the workout worksheet where you can see this:
To build the final view, I duplicated the workout worksheet, dragged off the unneeded calculation pills, Ctrl/Cmd+Dragged a copy of the Count pill to the Filters Shelf and set it to filter for nonNull values, turned off the header for the Row ID (copy) pill, added a Count Header field with a single value 'Count' and used that to generate header text.
A v9 workbook is attached, though this technique would work from at least v7 onwards (if you need it in an earlier version, let me know and I can duplicate it pretty quickly).
Finally, a note on volumes & performance:
 With enough memory the domain completion solution I proposed can work, though it will get slower and slower to refresh, I'm guessing it would likely start running out of memory somewhere between 1K and 10K Row IDs because that would translate into 1M to 100M marks. Allan Walker has demonstrated a view with 600M marks, and that took something like 45 minutes to render on a 64GB machine.
 The SQL cross product can definitely handle more Row IDs, though if Excelbased that may run into some limits due to MS JET limitations. Other DBs could handle a larger data set.
 The R solution should scale the best of all the options, since instead of N^2 records (cross product) or N^2 addresses (domain completion) it's using N*2 vectors.
Jonathan

3. Re: How? Compare cell value to cell value in all other rows, count matches
Jonathan Drummey May 11, 2015 2:31 AM (in response to Mary Solbrig)Nice work, Mary!

4. Re: How? Compare cell value to cell value in all other rows, count matches
daniel.ayoub.0 May 15, 2015 10:39 AM (in response to daniel.ayoub.0)Jonathan, Mary,
I really appreciate the answers; they both helped me answer many of the questions I had while looking for a solution.
I tried the SQL selfjoin, however I have ~100,000 rows and the database quickly grew too large. Doing this in R looks like it could be the right path  going to give this a try tonight.
Thanks
Daniel

5. Re: How? Compare cell value to cell value in all other rows, count matches
Jonathan Drummey May 17, 2015 12:59 PM (in response to daniel.ayoub.0)Let us know how it worked out!

6. Re: How? Compare cell value to cell value in all other rows, count matches
daniel.ayoub.0 May 18, 2015 1:57 PM (in response to daniel.ayoub.0)Thanks Jonathan! I ended up using the R script Mary wrote to do the calculation  this worked when the tableau sheet was viewed at the level of detail of an individual row in the database (i.e. unique row id  R calculation). Your explanation was extremely thorough however, and I think both of these would be incredibly helpful to anyone looking to do these types of calculations.
New question: When I aggregate the data up 1 level, tableau no longer sends the rowlevel data to R. Instead it sends the aggregated data to tableau, which yields the wrong answer in this case (and maybe this is a topic for a different thread),
To summarize:
This works with an Rscript calculated variable when viewing at the rowlevel level of detail:
Row ID  Widget  Latitude  Calculation (Rscript)
1 A 32 3
2 A 32.1 3
3 A 32.12 3
4 B 35.1 2
5 B 35.3 2
When I aggregate at the Widget level, I would like to take an average of the calculation column calculated at the rowlevel, however in this case it just redoes the same calculation with preaggregated data. That is to say, Tableau is passing R the AVG(Latitude) data preaggregated at the widget level. I would like to pass it the rowlevel Latitude, do the calculation, and then aggregate the calculation values to produce the following:
Widget  avg(Latitude)  avg(Calculation)
A 32.07 3
B 32.1 2

7. Re: How? Compare cell value to cell value in all other rows, count matches
Mary Solbrig May 18, 2015 2:42 PM (in response to daniel.ayoub.0)To clarify: Given the following data:
Row ID Widget Latitude R Script 1 A 32.0 3 2 A 32.1 4 3 A 32.5 5 4 A 32.6 4 5 A 33.0 3 6 B 35.1 2 7 B 35.0 2 8 B 36 1 Would you like to first compute the near values and then average, or average and then compute the near values?
As in the following would be computing the number of near values and then averaging:
A = (3+4+5+4+3)/5 = 3.8,
B = (2+2+1)/3 = 1.667?
The following would be averaging, and then computing the number of values within .5 of the average:
avg(A) = 32.5, Count near Average(A) = 5
avg(B) = 35.366, Count near Average(B) = 2
Oddly enough the latter is easier to do (since it doesn't require comparing every value to every other value):
IF ABS(Latitude  {FIXED Widget: AVG(Latitude)}) <=.5 THEN 1 ELSE 0 END
The former requires aggregating the results of an R call, or returning values at a different level of aggregation than you sent it.
The most straight forward, and probably best solution for this situation:
Use a trick similar to the one in the article below: Keep Row ID in the view, either in the rows or details shelf, and then use a FIRST() filter to hide the duplicate marks:
Aggregating By a Dimension Not in the View  Tableau Software