8 Replies Latest reply on Mar 24, 2011 11:28 AM by . tfb676R

    How to do a scatter plot with size encoding

    . tfb676R


      I want to create a scatter plot that encodes the number of points that fall on the same coordinates in the width of the point. Is this possible? If yes, how can I do this?

      Thanks for your help,


        • 1. Re: How to do a scatter plot with size encoding
          Joe Mako

          > Is this possible?



          > how can I do this?

          It depends on your situation. If you can provide sample data or a packaged workbook that represents your situation, details that fit your situation can be provided.


          Generally, If you place the field "Number of Records" on the size shelf, and it does not size as you want (likely because you have a dimension on the Level of Detail shelf - in order to get all the dots that you want), then a table calculation is a route to consider. In order to properly craft the table calculation, and set the partitioning/addressing, there must be an awareness of the dimensions in use on the worksheet.

          • 2. Re: How to do a scatter plot with size encoding
            . tfb676R

            Thanks for your quick answer ...

            Please find attached the workbook + some sample data.

            There is one graphic with my current scatterplot in the workbook. There are supposed be multiple points at a single position and I want to encode the number of points in size. Although there are no dims in the LoD shelf dropping the "Number of Records" in there does not seem to work. 

            • 3. Re: How to do a scatter plot with size encoding
              Joe Mako

              Seems like what you are looking for is the following, from your worksheet as it is:


              1. from the main menu, select Analysis->Aggregate Measures

              2. this wll change the green pill on your Rows shelf from "Refinement stages" to "SUM(Refinement stages)"

              3. bring up the context menu for the pill "SUM(Refinement stages)" (either with a right-click on the pill, or a left-click on the little triangle), and select "Dimension"

              4. Change your Marks type from "Automatic" to "Shape"

              5. Place the Measure "Number of Records" on the Size shelf.


              You will now have your marks sized by the number of records at that point.


              I have performed these steps in the attached workbook.

              • 4. Re: How to do a scatter plot with size encoding
                . tfb676R

                Hi Joe,

                works great ... Thank you so much!!!

                Pretty tough to get find out oneself, but fortunately you guys are always in place!!!


                One last question, not related to the scatter plot, but in general: Is there a way to access values from a certain row single by a calculated field?


                Background are user tests, where there is usually a reference user (one row of reference values). What is often needed is to compare these values against the values of the remaining users.


                Pseudo code would be like this:


                IF (ReferenceUser.[Answer#questionP_4] == [Answer#questionP_4]) THEN 'correct'

                ELSE 'wrong' END


                Is this possible?

                Thanks again for your quick help,


                • 5. Re: How to do a scatter plot with size encoding
                  Joe Mako

                  Yes, that is possible, and again, there are a few ways to accomplish it. Here the main questions would be what are your constraints.


                  Does the sample data that you provided earlier contain a reference user? If not can you provide some that does, and explain more of the business logic, like how is a reference user selected, is it constant for the data source, or is it dependent on some factor? and any other relevant information would be helpful as well.


                  There are three main routes that I can see to accomplish the task, depending on your exact situation:

                  - Table calculation, something like:

                  instead of your "ReferenceUser.[Answer#questionP_4]" (this table calc would work if the partition was sorted so the reference use was the first row, would take an additional calculation to then hide the reference user row, if so desired)

                  - Data Blend or Join, have your reference use data in another table, and perform a Data Blend or a Join you replicate the values for your reference user to each row in your data source

                  - Use custom SQL, like a join, but other capabilities are available.

                  • 6. Re: How to do a scatter plot with size encoding
                    . tfb676R

                    Thank you so much! We are getting closer ... have no example data here now, but one additional info might solve the problem -> my reference user has a certain and unique name (stated in a specific dimension).


                    LOOKUP seems to be for what I was searching for to read a value from a specified row. Now, I need the function to get this row :) ...


                    Many thanks for your help!!!!

                    • 7. Re: How to do a scatter plot with size encoding
                      Richard Leeke

                      One other hint that might help.


                      LOOKUP() lets you get a value from a particular row, identified by it's position in the partition (i.e. you can get the nth row).  You can't get a row directly by value.  So if your reference value doesn't happen to be such that sorting by that dimension results in a predictable position (like first, as Joe suggests), you may need to do a bit more work.  There are a couple of approaches I can think of.


                      1) You can create a calculated field which you can sort to get the value in a known place in the partition.  For example you could return 0 for your reference row and 1 for all other rows.


                      2) You can create a calculated field which returns you the offset you require for the lookup function.


                      Option 1) is simpler - but I think there may be times when you need to sort in a different order.


                      Option 2) is a bit trickier, and likely to be slower, I think, but can be done quite generally.


                      I've put a quick example together with Coffee Chain data showing how to do option 2).  That lets you choose which Product you want to regard as the reference - and makes the name and SUM([Sales] for the reference product in every row of the partition.


                      Have a look at the definitions of the 5 calculated fields with names starting "Reference".  One of the key things to look at is the partitioning - which needs to be set correctly on all calculations referenced by any of the calculated fields.


                      I think the formula for [Reference Offset] is probably worth writing up for the http://community.tableau.com/community/calculation-library - though I might wait and see if Joe has a simpler way, first (or anyone else for that matter, but Joe is the prime suspect).


                      Note that I've split the calculation up into a couple of steps for clarity - but it's probably worth combining them.


                      I've done two versions of [Reference Index] (which finds the index in the partition of the reference row).  One is simpler, but will become too slow to be usable on a large partition due to a bug (or what I regard as a bug, anyway - it seems to be taking an incredibly long time to find it's way onto the priority list, though).  The other version is optimised to work around the bug - but that makes it much harder to understand.

                      • 8. Re: How to do a scatter plot with size encoding
                        . tfb676R

                        Thanks Richard!

                        Haven't tried your approaches yet, but here is some of my sample data.


                        It refers to the user test example described above and contains rows where [workerid] == 'root'.

                        These rows represent the values that are supposed to be used as reference values for the test (see pseudo-code above).


                        Thanks for your help, guys! Appreciate it!!!