1 2 3 4 Previous Next 46 Replies Latest reply on Apr 26, 2017 9:47 AM by Jamieson Christian

    Row Level Security - Cartesian Rows

    Vasu choudhury

      Hello, We have 2 tables, Table A(Aggregate Table with 300k rows) & Table B(User Security Table with 1000 rows). We need to implement 3 levels of row level security. i.e. Region, State & Location.

      When I join Table B with A, I am getting Cartesian product and the data is blowing up from 300k to 15m rows and there could be chances of growing the data in future.(Using 10.1 version)

      1. Is there any way where I can pass the user ID to table B first and then based on the results, I join with Table A to get the require row level security rows?
      2. Is it better to go with Live or Extract in such scenarios?
        • 1. Re: Row Level Security - Cartesian Rows
          Jeff Strauss

          For #1, you may be able to set a datasource filter to limit to the current logged in user.

           

           

           

          I'm not sure for #2 if live or extract is better.

          1 of 1 people found this helpful
          • 2. Re: Row Level Security - Cartesian Rows
            Jamieson Christian

            Vasu,

             

            If you create a calculated field to govern your join, and apply that as a Data Source filter on the data source (or even as a Dimension filter on a view), in the backend, Tableau will incorporate that criteria directly into the SQL WHERE clause, thereby avoiding a full Cartesian join.

             

            Let's say that "Aggregate Table" is table A, and "User Security" is table B. You have JOIN'd them together with no JOIN criteria. Now, let's create a calculated field like this:

             

            [Join Condition]

            [Region (A)] = [Region (B)] OR
            [State (A)] = [State (B)] OR
            [Location (A)] = [Location (B)]
            

             

            When you apply this as a Data Source filter, or as a Dimension filter on your view, Tableau generates backend SQL along the lines of this:

             

            SELECT A.*, B.*
            FROM
                AggregateTable A,
                UserSecurity B
            WHERE
                A.Region = B.Region
                OR A.State = B.State
                OR A.Location = B.Location;
            

             

            For most DBMS's, this is executed in manner identical to:

             

            SELECT A.*, B.*
            FROM
                AggregateTable A,
            INNER JOIN
                UserSecurity B
            ON
                A.Region = B.Region
                OR A.State = B.State
                OR A.Location = B.Location;
            

             

            In either case, the underlying SQL is not generating a full Cartesian join, and Tableau should not have to deal with 15M rows coming over the wire.

             

            You can test this by using the Performance Recorder in Tableau. Before opening the workbook, select to Help > Settings & Performance > Start Performance Recording. Then open the workbook, making sure that the view that uses the data is the first thing to come up. Then, select Help > Settings & Performance > Stop Performance Recording.


            At this point, Tableau will launch a second workbook that contains the details of the performance recording. You can select any of the "Query" entries to see the details of the query at the bottom. (You may need to click on the query and copy-and-paste it to an external text editor to see the whole thing.) You can review the query to feel confident that the underlying SQL is not generating a full Cartesian join.

             

            Caveat: Cross-datasource joins

             

            All of the above goes out the window if you're using Tableau 10's cross-datasource joins. Cross-datasource joins are handled by Tableau, rather than by the DBMS, so Tableau does not have the opportunity to build the JOIN criteria directly into the SQL query in order to minimize the data coming over the wire. Arguably, even when doing a "federated join" (bringing in both data source and joining them together), Tableau will employ the [Join Condition] calculation early enough to avoid a full Cartesian join anywhere in the pipeline. Still, Tableau has to pull in all 300K records from the Aggregate Table just to compare them to the second data source that contains the User Security table, and that in itself can impose a performance hit.

             

            Bottom line: If you have to use a cross-datasource join to bring in your security model, seriously consider getting your security model mirrored onto the same database with your core data so that you don't need a cross-datasource join.

             

            Caveat: Custom SQL

             

            If your data source is based on Custom SQL, Tableau cannot perform optimizations to the query construction as effectively. It essentially wraps the entire Custom SQL as a sub-query, and incorporates its own filters and calculations in the outer query. This could result in a SQL query that causes the DBMS to temporarily deal with a full Cartesian join before filters are applied. Even then, Tableau will not receive 15M rows over the wire, but the SQL execution on the DBMS could be much more costly.

             

            Normally, Tableau joins are not used when Custom SQL is involved, so I'm guessing that this wont' be an issue for you. I just wanted to point out that Custom SQL can limit Tableau's ability to optimize query construction.

             

            To Extract Or Not To Extract…

             

            Regarding your question of whether to use extracts, that will depend on how long it's taking Tableau to execute your query. We've seen that a calculated field governing a join in Tableau should produce reasonably optimal performance in the back-end, but there are many other factors that can affect how quickly your query runs. In addition, the nature of your Tableau workbook will dictate the level of performance you require — an interactive dashboard requires much faster response times than a non-interactive report. Once you have established how long your query is taking to execute, you can create a Tableau extract version and see if that makes much difference in the performance. Sometimes it does, but it depends on your data.

             

            In addition, look for additional ways to pre-aggregate your data. You called Table A an "Aggregate Table", but 300K rows sounds like an awful lot to me — if your workbook is having to aggregate even further, consider building that additional aggregation into your table (or a live or materialized view in your DBMS) to further reduce the number of rows that must be joined against your security model.

             

             

            I hope this information helps.

            • 3. Re: Row Level Security - Cartesian Rows
              Vasu choudhury

              Thanks Jeff Strauss.

               

              I would like to double check that, first it is going to hit the Table B(inner join with Table A) and fetch some subset data and then filter it from Table A and this will happen for each user dynamically if I go with Live connection.

               

              I think If i go with Extract, first we need to pull all 15m rows in my extract and then play with it. Therefore in such scenarios it is good to go with Live. Pls correct me if my understanding is wrong.

              • 4. Re: Row Level Security - Cartesian Rows
                Jonathan Drummey

                Hi Jamieson,

                 

                A question about your caveat on cross database joins. I haven't spent quality time with the SQL on cross database joins yet, I've been thinking at worst case it would be like a DB2 Tableau data blend where Tableau would first query for the join keys with record-level filters applied so it would only be all combinations of Region, State, and Location from the transactional source and a single row from the permissions table. Have you looked at the SQL to see whether something like this is true?

                 

                Jonathan

                • 5. Re: Row Level Security - Cartesian Rows
                  Jamieson Christian

                  Jonathan,

                   

                  I did an exercise with our order data — over 400,000 rows of 2017 YTD data accessed via the Oracle connector, cross-datasource joined to an Excel table consisting of 6 rows of security info (4 of them corresponding to my user ID).

                   

                  The Excel table:

                   

                  Below is the data source employing the cross-datasource join. In addition to Region, Territory and State, I brought in Line Number and Order Number, just to see what would happen. I joined on [Org Id] = [Org], which would yield a full Cartesian result because the Org is the same for all rows in both tables.

                   

                  And my [Join Condition]:

                   

                   

                  Finally, I put SUM(Booked Amount) on the Labels shelf to trigger a very simple aggregation query. Let's see what happened…

                   

                   

                  The Performance Recorder Results

                   

                  There were 3 queries that were executed in the process of generating the view.

                   

                   

                  Here are the queries:

                   

                  Query 1 (EXCEL) : fetch contents of security table into temp table, filtered on current user

                   

                  This was based on the data source filter I set up, [User] = USERNAME()

                   

                  (restrict
                    (select
                      (table [TableauTemp].[Sheet1$])
                      (= [User] "Jamieson.Christian")
                      )
                    ([Org Id] [Region] [State] [Territory] [User])
                    )
                  

                   

                   

                  Query 2 (Oracle) : fetch ~ 440,000 rows into temp table

                   

                  The only filters that were applied at this stage were the data source filters that related solely to fields in the order table (Org ID and Booked Date). On the plus side, the only columns that were fetched were the ones required for the cross-datasource join and the view itself. (Order Number and Line ID were not fetched.)

                   

                  SELECT "ORDER_LINES"."BOOKED_AMOUNT" AS "BOOKED_AMOUNT",
                    "ORDER_LINES"."SHIP_REGION" AS "SHIP_REGION",
                    "ORDER_LINES"."SHIP_TERRITORY" AS "SHIP_TERRITORY",
                    "ORDER_LINES"."SHIP_STATE" AS "SHIP_STATE",
                    "ORDER_LINES"."ORG_ID" AS "__ORG_ID_LEN_RUS_S2__0"
                  FROM "SCHEMA"."ORDER_LINES" "ORDER_LINES"
                  WHERE (("ORDER_LINES"."ORG_ID" = '12345') AND ("ORDER_LINES"."BOOKED_DATE" >= TO_DATE('2017-01-01 00:00:00', 'YYYY-MM-DD HH24:MI:SS')))
                  

                   

                   

                  Query 3 (VizQL) : federated join of the two temp tables, on the calculated join condition field

                   

                  Here we can see the calculated field with the robust join conditions being used to bring the two temp tables together. Line 33 equates to [Join Condition] = TRUE. Lines 36–41 perform the final aggregation SUM(Booked Amount).

                   

                  (restrict
                    (aggregate
                      (select
                        (project
                          (join
                            (table [TEMP].[#Tableau_31_4E686BBB-43CA-48FA-A2C4-F6D1A69F92B9_2_FQ_Temp_1])
                            (restrict
                              (table [TEMP].[#Tableau_31_4E686BBB-43CA-48FA-A2C4-F6D1A69F92B9_1_FQ_Temp_2])
                              ([Org Id] [Region] [State] [Territory] [User])
                              )
                            (
                              ([__BUS_UNIT_LEN_RUS_S2__0] [Org Id])
                              )
                            (
                              ([Region] [Region])
                              ([State] [State])
                              ([Territory] [Territory])
                              ([User] [User])
                              )
                            )
                          (
                            ([Calculation_1715308559693545472]
                              (or
                                (= [Region] [POT_REGION])
                                (or
                                  (= [Territory] [POT_TERRITORY])
                                  (= [State] [SHIP_ADDR_STATE])
                                  )
                                )
                              )
                            )
                          )
                        [Calculation_1715308559693545472])
                      ()
                      (
                        ([sum:BOOKED_AMOUNT:ok]
                          (sum [BOOKED_AMOUNT])
                          )
                        )
                      )
                    ([sum:BOOKED_AMOUNT:ok])
                    )
                  

                   

                   

                  Conclusion

                   

                  Tableau had to fetch all of the rows from the main table before applying the join condition. So, while a Cartesian join did not happen (we did not have to bring 440,000 × 6 = 2.6 million records in over the wire), Tableau was still forced to fetch an inordinately large number of records because the DBMS could not perform the row-level joins or the aggregation.

                   

                  Let me know if this helps depict what's going on under the hood! I'd be happy to post my performance workbook, if it helps — I just have to scrub it first to remove any potentially sensitive information related to our database environment.

                  1 of 1 people found this helpful
                  • 6. Re: Row Level Security - Cartesian Rows
                    Jeff Strauss

                    Jamieson Christian   wow, thanks for the details and thought put into this!!! 

                     

                    OK, so this thread is getting a bit interesting.  And now I'm going to chime in on challenges that my org has encountered in reference to row-level security.  Let me know if I should split this out to its own thread, I'm looking for your thoughts / ideas.

                     

                    Scenario:

                    1. A dashboard connects to datasource_A (fact performance metrics).  It has account_id as a field.

                     

                    2. Account-id is entitled to be viewed by certain user_ids (i.e. row level security).  This relationship is on datasource_B.

                     

                     

                    Our challenge:

                    - We are using a tde extract for datasource_A because it's so much more performant than a relational database (for each of the queries that the worksheets within the dashboard).  The refresh extract duration is in the range of 30 minutes to 2 hours so therefore these extracts are refreshed once a day and is done overnight after our DW is up-to-date.

                     

                    - datasource_A does not have userid, it only has account-id.

                     

                    - Currently we are doing a datablend during rendering between datasource_A and datasource_B and filtering via a calc in datasource_B down to the account-id's that the user_id has access.  It works, but doing a data blend on each worksheet is not the long term solution as it's non-performant.

                     

                     

                    Challenge Options:

                    - We are looking at adding userid into datasource_A via a cross-datasource join (new in Tableau 10), this will allow us to eliminate the data blend altogether as then we can have a filter directly on datasource A.  The problems with this approach is that it will create one row for each id which results kind of a cartesian product as it will inflate the sum(metrics).  We can flatten the list of userid's into 1 field and then attach onto each row of datasource_A, but if there are many userid's then I think there's a limit on field length, which I forget what this limit is exactly.  The other problem is that if the account_id + user_id relationship is updated intra-day, then it's not reflective til the next day via the refresh.

                     

                    - Ditch the concept of using an extract and start using a database like columnar datastore where the data can be joined during rendering.  It's similar to the data blending approach, but allows a pass of userid to the database and only 1 query is issued per worksheet.

                     

                    - Wait for Tableau to build in a sub-query to the datasource filter or a better method for handling row-level security than ismemberof(), built-in static user filters, etc.

                     

                    - Open to other ideas

                    • 7. Re: Row Level Security - Cartesian Rows
                      Jamieson Christian

                      Jeff,

                       

                      Tableau Server and TDE data sources continue to present a challenge because they don't support cross-datasource joins. I don't know why, and it's very limiting to us because we leverage Tableau Server data sources so heavily.

                       

                      Data blending makes for very complicated view construction (since they're not truly row-level). I had to do that for the security model in a Tableau 9 dashboard, and I hope I never have to repeat the experience.

                       

                      If your security model is as simple as, "User X has access to one or more possible values of Column Y", then you can leverage Tableau 10's Cross-datasource Filtering to achieve row-level security without actually doing a join. The solution looks something like this:

                       

                      1. Load up your main data source
                      2. Load up your security data source. Add the data source filter for [Username] = USERNAME().
                      3. Set up a Data Relationship between your main data source and your security data source, based on Column Y (the "join" column).
                      4. Create a temporary sheet based on the security data source. Drag Column Y onto the filters shelf. Configure it to apply to "All Using Related Data Source". Set it to filter on "All".
                      5. Now create a view based on your main data source. The security filter will be added as a secondary filter.
                      6. Even though the security filter is set to allow "All", it will implicitly filter the main data source based on the domain of Column Y in the security data source. That domain is impacted by the data source filter [Username] = USERNAME(), so only the values of Column Y that the user is allowed to see will be allowed in the main data source.

                       

                      Sneaky, no?

                       

                      This technique relies on a single column being used for the filtering. It can be extended to work with multiple columns by creating a calculated "hash" of the columns in both data sources, but that assumes that the columns represent an "AND" relationship. (E.g. "User X has access to rows where Column Y = _____ AND Column Z = _____") It does not work if the columns represent an "OR" relationship. (E.g. "User X has access to rows where Column Y = _____ OR Column Z = _____").

                       

                      Hopefully the above helps in your situation!

                      • 8. Re: Row Level Security - Cartesian Rows
                        Jeff Strauss

                        it's an interesting concept, and I tried it and it doesn't seem to quite work yet.  My id (jstrauss) should only have access to Central and East.

                        • 9. Re: Row Level Security - Cartesian Rows
                          Jamieson Christian

                          Jeff,

                           

                          In your example, the shared secondary filter needs to be [Region], not [Account Manager].

                           

                          EDIT: Just realized my instructions were goofed, so that's on me! I have revised my instructions in my previous post.

                           

                          See attached modification. Voila!

                          • 10. Re: Row Level Security - Cartesian Rows
                            Jamieson Christian

                            This technique relies on a single column being used for the filtering. It can be extended to work with multiple columns by creating a calculated "hash" of the columns in both data sources, but that assumes that the columns represent an "AND" relationship. (E.g. "User X has access to rows where Column Y = _____ AND Column Z = _____") It does not work if the columns represent an "OR" relationship. (E.g. "User X has access to rows where Column Y = _____ OR Column Z = _____").

                             

                            Actually, the more I thought about this, the more I realized the above is not quite true.

                             

                            If you simply create cross-datasource filters for each individual column that should be filtered, it will kind of behave as an OR relationship. I say "kind of" because it's still an AND (all columns must match something from the security table), but it doesn't care whether the match occurs in the same row of the security table. In other words, "User X has access to rows where Column Y is any of (domain of values) and Column Z is any of (domain of values)" where "domain of values" is the full list of values that show up on any row in the security model associated to the given username.

                             

                            So, with the following security model:

                             

                            I can create a separate cross-datasource filter for both [Channel] and [State], and I will be able to see Phone/NY, Phone/TX, Field/NY, and Field/TX.

                             

                            This works for some types of OR-relationship security models, but not others. Specifically, if the security model depends on being able to leave one field completely unspecified, this approach will likely fail. (In the above example, if I left Channel blank in the first row, I would not see "all channels" for New York. I would only see "Field" for New York.)

                             

                            It's all a bit mind-bending. Tread cautiously.

                            • 11. Re: Row Level Security - Cartesian Rows
                              Jeff Strauss

                              good stuff!  I tested your solution on Tableau Server and for some unknown reason it works better than my original solution (attached).    Your's only connects twice and then issues three queries (primary, secondary, blend), where mine connects three times and issues five queries.  I will do more testing later today with your solution as to what is the effect to # of queries when having many worksheets on the dashboard.

                              • 12. Re: Row Level Security - Cartesian Rows
                                Jamieson Christian

                                Jeff,

                                 

                                Awesome, let us know what you find out! The more data we have on what goes on under the hood, the more effectively we can design.

                                 

                                Also, it would be interesting to see if any Tableau developers have input on the way cross-datasource joins are executed. One thing I noticed when conducting my exercise is that Tableau could have aggregated the main table (at a level-of-detail that incorporates all dimensions on the view plus the federated join fields) before building the temp table for the federated join. For my simple example, that would have drastically cut down on the amount of data sent over the wire (from 440,000 rows down to 295). Real-world examples would involve more dimensions — each of which mitigates the gains from aggregating ahead of time — but the worst-case scenario would still be no worse than what's happening now.

                                 

                                Patrick A. Van Der Hyde — know anyone over there that would be intrigued by this conversation?

                                • 13. Re: Row Level Security - Cartesian Rows
                                  Jeff Strauss

                                  My dashboard test (with 9 worksheets) shows some interesting results.

                                  --------------------------------------------------------------------------------------------------------

                                   

                                  Base test with no blend or cross filter executes 9 queries (as expected because each worksheet has unique measures / filters and this is how Tableau works).

                                   

                                  Cross filter test executes 11 queries which is a small price to pay for row level security

                                   

                                  My original solution executes 21 queries, a bit more hefty and as I ramped up to 9 worksheets, it added multiple queries

                                   

                                  When I changed the original to the cross filter solution on one of our prod workbooks, it cut the # of queries about in half (from 33 to 15).

                                   

                                  ------------------------------------------------------------------------------------------------------------------

                                  Conclusion (it's good to have options!!!!)

                                  - Either we will do the cross-datasource join and create a unified extract that has both the performance metrics and client entitlements

                                  OR

                                  - We will deploy the cross-filter solution

                                   

                                   

                                  If you want to see the performance recordings, then send me an email: jstrauss@conversantmedia.com

                                  • 14. Re: Row Level Security - Cartesian Rows
                                    Jeff Strauss

                                    Jamieson Christian

                                     

                                    I have one more thought to share with you in reference to:  Tableau Server and TDE data sources continue to present a challenge because they don't support cross-datasource joins. I don't know why, and it's very limiting to us because we leverage Tableau Server data sources so heavily.

                                     

                                     

                                    Starting with 10.1, Tableau appears to support cross-datasource joins across embedded local TDE files.  I didn't see it in new features, however it was pointed out to me via a conversation with one of the Tableau product sme's.  Yes, it's still a limitation that you can't do it via published datasources, but at least they're making progresss.  Maybe it's in the works...

                                    1 of 1 people found this helpful
                                    1 2 3 4 Previous Next