3 Replies Latest reply on Dec 23, 2013 8:21 AM by Russell Christopher

    Primary & Secondary data source - but what about third or fourth level?

    Keith Bradbury

      I've been trying my best to move to Tableau from the environment I know (Excel/PowerPivot) and I'm having a problem with how to structure the data. If I explain what data I have, and how I'm structuring it, perhaps someone would be good enough to explain how it can be achieved in Tableau (because I'm sure it's possible).


      I am trying to connect to a MySQL database, which I can do just fine. In there I have approx 25 tables. I can perform a data extract on the DB and create data sources within tableau. Some tables I am joining before they arrive in Tableau, and some other I am not joining. I then have a list of data sources in Tableau. In addition, I have some excel files which contain further information and I have also succeeded in getting them into Tableau. From here is where the problem begins.


      I then start creating links between the various data sources. From what I've understood there is one primary data source and all secondary data sources need to be linked to that one primary data source. But I am in a situation where not all the data sources have a link to the primary data source. So my question is how do I link them? In PowerPivot you can have any "depth" in your data sources, so in the example below I have Product Types feeding from Verticals which is linked to Acc.


      The analogous situation in PowerPivot looks like this:


      schema example.jpg


      It's sort of a snowflake design schema. There are data sources which are further out on the snowflake and I can’t figure out how to use them in Tableau.


      If anyone can advise how I can either achieve this set up, or restructure my data to achieve this same setup in another way then I would appreciate some help! I'm pretty sure there is a way to do this in Tableau otherwise how are people linking diverse and often complicated data sources?



        • 1. Re: Primary & Secondary data source - but what about third or fourth level?
          Russell Christopher

          Hey Keith -


          A related field is necessary to "link" your (one or more) secondary data sources to your primary data source via blending. If you don't have those related fields, "do not pass go".


          Question: If this schema is already expressed as a snowflake in MySQL, it seems like you're making your life more difficult by splitting things up across multiple data data sources -- especially if you're going to extract this stuff anyway.


          Why not just just join everything up, extract it all in one data source and use that? I suspect there's a reason you're not doing this, and it would be helpful to understand why.

          • 2. Re: Primary & Secondary data source - but what about third or fourth level?
            Keith Bradbury

            Hi Russell,


            Thanks for your response. I understand that a link requires a related field and I have all the related fields laid out in the scheme shown above. PowerPivot allows you to link ANY two tables via a related field. Each of the arrows is a link between two related fields. That is what I am trying to achieve in Tableau.


            The set up within the database is not a snowflake scheme. It is far more convoluted! Within Tableau I am doing an extract from MySQL and some of them are joined so as to avoid having to use too many links. But whilst that works for a few tables, it absolutely does not work when the tables are large and the relationship between them is complex. If I did a join across all the tables I need I would have something that ran well into hundreds of millions of rows (perhaps even a billion!). That doesn't seen efficient to me and there is a reason why most databases don't work like that. Further, I couldn't actually do a join across the data I need in order to make one massive table because of the nature of the relationships between the tables in the database.


            Perhaps I didn't explain why I used the example above from PowerPivot. I did that because it's extremely easy to link many many many data sources in PowerPivot and also to try and show what I'd like to achieve in Tableau. The snowflake scheme I showed in my example is not in a database, it is one I created in PowerPivot using multiple data sources which are all linked in some way or another.


            All I want to be able to do is link A to B (via a related field) and then link B to C (via a related field) and then link C to D (via a related field). In this situation it may not be possible to link A to D because there is no related field.


            Sorry if it sounds like I'm repeating myself! I'm very much aware that I'm struggling to clearly articulate the problem!



            • 3. Re: Primary & Secondary data source - but what about third or fourth level?
              Russell Christopher

              Hey Keith -


              I’m with you. Tableau doesn’t allow the sort of arbitrary joining you’re able to accomplish in PowerPivot except via a Custom SQL Statement, which you don’t want to do based on what you’ve already said.


              That leaves you with blending, which requires relationships to drive said links…and you don’t always have those, either.  Perhaps you can create multiple data sources that ALWAYS express some sort of related field so you can “link” them back together again via blends? In other words, purposefully add the same table to multiple data sources as a way to artificially get your “relationship” in?


              Otherwise, the only other think I can think of would be to just point Tableau at your PowerPivot model and be done with it