5 Replies Latest reply on Nov 11, 2013 6:47 PM by George Gorczynski

    Filtering Data Source, custom SQL?

    George Gorczynski

      I would like to filter my data source to only import the latest values. Below is an original data table, and a desired table I would like to end up with - all Names but Levels only for the latest date. I tried to achieve it in pre-processing but I have a feeling this can be achieved in Tableau with custom SQL. Any help will be much appreciated. Thank you.

       

      ORIGINAL.

      NameLevelDate
      Name 1 C101-2013
      Name 1 C102-2013
      Name 1 C203-2013
      Name2C101-2013
      Name2C102-2013
      Name2C103-2013
      Name2C104-2013
      Name2C105-2013
      Name2C106-2013
      Name3C305-2013
      Name3C306-2013
      Name3C407-2013
      Name3C408-2013
      Name3C409-2013
      Name3C410-2013

       

      DESIRED

      NameLevelDate
      Name 1 C203-2013
      Name2C106-2013
      Name3C410-2013
        • 1. Re: Filtering Data Source, custom SQL?
          Russell Christopher

          Hey George -

           

          Custom SQL may work for you, but it really depends on the "rules" that you need to put in place to get the values you want based on the columns you're working with.

           

          With the sample data above, the following SQL works nicely:

           

          SELECT

            [Sheet1$].[Name] AS [Name], 

            MAX([Sheet1$].[Level]) AS [Level],

            MAX([Sheet1$].[Date]) AS [Date]

          FROM [Sheet1$]

          GROUP BY [Name]

          HAVING MAX([Date])

           

          If you're OK with bring in all the data, there are probably a couple ways you could get rid of the stuff you don't want to see with table calcs and other trickery.

          1 of 1 people found this helpful
          • 2. Re: Filtering Data Source, custom SQL?
            George Gorczynski

            Hi Russell - thanks very much for your response! I would really want to explore different ways to achieve my goal.

             

            To give you more information about my data:

            I have monthly reports for 2013 with time utilization, group name, level, etc. for all employees in the company. I combine monthly reports externally (Pentaho Kettle) into a single table. The problem is that some employees changed levels during the year, also some group names got changed. To make things consistent with group names and avoid duplicate entries on graphs (same employee as level 1 and 2 for instance), I want to create a separate "summary" table with employee name, latest level and group name. I can't use the latest month to date spreadsheet as this would exclude employees who quit or were let go during the year.

             

            There may be a way to deal with this without creating a summary table, with calculated fields (e.g. current level=level during the latest month the data for this employee is available) but I didn't come up with a way to do it that way.

             

            Below is a more comprehensive dummy data table.

             

            Many thanks for your help

             

            NameLevelDivisionGroupDate
            Name 1C1Min.A01-2013
            Name 1C1Min.A02-2013
            Name 1C2MiningA03-2013
            Name2C1Geotech.B01-2013
            Name2C1Geotech.B02-2013
            Name2C1Geotech.B03-2013
            Name2C1Geotech.B04-2013
            Name2C1GeotechnicalB05-2013
            Name2C1GeotechnicalB06-2013
            Name3C3GeotechnicalC05-2013
            Name3C3GeotechnicalC06-2013
            Name3C4GeotechnicalC07-2013
            Name3C4GeotechnicalC08-2013
            Name3C4GeotechnicalC09-2013
            Name3C4GeotechnicalC10-2013
            • 3. Re: Filtering Data Source, custom SQL?
              kettan

              This can be done with a subquery:

               

              SELECT *
              FROM [Employees$] e
              WHERE e.[Date] = (
                SELECT MAX(e2.[Date])
                FROM [Employees$] e2
                WHERE e2.[Name] = e.[Name]
                )
              
              
              1 of 1 people found this helpful
              • 4. Re: Re: Filtering Data Source, custom SQL?
                Russell Christopher

                Hey George -

                 

                I'm going into "teacher" (translated: "preacher") mode here, so please don't take my response the wrong way

                 

                That being said, how do you think the SQL would change? Are you familiar with writing SQL?

                 

                I ask because ultimately you're responsbile for the performance of the solution and if it returns the right answers. The response I gave you has warts - it relies on MAX(Level) to return the level field since I am aggregating with GROUP BY. Will MAX([Level]) always return the right value? No clue.


                Kettan's solution is interesting too, but using subqueries in Custom SQL is bad for performance (Custom SQL for Data Connection). 


                Tableau was built to allow you to do really advanced things without even needing to know much, if any SQL. Knowing it helps, but isn't necessary. For example, I'm attaching a sample which uses data blending to solve this problem.


                It:


                • Uses a very simple SQL Statement to pull out the "latest record" for each employee and uses that as a data source
                • Uses another data source which brings back ALL rows
                • Blends the two data sources together, so that we only get rows back where the Name and Date from each match - essentially limiting rows to the "latest for each employee"

                 

                No fancy SQL necessary! So, if I were you I'd see if you can crack this nut without reliance on guys writing SQL for you - Tableau is all about enabling you to do this sort of stuff without needing the "Database Priesthood" that our CEO Christian talks about.

                • 5. Re: Re: Filtering Data Source, custom SQL?
                  George Gorczynski

                  Russell - thank you for your help and advice, it is much appreciated.

                   

                  I used what you did in your example to make a cross-tab with the latest information for each employee (Level, Group Name, etc.) I exported this cross-tab to Excel and blended it in a new file with the original data. This lets me show my productivity numbers through time and filter it by current Group or Level. There may be a better way to achieve what I needed but this works just fine for me.

                   

                  kettan - thanks for your suggestion as well.

                   

                  George