7 Replies Latest reply on Feb 24, 2017 12:10 PM by Joseph Brannock

    Summing over time with variable interval length data

    Joseph Brannock

      Hi everyone,

       

      I'm relatively new to Tableau so it's possible I'm missing something obvious, but I've tried searching through all the help materials and discussions and can't find anything that seems to fit my problem. It's a bit difficult to explain, but here goes:

       

      I have a dataset that details energy transactions over time -- basically, when particular power plants went online, for how long, and at what power level. So each row is a single transaction, but it could cover a variable length of time - anywhere from an hour to a year. I'd like to put together a chart that shows the total energy produced over time (preferably grouped by category), but haven't been able to figure out a way in Tableau to convert the transaction data to a usable time series.

       

      To illustrate what I'm trying to do, I've created a simplified sample data set and done the analysis in Excel (attached). The real dataset comprises millions of transactions, though, so Excel isn't up for the task.

       

      Any ideas on where I could start?

       

      Thanks,

      JD

        • 1. Re: Summing over time with variable interval length data
          Peter Galimutti

          Hi Joseph,

          Thanks for providing sample. Based on Data spreadsheet you have the following

           

              

          Power PlantTypeOutput level (MW)Start timeEnd time
          Plant 1Base50:0023:59
          Plant 2Base60:0023:59
          Plant 3Mid44:0021:59
          Plant 4Mid26:0019:59
          Plant 5Peaker25:007:59
          Plant 5Peaker316:0019:59
          Plant 6Peaker217:00

          17:59

          Now you calculated totals in another sheet, where does the totals, mid, peak,  base come from?

           

           

          HourBaseMidPeakerTotal
          0:00110011
          1:00110011
          2:00110011
          3:00110011
          4:00114015
          5:00114217
          • 2. Re: Summing over time with variable interval length data
            Joseph Brannock

            The mid, peak, base, and totals are the amount of power online at a given cross-section of time. For instance, at 1:00 Plant 1 is producing 5 MW and Plant 2 is producing 6 MW, so the total production is 11. The logic in each sum of the Excel calculation goes something like: (for illustration, we'll use the Base column)

             

            Take Sum of Output Level column, including any row where:

            (Type = "Base") AND (Start Time <= [time]) AND (End Time > [time])

             

            In plain(er) english: add up each transaction of the given type that is active at the given time.

             

            Hopefully this is clearer...

            • 3. Re: Summing over time with variable interval length data
              Joseph Brannock

              I would simply replicate this approach in Tableau, but the problem I'm having is that the summation table has a many-to-many relationship to the original data (i.e. each row in the transaction table might correspond to many hours in the time series table, and vice versa). If there is a way to do that type of join, I don't know it. Ideally I wouldn't even need to join the tables at all, and let them be totally independent except for the summation formulas, but Tableau doesn't seem to allow that. It won't show any data at all with 2 unjoined tables.

               

              I suspect my main issue is that I'm used to thinking in spreadsheets rather than relational databases. I'm hoping there's a simple solution that I'm just not grasping yet...

              • 4. Re: Summing over time with variable interval length data
                Peter Galimutti

                hi Joseph,

                Right, Tableau will not allow to combine two (or many) data sources without defining relationships between two sources. good news is you have two options, one is joining two tables like the way we do using SQL and second is blending. I don't want to go into details but however I would like to see a sample of your data to understand what you mean by many-many relationship. can you paste two tables one being transaction and another being summation, a few 5-10 records will be enough.

                 

                Thanks,

                Peter

                • 5. Re: Summing over time with variable interval length data
                  Joseph Brannock

                  Thanks for the responses, Peter. Could you elaborate a little on the two approaches you mentioned?

                   

                  The spreadsheet I uploaded before has exactly the sample tables you requested -- the first sheet is the transaction table and the second sheet is the summation table. To elaborate on what I mean by "many-to-many relationship," the first row in the transaction table is:

                   

                  Power PlantTypeOutput level (MW)Start timeEnd time
                  Plant 1Base50:0023:59

                   

                  This one transaction covers a 24-hour time span, so it corresponds to 23 different rows of the summation table (which is a time series with intervals of 1 hour). Now, looking at the first row of the summation table:

                   

                  HourBaseMidPeakerTotal
                  0:00110011

                   

                  At hour 0:00, there are two "active" transactions in the transaction table - the ones involving Plant 1 and Plant 2. They are active because the time 0:00 falls between the "Start Time" and "End Time" for those transactions. Other rows correspond to other hours of the day, some of which involve even more transactions.

                  • 6. Re: Summing over time with variable interval length data
                    Joseph Brannock

                    Here's another way of thinking about it.. I could show the output of each plant in a two-dimensional matrix, which would look like this:

                     

                     

                    Hour:
                    Plant:0:001:002:003:004:005:006:007:008:009:0010:0011:0012:0013:0014:0015:0016:0017:0018:0019:0020:0021:0022:0023:00
                    Plant 1555555555555555555555555
                    Plant 2666666666666666666666666
                    Plant 3000044444444444444444400
                    Plant 4000000222222222222220000
                    Plant 5000002220000000033330000
                    Plant 6000000000000000002000000
                    Total output111111111517191917171717171717172022202015151111

                     

                    By flattening the output of each plant out into a time series, it makes it easy to do the summation for each time period. But this isn't really a practical approach with thousands of plants and years of data -- the matrix becomes far too large. I need to find a way to do the summations within a single time series table (I think).

                    • 7. Re: Summing over time with variable interval length data
                      Joseph Brannock

                      If anyone else happens to come across this issue, I've gotten better at searching this topic and found a few more resources discussing more or less the same desired outcome:

                       

                      FAQ:  Open & Close Dates

                      Headcount, when given the Arrival & Departure Dates

                      https://community.tableau.com/ideas/4920

                      Show Records That Fall Within a Period of Time | Tableau Software

                      http://redheadedstepdata.io/lookup-vs-transactional/?content=Reference+Post

                       

                      I have yet to dig through all these in detail, but this point it looks like the "simple fix" I was hoping for doesn't exist. All the solutions I've found so far involve either custom SQL code, data reshaping, or both. I will post an update here if/when I find a satisfactory solution.