7 Replies Latest reply on Feb 24, 2017 12:10 PM by Joseph Brannock

Summing over time with variable interval length data

Hi everyone,

I'm relatively new to Tableau so it's possible I'm missing something obvious, but I've tried searching through all the help materials and discussions and can't find anything that seems to fit my problem. It's a bit difficult to explain, but here goes:

I have a dataset that details energy transactions over time -- basically, when particular power plants went online, for how long, and at what power level. So each row is a single transaction, but it could cover a variable length of time - anywhere from an hour to a year. I'd like to put together a chart that shows the total energy produced over time (preferably grouped by category), but haven't been able to figure out a way in Tableau to convert the transaction data to a usable time series.

To illustrate what I'm trying to do, I've created a simplified sample data set and done the analysis in Excel (attached). The real dataset comprises millions of transactions, though, so Excel isn't up for the task.

Any ideas on where I could start?

Thanks,

JD

• 1. Re: Summing over time with variable interval length data

Hi Joseph,

Thanks for providing sample. Based on Data spreadsheet you have the following

 Power Plant Type Output level (MW) Start time End time Plant 1 Base 5 0:00 23:59 Plant 2 Base 6 0:00 23:59 Plant 3 Mid 4 4:00 21:59 Plant 4 Mid 2 6:00 19:59 Plant 5 Peaker 2 5:00 7:59 Plant 5 Peaker 3 16:00 19:59 Plant 6 Peaker 2 17:00 17:59

Now you calculated totals in another sheet, where does the totals, mid, peak,  base come from?

 Hour Base Mid Peaker Total 0:00 11 0 0 11 1:00 11 0 0 11 2:00 11 0 0 11 3:00 11 0 0 11 4:00 11 4 0 15 5:00 11 4 2 17
• 2. Re: Summing over time with variable interval length data

The mid, peak, base, and totals are the amount of power online at a given cross-section of time. For instance, at 1:00 Plant 1 is producing 5 MW and Plant 2 is producing 6 MW, so the total production is 11. The logic in each sum of the Excel calculation goes something like: (for illustration, we'll use the Base column)

Take Sum of Output Level column, including any row where:

(Type = "Base") AND (Start Time <= [time]) AND (End Time > [time])

In plain(er) english: add up each transaction of the given type that is active at the given time.

Hopefully this is clearer...

• 3. Re: Summing over time with variable interval length data

I would simply replicate this approach in Tableau, but the problem I'm having is that the summation table has a many-to-many relationship to the original data (i.e. each row in the transaction table might correspond to many hours in the time series table, and vice versa). If there is a way to do that type of join, I don't know it. Ideally I wouldn't even need to join the tables at all, and let them be totally independent except for the summation formulas, but Tableau doesn't seem to allow that. It won't show any data at all with 2 unjoined tables.

I suspect my main issue is that I'm used to thinking in spreadsheets rather than relational databases. I'm hoping there's a simple solution that I'm just not grasping yet...

• 4. Re: Summing over time with variable interval length data

hi Joseph,

Right, Tableau will not allow to combine two (or many) data sources without defining relationships between two sources. good news is you have two options, one is joining two tables like the way we do using SQL and second is blending. I don't want to go into details but however I would like to see a sample of your data to understand what you mean by many-many relationship. can you paste two tables one being transaction and another being summation, a few 5-10 records will be enough.

Thanks,

Peter

• 5. Re: Summing over time with variable interval length data

Thanks for the responses, Peter. Could you elaborate a little on the two approaches you mentioned?

The spreadsheet I uploaded before has exactly the sample tables you requested -- the first sheet is the transaction table and the second sheet is the summation table. To elaborate on what I mean by "many-to-many relationship," the first row in the transaction table is:

 Power Plant Type Output level (MW) Start time End time Plant 1 Base 5 0:00 23:59

This one transaction covers a 24-hour time span, so it corresponds to 23 different rows of the summation table (which is a time series with intervals of 1 hour). Now, looking at the first row of the summation table:

 Hour Base Mid Peaker Total 0:00 11 0 0 11

At hour 0:00, there are two "active" transactions in the transaction table - the ones involving Plant 1 and Plant 2. They are active because the time 0:00 falls between the "Start Time" and "End Time" for those transactions. Other rows correspond to other hours of the day, some of which involve even more transactions.

• 6. Re: Summing over time with variable interval length data

Here's another way of thinking about it.. I could show the output of each plant in a two-dimensional matrix, which would look like this:

 Hour: Plant: 0:00 1:00 2:00 3:00 4:00 5:00 6:00 7:00 8:00 9:00 10:00 11:00 12:00 13:00 14:00 15:00 16:00 17:00 18:00 19:00 20:00 21:00 22:00 23:00 Plant 1 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 Plant 2 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 Plant 3 0 0 0 0 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 0 0 Plant 4 0 0 0 0 0 0 2 2 2 2 2 2 2 2 2 2 2 2 2 2 0 0 0 0 Plant 5 0 0 0 0 0 2 2 2 0 0 0 0 0 0 0 0 3 3 3 3 0 0 0 0 Plant 6 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 2 0 0 0 0 0 0 Total output 11 11 11 11 15 17 19 19 17 17 17 17 17 17 17 17 20 22 20 20 15 15 11 11

By flattening the output of each plant out into a time series, it makes it easy to do the summation for each time period. But this isn't really a practical approach with thousands of plants and years of data -- the matrix becomes far too large. I need to find a way to do the summations within a single time series table (I think).

• 7. Re: Summing over time with variable interval length data

If anyone else happens to come across this issue, I've gotten better at searching this topic and found a few more resources discussing more or less the same desired outcome:

FAQ:  Open & Close Dates

https://community.tableau.com/ideas/4920

I have yet to dig through all these in detail, but this point it looks like the "simple fix" I was hoping for doesn't exist. All the solutions I've found so far involve either custom SQL code, data reshaping, or both. I will post an update here if/when I find a satisfactory solution.