Can you post some sample data and a mock-up of the output?
Absolutely. The attachment, "MB Sample Data", contains several weeks worth of sample data for a given openstack zone.
The Cloud Team is currently using a python script to generate the forecasts for TTL for a given zone.
They are using this program to do four things:
# Smooth out the data set ewma = pandas.stats.moments.ewma fwd = ewma( Ys, span=42 ) # take EWMA in fwd direction bwd = ewma( Ys[::-1], span=42 ) # take EWMA in bwd direction c = numpy.vstack(( fwd, bwd[::-1] )) # lump fwd and bwd together c = numpy.mean( c, axis=0 ) # average
k = 1 # linear # k = 2 # parabolic # k = 3 # cubic extrapolator = UnivariateSpline( Xs[:,0], smooth, k=k )
# Predict newX =  last_timestamp = Xs[len(Xs) - 1] limit = 180 # Always predict 6 months min(len(Xs), num_days) while len(newX) < limit: series = last_timestamp + 86400 newX.append([last_timestamp, 0]) last_timestamp = series newX = numpy.asarray(newX, dtype=float) _newX =  for x in newX[:,0:1]: _newX.append(x) newY = extrapolator( _newX
4) Calculating TTL
sz_max = Max[-1] units = Ys[-1:] days = 0 for y in newY: if days >= 180 or units >= sz_max: break units = y days = days + 1
This outputs several charts that show the forecasted TTL up to 180 days for a given zone. I'd like to be able to do this in Tableau.
MB Sample Data.xlsx 218.9 KB
To get exact results my first recommendation is to use the given Python code to pre-process your data, given that there are 4 steps and the data is forecasted out beyond the range of the data this would be extremely complicated to implement in Tableau. The challenge is that fundamentally there's data set A that goes through a set of transformations to add data set B (the forecast), and I'm guessing that the final displayed view(s) are a combination of A and B. I went through an example of this sort of problem awhile back in Not a funnel cake, nor funnel charts, it’s funnel plots! | Drawing with Numbers.
Here's a step-by-step way of looking at it:
0. We start out with N data points.
1. The data is smoothed with a look forward/back in step 1 above. This can be done with a WINDOW_AVG() function in Tableau.
2. In step 2 above the data has a linear spline created. I'm not sure about the results of this Python function, I'm guessing that the extrapolating values step is adding records to the data between the smoothed values. In Tableau the only way to do that is with data densification and we can't do that on table calculation results (like the results of the WINDOW_AVG() calculation). Technically we could densify the data prior to step 1 but that means the WINDOW_AVG() calculation has to deal with the densified data and that adds complexity to that step and this step. Doing a linear spline in Tableau would require additional table calculations built on top of the WINDOW_AVG().
3. The interpolation/prediction calculation and TTL calculations are certainly adding rows to the data beyond what exists in the data -- creating the data source B. This is not something Tableau does at this time. There are ways we can work around it such as by "re-using" the original records in the data (which is what I did in the funnel plot workbook above) or a second technique where we add one or more dummy records to the data or do a self union that would enable us to (prior to step 1) to further densify the data in Tableau such that we have enough data to do build the forecast. This data prep is sometimes not possible and as I'd mentioned *all* the calculations have to deal with the densified data to accurately return results, plus since this is all done in Tableau at the time of refresh depending on your data volumes it can be memory and/or processor intensive. Also, I'm not 100% sure of the calculation logic used in the array (I'm not that good at reading Python), there are cases of nested iterative computations that are not feasible in Tableau (though they can sometimes be pushed into R using Tableau's R integration).
For any of us to help further we'd most likely need a next level of clarification of exactly what is happening at each step of the Python sample code you gave in order to translate that into Tableau-speak. In addition having intermediate and final outputs of the Python code on a sample data set would be helpful in validating the numbers at each step of the way.