I'm just thinking out loud, but using Tableau's calcs, you could:
1) Remove the commas, leaving a bunch of numbers (I assume its a string data type)-- REPLACE([FieldNameHere], ",", "") does this
2) Count the number of values in the string, and use that to analyze the count of minutes-- LENGTH([FieldNameHere]) returns this
I am referencing these calcs from memory, so if there's a mistake, I apologize
However, do the different numbers have different meaning? You said each value represents one minute, so that makes me think they are all equal to one minute. Or does an "8" signify 8 minutes?
How big is your dataset?
Thanks for the response. I may not have been clear enough in my original post. In my dataset each number represents the value within a given minute. So, let's say they are dollar values. With my dataset shown above on 1/1/13 I am saying that in minute 1 we earned $1, in minute 2 we earned $4, in minute 3 we earned $5, etc. So there are always 1,440 comma separated values in my activity stream field, one for each of the 1440 minutes in a day. Does that makes sense?
The dataset is pretty big, in the hundreds of thousands of records, but we could take a subset of the data if need be.
Yes, it makes sense. I do not know how to incorporate that logic into these comma separated values, though. I think re-shaping the data may be for the best, but I really don't know. I believe if each of those values had its own row, with the date and custom ID repeating, your analysis would be much simpler--but again, i'm not an expert.
Date Customer ID Activity Stream Value 1/1/2013 123 1 1/1/2013 123 4 1/1/2013 123 5 1/1/2013 123 6
And so on, etc. Then, a calculation could be written to assign values to each number in the activity stream.
But, Perhaps someone else will have some more input?
Best of luck.
You may want to try the Tableau Excel add-in which can be used for re-shaping data. Look-up Reshaping Data in Excel in the Knowledge Base or try the link below:
Tableau won't "just" create more rows, at the minimum you'd have to do something outside of Tableau to pad out the data (even if to just duplicate your data so you can then use domain padding inside Tableau).*** However, in that minimum case then you're totally into table calculation land and that can make all further analyses more difficult. Doing the reshaping before it hits Tableau will make life in Tableau much, much easier.
*** Technically not completely true, however the hoops you have to jump through are way harder than the SQL.
I am convinced the cross join technique in the attached workbook works for what you subscribe.
But it needs some tweaking first, which I don't have time for now (past bedtime here), and neither in the next few days.
Therefore I share it 'as is' so anyone can use it as inspiration in an attempt to solve your use case.
When is that Cross-Join TabWiki coming?! Just a thought my friend.
I have attached the solution as twbx, and twb + xlsx.
The twb + xlsx makes it easy and fast to test it for other values.
I knew exactly what to do while brushing my teeth before going to bed last night and remembered the solution as the first thing waking up this morning. I guess the saying is true that the last thing you think about before going to bed is the first you will think about in the morning. Therefore I took the time to do this, but please remember that I haven't stress-tested it, such as empty commas and alike.
If I get a correct answer for this, I only need 4 more to receive the honorable Help Desk badge
Ps. This is the used cross join technique:
SELECT s.Day, s.[Customer ID], s.[Activity Stream], p.Pos , clng(mid$(s.[Activity Stream], p.Pos, instr(p.Pos, s.[Activity Stream] + ',', ',') - p.Pos)) as [Value] FROM [activitystream$] s, [commaposition$] p WHERE p.Pos <= len(s.[Activity Stream]) AND mid$(',' + s.[Activity Stream], p.Pos, 1) = ','
I want very much to write this cross-join tabwiki! Not only for sharing, but also as a collection for myself.
The joy of sharing is of course the last push to actually do it
The sad fact is that I don't have enough quality-free-time available, but hope I can start working on it in December.
Thanks everyone for chiming in with ideas! Johan, I'm heading out for the weekend so I haven't had a chance to look at your solution in-depth, but at first glance it looks perfect. I'll check it out in more detail on Monday. Thank you thank you!
You are welcome. I hope it will work.
I didn't know which data type to choose and therefore chose CLng since your example only had integers.
I am not sure what the other options are (it is a pain to find a Jet SQL Reference), but probably this:
FUNCTION RETURN TYPE RANGE FOR EXPRESSION ARGUMENT CBool Boolean Any valid string or numeric expression. CByte Byte 0 to 255. CCur Currency -922,337,203,685,477.5808 to 922,337,203,685,477.5807. CDate Date Any valid date expression. CDbl Double -1.79769313486231E308 to
-4.94065645841247E-324 for negative values; 4.94065645841247E-324 to 1.79769313486232E308 for positive values.
CDec Decimal +/-79,228,162,514,264,337,593,543,950,335 for zero-scaled numbers, that is, numbers with no decimal places. For numbers with 28 decimal places, the range is
+/-7.9228162514264337593543950335. The smallest possible non-zero number is 0.0000000000000000000000000001.
CInt Integer -32,768 to 32,767; fractions are rounded. CLng Long -2,147,483,648 to 2,147,483,647; fractions are rounded. CSng Single -3.402823E38 to -1.401298E-45 for negative values; 1.401298E-45 to 3.402823E38 for positive values. CStr String Returns for CStr depend on the expression argument. CVar Variant Same range as Double for numerics. Same range as String for non-numerics.
I tested that one with 450 rows and 1440 minutes. It worked fine! Only problem was that there were 2880 "minutes" instead of 1440, but it might also be some problem with my data. Number of rows also increased to 600k as expected. I would say that it's getting quite heavy and what are the benefits for analysis?