First, a disclaimer: I'm not an expert--and I don't understand exactly what you're trying to do here...
Have you tried writing a calculation? Something like
if [College] (from first source) = [College] (from the other source)
then (write calculation here)...
I'm not sure this will work. You may want to use a join instead of a data blend, too--but again, I don't really understand the problem very well.
I did not look at your workbook closely because I'm not sure I understand exactly what you're trying to do. What have you attempted so far?
I've tried this to get to the wages the instructor ears from the department of the course. So, if this were a course record from the college of agriculture, I would want this variable to equal the wages the instructor got from the college of agriculture. I see this as a first step to making this work.
if (attr([College])=ATTR([Course Data Sample].[College])) then
However, this doesn't work. When a professor only has one source of income and only teaches courses in that college, it works. But for anything else, it displays null.
If it helps clear up what I'm trying to get to, I'm essentially taking a first stab at answering the question 'assuming all wages paid to instructors from academic areas are paid for teaching, how much does it cost the school to teach any given class?'
Do you know why you are getting NULL values? Knowing that will help you resolve the issue--I am going to guess that the data is not in the ideal format for this problem. I'm not an expert on reshaping data, so am afraid I won't be much help here. Hopefully someone else will chime in.
I'm certain I could do all the work in SAS and end up with a single flatfile that gives me what I want, but I'm trying to get better at using Tableau to do these things. I'd like to minimize the amount of steps between extracting the data and updating my dashboards. There are also a lot of potential Tableau users in my office who are not SAS programmers, so the less I have to lean on doing my own data manipulation outside of Tableau, the easier it will be to train them.
I understand that--but you can't do anything in Tableau until the data is formatted in a way Tableau can handle.
I'm no expert, but most everyone here will tell you: you can do nearly ANYTHING in Tableau... as long as the data you connect to is formatted properly first (that's a big caveat, I know). I'm fairly certain you can't do what you are trying to do without reshaping the data.
As far as office users, it will be far easier for them to interact with Tableau if the data is set up in a way that works for the desired end result. I think you will find that Tableau is limited only by the structure of your data.
Unfortunately, I don't know how to solve your problem specifically. I have a database guy who presents me with data in the shape I need for situations like this. I'm pretty inexperienced, so hopefully someone else will chime in and correct me where I've misspoken.
Again, I wonder if a JOIN would be more appropriate for this situation? This might be easier to accomplish with all of the data in one source. I could be wrong, though--I apologize for not being able to help you more.
See the attached. I changed the relationship between the two sources so there is an additional relationship on College (so the wage data is broken down by college & instructor). Then I created a calculated field in the primary with the following formula:
SUM([Wage Data Sample].[Wages])*SUM([Credits])/TOTAL(SUM([Credits]))
This takes the sum of wages from the secondary (which is at the instructor&college level of detail), multiplies that by the # of credits for a particular course, then divides that by the total credits for all courses in the college. This uses the TOTAL table calculation, the Compute Using is on Course ID so it generates the total credits per instructor/college/academic period.
Thanks Jonathan! Works like a charm. I was really hoping this would be possible in Tableau. These data make more sense as separate tables than fused together in SAS, I'm glad I can keep them that way.
ATTR() is a calc that effectively does IF MIN(field) == MAX(field) then (field) ELSE * END, where * is a special Tableau flavor of Null. In your initial setup, College was not part of the relationship between the data sources, so it was aggregated between the primary and secondary - in other words, returning all colleges for a given instructor. Since there was more than one college for some instructors, ATTR() was returning * inside your calculation and failing. The key to the solution I posted was to add College to the relationship.
Matthew - I know you're trying to master Tableau, here's a some of my thought process to try to help you out. When I'm thinking about the magnitude of changes necessary to make something work in Tableau, transforming and reshaping data is the biggest change possible (and sometimes not possible at all). I have a sort of hierarchy in my head where I start with that at the bottom of the list of possible solutions, with nested table calculations using data densification above that, then nested table calcs and data blending, then calculated fields, etc.
So I'll do my best to work with the data as-is and go through what's not working, understand why it's not working, and try to come up with a solution. A key part of this is to get to know the data, what are dimensions in Tableau, what their domains (range of values are), and how the dimensions are related. This is because dimensions determine the level of detail in the view (the GROUP BY clause in SQL terms) and what aggregations will be computed over. My point in all this is that thinking about why the ATTR() calc wasn't working in this was the right place to look at, and jumping to suggesting reshaping the data as a solution without understanding what was going with ATTR() was premature.
Does that make sense?
You're welcome! Data blending is a magical thing, and sometimes the incantations to get it to work are a little, umm, esoteric.
This is exactly what I'm trying to do. I've already jumped to reshaping the data for other projects, and I felt after the fact that it may have been more work than was necessary, in addition to creating a massive flat file that would be confusing to anyone who wasn't me.
Maybe I can move this to a separate discussion, but on that note is there any info out there on the performance differences between reshaped data and using Tableau's data blending? The file I refer to from before was merging 20 data tables into one huge flat file, and I'm wondering if I bothered to do it 'properly' by making 20 different connected data sources in tableau, if that might give the load time a huge hit (and therefore not be worth the trouble).
Yes, it makes sense. The first thing I tried was establishing a relationship in the data blend on the College field, but I must've had the wrong calculation, because I still didn't get the proper results.
I tried to present a disclaimer several times that I did not know for sure what best solution was, but since nobody else had offered help, I was trying to offer my experience wherever I could.
Thanks for chiming in Jonathan!
I can't give you an exact answer, I can describe some of the factors that I'm aware of:
- You didn't describe your schema of the "20 data tables into one huge flat file". We use star schemas here, and for various reasons (we're on MS Access and have overloaded file servers) I'll typically build a query in Access that pulls from 5-25 tables, and then have Tableau built a data extract from that query.
- Data blending has great power to mash up data, and as you saw here it can get a little complicated when dealing with varying levels of detail. Since I'm comfortable with SQL, sometimes I'll start out with a data blend as I'm prototyping and then move the joining/merging into a query. So, for example, I pretty much never use data blending for dimension tables, I put those in the query. Also, I'm building data sources for other users, and it's easier for them to see all their fields in one table rather than having to remember to blend in other data. (And the same is true for me). I did a writeup of one place where I'm using data blending in production here: http://drawingwithnumbers.artisart.org/tableau-data-blending-sparse-data-multiple-levels-of-granularity-and-improvements-in-version-8/
- You didn't say what "huge" is. With a tuned data source, Tableau can handle billions of records. (My understanding is that part of why Tableau data extracts exist is to make it easy for users to get going with giant text/Excel/etc. files.)
- Tableau will issue separate queries to blended data sources, so depending on the given data source, complexity of the view, etc. using a blend can be faster than using a single data source.
- There's a performance document I read somewhere that having too many data sources can slow down performance, but it didn't say how many is too many or under exactly what circumstances.
- That said, Tableau has a lot of intelligence under the hood to only query for the fields that are needed for the view, so even though you might have a lot of data sources in the *workbook* if a given *worksheet* is only querying a couple of fields, Tableau will only pull those fields.
- There's a temptation that it seems like many new users have to build the "mother of all data sources" and include everything you might ever need, then build a single workbook that does everything. This can cause performance issues with Tableau if you're adding a whole lot of calculated fields and a ton of worksheets. So I have some experimental/exploratory analytic workbooks that are relatively fat and slow, and then for production dashboards there are more tuned data sources and workbooks.