14 Replies Latest reply on Aug 8, 2014 3:50 PM by Matt Lutton

    Inconsistent use of column name aliases when replacing a data source

    Richard Leeke

      I have a workbook which I want to be able to hook up to a few different variants of the underlying database structure. Essentially I want to be able to run either against the live database schema, or against a restructured (denormalised) variant, to provide better performance. But I don't want to have to maintain multiple copies of my workbooks.

       

      So I'm trying to define a repeatable process for swapping out the "live" data source definitions and swapping in the "de-normalised" definitions, so that I can maintain a single copy of each workbook and then point copies of the workbook at different underlying data structures.

       

      At first glance this seems to be a simple matter of ensuring that the alternative variants of each data source present identical field names, by adjusting the field aliases in the data source definition to match the names used in the original data source. But it appears that Tableau only uses the field alias (referred to as the "column caption" in the workbook XML) for resolving some of the references to fields during replacement. Specifically, fields referenced in filter conditions and possibly other references within calculated fields seem to rely on the internal name automatically derived by Tableau when first constructing the data source (referred to as the "map key" under the covers). This causes lots of mis-matches, with the unrecognised fields removed from all sheets.

       

      The reason that the default names are different in the de-normalised variants of the data sources is that many of the tables I am joining have columns with the same names in multiple tables. On building the data source, whenever Tableau comes across a field name which already exists in the data source it automatically generates an internal name constructed by prefixing the field name with the table name. When building the de-normalised variants, the order of adding the tables is different, so different fields get the qualified or unqualified variants of the names.

       

      Fixing up the differences using the field name aliases in the data source seems to work fine for the names of fields that are actually placed on shelves, but not for underlying references within calculated fields or filter conditions. That results in lots of fields being removed from lots of sheets throughout the workbook when replacing the data sources.

       

      I could achieve the functional outcome I'm after by replacing the multiple table data source definitions with either custom SQL connections or references to database views which hide the details. But the performance impact of that would be too great since it would prevent Tableau from doing join culling etc.

       

      I don't expect that there is an easy answer to this - but I'd be interested to hear if anyone else has hit the same issue and even more interested if anyone has come up with a simple, repeatable and reliable process for doing what I'm trying to do.

        • 1. Re: Inconsistent use of column name aliases when replacing a data source
          Joe Mako

          Richard, This is not enough detail for me to replicate your situation or issue. I understand that the experience can be frustrating when the workbook is not configured to enable Replace Data Source to work, but in my experience, with some additional awareness it can be enabled to work without issue.

           

          In general, there are a number of techniques to enable a successful Replace Data Source operation. Let me know if you would like to have a screen sharing session to walk through each, since it is difficult to convey this in just words.

           

          In short, two main things:

          1. Ensure the data types of fields that exist nativity in the data source (non-calculated fields) are of the same Data Type. Will have issues if it is a number in one but a date in another.

          2. Have a dummy worksheet with a pill for every calculated field on a shelf, eg, place all on the Detail shelf. This ensures Tableau carries the calculated fields over (I think Tableau is trying to clean house, but cleans a little too much, and falsely thinks a calculated field is not in use).

           

          If you have both of these met and you still have issues. It would be helpful to have a workbook and data sources that exhibit the issue you are having, as there are always edge cases, but I am sure there is a way to enable a better experience.

           

          I would be very interested in seeing a situation where Replace Data Source does not work.

          • 2. Re: Inconsistent use of column name aliases when replacing a data source
            Richard Leeke

            Thanks for the lightening fast response, Joe.

             

            All my data types match exactly. The only difference is that we have denormalised some of the dimension tables (essentially flattening a snowflake out into a star). Field names are also identical except in the case where the de-normalisation has resulted in two fields with the same original name ending up in the same table. In that case we have prefixed the denormalised one with it's original table name - and it is in aligning the names for fields like that where we are coming unstuck.

             

            I've used that dummy worksheet trick in the past, but I don't think that is going to help here. The problem is not that we are losing calculated fields from the data source, it is that we are losing pills off sheets if they reference native data source fields for which the underlying Tableau "map key" name is different. For example, this happens if there is a filter with a condition which references a field whose underlying Tableau map key name has changed - even though the field alias matches what was in the original data source.

             

            I have spent the morning trying to put a simple reproduction together. I was intending to have that done before either posting on the forum or asking support for help - but even that is proving quite a challenge, so I thought I'd at least get a thread started. I'll persevere with the simple example, though.

             

            And yes - I'd love to take you up on the offer of a screen-share to walk you through it. Would be good just to catch up, too, we haven't spoken for ages. Let's wait until I have the simple repro together.

            • 3. Re: Re: Inconsistent use of column name aliases when replacing a data source
              Richard Leeke

              I've attached a sample workbook which demonstrates the problem. Wow that was a mission to get a reproduction case together - so it has probably ended up more elaborate than I needed.

               

              The workbook simulates the case of a 3 table, multi-table connection being replaced by a de-normalised version where two of the tables have been combined. The data source is a spreadsheet with tabs A, B and C in the original connection, being replaced by A_Denorm and C (where A_Denorm folds the fields from B into A).

               

              The workbook has two connections, the original 3 table one and the replacement 2 table one - and a worksheet connected to the three table one. I have adjusted the automatically assigned field names in the 2 table connection so that they match exactly the equivalent fields in the original (using the field alias in the data connection editor). That basically meant aligning the prefixes on the two fields with a name of [F1], as shown here:

              Field Aliases.PNG.png

               

              Right click on the original data source and select "Replace Data Source":

               

              Replace Data Source.PNG.png

               

              Click OK and you get this error:

               

              Filter Invalid.PNG.png

               

              The filter definition that it is complaining about looks like this:

              Filter Definition.PNG.png

               

              The problem is caused by the formula. The reference to [A$_F1] in the formula isn't being translated via the alias and so Tableau can't resolve the reference.

               

              I've yet to experiment with other types of field references (calculated fields, actions, etc, etc), but even this one case will cause filters to be lost across 40 or 50 worksheets in 8 data sources each time I try to deploy a new version to the alternate database schemas - so I really do need to find a viable approach here. I strongly suspect it won't be peculiar to filters.

               

              I'll drop you a line suggesting some times for a hangout, Joe.

              • 4. Re: Inconsistent use of column name aliases when replacing a data source
                Matt Lutton

                I'm definitely interested in hearing your findings.

                 

                Probably completely unrelated, but on a similar line of thought--when replacing an extract with a Published Data Source created from that same extract, all quick filters will be lost and need rebuilding--except those on the sheet you are actively on when replacing the data source. That is the behavior I've experienced in 8.1, anyway.   As a work around, if you click on each individual sheet in your workbook BEFORE creating the Published Data Source from the extract, all filters will be retained when you replace the Extract with the Published Data Source on Server.

                 

                I've avoided Published Data Sources because of this, and other oddities I've encountered and not understood completely, but I really wish I could use them to avoid publishing multiple workbooks with the same data source.

                 

                I also wish they'd bring back the "Data Source Name Conflict" message that existed in version 8.0 when working with Bookmarks (or copied/pasted sheets in 8.1), and my idea for that is here:

                http://community.tableau.com/ideas/2903

                1 of 1 people found this helpful
                • 5. Re: Re: Inconsistent use of column name aliases when replacing a data source
                  Richard Leeke

                  I've done a bit more testing and so far it only seems to be condition expressions in filters that break. The good news is that this gives me a workaround for my issue. All I need to do for the case I'm hitting is create a calculated field with the expression from the filter condition and filter on the calculated field instead.

                   

                  I've added a calculated field which replicates the filter logic and added that to the filter shelf as well and also added a highlight action to the test workbook (new version attached). The filter with a conditional expression is removed from the Filter shelf completely. The filter on the new calculated field and the highlight action both just work.

                   

                  Edit: I also agree with your idea about the duplicate datasource dialog, Matthew - I just up-voted it.

                  • 6. Re: Re: Re: Inconsistent use of column name aliases when replacing a data source
                    Richard Leeke

                    Logged a bug on this.

                     

                    Also managed to create a much simpler example once I had figured out what the issue is. In the attached workbook there are two data sources, A and B, each with just two fields, [F1] and [F2]. There is a worksheet using data source A, showing both fields and also having a filter with an expression which references [F2].

                     

                    Field [F2] in B has a different name in the underlying table, so I have used a field alias to align the names.

                     

                    Replacing data source A with B reproduces the issue.

                    • 7. Re: Inconsistent use of column name aliases when replacing a data source
                      Richard Leeke

                      Having gone through and fixed up the filter expressions which were causing problems I then ran into a slight variant of the issue with losing quick filters which you described, Matthew. I'm very glad you mentioned it - I would probably have thrown in the towel in disgust and gone back to using Excel if you hadn't - I've been doing battle with this for over a week now.

                       

                      In my case I'm not using published extracts, just replacing the data source with another one with the same columns which I created by duplicating the original and then swapping in different tables and adjusting field names.

                       

                      What I found is that all quick filters are lost on all sheets using the replaced data source which I haven't touched in the current editing session. Just going and opening all sheets before doing the replacements allows the quick filters to survive. If I don't open the sheet before doing the replacement, on opening the sheet all of the quick filters show, but with a message saying "unavailable" which flashes up for a second and then they disappear.

                       

                      Unfortunately I have 7 data sources driving about 50 sheets and I'm going to have to do this replacement process frequently in order to publish the variants of the workbooks. I've toyed with trying to do the fix-up directly in the XML of the workbook - but without access to the XML schema that just looks far too error-prone.

                      • 8. Re: Inconsistent use of column name aliases when replacing a data source
                        Richard Leeke

                        And yet another issue.

                         

                        After fixing up the field name aliases while creating the new data source, several of the calculated fields show as in error (red exclamation mark). But opening them up to edit them they show as valid. Click OK and they still have the red exclamation mark.

                         

                        I eventually found that I need to "touch" the expression in some way (add a space on the end and then delete it, for example). That seems to force Tableau to regard the expression as changed and therefore update it when I click OK.

                        • 9. Re: Inconsistent use of column name aliases when replacing a data source
                          Matt Lutton

                          Glad I chimed in here as well. These types of issues certainly make it a challenge to maintain multiple workbooks at an enterprise level. The issues you described with quick filters sound exactly like what I have run into, so I'm glad to know it's not specific to published data sources. I, too, have run into the red exclamation error on calculated fields, which I had forgotten to mention previously.

                           

                          Thanks for your extensive testing and documentation, and for all your many contributions over the years! Your work (and Joe's, Jonathan's, Alex's, etc.) has made learning Tableau much easier than it would have been without it.

                           

                          Here's to hoping for improvements; cheers.

                          • 10. Re: Inconsistent use of column name aliases when replacing a data source
                            Richard Leeke

                            A few more hours beating my head against the wall on the next data source and I've got to the bottom of what is happening with another related issue (though yet to work out the best solution).

                             

                            It turns out that if the original data source included any hidden fields at the time it was copied, you can end up with a left-over reference to that within the new copy of the data source which is completely invisible through the U/I - but which prevents the use of that name as an alias - saying that the name is in use. I had to dig through the XML to work that out.

                             

                            I'm hoping that in this case I don't need to align the names and I can just hide the equivalent (but differently named) field in the new data source.

                             

                            The hardest thing about getting this right is that lots of the impacts happen silently. So replacing a data source that is used by 20 sheets it is really difficult to know if it actually worked. There is no warning for things like actions or quick filters which get dropped.

                             

                            I'm about to have a webex with Joe going through what I'm trying to achieve. I'm hoping that Joe can bring his legendary clarity of thought to this whole problem and come up with a much better way of achieving what I'm trying to do.

                            • 11. Re: Inconsistent use of column name aliases when replacing a data source
                              Richard Leeke

                              Closing this out. I think I've got the process just about sorted - but it's very manual, clunky and error prone.

                               

                              It turns out that most of the problems stem from under the covers differences between the two data sources. So even though I was creating my new data source by cloning the old one and replacing tables, the name differences were enough to cause various of the linkages to be broken. For example, things like aliases for field values and colour and shape assignments for dimensions can easily be dropped. I also spotted with a file comparison tool that I had lost one default format setting.

                               

                              It turns out that with some careful editing of the XML of the data source tds file it is possible to reinstate the colour palettes, aliases etc, so that they will come across properly when the data source is replaced. The trick is in getting the new field name assignments right - those associations don't honour the field name alias (or probably get broken before the field aliases have been fixed up).

                               

                              If anyone is interested in the detail of the XML fixup, sing out and I'll try to find the time and energy to describe what I'm doing.

                              • 12. Re: Re: Inconsistent use of column name aliases when replacing a data source
                                Michelle Eich

                                Instead of replacing the data source, try 'Edit Connection'.  I found that when I replaced data sources the color legends and such disappeared, but when I did 'Edit Connection' they were still there. Might be less issues?

                                 

                                I vaguely remember support telling me that one way to avoid such things disappearing was to first make sure you've clicked through all of the views, then replace the data source. I opted for the 'Edit Connection" route as I didn't want to have to click through 33 views...  I found the email from support --"The workaround is to view all sheets in the
                                workbook before swapping the data source. "

                                 

                                Hope that helps,

                                Michelle

                                • 13. Re: Inconsistent use of column name aliases when replacing a data source
                                  Matt Lutton

                                  Just in case anyone else is following this thread:  I just had a major success with switching out a single data source in a multi-table connection in a TDE file.  Here is my scenario:

                                   

                                  -We had a view that was performing very slowly.  Populated the same data into a table that was populated by a SPROC.  My DBA did this - and it is blazingly fast. (Thanks Tom W for helping with that idea; my DBA was able to do this, and the datasource went from taking 5-6 hours to refresh, to around 2 minutes.  So yeah, it was a great suggestion!)

                                   

                                  -The new table has the same column names/structure and data as the original view

                                   

                                  -In version 8.1, in the "Edit Connection" dialog, I chose the existing view (which was the first data source in the multi-table connection list) and in the following dialog where the name of the view was entered at the top, I simply entered the new table's name.  Screenshots of that process below:

                                   

                                  Edit Table Window Showing Original View:

                                  EditTable_Window.png

                                  Typing in the New Table Name:

                                  TypeInNewTableName.png

                                  Confirming New Table is Being Used in Extract Schema (you may want to confirm in other ways as well, but it does appear to have worked flawlessly in my example):

                                  ResultingDataSet.png

                                   

                                  -Viola, the new table took the place of the existing view, and my join settings were kept, as well as aliases, colors, etc.

                                   

                                  Note:  Both data sources came from the same SSMS database schema, and were identical in terms of the resulting data set.  I also don't know if there would be any way to accomplish this when you want to replace multiple tables with a single table/view.  In the past, I have always rebuilt the entire extract, matched up the column names, made sure to select each sheet in the workbook in order to retain my filter settings, then had to replace the existing extract with a newly created one.  We're talking about a LOT of time spent on re-creating extracts.  I know others have had some success with editing the XML in the workbook as well, but I was happy to have avoided that this time.

                                   

                                  I aim to test this in 8.2 soon, and may also record a video demonstrating what I am doing.

                                   

                                  Patrick A Van Der Hyde: I have a lot going on right now, but if this is something that others may find useful, let's try and find a way to share it with more folks

                                  • 14. Re: Inconsistent use of column name aliases when replacing a data source
                                    Matt Lutton

                                    In Version 8.2, it also appears to work.  In 8.2, you would enter the new data source's name here in the Visual Data  Window area.

                                     

                                    8.2_ReplaceFirstTableinJoinSchema.png

                                    In the screenshot above, I replaced tbStudentGradeBookDetailByWeekByStudent with the vw_StudentGradeBookDetailByWeekByStudent version by simply typing in the view name in place of the table.  Same impact, nothing is lost in my workbook.