2 Replies Latest reply on Sep 4, 2018 3:23 PM by Aaron Dobbins

    Custom split in Tableau?

    Bobby Rohrkemper

      In a text file, I have a column with the following format. What I write here is a the data for a single column and a single row:

       

      A1;A2;A3|B1;B2;B3|C1;C2;C3...

       

      Another entry might be:

      A1;A2;A3|B1;B2;B3

       

      Thus, there are a maximum number of | symbols but this is not fixed. The number of ; symbols between each | however is fixed. In other words, this is a matrix with a fixed number of columns but a flexible number of rows all packed into a single field.

       

      How can I use a calculated field or multiple custom splits to create a list with the contents {A2 B2 C2}?

       

      What would be the best format to store this list such that I can later count the entries? i.e. BOOLEAN: Does C2 exist in the list? INTEGER; Number of B2 entries in the list?

       

      Perhaps as a string with whichever separator I like?

       

      Or if there's no great solution in Tableau, should I use R and then update my text file before again importing it in Tableau?

        • 1. Re: Custom split in Tableau?
          Aaron Dobbins

          Have you looked at Tableau Prep?  It could be a good solution to manipulate the text file step-by-step and get it into the format you like.  How often will the text file be updated?  That may determine feasibility since you will need to get the updated text file and run it through Prep each time it is updated.

           

          OPTION 1

          Otherwise, you could create a calculated field for each row (|):

          R1

          SPLIT("A1;A2;A3|B1;B2;B3|C1;C2;C3","|",1)

          ...

          R[N]

          SPLIT([row n data],"|",1)

           

          Then create a calculated field for each column (;) in each row (|):

          R1C1

          SPLIT([R1],";",1)

           

          R1C2

          SPLIT([R1],";",2)

           

          R1C3

          SPLIT([R1],";",3)

           

          R2C1

          SPLIT([R2],";",1)

          ...

          RNCN

          SPLIT([RN],";",[N])

           

          That would split each row and column into its own field, which could be tedious if the row or column limit is massive.

           

          OPTION 2

          You could create one calculated field for each column, putting all rows together, again tedious depending on the row limit.

          Inner split on | will get your row of data, outer split on ; will get your column data for that row.

           

          C1 (put column 1 in all rows together, changing row highlighted in bold)

          SPLIT(SPLIT([data],"|",1),";",1) + SPLIT(SPLIT([data],"|",2),";",1) + SPLIT(SPLIT([data],"|",3),";",1) + ... + SPLIT(SPLIT([data],"|",[row limit n]),";",1)

           

          C2 (put column 2 in all rows together, same formula as above but changing to column 2 highlighted in bold below)

          SPLIT(SPLIT([data],"|",1),";",2) + SPLIT(SPLIT([data],"|",2),";",2) + SPLIT(SPLIT([data],"|",3),";",2) + ... + SPLIT(SPLIT([data],"|",[row limit n]),";",2)

           

          C3 (put column 3 in all rows together, same formula as above but changing to column 3 highlighted in bold below)

          SPLIT(SPLIT([data],"|",1),";",3) + SPLIT(SPLIT([data],"|",2),";",3) + SPLIT(SPLIT([data],"|",3),";",3) + ... + SPLIT(SPLIT([data],"|",[row limit n]),";",3)

           

           

          Then you can check each column for values using FIND.  Here you would check each column and if the value you are looking for is found, FIND will return > 0.

          COL1 VAL1 (returns T/F if the value you are checking is in the column)

          FIND([C1],"value 1 you are looking for") > 0

           

          COL1 VAL2 (returns T/F if the value you are checking is in the column)

          FIND([C1],"value 2 you are looking for") > 0

           

          If you are looking for "if this value or this value or this value is in column then TRUE", it can look like this:

          COL1 VAL FOUND

          FIND([C1],"value 1 you are looking for") > 0

          OR

          FIND([C1],"value 2 you are looking for") > 0

          OR

          FIND([C1],"value 3 you are looking for") > 0

           

          In other languages you would do a SPLIT and have an array, but in Tableau Calculated Fields you must split and tell it which segment you want to return.  I am assuming here that if you split and look for row 5 and it is not there, it will simply return nothing which is fine in this case.

          • 2. Re: Custom split in Tableau?
            Aaron Dobbins

            With Tableau Prep you can have two steps to split into rows, pivot the split columns into actual rows in the data source, then split the row data into multiple columns to make your life easier.  I created one row of text with A1;A2;A3|B1;B2;B3|C1;C2;C3 and built the attached Prep flow in a couple of minutes.

             

            1. Connect to data source

            2. Add step

            3. Click on arrow in upper right corner of field and choose Split Values > Custom Split

            4. Enter | for value and choose All from Split Off drop down

            5. Pivot the data on all new fields to convert the columns into rows

            6. Add step

            7. Click on arrow in upper right corner of field and choose Split Values > Custom Split

            8. Enter ; for value and choose All from Split Off drop down

            9. Add output step and output as Hyper, TDE or CSV

             

            This should give you the data split into the fixed columns with the correct number of rows. 

             

            NOTE

            On the pivot step, only pivot on the columns that were generated from the split.  Any other fields in the data set will replicate for each row created. 

            Then when you split again, that single field is split into separate columns. 

            Then you can perform any logic or aggregation in Tableau Desktop you want.