7 Replies Latest reply on Mar 26, 2020 5:40 AM by Sera Peter

    Extract certain contents of a string based on a specific substring

    Sera Peter

      HI,

       

      Hope all are safe and staying healthy.

       

      I'm stuck with a task where I need to extract certain information from a string based on a specific content in the string. To explain in detail:-

       

      I have names of employees in 1 column of the excel and then their employer ID and small description of designation in another column

       

      Twist here is that there are employees with same name so their information is merged in single line with their emp ID and description separated with a "|" symbol.

       

      ex:

       

      Name

      Sera

      Emp ID : EMP1=tr_uo-56|23EMP2=bn-bh-0j|56-56MP1=tfg-ghhu|78-91

       

      In the above scenario, company had 3 people with name "Sera" so info starting with "EMP" are employees still in company and info starting with "MP" are no longer in company.

       

      Requirement is to extract the info of people still serving in the company as follows

       

      Name      Emp ID description     Emp ID

      Sera        tr_uo-56                      23

      Sera        bn-bh-0j                      56-56

       

       

      Can someone please suggest a solution to achieve a result as shown above

       

      Attaching herewith a sample file to work on Tableau

       

      Thanking you in advance.

       

       

        • 1. Re: Extract certain contents of a string based on a specific substring
          Xinting Yu

          you may achieve this by several steps:

           

          1.Use the custom split with separator "=", Split off First 3 columns, you may get the columns like "tr_uo-56|23EMP2"

           

          2. Continue to custom split the "tr_uo-56|23EMP2" , as below. You may get two columns like "tr_uo-56" and "23EMP2"

           

          the "tr_uo-56"  is the required Emp ID description column

           

          3. Create a calculated column from "23EMP2" by replacing "EMP", you may get a new column like "23/2" or "56-56MP1"

          4. Continue to create a calculated column from "56-56MP1" by replacing "MP", you may get a new column like "56-56/1"

          5. Continue to custom split the "56-56/1" , as below. You may get two columns like "23" or "56-56"

           

          6. Final data resources

           

          I hope these steps may help you!

          Have a nice day

          1 of 1 people found this helpful
          • 2. Re: Extract certain contents of a string based on a specific substring
            Sera Peter

            Hi Xinting Yu,

             

            Hope you are doing good and I appreciate your help and support with this query.

             

            However I had already tried custom split. But the issue is that Tableau allows only a max of 10 splits. My string can have multiple EMP information for a single Employee itself.

             

            For instance,

            EMP1=tr_uo-56|23EMP2=fgfh_gg-98jg|56-36EMP3=bn-bh-0j|56-56MP1=tfg-ghhu|78-90EMP4=tr_uo-56|23EMP5=fgfh_gg-98jg|56-36EMP6=bn-bh-0j|56-56MP2=tfg-ghhu|78-90EMP8=tr_uo-56|23EMP9=fgfh_gg-98jg|56-36EMP10=bn-bh-0j|56-56MP3=tfg-ghhu|78-90EMP11=tr_uo-56|23EMP12=fgfh_gg-98jg|56-36EMP13=bn-bh-0j|56-56MP4=tfg-ghhu|78-90EMP14=tr_uo-56|23EMP15=fgfh_gg-98jg|56-36EMP16=bn-bh-0j|56-56MP5=tfg-ghhu|78-90

             

            The above example has around 20 sets of EMP info. But Tableau takes only first 10 info based on "=" while using custom split.

             

            Please let me know if you have a solution for this.

             

            Thanking you in advance

            1 of 1 people found this helpful
            • 3. Re: Extract certain contents of a string based on a specific substring
              Soumitra Godbole

              Hi Sera,

               

                  Thank you for such a challenging and thought provoking problem that was real fun to solve. I have

              given it a 5 star rating (* * * * *). Here is my alternate solution which involved 2 steps

               

              1) Data cleanup and Reshaping

              2) Analysis in Tableau

               

               

              Here is what I got after filtering out only the rows with EMP

              Dashboard.png

              Here are the 5 calculated fields that did the trick:

               

               

              Formula 1.png

               

              Formula 2.png

               

              Formula 3.png

               

              Formula 4.png

               

              Formula 5.png

               

              Finally this gets you the following visual

               

              Visual.png

               

              Hope this answer was helpful and not too difficult to follow. For your reference I have attached the twbx

              file along with the excel spreadhseet. Please let me know if you have any questions. Once again thanks

              for such a wonderful problem. Best wishes and take care !


              Sincerely,
              Soumitra

              1 of 1 people found this helpful
              • 4. Re: Extract certain contents of a string based on a specific substring
                Sera Peter

                Hi Soumitra,

                 

                Thank you so much for your response. Much appreciated.

                 

                I'm curious to know how you have performed the data clean and reshaping. Please guide me on this.

                 

                Also there is a small mistake in the output you have achieved. so every single entry of employee info has Emp designation detail and an ID. so in short your "Desc" column should be completely filled.

                 

                Let me know how you have performed data cleanup and please guide me on how to get complete details in "Desc" column

                 

                Thanking you in advance

                1 of 1 people found this helpful
                • 5. Re: Extract certain contents of a string based on a specific substring
                  Soumitra Godbole

                  Picture1.png

                   

                  Dataset 1.png

                   

                  1) Excel allows you to do a much better Data Split (no limit of 10splits)

                  Dataset 2.png
                  2) After the split, make sure you have proper headers with ordering
                      (Excel does not add headers so I did add them from EMP ID 1 to EMP ID 21)
                  Dataset 3.png
                  3) Now we are ready to analyze this in tableau and will need to do a Data Pivot (21 columns)
                  Data Pivot.png
                  4) Here is what we finally get with the addition of 2 fields - Pivot Field Name & Pivot Field Value
                      We do get several Nulls for the Pivot Field values for Names other than Sera and so I had to filter
                      them out. Here is what we have which is the cleaned data that I referred to in my earlier post.
                  After Pivot Fields.png
                    Finally the last part is to get  the field Row using the following calculation
                  Formula for Row.png
                  The "Row" is an important field and glad I included it as I did not wish to "cause a row" else it would be

                  Picture 2.png

                  For your reference, I have attached the excel file (after splitting your fields) along with the twbx file.
                  I sincerely hope that this was helpful and not too difficult to follow and above all entertaining. Do let
                  me know if you have any questions. Best Wishes and take care !

                  Sincerely,
                  Soumitra
                  1 of 1 people found this helpful
                  • 6. Re: Extract certain contents of a string based on a specific substring
                    Sera Peter

                    Hi Soumitra,

                     

                    Thank you so much for your response. This was really helpful.

                    Especially with elaborated explanation and with appropriate images to help with the understanding.

                     

                    Thank you.

                    1 of 1 people found this helpful
                    • 7. Re: Extract certain contents of a string based on a specific substring
                      Soumitra Godbole

                      Hi Sera,

                       

                           You're always welcome and really the credit for this problem goes to you alone.

                      This one problem all by itself gave me the joy of several problems as it did have many

                      parts that I started to work on without having a clue where I was headed ? Finally after

                      the data pivot I began to see the light through the long dark tunnel.

                       

                        Regarding the Desc field, here is what I understood from what you explained. Just to

                      illustrate, I have taken a screenshot and highlighted some of the values in a red box.

                       

                      Untitled.png

                      This was just to give you an idea of what was actually happening. Hope this is helpful in

                      getting a clearer picture. Best Wishes !

                       

                      Sincerely,
                      Soumitra