5 Replies Latest reply on May 7, 2018 7:56 AM by Joshua Milligan

    Connecting to a Folder (and get all files in it)

    Miguel Escobar

      Hey!

       

      Here's my scenario:

      I receive reports on a weekly basis and I place each of these reports (txt files) inside of a folder. I'd like to have Tableau Prep connect to the folder and apply a set of transformations to each of the files inside of that folder prior to doing the union.

       

      Is this currently possible with Tableau Prep?

       

      I've been trying to make it happen, but the only solution that I'm able to build requires me to go back to the Flow that I originally created and adding every new weekly report as a new connection to the flow and also doing all the transformations to each of those files.

       

      Thanks!

        • 1. Re: Connecting to a Folder (and get all files in it)
          Joshua Milligan

          Miguel,

           

          In order to get all files in the directory automatically without having to add each one, Prep is going to want to do the union first.  What transformation would be better prior to the union?

           

          Best Regards,

          Joshua

          1 of 1 people found this helpful
          • 2. Re: Connecting to a Folder (and get all files in it)
            Miguel Escobar

            Thanks for the prompt response, Joshua Milligan

             

            It would be amazing if you could define what to do before joining each file as the approach of applying a set of transformations can differ greatly from a simple file to all the files combined into a single table and, from my personal experience using self-service ETL tools, it's always a better idea to transform all files prior to doing the union.

             

            This thread started as a question from one of the comments in one of the videos that I published recently comparing TP to other tools.

             

            Imagine that I have multiple txt or csv files that have the following format:

            Picture1.jpg

             

            Before appending every single file, I'd like to remove the first 3 rows (header and subheader), the last 3 rows (footer) and, since I know that every page of my report contains 20 records (data), I can get rid of the "Page Counter and information" based on the row position of them. I could do the same to get rid of the "FieldNames" rows that I don't need as I only need the first one.

            I completely lose that "row position" approach when I perform the union first which usually overly complicates things or it can only work for a few scenarios.

             

            In the video you can check at how Power BI (Power Query) does it. That's their 2nd iteration at it as their first one suffered from the same issues that Tableau Prep currently suffers.

            • 3. Re: Connecting to a Folder (and get all files in it)
              Joshua Milligan

              Miguel,

               

              I totally agree with you about Tableau Prep needing the ability to expose row position as information is many times encoded by row position. Please consider voting up this idea https://community.tableau.com/ideas/8732 and this one https://community.tableau.com/ideas/8714.

               

              Ideally, the tool should make it so it doesn't matter whether you do the union before or after the transformation.  And, in fact, in the case you outlined above, it should be easier to do it once after the union (both conceptually and technically) because you know the row numbers you want to keep per file (3 through 23).  If Tableau Prep exposed those row numbers and retained them, per file, in the union, then you could just do a simple filter like:

               

              RowNumber >= 3 AND RowNumber <= 23

               

              on the union results and end up with the rows you needed.

               

              Here's hoping that functionality comes soon!

              Joshua

              • 4. Re: Connecting to a Folder (and get all files in it)
                Miguel Escobar

                Hey!

                I voted on those ideas as soon as I saw them before making the reply, but adding the index to the files prior to doing the union would technically count as a transformation step (basically adding a new column). If they could expose that as a new column after the UNION or, perhaps exposing that as metadata for each file, that would be phenomenal. The way other tools do it is that they do have that row order as metadata, but don't expose that "Row" number to the end-user in a direct way.

                 

                It's the raw union of the files that makes everything just way more complicated than what it should be. One last question, there's no way of creating an index column in Data Prep right now, correct?

                • 5. Re: Connecting to a Folder (and get all files in it)
                  Joshua Milligan

                  Miguel,

                   

                  I am not aware of any way for a file-based data source.  For databases that support window functions, you could write a Custom SQL that would return a row number or index along with the data.

                   

                  I could see adding the index prior to or after the union - as long as you could partition the data by filename you could get row number per file.  Row-number as metadata would be great too!  I'd love to see it exposed as it would allow a lot of flexibility.

                   

                  Thank you for the discussion!

                  Joshua