3 Replies Latest reply on Nov 5, 2018 2:03 PM by Dan Cory

    Those hundreds of millions of rows!

    Jonathan Hodge

      Howdy all!

       

      Got a pretty big data set to work with. I'm wanting to pull 1 year of sales and even 1 month can be 40 million rows.

      All in all I am expecting the final result to be about 500 million records. To extract this much data I was wondering if I could leverage incremental refreshes?

       

      If I have SQL that looks like:

       

      SELECT *

      FROM TABLE

      WHERE CALYRMTH BETWEEN add_months(date_trunc('month', current_date), -12) AND add_months(date_trunc('month', current_date), -12)

       

      My hope is that I could run this, get 1 month, and then change the last 12 to an 11, click incremental refresh, and then it will add on the next month. Slowly I can do this until I get all the data that I want.

       

      It seems though that  it starts over from scratch? I even tested this by changing the sql but pasting in the exact statement I had before. When I ran an incremental refresh it started pulling data that it should have already had stored?

      Am I doing something wrong? I am not clicking "Full Refresh" but it seems to be doing so anyways

       

       

      Thanks for any support or solutions to this issue!

        • 1. Re: Those hundreds of millions of rows!
          Dan Cory

          Unfortunately that's not how incremental refresh works. It assumes the query is constant, but the data in the database changes. So when you change the query, it starts over.

           

          Is there a reason you aren't just extracting it all in one go? 500 million records is a lot, but not impossible.

           

          If you really want to do what you are trying, see Add Data to Extracts - Tableau

           

          Dan

          • 2. Re: Those hundreds of millions of rows!
            Jonathan Hodge

            Thanks for that clarification Dan!

             

            The reason is purely limitations on how long queries can run. The admins have processes in place that kill queries that run X seconds.

            I was hoping to avoid going through all the tape on these limitations by finding a workaround.

             

            I looked at the link you sent. I had not thought of appending in such ways to my extract. If in my shoes is there a way you'd recommend attempting that approach?

             

            Think I am slightly confused on going about slowly appending data but keeping my Custom SQL in the format desired for the final iteration. If using Custom Sql at all is feasible using that approach.

            • 3. Re: Those hundreds of millions of rows!
              Dan Cory

              I'd try it in one pass. If that doesn't work because the admin kills the query, then see the article I sent. You can download each month separately and append them on the client.

               

              Dan