3 Replies Latest reply on Nov 21, 2016 8:57 AM by Tom W

    Extract is Huge!

    Allison Suarez

      I have more of a theory of operations question.  I have a workbook that connects to an Azure database and one excel document.  I want to create an extract, but when I kick it off, it will run for hours and grow to hundreds of millions of rows.  What I don't understand is, the tables are not that big.  And on top of that, I filtered the extract for proof of concept and it still appears as it is pulling everything.  The table row counts are below.

                              All               With Filter

      Gaps:         157,572         1,411

      RptPerf       6986              196
      IIS               144507          5,088

      Excel Doc   150                150

      So is the extract pulling every possible combination?  I even tried limiting to 10,000 rows in the extract options but it still will run all day and get to 200 million rows before I cancel it.  I attached an image of the data source layout if that helps.

      Any thoughts would be appreciated.


        • 1. Re: Extract is Huge!
          Tom W

          Hi Allison,

          It sounds like one of the join conditions might be wrong. Can you click on the venn diagram for each join and provide a screenshot of that?

          • 2. Re: Extract is Huge!
            Allison Suarez

            I have attached the three images.


            • 3. Re: Extract is Huge!
              Tom W

              I was hoping it would be something super obvious, but unfortunately it isn't.


              Here's what I would suggest;

              Break your connection down into pieces. Start with just GapsArchives and IISPerfArchives. Refresh the extract and take a look at the row count. If all looks normal, proceed to introduce reportPerfArchives and refresh again.


              Without being able to see your actual data it's hard to say what the problem might be. But here's some general guidance;

              If you have 157k records in the GapsArchives table and they all contain the same 'App Server RSS' value of say 'WOMBAT' and you're joining to the PerfArchives table on the AppServerRSS field and they all contain the value 'WOMBAT' as well, it means you're going to end up with a total of 157k GapsArchives * 6986 IISPermArchives records because each GapsArchives record will be joined to each IISPermArchives record.


              A good starting place might actually be to filter on the 'GapsArchives' 'App Server RSS' field to select only one value. Then once it's extracted, view the data and you'll get a better understanding of how it's joining.