    Creating Data Connections


      Hi All,


      I wanted to ask the question with regards to data connections (v8.0).  Currently, I am creating a data connection with the intent to be updated by the Tableau server each nite.  While building the data connection, I am asking Tableau to query an oracle database view and pull back the data set.  At the start of the data connection build, I have 3 options:

      1)  Connect Live

      2)  Import all Data

      3) Import some data


      Since I am creating a data connection with the intent for it to be fully refreshed each nite, I select option 3.  Now the database view has 20+ million records and I want to try to minimize the time and storage amount of the data.  As I select option 3, it takes me to another screen "Specify how much data to extract".  There are a few option in the "Number of Rows" section.  Since I want to speed this up and intend to do a full refresh as soon as I publish to the server, I select the radio button for Sample and set the rows to 100 examples.  The thought here is that it would create the XML element/branches based off of the 100 records.  I could upload the XML schema to the server, run a full refresh and have the scheduler do the heavy lifting and populate the 20 million records. 


      I have found that this is not the case, when I upload this to the server and run the scheduler, it only updates the 100 records.  Why is this the case and how do I create a data connection that uses large amounts of data without taking in the 20 million records?  In some cases, I have seen this take upwards to 3 hours, this is especially painful if I am doing this over WIFI (work from home).


      So my question is twofold:

      1)  Once I create the data connection and it creates the XML structure, when Tableau queries the database why is this only returning the sample records and not the 20 million + records?  Is this record count agnostic and  does it identify the XML layout/semantic layer?


      2)  Is there a simple way to create quick data connections on large data sets for the purpose of just uploading to the server, establishing the semantic layer and having the scheduler fill in the remaining details/records?


      Thanks in advance,