6 Replies Latest reply on Apr 24, 2017 2:50 PM by Eric Liong

    Cluster Hard Drive Requirements

    Eric Liong

      Hi everyone,

       

      We're looking into going from a single machine to a cluster:

      Gateway

      Vizsql Server

      Background Processor

       

      Tableau says the Primary node/gateway needs enough power to perform backup/restore processes but can otherwise be bare bones if you're running no processes off it. (Distributed Requirements )

      Primary Node Hard drive space would need room for:

      Current Hard drive useage + Server Restore data  + Local Server Restore file + Buffer for growth etc

       

      For the worker nodes do we also need enough room to store a local copy and restore backups? Do the workers need local copies of the backup if the primary has a copy?

      More hardware is nice to have but it's nice to know what you need.

      Currently looking at a 820gb minimum just to have enough room to restore properly nevermind Tableau's 10x Backup size recommendation.

      Should I expect the primary/workers to require as much hard drive space  as our current Tableau instance on a single machine setup, each?

       

       

      Side note, has anyone tried running a gateway on a 4 core machine instead of a 8 core 16gb ram machine?

        • 1. Re: Cluster Hard Drive Requirements
          Russell Christopher

          Hey Eric -

           

          There's no definitive answer to this question since each cluster (and the data it hosts) is different. That said, a good rule of thumb is to "Be thrify, but not cheap". I've seen people regret going "low end" on storage, even when storage is generally the cheapest part of the system.

           

          The absolute best way to answer this question for yourself is to test a restore. The "basic" guidance (I've never seen the 10x thing - can you give me a pointer?) is often baked to be safe. It is not science. I think you're looking for a formula to use to calculate necessary disk space, and there really isn't one. For example, the tsbak file generated from your \data folder during a restore can vary wildly how many extracts you have, compression rate, etc. That leaves you with needing to run an experiment to generate guidance that really is meaningful.

           

          Setup windows perfmon on each of your machines and monitor the Logical Disk | Free Megabytes counter. You'll want a fairly low sampling interval (maybe 5-10 second) so you don't "miss" events where a lot of disk space gets temporarily consumed, and then freed again. Store all this info to disk  vs. trying to watch it real time.

           

          Do your restore and perfmon will give you the answer to your questions.  At the volumes you're dealing with, using any "conventional wisdom" around how much disk  (or how little you can get away with) you should have is sort of risky. You have to really know.

           

          A fair number of folks will run a 4-core Primary. No big whoop. Note however that gateway != primary 100% of the time. If you're running a cluster, you probably (or at least should be) running gateways on every node to avoid a single point of failure.

           

          HTH!

          • 2. Re: Cluster Hard Drive Requirements
            Eric Liong

            Hi Russell Christopher,

            Thanks for the reply!

             

            Here's the part about 10x space:

            "In addition to the amount of space needed for the backup file, you need temporary disk space roughly 10 times the size of the backup file (so if your backup is 4 GB, you should have about 40 GB of temporary disk space available)."

            -"Primary Computer Distributed Requirements "

             

            I understand the exact amount of space is something we cant realistically calculate but for the big parts.

            1. When doing local restores (ie tsback on C:\ and not over the network) does each node need to have the restore file or do they receive information from the primary node?

             

            With regards to running gateways on every node, if we're doing 1 8 core node running Vizql and another running just backgrounders would you recommend running gateways off the backgrounder as well?

             

             

            PS our hard drive requirements are approximately:

            Tableau Server: 240gb

            Local Restore: 135gb

            Space needed while restoring: 240gb

            Buffer: 150gb

            Seperate OS Drive + Pagefile : 130gb

             

             

             

            • 3. Re: Cluster Hard Drive Requirements
              Russell Christopher

              Huh! Who would have thunk - thanks for the pointer.

               

              I've never actually watched the restore, but I seem to recall that during a backup each node zips up it's own file and then sends it to the primary, which zips "its stuff" up along with the worker zips into the main archive.

               

              Soooo, I'd guess the opposite happens. But the way this stuff works changes from time to time, which is why you'd probably want to eyeball it. Using Sysinternals' ProcessExplorer and/or just "Disk Activity" in the Windows Resource Manager could quickly point you to which files are where and you could see how big they are.

               

              With your data volumes I assumed you had more than 2 nodes. If only one worker is rendering (vizqls) having gateways on multiple machines doesn't buy you much...If the vizql box goes down, you're still dead, so having a gateway on the BG machine which COULD direct traffic to the dead vizql box isn't relevant. Leave it alone =)

              1 of 1 people found this helpful
              • 4. Re: Cluster Hard Drive Requirements
                Jeff Strauss

                Here's either a useful tidbit or just another bit of trivia:

                 

                 

                If you try to restore on too small of a free space disk, then you will receive the following within tabadmin.log (such as I saw on my dev cluster).  The backup I was trying to restore is 58 gig.

                 

                2017-04-14 09:57:20.765 -0500_ERROR_10.xxxx:xxx-TBLDEVxxx_:_pid=9196_0x10f7918f__user=__request=__ Unable to restore G:/TableauRestore/tableau_backup-2017-04-14.tsbak

                2017-04-14 09:57:20.765 -0500_NOTICE_10.xxxx:xxx-TBLDEVxxx_:_pid=9196_0x10f7918f__user=__request=__ Estimated space needed during restore: 318,728,239,459 bytes.

                2017-04-14 09:57:20.765 -0500_NOTICE_10.xxxx:xxx-TBLDEVxxx_:_pid=9196_0x10f7918f__user=__request=__ Current free space: 303,257,239,552 bytes.

                ...

                2017-04-14 09:57:20.765 -0500_ERROR_10.xxxx:xxx-TBLDEVxxx_:_pid=9196_0x10f7918f__user=__request=__ Error: Insufficient disk space for restore of backup file.

                1 of 1 people found this helpful
                • 5. Re: Cluster Hard Drive Requirements
                  Eric Liong

                  Thanks Russell comments have been helpful!

                   

                  While no definite answer it looks like each node will need to hold at least a portion of the backup in the backup process.

                   

                  I'll keep that in mind that in a two node setup (Dedicated viz+dedicated BG) it's pointless to have gateways on the backgrounder node.

                  • 6. Re: Cluster Hard Drive Requirements
                    Eric Liong

                    Thanks Jeff!

                    I find my restore seems much larger (130gb) but only needs 220gb to restore...

                    Perhaps there's some sort difference in datasets but who knows!