7 Replies Latest reply on Feb 3, 2015 2:54 PM by Rick Kunkel

    Data Engine  / failover

    Sourabh Dasgupta

      Hello All

      In a distributed architecture, we can specify Data Engine Processes on multiple nodes.

       

      Example:

      Primary Node – can run 2 instances of Data Engine processes

      Worker Node 1 – can also run 2 instances of Data Engine processes

       

      Both can be Active and this automatically handles the fail over.

       

      However in the user manual below, it is mentioned that we need redundancy for Data Engine as well, can you please elaborate?

        http://onlinehelp.tableausoftware.com/current/server/en-us/help.htm#distrib_ha_intro.htm%3FTocPath%3DAdministrator%2520Guide|Distributed%2520Environments|High%2520Availability|_____1

        • 1. Re: Data Engine  / failover
          Patrick A Van Der Hyde

          Hello Sourabh Dasgupta,

           

          This question was left unanswered last month.  I believe the reference here is to utilize a Standby Data Engine as described at the bottom of this page.

           

          I have moved this post to the Server Administration forums where the community audience is more likely to answer and participate with this question.

           

          Patrick

          • 2. Re: Data Engine  / failover
            Rick Kunkel

            Hi, Sourabh.


            Redundancy is accomplished by having more than one machine running the data engine host process.

             

            As long as you have only one machine running the data engine host process (or processes, in the case of two on the same machine), you have a single point of failure for that function of Tableau Server.  Adding the data engine host process to another worker means you have some redundancy if the primary data engine host process (or processes) are lost.

             

            I have some additional info on understanding the diff between the roles of the primary and secondary data engine hosts as well, which I'm pasting below in case it helps explain something here:

             

             

            Data engines are read/write on the primary data engine host

             

            If you run in a non-HA configuration, then you will have only one machine running the data engine.  That machine can run either one or two data engine processes, which work off a common set of extract files, and both of those processes will be read/write.

             

            Data engines are read-only on the secondary data engine host

             

             

            If you run in an HA configuration, then you will have two machines running the data engine (the primary data engine machine, and the secondary data engine machine).  Each of those machines can run either one or two data engine processes.  Data engine processes on the same machine work off a common set of extract files.  Both primary data engine processes will be read/write, and both secondary data engines will be read-only.  Read-only on the secondary machine is a consequence of the fact that extract files are copied by rsync from the primary data engine machine to the secondary data engine machine (but not the reverse), thus for all extracts to appear on both machines, they must be written onto the primary.

            • 3. Re: Data Engine  / failover
              Jeff Strauss

              Rich, thanks for the clear explanation.  I have two followups.

               

              1. Are we guaranteed that the readonly will always be in sync?  What if the extract is done writing to the primary and then a tabadmin stop occurs.

               

              2. When trying to add a second data engine process, I encountered a port conflict.  A support case is open in reference to this and support was able to replicate.  Do you have any further advice?

               

              G:\Tableau\Tableau Server\8.3\bin>tabadmin start

              ===== Starting service...

                *** Blocked or conflicting ports found on worker ORD-TBL104. See worker tabadm

              1. in.log for details.

                *** Blocked or conflicting ports found on worker ORD-TBL105. See worker tabadm

              1. in.log for details.

              ===== Service failed to start properly. Run "tabadmin status -v" and see "tabadm

              1. in.log" for more details

               

               

              From ORD-TBL104:

              2014-12-10 16:44:09.122 -0600_DEBUG_:_:_pid=___user=__request=__ Seeking a range of 1 port(s) for worker2.gateway.port, default 80.

              2014-12-10 16:44:09.122 -0600_DEBUG_:_:_pid=___user=__request=__ Seeking a range of 2 port(s) for dataengine.port, default 27042.

              2014-12-10 16:44:09.122 -0600_DEBUG_:_:_pid=___user=__request=__ Seeking a range of 1 port(s) for pgsql.port, default 8060.

              2014-12-10 16:44:09.122 -0600_DEBUG_:_:_pid=___user=__request=__ Seeking a range of 4 port(s) for worker2.vizqlserver.port, default 9100.

              2014-12-10 16:44:09.122 -0600_DEBUG_:_:_pid=___user=__request=__ Seeking a range of 2 port(s) for worker2.dataserver.port, default 9700.

              2014-12-10 16:44:09.122 -0600_DEBUG_:_:_pid=___user=__request=__ Seeking a range of 2 port(s) for worker2.wgserver.port, default 8000.

              2014-12-10 16:44:09.122 -0600_DEBUG_:_:_pid=___user=__request=__ Seeking a range of 1 port(s) for pgsql.initport, default 8062.

              2014-12-10 16:44:09.122 -0600_DEBUG_:_:_pid=___user=__request=__ Seeking a range of 1 port(s) for dataengine.initport, default 27043.

              2014-12-10 16:44:09.122 -0600_WARN_:_:_pid=___user=__request=__ dataengine.initport is in conflict with one of the ports.

              2014-12-10 16:44:09.122 -0600_DEBUG_:_:_pid=___user=__request=__ Seeking a range of 1 port(s) for rsync.port, default 9090.

              2014-12-10 16:44:09.122 -0600_DEBUG_:_:_pid=___user=__request=__ Seeking a range of 1 port(s) for svcmonitor.jmx.port, default 9095.

              2014-12-10 16:44:09.122 -0600_DEBUG_:_:_pid=___user=__request=__ Seeking a range of 4 port(s) for worker2.vizqlserver.jmx.port, default 9400.

              2014-12-10 16:44:09.185 -0600_DEBUG_:_:_pid=___user=__request=__ Seeking a range of 2 port(s) for worker2.dataserver.jmx.port, default 10000.

              2014-12-10 16:44:09.185 -0600_DEBUG_:_:_pid=___user=__request=__ Seeking a range of 2 port(s) for worker2.wgserver.jmx.port, default 8300.

              2014-12-10 16:44:09.185 -0600_WARN_:_:_pid=___user=__request=__ Tableau Server may be degraded due to conflicted ports.

              • 4. Re: Data Engine  / failover
                Rick Kunkel

                Hi, Jeffrey,

                 

                 

                1. Are we guaranteed that the readonly will always be in sync?  What if the extract is done writing to the primary and then a tabadmin stop occurs?

                 

                As currently designed, once the TDE exists on the primary (read/write), rsync copies it to the secondary (readonly).  If something happens that prevents rsync from completing the copy, the secondary will be out of sync (i.e. not have the TDE).

                 

                As far as I understand it, a tabadmin stop (run immediately after a TDE file is created on the primary) could be one of the things that prevents the sync from completing.  Of course, tabadmin start will fire rsync up again, so it will be copied then.

                 

                A data engine failover event that occurs immediately after a TDE file is created on the primary can also prevent the sync from completing.  Furthermore, since the new primary data engine host was the secondary data engine host just prior to the failover event, and since the TDE is not on the new primary, the sync doesn't take place at all.  This will generally result in TDE 4 errors in a view or backgrounder extract refresh.  The more verbose error (if I recall correctly) suggests that the file or path cannot be found.  The error makes sense:  Since the sync did not complete, the TDE is not at the location that the workbook says it should be.

                 

                 

                 

                2. When trying to add a second data engine process, I encountered a port conflict.  A support case is open in reference to this and support was able to replicate.  Do you have any further advice?

                 

                If support is able to reproduce the issue, that's great.   An internal repro makes it much easier for engineering and dev to research.

                 

                As far as advice goes, if the issue is a critical one for you, and you've got flexibility in configuration, I suggest trying to find a configuration that doesn't cause the issue.  If the issue is a problem in the software, my (admittedly limited) understanding is that it only occurs in a specific configuration.

                 

                You might also try setting the ports manually and disabling port remapping on startup.  See http://onlinehelp.tableausoftware.com/current/server/en-us/help.htm#ports.htm under "Dynamic port remapping" for a description of how dynamic remapping works.

                 

                 

                 

                 

                Good luck!

                • 5. Re: Data Engine  / failover
                  Sourabh Dasgupta

                  Hello Rick, Thanks for the information!

                   

                  I have few more queries wrt DE.

                  Single Server  - Suppose I have 2 processes running. So at any point of time only one DE does all the work and the other sits idle, is it True? Is it also true on HA, i.e. if I have total 4 DEs then only one DE will do all the work?

                   

                  In HA - can I not have more than 2 nodes hosting DEs?

                  • 7. Re: Data Engine  / failover
                    Rick Kunkel

                    Sourabh Dasgupta, as of this writing:

                    • Any data engine host process can be read from, whether you've got 1, 2, 3, or 4 of them.
                    • Only primary data engine hosts processes (the dark green ones on the maintenance page) can be written to.