8 Replies Latest reply on Sep 11, 2018 3:36 AM by Ujjwal Shrestha

    Tableau Linux Topology Coordination Service Issue

    Jason Cameron

      Background:

       

      Trying to deploy a three headed monster with Centos7 under AWS (m4.xlarge's, 4vcpu/16gb ram each).  All three nodes can talk to each other, and show up just fine in the `tsm status -v` command.  Due to the fact im not necessarily sure which TCP/UDP ports talk to each other, all are currently whitelisted via Security Groups.

       

      Problem:

       

      Coordination Service won't create:

       

      ```

      [root@tableau-master user]# tsm stop

      Stopping service...

      Service stopped successfully.

      [root@tableau-master user]# tsm topology cleanup-coordination-service

      Removing non-production Coordination Service ensemble.

      50% - Validating that there are no pending changes.

      100% - Removing non-production Coordination Service ensemble id '1'.

      Finished removing non-production Coordination Service ensemble.

      [root@tableau-master user]# tsm topology deploy-coordination-service -n node1,node12,node13

      Deploying Coordination Service ensemble on nodes [node1, node12, node13].

      An error occurred while deploying Coordination Service ensemble on nodes [node1, node12, node13]. Use 'topology cleanup-coordination-service' to remove non-production ensemble.

      See '/root/.tableau/tsm/tsm.log' for more information.

      Server needs to be stopped to perform this operation.

      ```

       

      Logs:

      ```

      2018-07-06 16:16:50 main : DEBUG com.tableausoftware.tabadmin.Tabadmin - ====>> Starting tsm at 2018-07-06T16:16:49.810 <<====

      2018-07-06 16:16:50 main : DEBUG com.tableausoftware.tabadmin.Tabadmin - System.out encoding: UTF8

      2018-07-06 16:16:50 main : DEBUG com.tableausoftware.tabadmin.Tabadmin - Default locale: en_US

      2018-07-06 16:16:50 main : DEBUG com.tableausoftware.tabadmin.Tabadmin - Display language: English

      2018-07-06 16:16:51 main : DEBUG com.tableausoftware.tabadmin.Tabadmin - Command line: tsm topology deploy-coordination-service -n node1,node12,node13

      2018-07-06 16:16:51 main : DEBUG com.tableausoftware.certificates.LinuxCertManager - Loading certificates: /etc/opt/tableau/tableau_server/tableauservicesmanagerca.jks

      2018-07-06 16:16:56 main : TRACE com.tableausoftware.tabadmin.cli.SessionHandlingRestOperations$RequestFactory - Setting cookie

      2018-07-06 16:16:56 main : INFO  com.tableausoftware.tabadmin.cli.Console - Deploying Coordination Service ensemble on nodes [node1, node12, node13].

      2018-07-06 16:16:56 main : DEBUG com.tableausoftware.tabadmin.cli.ServerApi - Client request: POST https://tableau-master:8850/api/0.5/reconfigzk?nodes=node1,node12,node13

      2018-07-06 16:16:59 main : ERROR com.tableausoftware.tabadmin.TSMErrorHandler - An error occurred: 409512, Server needs to be stopped to perform this operation.

      2018-07-06 16:16:59 main : ERROR com.tableausoftware.tabadmin.async.AsyncJobDelegate - Exception occurred while starting the asynchronous job. Unable to determine if the job was started.

      Server needs to be stopped to perform this operation.: Server needs to be stopped to perform this operation.

      at com.tableausoftware.tabadmin.TSMErrorHandler.handleTsmResponse(TSMErrorHandler.java:95)

      at com.tableausoftware.tabadmin.TSMErrorHandler.handleError(TSMErrorHandler.java:49)

      at org.springframework.web.client.RestTemplate.handleResponse(RestTemplate.java:667)

      at org.springframework.web.client.RestTemplate.doExecute(RestTemplate.java:620)

      at org.springframework.web.client.RestTemplate.execute(RestTemplate.java:595)

      at org.springframework.web.client.RestTemplate.exchange(RestTemplate.java:516)

      at com.tableausoftware.tabadmin.cli.SessionHandlingRestOperations.exchange(SessionHandlingRestOperations.java:206)

      at com.tableausoftware.tabadmin.cli.ServerApi.sendRequestInner(ServerApi.java:315)

      at com.tableausoftware.tabadmin.cli.ServerApi.sendRequest(ServerApi.java:331)

      at com.tableausoftware.tabadmin.cli.ServerApi.reconfigureZookeeper(ServerApi.java:1036)

      at com.tableausoftware.tabadmin.async.ReconfigureZookeeperAsyncJobStrategy.startAsyncJob(ReconfigureZookeeperAsyncJobStrategy.java:33)

      at com.tableausoftware.tabadmin.async.AsyncJobDelegate$FailureStrategy.startAsyncJob(AsyncJobDelegate.java:204)

      at com.tableausoftware.tabadmin.async.AsyncJobDelegate.execute(AsyncJobDelegate.java:71)

      at com.tableausoftware.tabadmin.commands.topology.ReconfigureZookeeper.doExecute(ReconfigureZookeeper.java:48)

      at com.tableausoftware.tabadmin.commands.TabadminCommand.executeLeaf(TabadminCommand.java:174)

      at com.tableausoftware.tabadmin.commands.TabadminCommand.execute(TabadminCommand.java:120)

      at com.tableausoftware.tabadmin.commands.TabadminCommand.execute(TabadminCommand.java:115)

      at com.tableausoftware.tabadmin.commands.TabadminCommand.execute(TabadminCommand.java:85)

      at com.tableausoftware.tabadmin.Tabadmin.run(Tabadmin.java:165)

      at com.tableausoftware.tabadmin.Tabadmin.main(Tabadmin.java:65)

      at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)

      at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)

      at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)

      at java.lang.reflect.Method.invoke(Method.java:498)

      at org.springframework.boot.loader.MainMethodRunner.run(MainMethodRunner.java:48)

      at org.springframework.boot.loader.Launcher.launch(Launcher.java:87)

      at org.springframework.boot.loader.Launcher.launch(Launcher.java:50)

      at org.springframework.boot.loader.JarLauncher.main(JarLauncher.java:58)

      2018-07-06 16:16:59 main : ERROR com.tableausoftware.tabadmin.cli.Console - An error occurred while deploying Coordination Service ensemble on nodes [node1, node12, node13]. Use 'topology cleanup-coordination-service' to remove non-production ensemble.

      2018-07-06 16:16:59 main : ERROR com.tableausoftware.tabadmin.cli.Console - Server needs to be stopped to perform this operation.

      com.tableausoftware.commandline.commons.ReportableException: Server needs to be stopped to perform this operation.

      at com.tableausoftware.tabadmin.async.AsyncJobDelegate.execute(AsyncJobDelegate.java:79)

      at com.tableausoftware.tabadmin.commands.topology.ReconfigureZookeeper.doExecute(ReconfigureZookeeper.java:48)

      at com.tableausoftware.tabadmin.commands.TabadminCommand.executeLeaf(TabadminCommand.java:174)

      at com.tableausoftware.tabadmin.commands.TabadminCommand.execute(TabadminCommand.java:120)

      at com.tableausoftware.tabadmin.commands.TabadminCommand.execute(TabadminCommand.java:115)

      at com.tableausoftware.tabadmin.commands.TabadminCommand.execute(TabadminCommand.java:85)

      at com.tableausoftware.tabadmin.Tabadmin.run(Tabadmin.java:165)

      at com.tableausoftware.tabadmin.Tabadmin.main(Tabadmin.java:65)

      at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)

      at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)

      at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)

      at java.lang.reflect.Method.invoke(Method.java:498)

      at org.springframework.boot.loader.MainMethodRunner.run(MainMethodRunner.java:48)

      at org.springframework.boot.loader.Launcher.launch(Launcher.java:87)

      at org.springframework.boot.loader.Launcher.launch(Launcher.java:50)

      at org.springframework.boot.loader.JarLauncher.main(JarLauncher.java:58)

      Caused by: Server needs to be stopped to perform this operation.: Server needs to be stopped to perform this operation.

      at com.tableausoftware.tabadmin.TSMErrorHandler.handleTsmResponse(TSMErrorHandler.java:95)

      at com.tableausoftware.tabadmin.TSMErrorHandler.handleError(TSMErrorHandler.java:49)

      at org.springframework.web.client.RestTemplate.handleResponse(RestTemplate.java:667)

      at org.springframework.web.client.RestTemplate.doExecute(RestTemplate.java:620)

      at org.springframework.web.client.RestTemplate.execute(RestTemplate.java:595)

      at org.springframework.web.client.RestTemplate.exchange(RestTemplate.java:516)

      at com.tableausoftware.tabadmin.cli.SessionHandlingRestOperations.exchange(SessionHandlingRestOperations.java:206)

      at com.tableausoftware.tabadmin.cli.ServerApi.sendRequestInner(ServerApi.java:315)

      at com.tableausoftware.tabadmin.cli.ServerApi.sendRequest(ServerApi.java:331)

      at com.tableausoftware.tabadmin.cli.ServerApi.reconfigureZookeeper(ServerApi.java:1036)

      at com.tableausoftware.tabadmin.async.ReconfigureZookeeperAsyncJobStrategy.startAsyncJob(ReconfigureZookeeperAsyncJobStrategy.java:33)

      at com.tableausoftware.tabadmin.async.AsyncJobDelegate$FailureStrategy.startAsyncJob(AsyncJobDelegate.java:204)

      at com.tableausoftware.tabadmin.async.AsyncJobDelegate.execute(AsyncJobDelegate.java:71)

      ... 15 more

      ```

       

      What all is going on here?  I am not sure why Zookeeper is keeping information about the coordination service ensemble.  How can I get this bootstrapped?  Working through the documentation at Deploy a Coordination Service Ensemble currently times me out (frequently) and never wraps up, and now is throwing the error from above.

       

      PS:

      - I have verified the coordination service is not running on any nodes with the ensemble of '1'.

      - the only issues i have noticed is when i `stop` tableau via TSM, not all services appear to stop even though I get no errors in my logs, and get a successful exit code.  Very confusing.

       

      Any guidance would be great - thanks!

        • 1. Re: Tableau Linux Topology Coordination Service Issue
          Yuriy Fal

          Hi Jason,

           

          What is the current output of

           

          tsm status -v

           

          ?

          • 2. Re: Tableau Linux Topology Coordination Service Issue
            Jason Cameron

            node1: qa-tableau-02

                    Status: DEGRADED

                    'Tableau Server Gateway 0' is stopped.

                    'Tableau Server Application Server 0' is stopped.

                    'Tableau Server VizQL Server 0' is stopped.

                    'Tableau Server Cache Server 0' is stopped.

                    'Tableau Server Coordination Service 0' is running.

                    'Tableau Server Coordination Service 1' status is unavailable.

                    'Tableau Server Cluster Controller 0' is stopped.

                    'Tableau Server Search And Browse 0' is stopped.

                    'Tableau Server Backgrounder 0' is stopped.

                    'Tableau Server Data Server 0' is stopped.

                    'Tableau Server Data Engine 0' is running.

                    'Tableau Server File Store 0' is stopped.

                    'Tableau Server Repository 0' is running.

                    'Tableau Server Administration Agent 0' is running.

                    'Tableau Server Administration Controller 0' is running.

                    'Tableau Server Service Manager 0' is running.

                    'Tableau Server License Manager 0' is running.

                    'Tableau Server Client File Service 0' is running.

                    'Tableau Server Database Maintenance 0' is stopped.

                    'Tableau Server Backup/Restore 0' is stopped.

                    'Tableau Server Site Import/Export 0' is stopped.

                    'Tableau Server SAML Service 0' is stopped.

            node12: qa-tableau-03

                    Status: STOPPED

                    'Tableau Server Coordination Service 1' is running.

                    'Tableau Server Cluster Controller 0' is stopped.

                    'Tableau Server Administration Agent 0' is running.

                    'Tableau Server Service Manager 0' is running.

                    'Tableau Server Database Maintenance 0' is stopped.

                    'Tableau Server Backup/Restore 0' is stopped.

                    'Tableau Server Site Import/Export 0' is stopped.

            node13: qa-tableau-04

                    Status: STOPPED

                    'Tableau Server Coordination Service 1' is running.

                    'Tableau Server Cluster Controller 0' is stopped.

                    'Tableau Server Administration Agent 0' is running.

                    'Tableau Server Service Manager 0' is running.

                    'Tableau Server Database Maintenance 0' is stopped.

                    'Tableau Server Backup/Restore 0' is stopped.

                    'Tableau Server Site Import/Export 0' is stopped.

            • 3. Re: Tableau Linux Topology Coordination Service Issue
              Jason Cameron

              This was right after deploying the coordination service and it 'errors' out after a certain timeout period.  I tried:

               

              ```

              tsm configuration get -k clustercontroller.zk_session_timeout_ms

              900000

              ```

               

              but zookeeper doesn't appear to be the timeout.  Timeout said after 1210 seconds - trying to figure out what service it came from.

              • 4. Re: Tableau Linux Topology Coordination Service Issue
                Yuriy Fal

                What if you try to stop the cluster once again?

                • 5. Re: Tableau Linux Topology Coordination Service Issue
                  Jason Cameron

                  Tried restarting multiple times.  Tried completely re-doing the coordination cluster.

                   

                  Issue is, 50% of the way through the setup, something has a timeout that says '1210 seconds'.  I have set the --request-timeout, so I dont believe that is the cause.  However, logs do not indicate any additional errors which would lead to where this timeout is occurring.

                  • 6. Re: Tableau Linux Topology Coordination Service Issue
                    Jason Cameron

                    found an interesting bit in the zookeeper logs:

                     

                    2018-07-10 16:13:57.199 -0500 10324 main : INFO  com.tableausoftware.service.discovery.ServiceDiscoveryClient - updated hosts: [qa-tableau-02, qa-tableau-03, qa-tableau-04]

                    2018-07-10 16:13:57.201 -0500 10324 main : INFO  com.tableausoftware.config.ServiceRegistrationInfoFile - no registration file found at

                    2018-07-10 16:13:57.201 -0500 10324 main : WARN  com.tableausoftware.tabadmin.configuration.builder.AppConfigurationBuilder - Unable to flatten service registration info, because there is no registration file.

                     

                    Was there a registration file I missed or something?  Was a configuration not set up correctly to properly locate said configuration files?

                    • 7. Re: Tableau Linux Topology Coordination Service Issue
                      Kayla Grieme

                      Hey Jason,

                       

                      I had the same most recent error above when upgrading to 2018.2 on Windows.

                      The solution that worked for me was running tsm register -f <path to json>.

                      The template can be found here --> Activate and Register Tableau Server

                      After that, it allowed me to move forward with initializing tableau server via the TSM console.

                      • 8. Re: Tableau Linux Topology Coordination Service Issue
                        Ujjwal Shrestha

                        Hi Kayla,

                        I am having similar issue when I tried to deploy coordination ensemble, where do you have to put the registration file , is there certain location it needs to go into?

                        any help would be much appreciated.

                        thanks