I'm getting the same Health alerts through since upgrading to 9.0.
It's the same you used to get in V8, except that V9 tries to be nicer and puts multiple alerts per email instead of sending an email for each alert.
And V9 has a few extra processes, so more alerts too.
Getting the same issue, too, typically a few times per day. Only got these alerts maybe once every other month if that on 8.x.
Can anyone comment on what is happening, how it affects user activity, and if there's a fix (or if one is being investigated by Tableau)?
I think this is an indication of the data engine process crashing and then automatically starting via the cluster controller. I get it too luckily infrequently on prod, and a bit more frequently on dev (even though the dev server is quite idle).
Anyways, I am hopeful that 9.0.5 is coming along any day, see this post...
Tableau support could not provide me a satisfactory answer, but rebooting the server resolved it for me.
1 of 1 people found this helpful
So what's happening is that the clustercontroller tries to make a periodic connection to Data Engine. If it is unsuccessful in that Data Engine is busy, possible network latency issues in distributed system, etc it will consider that a 'failed' sample. In a given minute, you're allowed only so many failed samples. If it goes beyond that threshold then the assumption is made that Data Engine is down and sends out email notification.
If you look in the clustercontroller.log file you'll be able to track down the error. But ideally if DataEngine did go down that would be noticeable in the data engine log files.
In Mansoor's case the notifications are coming so close together that it is very unlikely that Data Engine actually went down. Could be tricky to try and catch but if you go to the 'Status' screen it should show if Data Engine is down.
Please note the above only applies to Data Engine. If you start getting similar error messages regarding filestore or the repository you should open a support case.
thanks for the explanation. Is this something in the works to be fixed or is it something just to live with?
It would be in the works though couldn't really say which release might have it. Ultimately I think they are looking at making an adjustable setting so folks can set a timeframe that works better for their environment versus the current one size fits all setting.
is there any further "informal" word what release this is in?
We are on Version 9. And I keep getting these alerts. Not sure what to do.
WORKAROUND TO REDUCE NUMBER OF EMAIL ALERTS (note, does not fix the issue, just reduces notification emails about data engine being up/down)
1. Open an administrative command prompt window from the primary machine.
2. Navigate to the Tableau Server 'bin' directory.
3. Stop Tableau Server:
4. Run the following commands:
tabadmin set monitoring.interval 10
tabadmin set monitoring.up_percentage 30
5. Configure for the changes:
6. Start Tableau Server:
Is this behavior expected even for a single node installation?