this has already been solved internally.
The windows scheduler does actually start once a minute and rejects being run in parallel. Our tasks (the alerts that are scheduled to run every 15 mins) take often longer than 15 minutes to be run and thus everything gets messed up and shifted.
That our tasks run this long is caused by the fact that we used a work around to send emails only once a day after the daily load is finished. For this we use the 15 mins schedule and check each time in the Email Action field if the daily load END_DTS was inside the last quarter hour. So now we just need to find a different solution instead of that work around.
Hi Lilli! Thanks for doing a thorough investigation before sending your question in. Yes, you are correct that the scheduled task (or chron job, for Linux people) that runs VizAlerts should be set up to run every minute. And you're also correct that if it does not, then you could miss test_alert comments intended to trigger an alert, because it only looks back in time for five minutes.
Can you confirm that the scheduled task is set to run every minute? There should be log entries showing some activity every minute, even if no views are being processed.
If you've set the task up properly, the only other cause for it skipping a test_alert comment would be if it was under very heavy load such that the batch did not complete within five full minutes. So, say VizAlerts runs at 10:00, but has ten alerts to process, each of which takes 1 minute. If set to run only a single thread, this would cause it to complete at 10:10. If you had entered a test_alert comment at 10:04, it would be missed. There are lots of solutions to this:
1. Increase the number of alerts being processed at the same time by upping the number of threads in the config/vizalerts.yaml file. By default it is set to 2, but if you need more, you can up it. We run 5 threads on our instance, and this is able to keep up with all our alerts.
2. Improve the efficiency of your alerts. You can also reduce the timeout settings to "encourage" your alert authors to improve this, by editing the timeout calc in the VizAlertsConfig workbook.
3. Scale down your alerts or their frequency. Do they all need to run? How about every fifteen minutes? Would hourly alerts also work, staggered over 15-minute periods (:00, :15, :30, 45)? This would give VizAlerts room to breath between batches.
many thanks for your detailed response!
Your second guess is the case - our chron job is scheduled to run every minute, but we had too many alerts scheduled to run every 15 minutes with some of them being too inefficient. Having VizAlerts set to run on two threads with a timeout set to 900 secs, the completion of a batch sometimes took as long as 40 minutes.
The reason why we had several alerts scheduled to run every 15 minutes was due to a work around we built to trigger the alert once a day directly after the daily DWH load was done. The "Email Action" field was set to 1 if the END_DTS of the daily load happened in the previous 15 minutes and to 0 otherwise. We only now realized that this work around might lead to a heavy load and throttle the whole process.
Many thanks also for the recommended solutions!
We took them into account and our next steps will be:
1. Test if we can upp the number of threads to 5 while not decreasing the performance of other processes.
2. First improve the efficiency of our alerts together with the alert authors and then reduce the timeout calc in the VizAlertsConfig workbook.
3. Try to restructure scheduling and frequency of alerts, staggering hourly alerts over 15-minute periods is a great idea.
4. Always drag the "Email Action" field to the filters and filter for 1 as it looked from the VizAlerts Logs as if in the case that there is no table visible in the VizAlerts Sheet, VizAlerts will directly jump to "execute_alert - Nothing to do! No rows in trigger data from file" when processing the alert whereas when a table is visible and the "Email Action" field is 0 it will first load all entries in the table including attachments.
By this I'm sure we will get back to a stable processing of VizAlerts. Many thanks again for your help!