3 Replies Latest reply on Aug 26, 2019 6:28 PM by Lilli Bombei

    Might problems with missing VizAlert processing be caused by wrong configuration?

    Lilli Bombei

      Hi all,

       

      we have been facing several issues lately with either test_alerts or alerts that are on 15 min schedules not being sent + not appearing in the log file at all. They seemed to just disappear. I went through other threads and found that VizAlerts is designed to run every minute and that there is a five minute window on the "test_alert" comment detection.

       

      I had a deeper look in the log files and found the following pattern - every 15 mins there are the entries for starting the queueing of the subscriptions "Queueing subscription id xx for processing". (we actually have a bunch of Alerts on the "VizAlert every 15 mins" schedule so I thought that makes sense). Only in this pattern it seems VizAlert starts queueing the subscriptions. I also found out through testing, that test_alerts are only being processed when randomly typing the "test_alert" comment around that moment where the queueing takes place. Otherwise the test_alert subscription just vanishes from the list and no entry appears in the log file.

       

      I noticed as well, that during the day some of the 15 mins queueings were just skipped. Today for example they were skipped at 04:00, 05:00, 05:15, 05:30, 08:00, 08:15, 08:30, 09:00, 10:00, 11:45, 12:15, 14:45, 15:45. So probably when there was a heavy general load.

       

      So my maybe rhetorocal question is - might all our problems be caused by our VizAlert being configured to run every 15 minutes instead of every minute? (I'm not sure where to find that information)

       

      Many thanks for your help!

        • 1. Re: Might problems with missing VizAlert processing be caused by wrong configuration?
          Lilli Bombei

          Hi all,

           

          this has already been solved internally.

           

          The windows scheduler does actually start once a minute and rejects being run in parallel. Our tasks (the alerts that are scheduled to run every 15 mins) take often longer than 15 minutes to be run and thus everything gets messed up and shifted.

           

          That our tasks run this long is caused by the fact that we used a work around to send emails only once a day after the daily load is finished. For this we use the 15 mins schedule and check each time in the Email Action field if the daily load END_DTS was inside the last quarter hour. So now we just need to find a different solution instead of that work around.

          • 2. Re: Might problems with missing VizAlert processing be caused by wrong configuration?
            Matt Coles

            Hi Lilli! Thanks for doing a thorough investigation before sending your question in. Yes, you are correct that the scheduled task (or chron job, for Linux people) that runs VizAlerts should be set up to run every minute. And you're also correct that if it does not, then you could miss test_alert comments intended to trigger an alert, because it only looks back in time for five minutes.

             

            Can you confirm that the scheduled task is set to run every minute? There should be log entries showing some activity every minute, even if no views are being processed.

             

            If you've set the task up properly, the only other cause for it skipping a test_alert comment would be if it was under very heavy load such that the batch did not complete within five full minutes. So, say VizAlerts runs at 10:00, but has ten alerts to process, each of which takes 1 minute. If set to run only a single thread, this would cause it to complete at 10:10. If you had entered a test_alert comment at 10:04, it would be missed. There are lots of solutions to this:

             

            1. Increase the number of alerts being processed at the same time by upping the number of threads in the config/vizalerts.yaml file. By default it is set to 2, but if you need more, you can up it. We run 5 threads on our instance, and this is able to keep up with all our alerts.

            2. Improve the efficiency of your alerts. You can also reduce the timeout settings to "encourage" your alert authors to improve this, by editing the timeout calc in the VizAlertsConfig workbook.

            3. Scale down your alerts or their frequency. Do they all need to run? How about every fifteen minutes? Would hourly alerts also work, staggered over 15-minute periods (:00, :15, :30, 45)? This would give VizAlerts room to breath between batches.

            • 3. Re: Might problems with missing VizAlert processing be caused by wrong configuration?
              Lilli Bombei

              Hi Matt,

               

              many thanks for your detailed response!

               

              Your second guess is the case - our chron job is scheduled to run every minute, but we had too many alerts scheduled to run every 15 minutes with some of them being too inefficient. Having VizAlerts set to run on two threads with a timeout set to 900 secs, the completion of a batch sometimes took as long as 40 minutes.

               

              The reason why we had several alerts scheduled to run every 15 minutes was due to a work around we built to trigger the alert once a day directly after the daily DWH load was done. The "Email Action" field was set to 1 if the END_DTS of the daily load happened in the previous 15 minutes and to 0 otherwise. We only now realized that this work around might lead to a heavy load and throttle the whole process.

               

              Many thanks also for the recommended solutions!

               

              We took them into account and our next steps will be:

               

              1. Test if we can upp the number of threads to 5 while not decreasing the performance of other processes.

              2. First improve the efficiency of our alerts together with the alert authors and then reduce the timeout calc in the VizAlertsConfig workbook.

              3. Try to restructure scheduling and frequency of alerts, staggering hourly alerts over 15-minute periods is a great idea.

              4. Always drag the "Email Action" field to the filters and filter for 1 as it looked from the VizAlerts Logs as if in the case that there is no table visible in the VizAlerts Sheet, VizAlerts will directly jump to "execute_alert - Nothing to do! No rows in trigger data from file" when processing the alert whereas when a table is visible and the "Email Action" field is 0 it will first load all entries in the table including attachments.

               

               

              By this I'm sure we will get back to a stable processing of VizAlerts. Many thanks again for your help!