Quantifying the difference would be hard, but all thing being equal your logic is sound.
I would expect a tiny improvement in performance every time you did an integer IF vs. a string IF. Over 300 million rows, that would add up to a measurable difference.
Externalizing the alias from the workbook is also a good strategy. Calculated fields, and particularly string-based calculated fields, are expensive. Taking that workload out of the workbook will definitely improve performance.
Whether those 2 combine to give you massive gains is impossible to predict. Measurable, yes, but massive?
The next biggest impacts you can have are materializing calculations in the data source vs. performing the calcs (ANY calcs) in the workbook, and if possible, pre-aggregating the data in Prep prior to bringing it into the workbook. Those have to be done somewhere, and if you need to prioritize user-facing load times/performance then doing them before the data gets to the workbook will help. TOTAL time end to end for the process likely won't vary much, but you'll shift a lot of the load to a part of the process the user will not have to see.
Thanks for the response Michael!
Agreed! Summary data will definitely help but due to the nature of this dashboard, we need the most granular data and then run count distinct functions on a ID to get the correct numbers based on different dimensional values. Since there are many to many relationships in the data set and different date level view, we need to take the lowest grain of data and let Tableau execute the aggregation at run time. This returns the correct number in an extremely dynamic report but my trade-off is that it's very compute intense.
For example, if an single ID spams 12 months, i would want to count that ID once when i'm at a monthly view, but also only 1 time at the yearly view. Aggregating on the dimension beforehand will cause it to be counted 1 for each monthly view but 12 times for the yearly view due to the grouping.
Right now, i'm chasing marginal returns and see how i can optimize anywhere. Hence, I want to see how does getting rid of string values in the hyper extract completely and only using integer/Boolean/date/alias will improve performance.
Since hyper extracts are columnar stored and compressed based on the cardinality of data, i"m not sure if I will get improvement on storage but maybe I can get some improvements on extract refresh time due to everything being Boolean and strings. Guess we shall see how this experiment goes