I think I have solved my own problem.
So after some debugging, I noticed that the dataset was parsed into the python code in different sequences, somehow distribution will parse the data sorted by value in ascending order, while box plot will parse the data in the original sequence.
So this will render the training / test split set to vary between the box plot & distribution, even with the same randomizer seed, thus giving different outcome.
Sort data by a not significant field, such as Serial Number or timestamp for all the interactions, this way the outcome will always be the same.
Peace, signing out.
Thanks for sharing. I am also exploring Tabpy but have been having difficulties in getting tableau to plot model results calculated from Python. For example, I did a logistic regression in Python, I was able to get Tabpy to return the classification label in a Tableau crosstab. However, I can't get Tableau to plot the results summary as follows because Tabpy needs the 'Id' as the input to pass to Python. Hence, 'Class' and 'Id' has to be in the same row/column. Could you give me some guidelines on how to proceed? For example, how do you get Tableau to plot the feature importance bar chart when you don't have the features names in the data source? Thank you!
Currently, TabPy needs each mark on the visualization to pass data at the correct row level to Python. So if each row is differentiated by an Id, you'd need that Id field on the viz to send the data correctly. These videos may provide some further explanation:
This is something we would definitely like to expand on in the future, and if you've got more questions or feedback, please don't hesitate to contact me at firstname.lastname@example.org