14 Replies Latest reply on Nov 25, 2016 11:42 AM by Luis Miguel Sanchez

    Aggregating data from the US stock market in Tableau

    Luis Miguel Sanchez

      I am a very basic user of tableau and I have not found an answer to my question.

       

      I have a txt file that has historical daily data for 98% of all the stocks in the US, with their daily capitalization. Each stocks has its TICKER, daily MARKET VALUE for every trading day of the year, and its SECTOR.

       

      I did a simple time series that displays `SUM([Mktval])` (sum of all individual market values) across all stocks, on a daily basis, and where I can see that the total aggregated value of the US capital markets is about 24 Trillion USD, as of Juy 2016, as shown in the image below.

      izgPw.png

      When I change the view column from DAY to YEAR, I don't see the right values, but something a lot larger. So I realized that I need to do `SUM([Mktval])/252` to get the right value for a year (there are about 252 trading days in a year).

       

      If I change the view to MONTH, as in the chart below, the numbers are again wrong because 252 is not the right value to use in the division.

       

      YQRcZ.png

       

      Is there any way that Tableau can adjust the values automatically (the number to divide by) to reflect the **AVG MktVal** across any time selected (monthly, quarterly, yearly, etc.) ?

       

      Thanks

        • 1. Re: Aggregating data from the US stock market in Tableau
          Simon Runc

          hi Miguel,

           

          So what (I think) we need to do here is average the stock of each company, over whatever time period and then add them together...and one way is to use an LoD calculation (LoDs allow us to tell Tableau the level at which to aggregate).

           

          So in your example here...

          {FIXED [Tikcer], [Sector]: AVG([MktVal])}

           

          So this creates (for each Ticker/Sector) the Average [MktVal] (so at Day level it's the same as using SUM, as the AVG of 1 value is the same as the SUM of that Value...or MIN/MAX/MEDIAN for that matter!), but at higher levels (say Month) it takes the Average for each company over the month and then SUMs them up.

           

          You can then bring this field in a SUM (as LoDs are returned at Row Level we are then able to re-aggregate), and that should work at any level.

           

          Hope this does the trick, and makes sense, but let me know if not (and if it doesn't do as you want, I may need a sample workbook so I can take a look at the "Grain" of the data, as we may need to add some other dimension to our LoD, or we might need to weight the prices on Volume, for example)

          • 2. Re: Aggregating data from the US stock market in Tableau
            Luis Miguel Sanchez

            Thank you Simon.

             

            I tried your suggestion, creating a calculated field called "Test", which I plotted and does not show the right results on any of the time frames. Again, my reference is that on a daily basis, the total market cap of all the stocks should add to about ~24 trillion USD, which shows correctly in the upper chart, but not in the lower chart which has your calc. So there is another factor that needs to come into play.

             

            Screen Shot 2016-11-24 at 6.13.26 PM.png

             

            The source I am using does not have a calculation of market value as price X shares_outstanding. I created the CSV file that feeds this Tableau workbook using pandas in python, and the market value for each stock is already calculated in the CSV. (A few stocks are unselected since the market val for those are wrong). Also, I tried to add the Thanks in advance for any feedback. Also, tried to attach the Tableau packaged workbook, but it is 500MB in size.

            Cheers

            • 3. Re: Aggregating data from the US stock market in Tableau
              Simon Runc

              hi Luis Miguel,

               

              Sorry my bad...I forgot we also need to add the date part to the LoD!...else each company is just taking it's average over all time and adding them.

               

              So I've mocked up (well downloaded) the Stock Price for 3 companies (so a mini version of yours). I'm not sure what control you want over the time period. I've chosen a parameter.

               

              I've first set up a parameter

               

               

              and created the following calculated field to return the requested time period

              [Selected Period]

              DATETRUNC([Select Period],[Date])

               

              where [Select Period] is the parameter.

               

              I then add this to the LoD...

              [Close - LoD]

              {FIXED [Company],[Selected Period]: AVG([Close])}

               

              and now (I think!) it does what we want. If it's still not, can you just export the data for 2 or 3 of companies, so I can see the "grain" of the data.

              • 4. Re: Aggregating data from the US stock market in Tableau
                Luis Miguel Sanchez

                Thank you Simon

                 

                Unfortunately, I cant open your example, so I'll try to recreate the formula in the version of Tableau I have (9.3.1 (9300.16.0511.2203) 64-bit).

                 

                I think this data set is a little bit complicated, since some stocks get de listed and drop out of the set over the time. Conversely, new ones are added to the dataset as the become IPOs, so I am not sure how the averages will be affected.

                 

                Also, I had to drop some dates from the calculations (for example, around Sep 12, 2001), which distorts the cacls since for example not all stocks were trading the day after 9/11. So if you simply add the market values on that particular day, you'll see an "artificial' drop in value for the combined market, followed by an artificial increase in value the following days.

                 

                In any case, I uploaded the twbx file in Dropbox, in case you want to take a look. It is here: Dropbox - MarketCaps.twbx

                 

                Thanks again for your hep.

                • 5. Re: Aggregating data from the US stock market in Tableau
                  Simon Runc

                  I've attached a 9.3 version of the workbook here...

                   

                  so the average will just take the average, over that of time (btw NULLS don't get counted in Average...so 01/01/2016 = $200, 02/01/2016 = $400, 03/01/2016 = NULL would give an average of $300 not $200). So the question is how you want to take the Market Value when you change from Day? Is it the Average of each company over the Month (added together)?, or the figure on the last day of every month...etc.?

                   

                  I've downloaded the beast!...and adding the date period in does do what I'd expect (but what you want may be a bit different!!). If you can let me know how you'd expect these figures to be calculated at a Week/Month...etc level there's probably a formula (or formulas) we can use. One other thing was I didn't realise quite how big the data was! Fixed LoDs (although amazingly useful) aren't that efficient, so we may have to go with an INCLUDE/EXCLUDE or Table Calc to get it working as fast as possible.

                   

                  Edit SR:- One other thing that occurred is to always keep day level, and overlay a moving average (in the below the blue line is a rolling average over the previous 30 days)...just a thought!!

                   

                  • 6. Re: Aggregating data from the US stock market in Tableau
                    Luis Miguel Sanchez

                    Thank you Simon:

                     

                    I was able to open your example, thank you very much. Nevertheless, I am not sure that's the right calculation. In any case, to answer your question, the TOTAL US Market Value when changed from day view to lets say, period X view should be the average Mkt Value of each company over the period X, all added up.

                     

                    The test is that the average US Mkt Value when aggregated over monthly, weekly, quarterly, or yearly view as of July 2016, should be roughly equal to the value we have when observed daily for July 2016, which is a ~24 trillion USD.

                     

                    Yes, the dataset is huge, but what you have is a small sample of the total set I have. (I had to create an extract to deal with this particular "Market Cap" feature, to use in a model I am building). Also, your idea of having a simple MA on different time frames could be a good solution.

                     

                    In any case,

                     

                    Thanks again

                    • 7. Re: Aggregating data from the US stock market in Tableau
                      Simon Runc

                      Wow so this is only the sample!! (mind you I has 1.2 Billion rows the other day, although had to move out of TDEs and use EXASol Live!!)...

                       

                      So yes it's often easier to isolate a small amount of the data (be that a few companies, or a single month) and get that right...then hopefully, maths being what maths is, should work for all the data!

                       

                      So I have isolated July 2016, and below shows this by day, and then the values we get back with a couple of calculations for the month level.

                       

                      I've also removed Sector from the LoDs (as from a quick look at the data, having it in doesn't seem to increase the grain).

                       

                      Date1 - Month is simply DATETRUNC('month', [Date1])

                       

                      Then

                      [Month LoD MktVal v1]

                      {FIXED [TICKER (all stocks since 1962.csv)], [Date1 - Month]: AVG([Mktval])}

                       

                      So this one is the SUM of the Average for Each TICKER by Month. notice that the value changes on the 20th. This is (I'm guessing) that we have a new TICKER and as the AVG is worked out at this level there is one (or more) extra companies adding to the sum

                       

                      [Month LoD MktVal v2]

                      {FIXED [Date1 - Month]: SUM({FIXED [TICKER (all stocks since 1962.csv)], [Date1 - Month]: AVG([Mktval])})}

                       

                      and this one is a slight variation, taking the Average for each TICKER over the month, then SUMing those values up at the Month Level. As we force this one back to be returned at Month level, this one doesn't change.

                       

                      Below I've attached an export of that table (so you can copy and paste into Excel). If you can look at the day level MktVal and let me know what value you'd expect to see for July, 2016...it gives us something to aim for (and we can de-bug if it's not quite doing what it should!...and hopefully work out why)

                       

                          

                      Date1Date1 - MonthMktvalMonth LoD MktVal v1Month LoD MktVal v2
                      01/07/2016July, 201623,062,474,576,24523,591,107,776,89323,794,227,037,280
                      05/07/2016July, 201622,879,937,112,20123,591,107,776,89323,794,227,037,280
                      06/07/2016July, 201623,008,245,002,60723,591,107,776,89323,794,227,037,280
                      07/07/2016July, 201623,000,198,693,74023,591,107,776,89323,794,227,037,280
                      08/07/2016July, 201623,359,509,305,00523,591,107,776,89323,794,227,037,280
                      11/07/2016July, 201623,462,989,494,91523,591,107,776,89323,794,227,037,280
                      12/07/2016July, 201623,641,025,815,90023,591,107,776,89323,794,227,037,280
                      13/07/2016July, 201623,625,750,838,07823,591,107,776,89323,794,227,037,280
                      14/07/2016July, 201623,733,416,634,09423,591,107,776,89323,794,227,037,280
                      15/07/2016July, 201623,709,146,244,14623,591,107,776,89323,794,227,037,280
                      18/07/2016July, 201623,776,846,065,35523,591,107,776,89323,794,227,037,280
                      19/07/2016July, 201623,728,464,796,28023,591,107,776,89323,794,227,037,280
                      20/07/2016July, 201623,853,999,279,18423,591,507,216,74323,794,227,037,280
                      21/07/2016July, 201623,766,860,202,78723,591,507,216,74323,794,227,037,280
                      22/07/2016July, 201623,890,504,058,48423,591,507,216,74323,794,227,037,280
                      25/07/2016July, 201623,827,006,437,14523,591,507,216,74323,794,227,037,280
                      26/07/2016July, 201623,857,567,642,40723,591,507,216,74323,794,227,037,280
                      27/07/2016July, 201623,822,380,194,62123,591,507,216,74323,794,227,037,280
                      28/07/2016July, 201623,875,505,673,37223,591,507,216,74323,794,227,037,280
                      29/07/2016July, 201623,943,522,908,43723,591,507,216,74323,794,227,037,280
                      1 of 1 people found this helpful
                      • 8. Re: Aggregating data from the US stock market in Tableau
                        Yuriy Fal

                        Hi guys,

                         

                        Huge dataset, great to explore, thank you!

                         

                        Pardon for a dumb question, but may be this could help?

                        SUM([Mktval]) / COUNTD([Date1])

                         

                        Yours,

                        Yuri

                        2 of 2 people found this helpful
                        • 9. Re: Aggregating data from the US stock market in Tableau
                          Simon Runc

                          Pardon for a dumb question, but may be this could help?

                          ...As always Yuri, such clear thinking!!...I think the phase (on my attempts)..."can't see the wood for the trees" (if you know the expression) springs to mind!! - sometimes you can get so caught up with the data/calculation grain and getting the LoD levels...that the obvious just sails on past!!

                           

                          And this, of course, will be so much faster.

                           

                          Yes, it is a cool data set...

                           

                          Thanks again for the intervention!!

                          2 of 2 people found this helpful
                          • 10. Re: Aggregating data from the US stock market in Tableau
                            Luis Miguel Sanchez

                            Thank you Simon!

                            Your solution returns the right results, however, Yuriy's solution is less computational intensive and does return the right results. Needless to say, I improved my knowledge of Tableau a great deal by following your excellent explanations.

                            Cheers

                            • 11. Re: Aggregating data from the US stock market in Tableau
                              Luis Miguel Sanchez

                              Yuriy: Thank for for such an elegant, precise solution to my problem!

                              Glad you liked the dataset, as I mentioned, it is just an extract of a larger dataset that contains many features observed and generated by an AI that I am programming in Python/TensorFlow to analyze the stock market.

                              I am starting to create several visualizations for this particular project, and my knowledge of Tableau is way behind my knowledge of financial engineering and programming, but I hope to improve and learn more. Thanks again. Cheers.

                              • 12. Re: Aggregating data from the US stock market in Tableau
                                Yuriy Fal

                                A grain of salt here:

                                The data is sparse.

                                 

                                There are days (plenty of) for which

                                some stocks have no trades, hence no data.

                                 

                                This is no problem for a rough estimation

                                of the whole Market Cap at 2016,

                                but could be an issue when segmenting.

                                 

                                Cheers,

                                Yuri

                                1 of 1 people found this helpful
                                • 13. Re: Aggregating data from the US stock market in Tableau
                                  Luis Miguel Sanchez

                                  You are right. For a rough estimate, that's not a problem. The final visualization will have several filters (including day filters.), to discard aggregated values for days that have some sort of abnormality, like the example I mentioned of the days following 9/11, where many large cap stocks stopped trading. If those days are included, the viz would show large decreases in values for one or few days, followed by a large increase in value when stock trading resumed for most stocks.

                                  Thank you for pointing that out.