5 Replies Latest reply on Nov 24, 2018 10:30 PM by Ombir Rathee

    What is the best type of JOIN I Can Use for my data sources?

    Mohamed Hersi

      Dear All,

       

      I have 3 types of data sources:

       

      File 1: Excel File containing Monthly data from more than 100 Health Facilities

                 Since each Health Facility has to report every month, each region, district, health facility is repeated 12 times in a year (the file contains 1 year's data e.g. 2015.

                Note: each of my health facilities has a UNIQUE Code, but each of these are also repeated 12 times along with their respective facilities.

       

      File 2: CSV File containing geographical data (latitudes and longitudes for each Health Facility), but there is no repetition

       

      File 3: Shape File containing .shp data (contains Somalia admin data)

       

       

      For me, I use the following types of JOINs as shown in the screen shot below. For both JOINs, I selected Regions in all sources, is that correct? Will regions be enough?

       

      That BIG question is: I have 1,248 rows in File 1 data source (Excel File), and I expected to have the same number of rows in Tableau data source sheet after doing the JOINs, but instead I got 1,000 rows only. There are 248 rows missing! Any explanation for this?

       

       

      What is the best type of JOIN to use.PNG

       

       

       

      Any advice on best type of JOIN and reason for choice and other types are wrong/not good?

       

       

      Regards

      Mohamed

        • 1. Re: What is the best type of JOIN I Can Use for my data sources?
          Ombir Rathee

          It depends what is the primary key in all three sources. If there are 1248 rows in first data source and you want the same row count in final result set then you should apply the Right Join on basis Unique health Id if it is available in all three sources.

           

          EPI Data----Left Join on Unique health code------Geography-----Left join join the Primary Key------Somalia Data.

           

          It might works but still not sure how your data is structured.

          • 2. Re: What is the best type of JOIN I Can Use for my data sources?
            Mohamed Hersi

            Dear Ombrid,

             

            Below are the column names for File 1 (Excel file) which my main data source and File 2 (.csv file) which has the latitudes and longitudes.

             

             

            File 1- Excel file:

            File 1 Columns - Excel File.PNG

            The above file (Excel file) has additional columns (numeric field), so I omitted them from the screen shot since they are not needed for this question.

             

             

            File 2- CSV file

            File 2 Colums- CSV file.PNG

            Although Health Facility Code is missing from File 2 (.csv file), it is easy to add, what difference will that bring?

             

             

            Question 1 on Shape file- is there a way I can add Health Facility Codes to the .shp file which I just downloaded from the internet (do I have that authority)?

            Question 2 on Shape file- How can I find the way the .shp file is structures e.g. columns names, number of columns, and data in its columns?

             

             

            As for the Primary Key- each heath facility has a unique code, but since they are repeated over 12 time (i.e. 12 months), they are no longer primary keys. But these codes can be used as Primary Keys used only in File 2 (.csv file) since there is no monthly repetition in this file.

             

             

            Regards

            Mohamed

            • 3. Re: What is the best type of JOIN I Can Use for my data sources?
              Ombir Rathee

              Mohamed Hersi wrote:

               

              Dear Ombir,

               

              Below are the column names for File 1 (Excel file) which my main data source and File 2 (.csv file) which has the latitudes and longitudes.

               

               

              File 1- Excel file:

              File 1 Columns - Excel File.PNG

              The above file (Excel file) has additional columns (numeric field), so I omitted them from the screen shot since they are not needed for this question.

               

               

              File 2- CSV file

              File 2 Colums- CSV file.PNG

              Although Health Facility Code is missing from File 2 (.csv file), it is easy to add, what difference will that bring?

               

              ~

              You can left join File1 with File 2 on Health Facility. If each health facility has unique code then there is no need to add it. It will return the expected number of rows.

              ~

               

              Question 1 on Shape file- is there a way I can add Health Facility Codes to the .shp file which I just downloaded from the internet (do I have that authority)?

               

              ~

              You can modify the content with Arcgis but I don't have any experience with that. Why you're using shape file. What is the need in this scenario ?

              ~

               

              Question 2 on Shape file- How can I find the way the .shp file is structures e.g. columns names, number of columns, and data in its columns?

               

              ~

              Open the file directly in tableau using Spatial file connection. Then you can see the contents in data source page.

              ~

               

              As for the Primary Key- each heath facility has a unique code, but since they are repeated over 12 time (i.e. 12 months), they are no longer primary keys. But these codes can be used as Primary Keys used only in File 2 (.csv file) since there is no monthly repetition in this file.

               

               

              Regards

              Mohamed

              • 4. Re: What is the best type of JOIN I Can Use for my data sources?
                Mohamed Hersi

                I Ombir,

                 

                 

                You can left join File1 with File 2 on Health Facility. If each health facility has unique code then there is no need to add it. It will return the expected number of rows.

                If my health facilities are repeating 12 in the year with their facility codes, can you call such codes as 'unique'? (this is true in the case if File1)?

                 

                 

                You can modify the content with Arcgis but I don't have any experience with that. Why you're using shape file. What is the need in this scenario ?

                I needed the .shp file as Tableau could not recognize my districts. Tableau has kept most of my districts as Ambigious, hence I was advised to use .shp file for Somalia.

                 

                 

                Regards

                Mohamed

                • 5. Re: What is the best type of JOIN I Can Use for my data sources?
                  Ombir Rathee

                  Mohamed Hersi wrote:

                   

                  I Ombir,

                   

                   

                  You can left join File1 with File 2 on Health Facility. If each health facility has unique code then there is no need to add it. It will return the expected number of rows.

                  If my health facilities are repeating 12 in the year with their facility codes, can you call such codes as 'unique'? (this is true in the case if File1)?

                   

                  It doesn't matter if it is repeating 12 times. As long as you have non repeating code/names in Geography file, Left join will return the exact number of rows as in Left table i.e EPI Data.

                   

                   

                  You can modify the content with Arcgis but I don't have any experience with that. Why you're using shape file. What is the need in this scenario ?

                  I needed the .shp file as Tableau could not recognize my districts. Tableau has kept most of my districts as Ambigious, hence I was advised to use .shp file for Somalia.

                   

                  May be you need to select the country to remove the ambigious error. Can't exactly say without looking at all three data files. You can also explore the Custom Geocoding option instead of using shape file as you've Lat/Lon available for health facility.

                   

                   

                  Regards

                  Mohamed