1 Reply Latest reply on Apr 28, 2015 6:28 AM by Patrice Bellan

    Python API: Sample code performance

    Patrice Bellan



      A while ago I started using the Python API to create TDE from CSV files, and I wanted to share some findings with beginners.


      My experience with Python was limited, so I relied heavily on the sample code that can be found here:

      tableau-dataextract-api/csv2tde.py at master · lc0/tableau-dataextract-api · GitHub


      Since my datetime fields where all using the same format, I reused a simplified version of setDateTime:

      def setDateTime(row, colNo, value) :
          d = datetime.datetime.strptime(value, "%Y-%m-%dT%H:%M:%SZ")
          row.setDateTime( colNo, d.year, d.month, d.day, d.hour, d.minute, d.second, 0 )


      Gaining Python knowledge, I started profiling the code, realized it was too slow, and moved on to slicing the date strings:

      def setDateTime(row, colNo, value) :
          row.setDateTime(colNo, int(value[0:4]), int(value[5:7]), int(value[8:10]), int(value[11:13]), int(value[14:16]), int(value[17:19]), 0)


      After some more profiling fun, I decided to test something a bit less obvious.

      I created a dict and relied on it to cast strings into int.

      d = {
          '00': 0,
          '01': 1,
          '02': 2,
          '03': 3,
          '04': 4,
          '05': 5,
          '99':99 }
      def setDateTime(rowo, colNo, value):
          return setDateTime(0, 100*d[value[0:2]]+d[value[2:4]], d[value[5:7]], d[value[8:10]], d[value[11:13]], d[value[14:16]], d[value[17:19]], 0)


      Results when looping 10 000 000 times on each function:

      slice + int()25.61926.225
      slice + dict7.9418.545


      The dict version is more than 18 times faster than the sample code, not bad


      If you're relatively new to Python and wanted to use the sample code, think twice if you're processing big files.

      And if you know Python better than I do and managed to have even better performance, please share, I'm very interested.