13 Replies Latest reply on Jan 14, 2020 4:52 PM by Dan Cory

    Creating an extract from MarkLogic not utilizing Hyper

    Aaron Newberg

      Hi,

       

      I'm with MarkLogic and a customer of ours is experiencing difficulty creating an extract in Tableau. They have a database with 40M rows and from them:

      Tableau fills 12GB of memory on the client computer, begins to fill up swap space, then stops communicating with the ODBC app server on the Marklogic cluster

      We dug into this issue a bit, were able to reproduce, and parsed the logs with Tableau Log Viewer.

       

      Tableau does not use Hyper when we try to create an extract from MarkLogic; it seems to choose to use the old way of creating extracts from Tableau 10.4. The .hyper file is created, but when we look through the logs, it shows that the whole result set is being held in memory all at once, filling up swap space and draining all the client machine’s memory.

      We’ve tested and seen that Hyper is used for IBM DB2 even using the generic ODBC connector, as the logs show 16MB chunks of data (hyper-send-chunk entries) being received into Tableau every 100ms or so. It seems that Hyper would solve our customer’s problem.

       

      Attached are our logs. Is there any specific database or ODBC driver features that we need to implement in order to utilize Hyper? Thanks in advance for any insight.

        • 1. Re: MarkLogic not utilizing Hyper when creating an extract
          Jeff D

          Hi Aaron, can you point to the log entries that show the result set is being held in memory?

          • 2. Re: MarkLogic not utilizing Hyper when creating an extract
            Aaron Newberg

            Hi Jeff,

            I see a single begin-protocol.query entry in the Tableau logs. From the MarkLogic logs, I see the data is done being sent to the client, while the client's memory is gradually filling up.

            We want to utilize Hyper and see the hyper-send-chunk entries that we are seeing from our DB2 tests.

            We have tried changing customizations in our .tdc file, but that did not enable the hyper feature.

            Is there a certain protocol message that Tableau looks for that turns on hyper? Do we need to implement a certain db feature ie. a blocking cursor?

             

            Thanks,

            Aaron

            • 3. Re: MarkLogic not utilizing Hyper when creating an extract
              Dan Cory

              This is a connectivity layer problem, not a Hyper problem. And it's probably something to do with your ODBC driver or its configuration. I would create an ODBC trace and see what calls Tableau is making.

               

              Reading through your logs a few times, I think what's happening is that Tableau is trying to read metadata for the query. Tableau has several strategies to read metadata, but some of the code paths can wind up executing the entire query.

               

              The ODBC trace will tell you what Tableau is doing and we can probably figure it out from there.

               

              Dan

              • 4. Re: MarkLogic not utilizing Hyper when creating an extract
                Dan Cory

                Also, I'm not quite sure why you aren't using the built in MarkLogic connector, but I know there have been problems with newer versions of MarkLogic so you may be stuck there.

                 

                But reviewing our MarkLogic connector code, we set several capability settings. You probably need a TDC file to control your ODBC driver as well. See Customize and Tune ODBC Connections - Tableau if you are just learning about TDC files.

                 

                Particular settings that are likely relevant:

                CAP_ODBC_METADATA_SUPPRESS_SELECT_STAR yes

                CAP_QUERY_WHERE_FALSE_METADATA yes

                 

                I suspect the first one is causing the problem, although I would think we'd see a message in the log about it.

                 

                You'll also want to look at the Tableau Connector SDK

                 

                Dan

                • 5. Re: MarkLogic not utilizing Hyper when creating an extract
                  Aaron Newberg

                  Hi Dan,

                  Thanks for the insight.

                  Attached is the ODBC trace. Hopefully you can spot something that can give us a clue; I can't.

                  I tried different combinations of customizations in the .tds file, but the changes do not enable Hyper for me. I then copied the customization list from the .tds in which DB2 extracted using Hyper and that also did not work.

                   

                  The MarkLogic connector currently in Tableau is not compatible with recent versions of MarkLogic, as we have rewritten the SQL engine in MarkLogic 9.

                  If we want to rewrite our connector, would it be advisable to use the instructions you linked? Tableau Connector SDK

                   

                  Aaron

                  • 6. Re: MarkLogic not utilizing Hyper when creating an extract
                    Dan Cory

                    You will want to use the connector SDK. It basically wraps your ODBC (or JDBC) driver. The learnings from this discussion will map over there.

                     

                    I looked at the SQL.LOG. Unfortunately it's cut off in the middle of a SQLExecute, so I'm not quite sure what's happening. Did the process actually crash?

                     

                    It vaguely looks like reading metadata with SQLPrepareW is working correctly. One useful test would be to press the "Update Now" button in the data grid so that the top 1000 rows get shown before you create the extract. You did this in the logs you attached earlier, but not in this ODBC trace (as far as I can tell). That way I can see a successful query as well as an unsuccessful query from the ODBC trace.

                     

                    It would be great for you to attach the TDC file (not TDS, that's a different Tableau file) that you are testing with as well. The logs you attached earlier did not include a reference to a TDC file. We should probably also get you the TDC settings used in the MarkLogic 7.0/8.0 connector.

                     

                    Dan

                    • 7. Re: Creating an extract from MarkLogic not utilizing Hyper
                      Dan Cory

                      I built a TDC file you based on the settings in our connector. That might help you get your TDC file fixed up.

                       

                      Looking over some previous MarkLogic notes, the particular settings to look at are:

                      CAP_QUERY_WHERE_FALSE_METADATA - our connector sets this to no, but I think now it needs to be yes.

                      CAP_QUERY_TOP_0_METADATA - our connector set this to yes, but I think now it needs to be no.

                      • 8. Re: MarkLogic not utilizing Hyper when creating an extract
                        Aaron Newberg

                        Hi Dan,

                         

                        Thanks for the sample TDC. I tried it but am still hitting a memory bottleneck and Tableau isn't receiving data in chunks. I really think our (MarkLogic's) ODBC driver is missing the capability to send data this way. Our ODBC driver and server don't seem to have code to support this. What kind of protocol/messages does Hyper expect from the ODBC driver?

                         

                         

                        I attached the SQL.LOG and Tableau logs that show usage of the TDC. I cancelled the query early, as memory was full.

                         

                        Thanks,

                        Aaron

                        • 9. Re: MarkLogic not utilizing Hyper when creating an extract
                          Dan Cory

                          Your logs show that while the TDC was 'found' (read off disk) it wasn't 'matched' (used for the connection). I looked over the settings again and it looks like I was missing a space in the driver name. Try the attached. But the problem I was targeting with the TDC settings may not be the right one.

                           

                          The SQL.LOG you attached includes the reading of the 1000 records. It shows that Tableau sets SQL_ATTR_ROW_ARRAY_SIZE to 30. Then calls SQLFetch 34 times to read 1000 records. That all looks good, and Tableau is correctly reading data in chunks.

                           

                          My best guess is that the problem is completely in your driver. When Tableau goes to read the first chunk, the driver is simply reading everything. With 1000 records, this doesn't take much memory, so we don't notice. But when we try to create the extract, that's too much.

                           

                          I recently encountered something somewhat similar in another vendor's driver. In their case, setting CAP_ODBC_FORCE_SINGLE_ROW_BINDING to yes helped. This forced the records to be read one at a time. This is overall slower than reading rows in bulk, but worked around the bug in the driver. This may work for you as well.

                           

                          It also appears your driver has an internal tracing setting. I'd try that next.

                           

                          Dan

                          1 of 1 people found this helpful
                          • 10. Re: Creating an extract from MarkLogic not utilizing Hyper
                            Dan Cory

                            It also struck me as a bit odd that Tableau was trying to read only 30 rows in a block. That's pretty small.

                             

                            It turns out Tableau thinks your lineitem table is really wide. This could be how the table was defined, limitations of MarkLogic, or issues in the driver. It's hard to say without having the lineitem definition you used to compare to.

                             

                            Fields like L_DISCOUNT seem to be numeric format and so are being read as text at 47 characters each. There are 4 fields like this.

                            Fields like L_LINESTATUS, which are one character fields in the TPCH standard, are being defined as 255 character wide text fields. There are 4 fields like this.

                            Fields like L_ORDERKEY, which are numeric fields in the TPCH standard, are being defined as 255 character wide text fields. There are 3 fields like this.

                            These numbers are all in the Tableau logs.

                             

                            This all makes for a total octet width of 4314, which is a very wide row, so Tableau only reads a few at a time.

                             

                            As I mentioned, this could be because of how you defined the table in MarkLogic, or it could be a driver problem. And I'm not sure if your customer is having the same problem.

                             

                            Dan

                            1 of 1 people found this helpful
                            • 11. Re: Creating an extract from MarkLogic not utilizing Hyper
                              Dan Cory

                              I tried out our MarkLogic server (running 8.0 I think) with the latest MarkLogic ODBC driver. Turning on the ODBC driver's logging, I saw the following statements:

                              qresult.c[QR_prepare_for_tupledata]817: REALLOC: old_count = 0, size = 0

                              qresult.c[QR_prepare_for_tupledata]817: REALLOC: old_count = 100, size = 46400

                              qresult.c[QR_prepare_for_tupledata]817: REALLOC: old_count = 200, size = 92800

                              qresult.c[QR_prepare_for_tupledata]817: REALLOC: old_count = 400, size = 185600

                              qresult.c[QR_prepare_for_tupledata]817: REALLOC: old_count = 800, size = 371200

                              qresult.c[QR_prepare_for_tupledata]817: REALLOC: old_count = 1600, size = 742400

                              qresult.c[QR_prepare_for_tupledata]817: REALLOC: old_count = 3200, size = 1484800

                              So it's not using a cursor at all - instead allocating space for all the records. For small data sets this isn't noticeable, but you run out of memory on large data sets.

                               

                              I'm not sure _why_ it's not using a cursor. But you'll have to have someone who works on the ODBC driver look at that.

                               

                              If we knew why, there may be a way to work around it with a TDC.

                               

                              Dan

                              1 of 1 people found this helpful
                              • 12. Re: Creating an extract from MarkLogic not utilizing Hyper
                                Aaron Newberg

                                Hi Dan,

                                 

                                Thanks very much for your ongoing investigation. I'm not convinced our ODBC driver supports cursors. We forked off of Postgres' ODBC driver and if I'm not mistaken, they use DECLARE/FETCH SQL syntax for their cursors. Since this is the case, I think they chose to support cursors at the server level rather than the ODBC driver level. MarkLogic currently does not support this syntax.

                                Do you have someone at Postgres that can confirm this is the case? We might have to either support the DECLARE/FETCH cursor syntax on our server or fix it at the ODBC level.

                                 

                                Aaron

                                • 13. Re: Creating an extract from MarkLogic not utilizing Hyper
                                  Dan Cory

                                  Since Postgres is an open source project, we don't have any special connections with them. You can ask for help in the Postgres ODBC mailing list: https://www.postgresql.org/list/pgsql-odbc/

                                   

                                  Since you've already forked the code, presumably you could change the driver to still expect paginated results and have the server return them. I'm not sure how much work that would be.

                                   

                                  I don't think there's much else I can do to help at this point. If there's a way to get the driver to download only some records at time, I'm happy to help figure out how to get Tableau on that path.

                                   

                                  Dan