Skip navigation

Prep Rows to Columns: dynamically add and/or remove pivoted values based on the current data

score 4
You have not voted. Active

The Rows to Columns feature (aka unpivot/crosstab) introduced in 2019.1 has a limitation in that if there are new values added to the field used by the Pivot Column they are not automatically included or excluded from the output. (I most recently tested this in v2019.4.2). I'd like this to be changed so that way the Rows to Columns uses *all* the values in the current data set as of when the flow is run.


Here's an example of the current behavior: I've got a simple data source and I want to pivot two values, this works as expected. The screenshot below shows the Clean & Pivot steps:




Then at some later time the data source updates and now there's a third value to be pivoted to columns. The Rows to Columns does not even display that value nor does it pivot it as desired - the value effectively disappears from the Pivot tool and all downstream data. The screenshot below shows the Clean step with the additional value and the Pivot step that doesn't pick up the value:




I've attached two packaged flows representing the start/end conditions that led to these screenshots.


In the situation where a value is removed from the data then Prep's pivot of Rows to Columns *keeps* the prior pivot configuration even thought that value doesn't exist anymore in the data. In this screenshot I've removed value b from the input data and Prep is still showing it in the Pivoted Fields pane and as a new column (and imputing a null value):


Screen Shot 2019-12-18 at 12.39.52 PM.png


The behavior that I'm expecting is that at each time the flow is run Prep will pivot all the columns presently in the data (rather than based on when the Pivot step was configured). So if values are added then new columns are added to the output, if values are removed then those are removed from the output.


The only workaround at this time (as of 2019.4.2) is to 1) delete the Pivot step, 2) add a new Pivot step, and 3) configure that Pivot step with the data. This has to be done every time the number of values to pivot changes.


Here are a couple of use cases for this feature:


a) We want to perform a set of the same calculations across a number of fields in the original data, for example we have a set of HIV metrics and need to do data validation on multiple fields. Since Prep doesn't have a built-in way to do the same calculation across multiple fields a common pattern is to pivot the columns to rows, do the operation on all the rows, then once that is complete pivot rows back to columns. If a new metric is added then it's not picked up by the Rows to Columns. Note that the Script tool isn't a workaround for this scenario because it also (as of v2019.4.2) doesn't automatically update the output columns.


b) Prep is being used to prepare a data set (either CSV or Hyper) that is passed to a downstream source and that requires data in a "wide" format. For example having an export with a column for each month, where each month that month's column is added to the export. Another example for us is one where each column is a health care facility and the number of facilities changes over time as facilities are opened or closed.


The lack of this functionality prevents us from being able to recommend Prep as a tool for a number of users since use case a) is a really common one for us.


Vote history