2 Replies Latest reply on Sep 20, 2018 1:42 AM by Philip King

    When the first row is data and not a header

    Philip King

      Hi,

       

      I have just started using Prep, been through the "Getting Started" videos and am pretty confident it will do most of what I need it to.

       

      My main requirement is to clean UK Heathcare (NHS) published data and create csv files for loading into an online Big Data solution. The standard way for organisation to publish data is via a multi-sheet Excel file - all perfectly do-able in Prep.  Most of the workbooks are published monthly but the month the data applies to is at the top of the worksheet, along with other information about the data below.  Here's a typical layout:

       

      RTTdata.PNG

       

      Actual data is here: https://www.england.nhs.uk/statistics/wp-content/uploads/sites/2/2018/09/Incomplete-Provider-Jul18-XLS-8148K-36078.xls

       

      I discovered that Data Interpreter does an excellent job of detecting this as 2 distinct blocks of data:

       

      • B5:C11 which contains contextual information about the data in the main data area
      • B14: BL3674 the main data area I want to clean then output

       

      What I need to be able to do is grab the "Period" of July 2018 and add it as a column in the main data area like this:

      • 2018-07-01,Y54,ROA,MANCHESTER UNIVERSITY..... etc

       

      Is there any way that a constant could be defined from the B5:C11 for use in the main data table, given that it is pretty structured, perhaps using pivot?  The problem with pivot when I tried it was that it saw "Period:" and "July 2018" as the header row.

       

      I'd be grateful of any advice!

       

      Cheers,

       

      Phil