6 Replies Latest reply on Aug 3, 2013 7:07 AM by Jonathan Drummey

# Filtering rows based on multiple columns value

Hi all,

I would like to an highlight table for month over month transaction counts percent difference, for many different types of transactions. The problem I have is that for certain type of transactions, the count is very low, an can even be 0 for a given month, so I will have a 100% or -100% percent different and my color scale gets unbalanced (20% inscrease could be very high, but it won't stick out if I have an 100% increase for a transaction that went from 0 to 1.)

So for transactions count, the table would look like this:

Jan     Feb      Mar      Apr     May

Transaction A     1000     1020     960     1100     990

Transaction B     0               10     2          0          4

Transaction C    3100     3320     2960     3100    3300

What I have in mind is to do: Filter out any transaction that doesn't have a transactions count of at least 'x' for EVERY month.

In the example above, with x=100, it would filter out Transaction B.

Now, my real dataset has hundreds of those transactions and my report gets updated daily. New transactions can also be added. In short, I don't want to be maintaining it manually.

My current solution is to create as many filters as I want months to show. If I were to do the table for the last 12 completed months, I would write:

Filter transaction types on:

sum(if DATETRUNC('month',PERIOD_DT) = DATEADD('month',-1,datetrunc('week',today())) then 1 else 0 end) >= x

AND

sum(if DATETRUNC('month',PERIOD_DT) = DATEADD('month',-2,datetrunc('week',today())) then 1 else 0 end) >= x

AND

...

AND

sum(if DATETRUNC('month',PERIOD_DT) = DATEADD('month',-12,datetrunc('week',today())) then 1 else 0 end) >= x

This solution has 2 problems:

First, I'm assuming that my data was updated today, which might not always be the case (when using 'today()', what I really want to do is 'the maximum date of my dataset'. I don't know how to do that.) Or maybe I would like to do this exercice for 2012 data. How can I write:

if DATETRUNC('month',PERIOD_DT) = DATEADD('month',-1,datetrunc('week',LATEST DATE FOUND IN 'PERIOD_DT')) then 1 else 0 end

Second, if I were to do it daily, or even weekly, the filtering condition would be very long to type.

Anyone has a solution?

Thanks!!

• ###### 1. Re: Filtering rows based on multiple columns value

Charles,

Two thoughts come to mind:

1. When you say " Filter out any transaction that doesn't have a transactions count of at least 'x' for EVERY month"  it sounds like a good case for the WINDOW_MIN function.  That would allow you to determine the minimum for a transaction across all dates in the window.  You could then filter based on a threshold.  The basic calculation would be

WINDOW_MIN(SUM(1)) > 100

Place that on filters and only include where it is true. I'm assuming SUM(1) gives you the transaction count -- replace that part of the calculation as needed.

2. With the highlight table, you could edit the color legend and manually set the start or end points for the color gradient.  So, for example, you might set the end at 20% if you didn't want it to get lost with the occasional outlying 100% values.

Here's an example where I wanted anything close to 1% to stand out, even though there was one line that had values around 20% to 30%: Regards,

Joshua

• ###### 2. Re: Filtering rows based on multiple columns value

Thanks Joshua! Changing the start/end points of the color gradient is a very good and simple solution. However, if possible, I would still like to filter on the transaction types, as it makes the table more concise. As for table calculation, I don’t think you can use them as filters. When filtering by formula, they are not available (it give the error message ‘Unknown function  called’).

• ###### 3. Re: Filtering rows based on multiple columns value

Charles,

You should be able to use a table calc as a filter -- so there must be something else going on.  If you are able to post a packaged workbook (dummy up the data if needed), I'd be happy to take a look.

Regards,

Joshua

• ###### 4. Re: Filtering rows based on multiple columns value

I'm guessing you're trying to use a top or conditional filter, because that's where by Formula is an option. Top and conditional are applied on dimensions and computed in the data source, whereas table calculations are computed later, inside Tableau. What Josh was suggesting was to directly use the table calculation field as an additional filter on your worksheet.

Jonathan

• ###### 5. Re: Filtering rows based on multiple columns value

Oh, sorry! It’s all good now. Thanks to you both.

One last thing, as it might save me from creating another thread:

How would I do the filter on the months I want to keep? I would like to only keep the months that are completed. So if the last date in my data set is July 20th, I want to exclude July. Let’s not suppose that the last date of my data set is the same as today(). Any solution?

Thanks again.

• ###### 6. Re: Filtering rows based on multiple columns value

1. Add a subquery to your data source that returns the latest date in your data source and adds that as another dimension. Then you can create a Boolean calc like DATETRUNC('month',[data date]) != DATETRUNC('month',[latest data date]) and put that on the Filters Shelf, filtering for true.

2. Create a calculated field that identifies the latest day in the data set, checks to see whether the latest day is the end of the month like built around DAY(DATETRUNC('day',[data date])+1) == 1, then passes that value back to other rows in the last month. The exact calculation depends on what levels of dates are in the view, it might be able to be an aggregate calc, or maybe a table calculation.

I'd recommend option #1 if you can set that up, it's almost certainly going to be faster than #2 and definitely easier to work with inside Tableau. If you want to go for option #2 and need help, please post a packaged workbook with some sample data.

Jonathan