Several systems such as Cloudera Impala are making big advances in fast queries against massive amounts of data. However the standard Hive SQL interface will expose your queries to the batch nature of Map/Reduce, so you may not have terrific response times if that's your only option. For data that large, even Data Engine extracts may not give you enough of a performance boost at the level of data granularity you require. Consider that you may need to pay for a commercial Hadoop distribution if you need better performance than what the standard Apache distribution provides.
+1 On Robert’s comments about Impala – you can download their “Quick Start” Virtual Machine for testing – I was pleasantly surprised with how quickly I was able to get the solution up and running for personal testing even though I rate my Linux & Hadoop stack knowledge as “somewhere south of very, very low”. This might be a good fit for you.
I think Tableau Product Design answers your question.
Lets see if I understand the question correctly -
1. You have big data and it may not be possible for you to bring in that data as extract
2. You plan to move to Hadoop to store the data generated by the system. Now there are connectors to the Hadoop system but it you are concerned about perfomance of queries on Hadoop.
3. You would like to keep the distinct values available for drill down if required by the user
4. You want ad-hoc reporting capability on big data
The way we try to solve this problem -
1. We have knowledge matrix created from the big data that we use to show counts and aggregate level data. Knowledge matrix can be created by understanding the dimensions and measures in your data. Hopefully they are defined and not dynamic.
2. We have drill down configured from there to fetch the row level data into a different workbook
3. We realized there is no magic. We just have to structure the data correctly into polygot data management system composed of high speed in memory database and slower large database. The distribution of data was created by following the business needs. Like the daily transactions log was always available for drill down in the extract. Historic queries have knowledge matrix for reporting and row level data is fetched from hadoop system on need basis
Hope this helps.