14 Replies Latest reply on Dec 25, 2017 7:16 PM by Shinichiro Murakami

# Data quality and Data available

Hello people

I wanted to do two things in the data:

The first objective is:

Calculat the percentage of measured errors of the total, where the measured variable is the flow by  id_sensor and I have data for several years.

The errors that may exist are the flow below zero.

The second objective is:

Calculate the availability of the data that represents the coefficient between the expected data volume and observed data volume.

This sensor collected flow data every 5 minutes during 2012 through 2016. I would like to make a calculation of the sum of all the values that gave me 288 records per day and the number of values that gave me below that number.

The goal is to make the data available in percent per day (288 records) per month (288 * 31 or 288 * 30 or 288 * (28 or 29)) and by Year for a given attribute.

The following equations express the receavailability coefficient for each parking area:

In these equations, i is a day where i (symbol belongs) N and N is the number of days in the dataset, in this case 366. Vi is a boolean variable that considers a day valid if RPi >= 0.9. RPi represents the record percentage as a coefficiente between the number of observed records in a day i and the expected number of records. Finally, the availability is a coefficient between all days that are considered valid (with record percentage higher than 0.9) and the total number of days, N. In sum, the availability coefficient is equal to the number of days with a record percentage higher than 0.9 divided by the total number of days, 366. In order to chose how to consider a specific day valid (with a high record percentage), the record percentage was calculated for each parking area. Results showed that all parking areas had an average between 70-90% of record percentage for most of the days, but hardly any parking area had more than 90% in any day. Due to this reason, it became clear that demanding a record percentage higher than 90% would render all parking facilities invalid, and the value of 0.9 was chosen as the required record percentage. Figure 3.3 shows the availability coefficient for each parking area. As shown in the figure, most parking areas have a coefficient higher than 0.8. This means that the majority of the parking areas have more than 80% of their days with a high record availabilty.

I reduced the data and attached it here.

• ###### 1. Re: Data quality and Data available

HI Sergio

I don't hink there are no case of case -1 and I am not sure what you want exactly.

#2, I also don't understand which fields represent what.

By somehow you already deleted past communication and I am totally lost what you want.

I don't think your scenario works with reducing number of unique ID.

It's better to filter by dates.

And deleting your past communication is really not good idea..

Thanks,

Shin

1 of 1 people found this helpful
• ###### 2. Re: Data quality and Data available

Sorry people

I'm new to these wanderings.

I've never worked with bigdata and much less tableau. Thank you very much the posts and the answers voices that helped me a lot in the understanding of some concepts.

At the beginning, my database was huge and it became difficult for me to help myself. I want to leave a thank you in particular Shinichiro Murakami who helped me a lot.

My final post is this:Data quality and Data available

In other publications, I tried to insert the database and modify the structure of the question and always gave error. If it is possible, I will close the others.

Tanks,

people

1 of 1 people found this helpful
• ###### 3. Re: Data quality and Data available

Sergio,

Are you able to recover past few days conversation?

I totally lost what I did and where you still have problem.

Could you please clarify the "Remaining" problem again.

And as I suggested, make it sure reduced data still replicates your issue.

Shin

1 of 1 people found this helpful
• ###### 4. Re: Data quality and Data available

I can not restore previous posts. Sorry for that but in this file that i have post is there what you have done.

But briefly:

1º A calculation was made by suming the number of record, include date[data time]:

{include date([Date Time]):sum([Número de registros])}

2ºCount the [1st]> = 288 (288 = number of expected registrations per day) as grouping.

Like this, filtering only by year and other variables:

Now I can count the number of 1 (number of days in which the number of registrations was 288) per month and per year.

Now what I intend is to divide this sum by the number of days, for example in the year 2012 in January, there were 10 days in which there were 288 registrations per day and I want to divide by 31 days and make% and by 365 days for each year .

1 of 1 people found this helpful
• ###### 5. Re: Data quality and Data available

HI Sergio,

Hope this helps.

Regards,

Shin

1 of 1 people found this helpful
• ###### 6. Re: Data quality and Data available

Tanks a lot SHINICHIRO MURAKAM, your are the man )

i have one question: in yellow is the sum number of records (frist column) and the sum of days( second column).

In the field of Disponibilidade i have a calculat nºrecords/nºof days.

I want to do in tha last fields 533351/1858?

• ###### 7. Re: Data quality and Data available

It's quite troublesome to handle g.total, and .I cannot understand what do you have in the calculated fields without seeing your data.

Attach packaged workbook with reduced size please.

Shin

1 of 1 people found this helpful
• ###### 8. Re: Data quality and Data available

This is my workbook

Tank you for help me

• ###### 9. Re: Data quality and Data available

Hi Sergio,

This may help.

Please mark my answer as correct to close the thread, not from inbox view, but from original post.

For further question, please create new post.

Thanks,

Shin

1 of 1 people found this helpful
• ###### 10. Re: Data quality and Data available

HI Sergio,

Could you mark either of my answer as correct to close the thread.

Not from the inbox view, but from the original post.

Regards,

Shin

• ###### 11. Re: Data quality and Data available

Is it possible to use the value of the id sensor and the availability value (both underlined in yellow) and make a bar graph?

• ###### 12. Re: Data quality and Data available

HI Sergio,

I don't prefer to receive the question one after another.

If you still have couple of needs, I recommend to create new post.

Shin

• ###### 13. Re: Data quality and Data available

Sorry SHINICHIRO MURAKAM,

This is my final question.

Is it possible to use the value of the id sensor and the availability value (both underlined in yellow) and make a bar graph?
• ###### 14. Re: Data quality and Data available

Simply put "Sensor ID" to column shelf ??

Thanks,

Shin

1 of 1 people found this helpful