Want to join our mediation workshop that helps you to study better?

The Eduladder is a community of students, teachers, and programmers just interested to make you pass any exams. So we help you to solve your academic and programming questions fast.
In eduladder you can Ask,Answer,Listen,Earn and Download Questions and Question papers.
Watch related videos of your favorite subject.
Connect with students from different parts of the world.
Apply or Post Jobs, Courses ,Internships and Volunteering opportunity. For FREE
See Our team
Wondering how we keep quality?
Got unsolved questions? Ask Questions

Data-warehousing-and-data-mining-10IS74-10CS755-->View question


Asked On2017-05-17 18:19:14 by:pallaviaithaln

Taged users:


Likes:
Be first to like this question

Dislikes:
Be first to dislike this question
Talk about this  Like  Dislike
View all qusetions
Answers
Imagine that you are a manager at AllElectronics
and have been charged with analyzing the
companys data with respect to the sales at your branch.
You immediately set out to perform this task.
You carefully inspect the companys database and data warehouse, identifying and selecting the attributes or dimensions to be included in your analysis, such as
item, price, and units sold.
Alas! You notice that several of the attributes for various tuples have no recorded value.
For your analysis, you would like to include information as to whether each item purchased was advertised as on sale, yet you discover that this information has not been recorded.
Furthermore, users of your database system have reported errors, unusual values, and inconsistencies in the data recorded for some transactions.
In other words, the data you wish to analyze by data mining techniques are incomplete (lacking attribute values or certain attributes of interest, or containing only aggregate data), noisy (containing errors, or outlier values that
deviate from the expected), and inconsistent
(e.g., containing discrepancies in the department
codes used to categorize items).
Welcome to the real world! Incomplete, noisy, and inconsistent data are commonplace properties of large real world databases and data warehouses. Incomplete data can occur for a number of reasons.

Attributes of interest may not always be available, such as customer information for sales transaction data.
Other data may not be included simply because it was not considered important at the time of entry. Relevant data may not be recorded due to a misunderstanding, or because of equipment malfunctions.
Data that were inconsistent with other recorded data may have been deleted.
Furthermore, the recording of the history or modifications to the data may have been overlooked.
Missing data, particularly for tuples with missing values for some attributes, may need to be inferred.
There are many possible reasons for noisy data (having incorrect attribute values). The data
collection instruments used may be faulty. There may have been human or computer errors
occurring at data entry. Errors in data transmission can also occur. There may be technology
limitations, such as limited buffer size for coordinating synchronized data transfer and
consumption. Incorrect data may also result from in consistencies in naming conventions or data
codes used, or inconsistent formats for input fields, such as
date.

Duplicate tuples also require data cleaning. Data cleaning routines work to clean the data by
filling in missing values, smoothing noisy data, identifying or removing outliers, and resolving
inconsistencies. If users believe the data are dirty, they are unlikely to trust the results of any data
mining that has been applied to it. Furthermore, dirty data can cause confusion for the mining
procedure, resulting in unreliable output. Although most mining routines have some procedures
for dealing with incomplete or noisy data, they are not always robust. Instead, they may
concentrate on avoiding overfitting the data to the function being modeled. Therefore, a useful
preprocessing step is to run your data through some data cleaning routines.
Getting back to your task at AllElectronics, suppose that you would.





Answerd on:2015-01-20 Answerd By:pallaviaithaln

Likes:
Be first to like this answer

Dislikes:
Be first to dislike this answer
Talk about this  Like  Dislike

You might like this video:Watch more here

Watch more videos from this user Here

Learn how to upload a video over here

Data goes through a series of steps during preprocessing:
Data Cleaning: Data is cleansed through processes such as filling in missing values, smoothing the noisy data, or resolving the inconsistencies in the data.
Data Integration: Data with different representations are put together and conflicts within the data are resolved.
Data Transformation: Data is normalized, aggregated and generalized.
Data Reduction: This step aims to present a reduced representation of the data in a data warehouse.
Data Discretization: Involves the reduction of a number of values of a continuous attribute by dividing the range of attribute intervals.

Answerd on:2019-06-26 Answerd By:avi738

Likes:
Be first to like this answer

Dislikes:
Be first to dislike this answer
Talk about this  Like  Dislike

You might like this video:Watch more here

Watch more videos from this user Here

Learn how to upload a video over here



Lets together make the web is a better place

We made eduladder by keeping the ideology of building a supermarket of all the educational material available under one roof. We are doing it with the help of individual contributors like you, interns and employees. So the resources you are looking for can be easily available and accessible also with the freedom of remix reuse and reshare our content under the terms of creative commons license with attribution required close.

You can also contribute to our vision of "Helping student to pass any exams" with these.
Answer a question: You can answer the questions not yet answered in eduladder.How to answer a question
Career: Work or do your internship with us.Work with us
Create a video: You can teach anything and everything each video should be less than five minutes should cover the idea less than five min.How to upload a video on eduladder