I’m working on a computer science discussion question and need an explanation to help me understand better.
An attribute is a property or characteristic of an object that can vary, either from one object to another or from one time to another (Tan, P.-N. et al., 2013). Data can have diverse formats and can be stored using a variety of different storage modes. At the most elementary level, a single unit of information is a value of a feature/attribute, where each attribute can take a number of different values. The objects, described by attributes, are combined to form data sets, which in turn are stored as flat (rectangular) files and in other formats using databases and data warehouses (Krzysztof et al., 2020).
As described in our textbook, there are four different types of attributes they are:
- Nominal attribute -> Nominal attribute provides enough information to differentiate objects like zipcodes.
- Ordinal attribute -> Ordinal attribute provide enough information to order the objects. Like different grades of the item.
- Interval attribute -> Interval attributes refers to the differences between values like median, standard deviation.
- Ratio attribute -> Ratio attributes refers to the differences and ratios like geometric mean. (Tan, P.-N. et al., 2013).
When we say continuous data; the data is not fixed like our weight we can measure our weight. But it may or may not be constant whereas when we say Discrete data it’s taking about only certain values like the number of students for a certain course as it will be defined in interger value and will be constant.
It is very important to have correct data to come to a correct output. When we consider business. In terms of data, data is a critical part of business. Bad data can have significant business consequences for companies. Poor-quality data is often pegged as the source of operational snafus, inaccurate analytics and ill-conceived business strategies. Examples of the economic damage that data quality problems can cause include added expenses when products are shipped to the wrong customer addresses, lost sales opportunities because of erroneous or incomplete customer records, and fines for improper financial or regulatory compliance reporting (Vaughan, J, 2019).
The goal of data preprocessing is to improve the data mining analysis with respect to time, cost, and quality. These are the following steps that occurs in data preprocessing
- Dimensionality reduction
- Feature subset selection
- Feature creation
- Discretization and binarization
- Variable transformation (Tan, P.-N. et al., 2013).