This shows you the differences between two versions of the page.
| en:iot-reloaded:preprocessing [2024/09/25 15:44] – created agrisnik | en:iot-reloaded:preprocessing [2024/09/25 15:47] (current) – agrisnik | ||
|---|---|---|---|
| Line 1: | Line 1: | ||
| + | ===== Data preprocessing in clustering ===== | ||
| + | {{: | ||
| + | |||
| + | Before starting clustering several important steps have to be performed: | ||
| + | |||
| + | * **Check if the used data is metric:** In clustering, the primary measure is Euclidian distance (in most cases), which requires numeric data. While it is possible to encode some arbitrary data using numerical values, they must maintain the semantics of numbers, i.e. 1 < 2 < 3. Good examples of natural metric data are temperature, | ||
| + | * **Select the proper scale:** For the same reasons as the distance measure, the values of each dimension should be on the same scale. For instance, customers' | ||
| + | * **Unity interval:** a minimal factor value is substructed from the given point value and divided by the interval value, giving the result 0 to 1. | ||
| + | * **Z-scale: | ||
| + | |||