Types of data in cluster analysis in data mining ppt

Data Mining

Data mining is the process of extracting valuable information from large data sets. Cluster analysis is a type of data mining that identifies groups of similar data points in a data set. This can be useful for marketing purposes, such as identifying groups of customers with similar buying habits.

What is Data Mining?

Data mining is the process of extracting patterns from data. It can be used to extract information from a large set of data. Data mining is used in many different applications, such as insurance, credit card fraud detection, and marketing.

There are two types of data in cluster analysis:
-Nominal data: This type of data can be classified, but the order of the categories is not important. For example, gender (male/female) is a nominal variable.
-Ordinal data: This type of data can be classified and the order of the categories is important. For example, education level (elementary, high school, college) is an ordinal variable.

Data Mining Process

The Data Mining Process is a model that describes the steps necessary to accomplish data mining.

The process of data mining can be divided into six major steps:

  1. Pre-Processing: In this step, the raw data is prepared for processing. This may involve cleaning the data, imputing missing values, scaling the data, etc.
  2. Data Transformation: In this step, the pre-processed data is transformed into a format that is suitable for mining. This may involve creating new variables, discretizing continuous variables, etc.
  3. Data Mining: In this step, various algorithms are applied to the transformed data in order to discover patterns and relationships.
  4. Pattern Evaluation: In this step, the discovered patterns are evaluated to determine whether they are useful and interesting.
  5. Deployment: In this step, thepatterns are deployed in some way. This may involve incorporating them into a decision-making system, using them to generate predictions, or simply storing them for future reference.
  6. Post-Processing: In thisstep, any post-processing that needs to be done is completed (e.g., creating visualizations of the results).
    Cluster Analysis
    Cluster analysis is a data mining technique used to find natural groups or clusters in data. Cluster analysis is used to classify objects into groups based on their similarity.
    What is Cluster Analysis?

    Cluster analysis is a statistical technique for finding groups of similar objects in a data set. Cluster analysis can be used to group variables together, to group objects together, or to find both types of groups simultaneously.

There are two types of cluster analysis:
-Hierarchical cluster analysis
-Partitioning cluster analysis

Types of Data in Cluster Analysis

Cluster analysis is a type of unsupervised learning that groups data points together based on similarity. Unlike classification, which requires a labeled dataset, clustering groups data points together without any prior labeling. This makes it a very powerful tool for exploratory data analysis, as it can help you find hidden patterns and relationships in your data.

There are two main types of data that can be clustered:

-Numeric data: This is the most common type of data used in cluster analysis. Numeric data can be quantitative (e.g., height, weight, age) or categorical (e.g., gender, race). Numeric data can be real-valued (e.g., income) or discrete (e.g., zip codes).

-Textual data: This type of data is becoming increasingly common as more and more organizations have access to large amounts of unstructured textual data (e.g., social media posts, customer reviews, support logs). Textual data can be represented using Bag-of-Words models or Word Embeddings.


In conclusion, we have examined four different types of data that can be used in cluster analysis. We have also seen how each type of data can be represented in a two-dimensional space. Finally, we have briefly discussed some of the benefits and drawbacks of using each type of data.

Leave a Reply

Your email address will not be published.