Data Mining

The data mining is the technique in which useful information is extracted from the raw data. The data mining is applied to accomplish various tasks like clustering, prediction analysis and association rule generation with the help of various Data Mining Tools and Techniques. In the approaches of data mining, clustering is the most efficient technique which can be applied to extract useful information from the raw data. The clustering is the technique in which similar and dissimilar type of data can be clustered to analyze useful information from the dataset. The clustering is of many types like density based clustering, hierarchical clustering and partitioning based clustering. The k-mean algorithm is the most efficient algorithm which is widely used to cluster similar and dissimilar types of data from the input data set. In the k-mean clustering, the centroid point in calculated by taking the arithmetic mean of the input dataset.


The eculidean distance is calculated from the centroid point to cluster similar and dissimilar points from the data set. The prediction analysis is the technique which is applied on the input dataset to predict current and future situations according to the input dataset. In the predictive analysis, the clustering is applied to cluster similar and dissimilar type of data and on the clustered data the technique of classification is applied which will classify the data for prediction analysis. There is an array of data mining tools and techniques that keep evolving to keep pace with the modern innovations.

Process of Data Mining

Data mining is an iterative process and it goes through the following phases as laid down by Cross Industry Standard Process for Data Mining(CRISP DM) process model:

  • Problem definition – In the first phase problem definition is listed i.e. business aims and objectives are determined taking into consideration certain factors like current background and future prospectives.
  • Data exploration – Required data is collected and explored using various statistical methods along with identification of underlying problems.
  • Data preparation – The data is prepared for modeling by cleansing and formatting the raw data in a desired way. The meaning of data is not changed while preparing.
  • Modeling – In this phase the data model is created by applying certain mathematical functions and modeling techniques. After the model is created it goes through validation and verification.
  • Evaluation – After the model is created, it is evaluated by a team of experts to check whether it satisfies business objectives or not.
  • Deployment – After evaluation, the model is deployed and further plans are made for its maintenance. A proper organized report is prepared with summary of the work done.

Topics to study in data mining

Data mining is a relatively new thing and many are not awared of this technology. This can also be a good topic for M.Tech thesis and for presentations. Following are the topics under data mining to study:

  • Fraud Detection
  • Crime Rate Prediction
  • Market Analysis
  • Customer trend analysis
  • Financial Analysis
  • Website Evaluation
  • Data Mining techniques

Scope of Data Mining

Data Mining being a relatively new field has a bright scope now as well as in future. The scope of this field is high due to the fact that markets and businesses are looking for valuable data by which they can grow their business. Data mining as a subject should be mandatory in computer science syllabus. As earlier said data mining is a good topic for M.Tech thesis. Students can go for deep research to have a good content for their thesis report.

Importance of Data Mining

  1. Data Mining helps to find out the customer behaviour towards a business.
  2. It helps in attaining competitive advantage over the rival business.
  3. It helps in making crucial decisions for the company.