Data Mining

The data mining is the technique in which useful information is extracted from the raw data. The data mining is applied to accomplish various tasks like clustering, prediction analysis and association rule generation with the help of various Data Mining Tools and Techniques. In the approaches of data mining, clustering is the most efficient technique which can be applied to extract useful information from the raw data. The clustering is the technique in which similar and dissimilar type of data can be clustered to analyze useful information from the dataset. The clustering is of many types like density based clustering, hierarchical clustering and partitioning based clustering. The k-mean algorithm is the most efficient algorithm which is widely used to cluster similar and dissimilar types of data from the input data set. In the k-mean clustering, the centroid point in calculated by taking the arithmetic mean of the input dataset.


The eculidean distance is calculated from the centroid point to cluster similar and dissimilar points from the data set. The prediction analysis is the technique which is applied on the input dataset to predict current and future situations according to the input dataset. In the predictive analysis, the clustering is applied to cluster similar and dissimilar type of data and on the clustered data the technique of classification is applied which will classify the data for prediction analysis. There is an array of data mining tools and techniques that keep evolving to keep pace with the modern innovations.

Process of Data Mining

Data mining is an iterative process and it goes through the following phases as laid down by Cross Industry Standard Process for Data Mining(CRISP DM) process model:

  • Problem definition – In the first phase problem definition is listed i.e. business aims and objectives are determined taking into consideration certain factors like current background and future prospectives.
  • Data exploration – Required data is collected and explored using various statistical methods along with identification of underlying problems.
  • Data preparation – The data is prepared for modeling by cleansing and formatting the raw data in a desired way. The meaning of data is not changed while preparing.
  • Modeling – In this phase the data model is created by applying certain mathematical functions and modeling techniques. After the model is created it goes through validation and verification.
  • Evaluation – After the model is created, it is evaluated by a team of experts to check whether it satisfies business objectives or not.
  • Deployment – After evaluation, the model is deployed and further plans are made for its maintenance. A proper organized report is prepared with summary of the work done.

Topics to study in data mining

Data mining is a relatively new thing and many are not awared of this technology. This can also be a good topic for M.Tech thesis and for presentations. Following are the topics under data mining to study:

  • Fraud Detection
  • Crime Rate Prediction
  • Market Analysis
  • Customer trend analysis
  • Financial Analysis
  • Website Evaluation
  • Data Mining techniques

Scope of Data Mining

Data Mining being a relatively new field has a bright scope now as well as in future. The scope of this field is high due to the fact that markets and businesses are looking for valuable data by which they can grow their business. Data mining as a subject should be mandatory in computer science syllabus. As earlier said data mining is a good topic for M.Tech thesis. Students can go for deep research to have a good content for their thesis report.

Importance of Data Mining

  1. Data Mining helps to find out the customer behaviour towards a business.
  2. It helps in attaining competitive advantage over the rival business.
  3. It helps in making crucial decisions for the company.

Thesis and Research topics in Data Mining

Following is the list of latest topics in data mining for final year project, thesis and research:

  1. Web Mining

  2. Predictive Analytics

  3. Oracle Data Mining

  4. Clustering

  5. Text Mining

  6. Fraud Detection

  7. Data Mining as a Service(DMaaS)

  8. Graph Mining

  • Web Mining – Web mining is an application of data mining for discovering data patterns from web. Web mining is of three categories – content mining, structure mining and usage mining. Content mining detects patterns from data collected by the search engine. Structure mining examines the data which is related to the structure of the website while usage mining examines data from the user’s browser. The data collected through web mining is evaluated and analysed using techniques like clustering, classification and association. It is a very good topic for thesis in data mining.

  • Predictive Analytics – Predictive Analytics is a set of statistical techniques to analyze the current and historical data to predict the future events. The techniques include predictive modeling, machine learning and data mining. In large organizations, predictive analytics help businesses to identify risks and opportunities in their business. Both structured and unstructured data is analyzed to detect patterns. Predictive Analysis is a lengthy process and consist of seven stages which are project defining, data collection, data analysis, statistics, modeling, deployment, and monitoring. It is an excellent choice for research and thesis.

  • Oracle Data Mining – Oracle Data Mining, also referred as ODM, is a component of Oracle Advance Analytics Database. It provides powerful data mining algorithms to assist the data analysts to get valuable insights from data to predict the future standards. It helps in predicting the customer behavior which will ultimately help in targeting the best customer and cross-selling. SQL functions are used in the algorithm to mine data tables and views. It is also a good choice for thesis and research in data mining and database.

  • Clustering – Clustering is a process in which data objects are divided into meaningful sub-classes known as clusters. Objects with similar characteristics are aggregated together in a cluster. There are distinct models of clustering such as centralized, distributed. In centroid-based clustering, a vector value is assigned to each cluster. There are various applications of clustering in data mining such as market research, image processing and data analysis. It is also used in credit card fraud detection.

  • Text mining – Text mining or text data mining is a process to extract high-quality information from the text. It is done through patterns and trends devised using statistical pattern learning. Firstly, the input data is structured. After structuring, patterns are derived from this structured data and finally, the output is evaluated and interpreted. The main applications of text mining include competitive intelligence, E-Discovery, National Security, and social media monitoring. It is a trending topic for thesis in data mining.

  • Fraud Detection – The number of frauds in daily life is increasing in sectors like banking, finance and government. Accurate detection of fraud is a challenge. Data mining techniques help in anticipation and detection of fraud. Data mining tools can be used to spot patterns and detect fraud transactions. Through data mining, factors leading to fraud can be determined.

  • Data Mining as a Service(DMaaS) – It is a service for mining of data on the cloud. The result can be shared for scientific research. The interactive analysis of data can be done on the cloud. It will leverage the existing interface.

  • Graph Mining – It is an application of data mining to extract useful patterns from the graphs. The underlying data can be used for classification and clustering. There are certain tools for graph mining like GASTON and gSpan. The application of graph mining includes biological network, web data, cheminformatics and many more.

These were the latest research and thesis topics in data mining.