Data Mining is the process of transforming unprocessed data to useful one by use certain methodologies and tactics. Data Mining involves discovering and identifying patterns in large data sets which is used by large companies to anticipate the future trends.
What is a data set?
A data set is a collection of similar data. We can also refer data set as a single database. In a data set, the data is stored in an organized form which can be accessed by applying some logic. Following are the types of data set;
File-based data set
Folder based data set
Database data set
Web-based data set
Process of Data Mining
Data Mining is a comparatively new technology to determine the futuristic trends. Data Mining tends to extract out valuable information from large unused data using statistical techniques or by using techniques of artificial intelligence and machine learning. The extracted data can be used to increase the sales, grow the business, to analyze the market trends and also in fraud detection. Students working on Ph.D. thesis in Data Mining can explain about the process in their work. Data mining is a repetitive process and it goes through the following phases as given by Cross Industry Standard Process for data mining (CRISP-DM) process model:
Problem definition – In the first phase, the business objectives and needs are determined based on the current scenario. Its requirements are studied and then an evaluation plan is prepared taking into consideration various assumptions, constraints, and conditions.
Data understanding and exploration – In this phase, the available data is collected and explored. While exploring, the experts identify the underlying problems with data using certain statistical methods. The quality of data is also checked in this phase.
Data preparation – Once the raw data is collected, it is selected, cleansed and formatted in a desired way. The data is then prepared for modeling by selecting tables, records, cases, and attributes. While preparing, the meaning of data is not at all changed.
Modeling – In this phase various modeling techniques are applied to the prepared data including mining functions and a model is created. After the model is created, it goes through testing to verify and validate the model. Some other models are also generated using modeling tools. The models are then accessed in the presence of expertise to check whether it meets business requirements or not.
Evaluation – After the model is created, it is evaluated by a team of experts to verify it in terms of business objectives. It don’t satisfy the needs then it again goes through the modeling phase. After the successful completion of this phase, the use of data mining results is decided by the experts.
Deployment – In this phase, the plans for deployment, maintenance, and monitoring is prepared for implementation. A properly organized report of data mining is prepared which will be a summary of the whole process
Data Mining Techniques
Following are some of the data mining techniques used for data mining process:
Association – In this technique, a pattern is identified based on the relationship between items of similar proceedings. A customer behavior can be analyzed by an analyst using association technique based on his buying patterns.
Classification – This technique of data mining is based on machine learning using the concepts of decision trees, linear programming, neural networks, and statistics. In this items are classified into predefined groups and classes. This method depends upon predictions made using predefined techniques.
Clustering – Clustering is the process of making a cluster of abstract objects having similar characteristics. Clustering technique is used in Machine Learning, Image Analysis, Pattern Recognition, and retrieving information.
Decision Trees – It is a graphical technique of data mining in which root of the tree is a condition and its branches are its solutions. This technique of Data Mining is used in Machine Learning.
Prediction – This data mining technique identifies the relationship between independent and dependent variables and is mainly used in predicting the future for a sale.It is an important technique of data mining in which repetitive pattern is recognized in intelligent environments. It helps in predicting future events.
Sequential Analysis – Sequential analysis is a technique that discovers and identifies similar patterns, events, and trends in transactional data over a certain period of time.
Examples of Data Mining
There are various real-life examples of data mining from everyday life. The most common example for this is cross-selling by e-commerce sites based on the searches made by the customer on the web. Another example for this is the loyalty card programme run by various stores and markets to gather valuable customer information. Fraud detection, particularly in the field of telecommunication and card sale service, is another example for this. Data mining helps in determining duration, location and time of the call in case of fraud calls.
Data Mining Trends
Data mining is used in wide range of areas from telecommunication to financial areas. It is also being taught as a subject in various colleges as a part of the curriculum, particularly in computer science. For masters students, this is a very good thesis topic as well as for research. Numerous agencies are available over the Internet that will provide thesis writing assistance and help for data mining. It is a relatively new technology and yet to reach a wider audience.
Applications of Data Mining
In Medical Science
A lot of data is generated in medical science every day which needs to be managed. Data Mining is useful in this case for extracting valuable information from this data thus generated. Data Mining is helpful in medical science to:
- Detect frauds in hospitals and medical centers
- Explore the business more effectively
- Analyse patient’s health by monitoring his day to day activities
- For successful treatment of a patient’s health
Data Mining can be used to analyze customer behavior by tracking his different purchases and daily activities. We can get information about how much does a customer spends using his credit card and which product he usually buys.
In Marketing and Sales
Data Mining is very helpful, particularly in marketing and sales business. Through data mining, marketing and sales enterprises can make offers to customers based on their purchases and also on what product he usually searches.
In Science and Engineering
Data Mining also finds its application in the field of science and engineering for the development of new products like sensor devices and pattern recognition system. Data Mining also finds its application in Machine Learning, pattern recognition, database management and artificial intelligence.
Thesis, Project and Research Ideas/Topics in Data Mining
Following is the list of data mining thesis ideas and research topics:
- Data Leakage Detection
- Database Text Mining
- Web Content Analysis
- Social Media Mining
- Climate Change Study using Data Mining
- Weather Forecasting using Data Mining
- Opinion Mining
- Enterprise Resource Planning
- Stock Market Analysis
Web Mining is an application of Data Mining and an important topic for research and thesis. It is a technique to discover patterns from WWW i.e World Wide Web. The information for web mining is collected through browser activities, page content and server logins. It is a very good area for master thesis data mining. There are three types of Web Mining:
- Web Usage Mining
- Web Content Mining
- Web Structure Mining
Web Usage Mining
It is a technique to extract usage patterns from Web Data. These patterns are used for understanding the needs of Web-based applications. Web usage mining can also be classified according to the following type of data:
- Web Server Data
- Application Server Data
- Application Level Data
Web Content Mining
Web Content Mining refers to the extraction of useful information and data from Web Page content. For retrieving information from the web page intelligent tools like web agents are used. Intelligent Systems are created which involve this agent-based approach.
Web Structure Mining
In this technique, graph theory is used for analyzing the node and structure of the website. It can be classified into two different types :
- Identifying and extracting patterns from a hyperlink
- Document structure mining – describing HTML and XML tag usage.
It is an important field of Data Mining. It refers to the process of extracting valuable information from text and is also referred to as text analytics. This high-quality information is extracted through patterns and methods like statistical pattern learning. It is another good area for the Ph.D. thesis on Data Mining. In Text Mining, input data is structured and patterns are derived from this structured data. There are various research areas and thesis topics in the field of text mining.
Applications of Text Mining
Following are the main application areas of Text Mining:
- Competitive Intelligence
- Security Applications like encryption and decryption
- Biomedical Applications for biomedical text mining
- Software Applications
- Business and marketing applications
- Academic Applications