Module leader: Mr Hongbo Du
(hongbo.du@buckingham.ac.uk)
One term (15 units)
Information Discovery (also known as data mining) is a newly developed discipline combining techniques from statistics, artificial intelligence and database management systems, aiming to discover valuable information buried in large volumes of data in databases. Due to its application in various areas, the technology has gained a wide recognition in the IT industry. This module introduces the concept and principles of information discovery. It also presents in detail modern techniques of data mining with practical examples. The module also discusses practical issues in applying the data mining techniques in real application domains. Through this module, students will:
- Obtain fundamental knowledge in information discovery.
- Study state-of-art approaches and techniques in discovering different types of information.
- Obtain in-depth knowledge in applying data mining techniques in practice.
Course Outline
- Introduction. Data and information. What is information discovery and why? Information discovery process. Query, OLAP and Discovery. Promises and challenges.
- Discovery methodologies. Data mining approaches. Discovery tasks and solutions, an overview. An example of automated discovery. How to conduct a discovery project, a case study
- Input, output and basic discovery methods. Input data: concept, instances and attributes. Output: different types of information patterns. An overview of basic discovery methods. How to measure the discovered patterns.
- Association rules. Boolean, generalised and qualitative association rules. Algorithms for mining different association rules and examples. Practice of association rule discovery
- Clustering. Basic methods: K-means method. Agglomeration method. Advanced methods: CLIQUE, CHAMELEON. Practice of cluster detection
- Classification. Memory based reasoning (PEBLS). Induction of decision trees (ID3, C4.5). Overview of artificial neural networks. Practice of classification methods
- Evaluation of information patterns. Training and testing. Cross-validation. Comparing data mining solutions. Counting the cost. Simplicity of patterns.
- Supporting techniques. Model pruning methods. Attribute selection. Discretization. Data cleansing.
- Information discovery in practice. Data warehousing. OLAP. Discovery in business applications.
This module is assessed by both coursework (25%) and written examination (75%).
Key texts:
- Witten, I. & E. Frank. Data mining: practical machine learning tools and techniques with Java implementations (San Francisco: Morgan Kaufmann, 2000). ISBN: 1-55860-552-5.
- Berry, M.J. & G. Linhoff. Data mining techniques for marketing sales and customer support (New York: Wiley, 1997). ISBN: 0-471-17980-9.
Before purchasing any key texts, we recommend that you check for the latest edition with the Department of Applied Computing.