Data Mining the Web: Uncovering Patterns in Web Content, Structure, and Usage

ISBN: 978-0-471-66655-4

Hardcover

218 pages

April 2007

List Price:	US $114.25
Government Price:	US $78.68
Enter Quantity: Buy

Data Mining the Web: Uncovering Patterns in Web Content, Structure, and Usage (0471666556) cover image

This is a Print-on-Demand title. It will be printed specifically to fill your order. Please allow an additional 10-15 days delivery time. The book is not returnable.

< >

PREFACE.

PART I: WEB STRUCTURE MINING.

1 INFORMATION RETRIEVAL AND WEB SEARCH.

Web Challenges.

Web Search Engines.

Topic Directories.

Semantic Web.

Crawling the Web.

Web Basics.

Web Crawlers.

Indexing and Keyword Search.

Document Representation.

Implementation Considerations.

Relevance Ranking.

Advanced Text Search.

Using the HTML Structure in Keyword Search.

Evaluating Search Quality.

Similarity Search.

Cosine Similarity.

Jaccard Similarity.

Document Resemblance.

References.

Exercises.

2 HYPERLINK-BASED RANKING.

Introduction.

Social Networks Analysis.

PageRank.

Authorities and Hubs.

Link-Based Similarity Search.

Enhanced Techniques for Page Ranking.

References.

Exercises.

PART II: WEB CONTENT MINING.

3 CLUSTERING.

Introduction.

Hierarchical Agglomerative Clustering.

k-Means Clustering.

Probabilty-Based Clustering.

Finite Mixture Problem.

Classification Problem.

Clustering Problem.

Collaborative Filtering (Recommender Systems).

References.

Exercises.

4 EVALUATING CLUSTERING.

Approaches to Evaluating Clustering.

Similarity-Based Criterion Functions.

Probabilistic Criterion Functions.

MDL-Based Model and Feature Evaluation.

Minimum Description Length Principle.

MDL-Based Model Evaluation.

Feature Selection.

Classes-to-Clusters Evaluation.

Precision, Recall, and F-Measure.

Entropy.

References.

Exercises.

5 CLASSIFICATION.

General Setting and Evaluation Techniques.

Nearest-Neighbor Algorithm.

Feature Selection.

Naive Bayes Algorithm.

Numerical Approaches.

Relational Learning.

References.

Exercises.

PART III: WEB USAGE MINING.

6 INTRODUCTION TO WEB USAGE MINING.

Definition of Web Usage Mining.

Cross-Industry Standard Process for Data Mining.

Clickstream Analysis.

Web Server Log Files.

Remote Host Field.

Date/Time Field.

HTTP Request Field.

Status Code Field.

Transfer Volume (Bytes) Field.

Common Log Format.

Identification Field.

Authuser Field.

Extended Common Log Format.

Referrer Field.

User Agent Field.

Example of a Web Log Record.

Microsoft IIS Log Format.

Auxiliary Information.

References.

Exercises.

7 PREPROCESSING FOR WEB USAGE MINING.

Need for Preprocessing the Data.

Data Cleaning and Filtering.

Page Extension Exploration and Filtering.

De-Spidering the Web Log File.

User Identification.

Session Identification.

Path Completion.

Directories and the Basket Transformation.

Further Data Preprocessing Steps.

References.

Exercises.

8 EXPLORATORY DATA ANALYSIS FOR WEB USAGE MINING.

Introduction.

Number of Visit Actions.

Session Duration.

Relationship between Visit Actions and Session Duration.

Average Time per Page.

Duration for Individual Pages.

References.

Exercises.

9 MODELING FOR WEB USAGE MINING: CLUSTERING, ASSOCIATION, AND CLASSIFICATION.

Introduction.

Modeling Methodology.

Definition of Clustering.

The BIRCH Clustering Algorithm.

Affinity Analysis and the A Priori Algorithm.

Discretizing the Numerical Variables: Binning.

Applying the A Priori Algorithm to the CCSU Web Log Data.

Classification and Regression Trees.

The C4.5 Algorithm.

References.

Exercises.

INDEX.

Related Titles

Database & Data Warehousing Technologies

Relational Database Index Design and the Optimizers: DB2, Oracle, SQL Server, et al.

by Tapio Lahdenmaki, Mike Leach

Geo-Business: GIS in the Digital Organization

by James B. Pick

MDX Solutions: With Microsoft SQL Server Analysis Services 2005 and Hyperion Essbase, 2nd Edition

by George Spofford, Sivakumar Harinath, Christopher Webb, Dylan Hai Huang, Francesco Civardi

Emergent Information Technologies and Enabling Policies for Counter-Terrorism

by Robert L. Popp (Editor), John Yen (Editor)

Mastering Data Warehouse Aggregates: Solutions for Star Schema Performance

Data Modeling Fundamentals: A Practical Guide for IT Professionals

by Paulraj Ponniah

Beginning XML Databases

by Gavin Powell

Read Online Now at Wiley Online Library

An online version of this product is available through our subscription-based content service.
Read Online

Read an Excerpt

Permissions

To reuse content from this title

Request permission

Join An E-mail List

Learn about the latest products, events, offers and content.

Our Solutions, Your Way

Data Mining the Web: Uncovering Patterns in Web Content, Structure, and Usage

Related Titles

Database & Data Warehousing Technologies

Read Online Now at Wiley Online Library

Read an Excerpt

Permissions

Join An E-mail List

About Wiley

Resources

Customer Support