Wiley.com
Print this page Share

Data Mining Techniques: For Marketing, Sales, and Customer Relationship Management, 3rd Edition

ISBN: 978-0-470-65093-6
Paperback
888 pages
April 2011
List Price: US $50.00
Government Price: US $32.00
Enter Quantity:   Buy
Data Mining Techniques: For Marketing, Sales, and Customer Relationship Management, 3rd Edition (0470650931) cover image

Introduction xxxvii

Chapter 1 What Is Data Mining and Why Do It? 1

What Is Data Mining? 2

Data Mining Is a Business Process 2

Large Amounts of Data 3

Meaningful Patterns and Rules 3

Data Mining and Customer Relationship Management 4

Why Now? 6

Data Is Being Produced 6

Data Is Being Warehoused 6

Computing Power Is Affordable 7

Interest in Customer Relationship Management Is Strong 7

Commercial Data Mining Software Products Have Become Available 8

Skills for the Data Miner 9

The Virtuous Cycle of Data Mining 9

A Case Study in Business Data Mining 11

Identifying BofA’s Business Challenge 12

Applying Data Mining 12

Acting on the Results 13

Measuring the Effects of Data Mining 14

Steps of the Virtuous Cycle 15

Identify Business Opportunities 16

Transform Data into Information 17

Act on the Information 19

Measure the Results 20

Data Mining in the Context of the Virtuous Cycle 23

Lessons Learned 26

Chapter 2 Data Mining Applications in Marketing and Customer Relationship Management 27

Two Customer Lifecycles 27

The Customer’s Lifecycle 28

The Customer Lifecycle 28

Subscription Relationships versus Event-Based Relationships 30

Organize Business Processes Around the Customer Lifecycle 32

Customer Acquisition 33

Customer Activation 36

Customer Relationship Management 37

Winback 38

Data Mining Applications for Customer Acquisition 38

Identifying Good Prospects 39

Choosing a Communication Channel 39

Picking Appropriate Messages 40

A Data Mining Example: Choosing the Right Place to Advertise 40

Who Fits the Profile? 41

Measuring Fitness for Groups of Readers 44

Data Mining to Improve Direct Marketing Campaigns 45

Response Modeling 46

Optimizing Response for a Fixed Budget 47

Optimizing Campaign Profitability 49

Reaching the People Most Influenced by the Message 53

Using Current Customers to Learn About Prospects 54

Start Tracking Customers Before They Become “Customers” 55

Gather Information from New Customers 55

Acquisition-Time Variables Can Predict Future Outcomes 56

Data Mining Applications for Customer Relationship Management 56

Matching Campaigns to Customers 56

Reducing Exposure to Credit Risk 58

Determining Customer Value 59

Cross-selling, Up-selling, and Making Recommendations 60

Retention 60

Recognizing Attrition 60

Why Attrition Matters 61

Different Kinds of Attrition 62

Different Kinds of Attrition Model 63

Beyond the Customer Lifecycle 64

Lessons Learned 65

Chapter 3 The Data Mining Process 67

What Can Go Wrong? 68

Learning Things That Aren’t True 68

Learning Things That Are True, but Not Useful 73

Data Mining Styles 74

Hypothesis Testing 75

Directed Data Mining 81

Undirected Data Mining 81

Goals, Tasks, and Techniques 82

Data Mining Business Goals 82

Data Mining Tasks 83

Data Mining Techniques 88

Formulating Data Mining Problems: From Goals to Tasks to Techniques 88

What Techniques for Which Tasks? 95

Is There a Target or Targets? 96

What Is the Target Data Like? 96

What Is the Input Data Like? 96

How Important Is Ease of Use? 97

How Important Is Model Explicability? 97

Lessons Learned 98

Chapter 4 Statistics 101: What You Should Know About Data 101

Occam’s Razor 103

Skepticism and Simpson’s Paradox 103

The Null Hypothesis 104

P-Values 105

Looking At and Measuring Data 106

Categorical Values 106

Numeric Variables 117

A Couple More Statistical Ideas 120

Measuring Response 120

Standard Error of a Proportion 121

Comparing Results Using Confidence Bounds 123

Comparing Results Using Difference of Proportions 124

Size of Sample 125

What the Confidence Interval Really Means 126

Size of Test and Control for an Experiment 127

Multiple Comparisons 129

The Confidence Level with Multiple Comparisons 129

Bonferroni’s Correction 129

Chi-Square Test 130

Expected Values 130

Chi-Square Value 132

Comparison of Chi-Square to Difference of Proportions 134

An Example: Chi-Square for Regions and Starts 134

Case Study: Comparing Two Recommendation Systems with an A/B Test 138

First Metric: Participating Sessions 140

Data Mining and Statistics 144

Lessons Learned 148

Chapter 5 Descriptions and Prediction: Profiling and Predictive Modeling 151

Directed Data Mining Models 152

Defining the Model Structure and Target 152

Incremental Response Modeling 154

Model Stability 156

Time-Frames in the Model Set 157

Directed Data Mining Methodology 159

Step 1: Translate the Business Problem into a Data Mining Problem 161

How Will Results Be Used? 163

How Will Results Be Delivered? 163

The Role of Domain Experts and Information Technology 164

Step 2: Select Appropriate Data 165

What Data Is Available? 166

How Much Data Is Enough? 167

How Much History Is Required? 167

How Many Variables? 168

What Must the Data Contain? 168

Step 3: Get to Know the Data 169

Examine Distributions 169

Compare Values with Descriptions 170

Validate Assumptions 170

Ask Lots of Questions 171

Step 4: Create a Model Set 172

Assembling Customer Signatures 172

Creating a Balanced Sample 172

Including Multiple Timeframes 174

Creating a Model Set for Prediction 174

Creating a Model Set for Profiling 176

Partitioning the Model Set 176

Step 5: Fix Problems with the Data 177

Categorical Variables with Too Many Values 177

Numeric Variables with Skewed Distributions and Outliers 178

Missing Values 178

Values with Meanings That Change over Time 179

Inconsistent Data Encoding 179

Step 6: Transform Data to Bring Information to the Surface 180

Step 7: Build Models 180

Step 8: Assess Models 180

Assessing Binary Response Models and Classifiers 181

Assessing Binary Response Models Using Lift 182

Assessing Binary Response Model Scores Using Lift Charts 184

Assessing Binary Response Model Scores Using Profitability Models 185

Assessing Binary Response Models Using ROC Charts 186

Assessing Estimators 188

Assessing Estimators Using Score Rankings 189

Step 9: Deploy Models 190

Practical Issues in Deploying Models 190

Optimizing Models for Deployment 191

Step 10: Assess Results 191

Step 11: Begin Again 193

Lessons Learned 193

Chapter 6 Data Mining Using Classic Statistical Techniques 195

Similarity Models 196

Similarity and Distance 196

Example: A Similarity Model for Product Penetration 197

Table Lookup Models 203

Choosing Dimensions 204

Partitioning the Dimensions 205

From Training Data to Scores 205

Handling Sparse and Missing Data by Removing Dimensions 205

RFM: A Widely Used Lookup Model 206

RFM Cell Migration 207

RFM and the Test-and-Measure Methodology 208

RFM and Incremental Response Modeling 209

Naïve Bayesian Models 210

Some Ideas from Probability 210

The Naïve Bayesian Calculation 212

Comparison with Table Lookup Models 213

Linear Regression 213

The Best-fit Line 215

Goodness of Fit 217

Multiple Regression 220

The Equation 220

The Range of the Target Variable 221

Interpreting Coefficients of Linear Regression Equations 221

Capturing Local Effects with Linear Regression 223

Additional Considerations with Multiple Regression 224

Variable Selection for Multiple Regression 225

Logistic Regression 227

Modeling Binary Outcomes 227

The Logistic Function 229

Fixed Effects and Hierarchical Effects 231

Hierarchical Effects 232

Within and Between Effects 232

Fixed Effects 233

Lessons Learned 234

Chapter 7 Decision Trees 237

What Is a Decision Tree and How Is It Used? 238

A Typical Decision Tree 238

Using the Tree to Learn About Churn 240

Using the Tree to Learn About Data and Select Variables 241

Using the Tree to Produce Rankings 243

Using the Tree to Estimate Class Probabilities 243

Using the Tree to Classify Records 244

Using the Tree to Estimate Numeric Values 244

Decision Trees Are Local Models 245

Growing Decision Trees 247

Finding the Initial Split 248

Growing the Full Tree 251

Finding the Best Split 252

Gini (Population Diversity) as a Splitting Criterion 253

Entropy Reduction or Information Gain as a Splitting Criterion 254

Information Gain Ratio 256

Chi-Square Test as a Splitting Criterion 256

Incremental Response as a Splitting Criterion 258

Reduction in Variance as a Splitting Criterion for Numeric Targets 259

F Test 262

Pruning 262

The CART Pruning Algorithm 263

Pessimistic Pruning: The C5.0 Pruning Algorithm 267

Stability-Based Pruning 268

Extracting Rules from Trees 269

Decision Tree Variations 270

Multiway Splits 270

Splitting on More Than One Field at a Time 271

Creating Nonrectangular Boxes 271

Assessing the Quality of a Decision Tree 275

When Are Decision Trees Appropriate? 276

Case Study: Process Control in a Coffee Roasting Plant 277

Goals for the Simulator 277

Building a Roaster Simulation 278

Evaluation of the Roaster Simulation 278

Lessons Learned 279

Chapter 8 Artificial Neural Networks 281

A Bit of History 282

The Biological Model 283

The Biological Neuron 285

The Biological Input Layer 286

The Biological Output Layer 287

Neural Networks and Artificial Intelligence 287

Artificial Neural Networks 288

The Artificial Neuron 288

The Multi-Layer Perceptron 291

A Network Example 292

Network Topologies 293

A Sample Application: Real Estate Appraisal 295

Training Neural Networks 299

How Does a Neural Network Learn Using Back Propagation? 299

Pruning a Neural Network 300

Radial Basis Function Networks 303

Overview of RBF Networks 303

Choosing the Locations of the Radial Basis Functions 305

Universal Approximators 305

Neural Networks in Practice 308

Choosing the Training Set 309

Coverage of Values for All Features 309

Number of Features 310

Size of Training Set 310

Number and Range of Outputs 310

Rules of Thumb for Using MLPs 310

Preparing the Data 311

Interpreting the Output from a Neural Network 313

Neural Networks for Time Series 315

Time Series Modeling 315

A Neural Network Time Series Example 316

Can Neural Network Models Be Explained? 317

Sensitivity Analysis 318

Using Rules to Describe the Scores 318

Lessons Learned 319

Chapter 9 Nearest Neighbor Approaches: Memory-Based Reasoning and Collaborative Filtering 321

Memory-Based Reasoning 322

Look-Alike Models 323

Example: Using MBR to Estimate Rents in Tuxedo, New York 324

Challenges of MBR 327

Choosing a Balanced Set of Historical Records 328

Representing the Training Data 328

Determining the Distance Function, Combination Function, and Number of Neighbors 331

Case Study: Using MBR for Classifying Anomalies in Mammograms 331

The Business Problem: Identifying Abnormal Mammograms 332

Applying MBR to the Problem 332

The Total Solution 334

Measuring Distance and Similarity 335

What Is a Distance Function? 335

Building a Distance Function One Field at a Time 337

Distance Functions for Other Data Types 340

When a Distance Metric Already Exists 341

The Combination Function: Asking the Neighbors for Advice 342

The Simplest Approach: One Neighbor 342

The Basic Approach for Categorical Targets: Democracy 342

Weighted Voting for Categorical Targets 344

Numeric Targets 344

Case Study: Shazam — Finding Nearest Neighbors for Audio Files 345

Why This Feat Is Challenging 346

The Audio Signature 347

Measuring Similarity 348

Collaborative Filtering: A Nearest-Neighbor Approach to Making Recommendations 351

Building Profiles 352 Comparing Profiles 352

Making Predictions 353

Lessons Learned 354

Chapter 10 Knowing When to Worry: Using Survival Analysis to Understand Customers 357

Customer Survival 360

What Survival Curves Reveal 360

Finding the Average Tenure from a Survival Curve 362

Customer Retention Using Survival 364

Looking at Survival as Decay 365

Hazard Probabilities 367

The Basic Idea 368

Examples of Hazard Functions 369

Censoring 371

The Hazard Calculation 372

Other Types of Censoring 375

From Hazards to Survival 376

Retention 376

Survival 378

Comparison of Retention and Survival 378

Proportional Hazards 380

Examples of Proportional Hazards 381

Stratification: Measuring Initial Effects on Survival 382

Cox Proportional Hazards 382

Survival Analysis in Practice 385

Handling Different Types of Attrition 385

When Will a Customer Come Back? 387

Understanding Customer Value 389

Forecasting 392

Hazards Changing over Time 393

Lessons Learned 394

Chapter 11 Genetic Algorithms and Swarm Intelligence 397

Optimization 398

What Is an Optimization Problem? 398

An Optimization Problem in Ant World 399

E Pluribus Unum 400

A Smarter Ant 401

Genetic Algorithms 403

A Bit of History 404

Genetics on Computers 404

Representing the Genome 413

Schemata: The Building Blocks of Genetic Algorithms 414

Beyond the Simple Algorithm 417

The Traveling Salesman Problem 418

Exhaustive Search 419

A Simple Greedy Algorithm 419

The Genetic Algorithms Approach 419

The Swarm Intelligence Approach 420

Case Study: Using Genetic Algorithms for Resource Optimization 421

Case Study: Evolving a Solution for Classifying Complaints 423

Business Context 424

Data 425

The Comment Signature 425

The Genomes 426

The Fitness Function 427

The Results 427

Lessons Learned 427

Chapter 12 Tell Me Something New: Pattern Discovery and Data Mining 429

Undirected Techniques, Undirected Data Mining 431

Undirected versus Directed Techniques 431

Undirected versus Directed Data Mining 431

Case Study: Undirected Data Mining Using Directed Techniques 432

What is Undirected Data Mining? 435

Data Exploration 435

Segmentation and Clustering 436

Target Variable Definition, When the Target Is Not Explicit 438

Simulation, Forecasting, and Agent-Based Modeling 443

Methodology for Undirected Data Mining 455

There Is No Methodology 456

Things to Keep in Mind 456

Lessons Learned 457

Chapter 13 Finding Islands of Similarity: Automatic Cluster Detection 459

Searching for Islands of Simplicity 461

Customer Segmentation and Clustering 461

Similarity Clusters 463

Tracking Campaigns by Cluster-Based Segments 464

Clustering Reveals an Overlooked Market Segment 466

Fitting the Troops 467

The K-Means Clustering Algorithm 468

Two Steps of the K-Means Algorithm 468

Voronoi Diagrams and K-Means Clusters 471

Choosing the Cluster Seeds 473

Choosing K 473

Using K-Means to Detect Outliers 474

Semi-Directed Clustering 475

Interpreting Clusters 475

Characterizing Clusters by Their Centroids 476

Characterizing Clusters by What Differentiates Them 477

Using Decision Trees to Describe Clusters 478

Evaluating Clusters 479

Cluster Measurements and Terminology 480

Cluster Silhouettes 480

Limiting Cluster Diameter for Scoring 483

Case Study: Clustering Towns 484

Creating Town Signatures 484

Creating Clusters 486

Determining the Right Number of Clusters 486

Evaluating the Clusters 487

Using Demographic Clusters to Adjust Zone Boundaries 488

Business Success 490

Variations on K-Means 490

K-Medians, K-Medoids, and K-Modes 490

The Soft Side of K-Means 494

Data Preparation for Clustering 495

Scaling for Consistency 496

Use Weights to Encode Outside Information 496

Selecting Variables for Clustering 497

Lessons Learned 497

Chapter 14 Alternative Approaches to Cluster Detection 499

Shortcomings of K-Means 500

Reasonableness 500

An Intuitive Example 501

Fixing the Problem by Changing the Scales 503

What This Means in Practice 504

Gaussian Mixture Models 505

Adding “Gaussians” to K-Means 505

Back to Gaussian Mixture Models 508

Scoring GMMs 510

Applying GMMs 511

Divisive Clustering 513

A Decision Tree–Like Method for Clustering 513

Scoring Divisive Clusters 515

Clusters and Trees 515

Agglomerative (Hierarchical) Clustering 516

Overview of Agglomerative Clustering Methods 516

Clustering People by Age: An Example of An Agglomerative Clustering Algorithm 520

Scoring Agglomerative Clusters 522

Limitations of Agglomerative Clustering 523

Agglomerative Clustering in Practice 525

Combining Agglomerative Clustering and K-Means 526

Self-Organizing Maps 527

What Is a Self-Organizing Map? 527

Training an SOM 530

Scoring an SOM 531

The Search Continues for Islands of Simplicity 532

Lessons Learned 533

Chapter 15 Market Basket Analysis and Association Rules 535

Defining Market Basket Analysis 536

Four Levels of Market Basket Data 537

The Foundation of Market Basket Analysis: Basic Measures 539

Order Characteristics 540

Item (Product) Popularity 541

Tracking Marketing Interventions 542

Case Study: Spanish or English 543

The Business Problem 543

The Data 544

Defining “Hispanicity” Preference 545

The Solution 546

Association Analysis 547

Rules Are Not Always Useful 548

Item Sets to Association Rules 551

How Good Is an Association Rule? 553

Building Association Rules 555

Choosing the Right Set of Items 556

Anonymous Versus Identified 561

Generating Rules from All This Data 561

Overcoming Practical Limits 565

The Problem of Big Data 567

Extending the Ideas 569

Different Items on the Right- and Left-Hand Sides 569

Using Association Rules to Compare Stores 570

Association Rules and Cross-Selling 572

A Typical Cross-Sell Model 572

A More Confident Approach to Product Propensities 573

Results from Using Confidence 574

Sequential Pattern Analysis 574

Finding the Sequences 575

Sequential Association Rules 578

Sequential Analysis Using Other Data Mining Techniques 579

Lessons Learned 579

Chapter 16 Link Analysis 581

Basic Graph Theory 582

What Is a Graph? 582

Directed Graphs 584

Weighted Graphs 585

Seven Bridges of Königsberg 585

Detecting Cycles in a Graph 588

The Traveling Salesman Problem Revisited 589

Social Network Analysis 593

Six Degrees of Separation 593

What Your Friends Say About You 595

Finding Childcare Benefits Fraud 596

Who Responds to Whom on Dating Sites 597

Social Marketing 598

Mining Call Graphs 598

Case Study: Tracking Down the Leader of the Pack 601

The Business Goal 601

The Data Processing Challenge 601

Finding Social Networks in Call Data 602

How the Results Are Used for Marketing 602

Estimating Customer Age 603

Case Study: Who Is Using Fax Machines from Home? 604

Why Finding Fax Machines Is Useful 604

How Do Fax Machines Behave? 604

A Graph Coloring Algorithm 605

“Coloring” the Graph to Identify Fax Machines 606

How Google Came to Rule the World 607

Hubs and Authorities 608

The Details 609

Hubs and Authorities in Practice 611

Lessons Learned 612

Chapter 17 Data Warehousing, OLAP, Analytic Sandboxes, and Data Mining 613

The Architecture of Data 615

Transaction Data, the Base Level 616

Operational Summary Data 617

Decision-Support Summary Data 617

Database Schema/Data Models 618

Metadata 623

Business Rules 623

A General Architecture for Data Warehousing 624

Source Systems 624

Extraction, Transformation, and Load 626

Central Repository 627

Metadata Repository 630

Data Marts 630

Operational Feedback 631

Users and Desktop Tools 631

Analytic Sandboxes 633

Why Are Analytic Sandboxes Needed? 634

Technology to Support Analytic Sandboxes 636

Where Does OLAP Fit In? 639

What’s in a Cube? 641

Star Schema 646

OLAP and Data Mining 648

Where Data Mining Fits in with Data Warehousing 650

Lots of Data 651

Consistent, Clean Data 651

Hypothesis Testing and Measurement 652

Scalable Hardware and RDBMS Support 653

Lessons Learned 653

Chapter 18 Building Customer Signatures 655

Finding Customers in Data 656

What Is a Customer? 657

Accounts? Customers? Households? 658

Anonymous Transactions 658

Transactions Linked to a Card 659

Transactions Linked to a Cookie 659

Transactions Linked to an Account 660

Transactions Linked to a Customer 661

Designing Signatures 661

Is a Customer Signature Necessary? 666

What Does a Row Represent? 666

Will the Signature Be Used for Predictive Modeling? 671

Has a Target Been Defined? 672

Are There Constraints Imposed by the Particular Data Mining Techniques to be Employed? 672

Which Customers Will Be Included? 673

What Might Be Interesting to Know About Customers? 673

What a Signature Looks Like 674

Process for Creating Signatures 677

Some Data Is Already at the Right Level of Granularity 678

Pivoting a Regular Time Series 679

Aggregating Time-Stamped Transactions 680

Dealing with Missing Values 685

Missing Values in Source Data 685

Unknown or Non-Existent? 687

What Not to Do 687

Things to Consider 689

Lessons Learned 691

Chapter 19 Derived Variables: Making the Data Mean More 693

Handset Churn Rate as a Predictor of Churn 694

Single-Variable Transformations 696

Standardizing Numeric Variables 696

Turning Numeric Values into Percentiles 697

Turning Counts into Rates 698

Relative Measures 699

Replacing Categorical Variables with Numeric Ones 700

Combining Variables 707

Classic Combinations 707

Combining Highly Correlated Variables 710

Rent to Home Value 712

Extracting Features from Time Series 718

Trend 719

Seasonality 721

Extracting Features from Geography 722

Geocoding 722

Mapping 723

Using Geography to Create Relative Measures 724

Using Past Values of the Target Variable 725

Using Model Scores as Inputs 725

Handling Sparse Data 726

Account Set Patterns 726

Binning Sparse Values 727

Capturing Customer Behavior from Transactions 727

Widening Narrow Data 728

Sphere of Influence as a Predictor of Good Customers 728

An Example: Ratings to Rater Profile 730

Sample Fields from the Rater Signature 730

The Rating Signature and Derived Variables 732

Lessons Learned 733

Chapter 20 Too Much of a Good Thing? Techniques for Reducing the Number of Variables 735

Problems with Too Many Variables 736

Risk of Correlation Among Input Variables 736

Risk of Overfitting 738

The Sparse Data Problem 738

Visualizing Sparseness 739

Independence 740

Exhaustive Feature Selection 743

Flavors of Variable Reduction Techniques 744

Using the Target 744

Original versus New Variables 744

Sequential Selection of Features 745

The Traditional Forward Selection Methodology 745

Forward Selection Using a Validation Set 747

Stepwise Selection 748

Forward Selection Using Non-Regression Techniques 748

Backward Selection 748

Undirected Forward Selection 749

Other Directed Variable Selection Methods 749

Using Decision Trees to Select Variables 750

Variable Reduction Using Neural Networks 752

Principal Components 753

What Are Principal Components? 753

Principal Components Example 758

Principal Component Analysis 763

Factor Analysis 767

Variable Clustering 768

Example of Variable Clusters 768

Using Variable Clusters 770

Hierarchical Variable Clustering 770

Divisive Variable Clustering 773

Lessons Learned 774

Chapter 21 Listen Carefully to What Your Customers Say: Text Mining 775

What Is Text Mining? 776

Text Mining for Derived Columns 776

Beyond Derived Features 777

Text Analysis Applications 778

Working with Text Data 781

Sources of Text 781

Language Effects 782

Basic Approaches to Representing Documents 783

Representing Documents in Practice 784

Documents and the Corpus 786

Case Study: Ad Hoc Text Mining 786

The Boycott 787

Business as Usual 787

Combining Text Mining and Hypothesis Testing 787

The Results 788

Classifying News Stories Using MBR 789

What Are the Codes? 789

Applying MBR 790

The Results 793

From Text to Numbers 794

Starting with a “Bag of Words” 794

Term-Document Matrix 796

Corpus Effects 797

Singular Value Decomposition (SVD) 798

Text Mining and Naïve Bayesian Models 800

Naïve Bayesian in the Text World 801

Identifying Spam Using Naïve Bayesian 801

Sentiment Analysis 806

DIRECTV: A Case Study in Customer Service 809

Background 809

Applying Text Mining 811

Taking the Technical Approach 814

Not an Iterative Process 818

Continuing to Benefit 818

Lessons Learned 819

Index 821

Back to Top