Data Mining and Analysis 培训
Data preprocessing
Data Cleaning
Data integration and transformation
Data reduction
Discretization and concept hierarchy generation
Statistical inference
Probability distributions, Random variables, Central limit theorem
Sampling
Confidence intervals
Statistical Inference
Hypothesis testing
Multivariate linear regression
Specification
Subset selection
Estimation
Validation
Prediction
Classification methods
Logistic regression
Linear discriminant analysis
K-nearest neighbours
Naive Bayes
Comparison of Classification methods
Neural Networks
Fitting neural networks
Training neural networks issues
Decision trees
Regression trees
Classification trees
Trees Versus Linear Models
Bagging, Random Forests, Boosting
Bagging
Random Forests
Boosting
Support Vector Machines and Flexible disct
Maximal Margin classifier
Support vector classifiers
Support vector machines
2 and more classes SVM’s
Relationship to logistic regression
Principal Components Analysis
Clustering
K-means clustering
K-medoids clustering
Hierarchical clustering
Density based clustering
Model Assesment and Selection
Bias, Variance and Model complexity
In-sample prediction error
The Bayesian approach
Cross-validation
Bootstrap methods