为电信服务供应商的智能大数据信息业务培训
Breakdown of topics on daily basis: (Each session is 2 hours)
Day-1: Session -1: Business Overview of Why Big Data Business Intelligence in Telco.
Case Studies from T-Mobile, Verizon etc.
Big Data adaptation rate in North American Telco & and how they are aligning their future business model and operation around Big Data BI
Broad Scale Application Area
Network and Service management
Customer Churn Management
Data Integration & Dashboard visualization
Fraud management
Business Rule generation
Customer profiling
Localized Ad pushing
Day-1: Session-2 : Introduction of Big Data-1
Main characteristics of Big Data-volume, variety, velocity and veracity. MPP architecture for volume.
Data Warehouses – static schema, slowly evolving dataset
MPP Databases like Greenplum, Exadata, Teradata, Netezza, Vertica etc.
Hadoop Based Solutions – no conditions on structure of dataset.
Typical pattern : HDFS, MapReduce (crunch), retrieve from HDFS
Batch- suited for analytical/non-interactive
Volume : CEP streaming data
Typical choices – CEP products (e.g. Infostreams, Apama, MarkLogic etc)
Less production ready – Storm/S4
NoSQL Databases – (columnar and key-value): Best suited as analytical adjunct to data warehouse/database
Day-1 : Session -3 : Introduction to Big Data-2
NoSQL solutions
KV Store - Keyspace, Flare, SchemaFree, RAMCloud, Oracle NoSQL Database (OnDB)
KV Store - Dynamo, Voldemort, Dynomite, SubRecord, Mo8onDb, DovetailDB
KV Store (Hierarchical) - GT.m, Cache
KV Store (Ordered) - TokyoTyrant, Lightcloud, NMDB, Luxio, MemcacheDB, Actord
KV Cache - Memcached, Repcached, Coherence, Infinispan, EXtremeScale, JBossCache, Velocity, Terracoqua
Tuple Store - Gigaspaces, Coord, Apache River
Object Database - ZopeDB, DB40, Shoal
Document Store - CouchDB, Cloudant, Couchbase, MongoDB, Jackrabbit, XML-Databases, ThruDB, CloudKit, Prsevere, Riak-Basho, Scalaris
Wide Columnar Store - BigTable, HBase, Apache Cassandra, Hypertable, KAI, OpenNeptune, Qbase, KDI
Varieties of Data: Introduction to Data Cleaning issue in Big Data
RDBMS – static structure/schema, doesn’t promote agile, exploratory environment.
NoSQL – semi structured, enough structure to store data without exact schema before storing data
Data cleaning issues
Day-1 : Session-4 : Big Data Introduction-3 : Hadoop
When to select Hadoop?
STRUCTURED - Enterprise data warehouses/databases can store massive data (at a cost) but impose structure (not good for active exploration)
SEMI STRUCTURED data – tough to do with traditional solutions (DW/DB)
Warehousing data = HUGE effort and static even after implementation
For variety & volume of data, crunched on commodity hardware – HADOOP
Commodity H/W needed to create a Hadoop Cluster
Introduction to Map Reduce /HDFS
MapReduce – distribute computing over multiple servers
HDFS – make data available locally for the computing process (with redundancy)
Data – can be unstructured/schema-less (unlike RDBMS)
Developer responsibility to make sense of data
Programming MapReduce = working with Java (pros/cons), manually loading data into HDFS
Day-2: Session-1.1: Spark : In Memory distributed database
What is “In memory” processing?
Spark SQL
Spark SDK
Spark API
RDD
Spark Lib
Hanna
How to migrate an existing Hadoop system to Spark
Day-2 Session -1.2: Storm -Real time processing in Big Data
Streams
Sprouts
Bolts
Topologies
Day-2: Session-2: Big Data Management System
Moving parts, compute nodes start/fail :ZooKeeper - For configuration/coordination/naming services
Complex pipeline/workflow: Oozie – manage workflow, dependencies, daisy chain
Deploy, configure, cluster management, upgrade etc (sys admin) :Ambari
In Cloud : Whirr
Evolving Big Data platform tools for tracking
ETL layer application issues
Day-2: Session-3: Predictive analytics in Business Intelligence -1: Fundamental Techniques & Machine learning based BI :
Introduction to Machine learning
Learning classification techniques
Bayesian Prediction-preparing training file
Markov random field
Supervised and unsupervised learning
Feature extraction
Support Vector Machine
Neural Network
Reinforcement learning
Big Data large variable problem -Random forest (RF)
Representation learning
Deep learning
Big Data Automation problem – Multi-model ensemble RF
Automation through Soft10-M
LDA and topic modeling
Agile learning
Agent based learning- Example from Telco operation
Distributed learning –Example from Telco operation
Introduction to Open source Tools for predictive analytics : R, Rapidminer, Mahut
More scalable Analytic-Apache Hama, Spark and CMU Graph lab
Day-2: Session-4 Predictive analytics eco-system-2: Common predictive analytic problems in Telecom
Insight analytic
Visualization analytic
Structured predictive analytic
Unstructured predictive analytic
Customer profiling
Recommendation Engine
Pattern detection
Rule/Scenario discovery –failure, fraud, optimization
Root cause discovery
Sentiment analysis
CRM analytic
Network analytic
Text Analytics
Technology assisted review
Fraud analytic
Real Time Analytic
Day-3 : Sesion-1 : Network Operation analytic- root cause analysis of network failures, service interruption from meta data, IPDR and CRM:
CPU Usage
Memory Usage
QoS Queue Usage
Device Temperature
Interface Error
IoS versions
Routing Events
Latency variations
Syslog analytics
Packet Loss
Load simulation
Topology inference
Performance Threshold
Device Traps
IPDR ( IP detailed record) collection and processing
Use of IPDR data for Subscriber Bandwidth consumption, Network interface utilization, modem status and diagnostic
HFC information
Day-3: Session-2: Tools for Network service failure analysis:
Network Summary Dashboard: monitor overall network deployments and track your organization's key performance indicators
Peak Period Analysis Dashboard: understand the application and subscriber trends driving peak utilization, with location-specific granularity
Routing Efficiency Dashboard: control network costs and build business cases for capital projects with a complete understanding of interconnect and transit relationships
Real-Time Entertainment Dashboard: access metrics that matter, including video views, duration, and video quality of experience (QoE)
IPv6 Transition Dashboard: investigate the ongoing adoption of IPv6 on your network and gain insight into the applications and devices driving trends
Case-Study-1: The Alcatel-Lucent Big Network Analytics (BNA) Data Miner
Multi-dimensional mobile intelligence (m.IQ6)
Day-3 : Session 3: Big Data BI for Marketing/Sales –Understanding sales/marketing from Sales data: ( All of them will be shown with a live predictive analytic demo )
To identify highest velocity clients
To identify clients for a given products
To identify right set of products for a client ( Recommendation Engine)
Market segmentation technique
Cross-Sale and upsale technique
Client segmentation technique
Sales revenue forecasting technique
Day-3: Session 4: BI needed for Telco CFO office:
Overview of Business Analytics works needed in a CFO office
Risk analysis on new investment
Revenue, profit forecasting
New client acquisition forecasting
Loss forecasting
Fraud analytic on finances ( details next session )
Day-4 : Session-1: Fraud prevention BI from Big Data in Telco-Fraud analytic:
Bandwidth leakage / Bandwidth fraud
Vendor fraud/over charging for projects
Customer refund/claims frauds
Travel reimbursement frauds
Day-4 : Session-2: From Churning Prediction to Churn Prevention:
3 Types of Churn : Active/Deliberate , Rotational/Incidental, Passive Involuntary
3 classification of churned customers: Total, Hidden, Partial
Understanding CRM variables for churn
Customer behavior data collection
Customer perception data collection
Customer demographics data collection
Cleaning CRM Data
Unstructured CRM data ( customer call, tickets, emails) and their conversion to structured data for Churn analysis
Social Media CRM-new way to extract customer satisfaction index
Case Study-1 : T-Mobile USA: Churn Reduction by 50%
Day-4 : Session-3: How to use predictive analysis for root cause analysis of customer dis-satisfaction :
Case Study -1 : Linking dissatisfaction to issues – Accounting, Engineering failures like service interruption, poor bandwidth service
Case Study-2: Big Data QA dashboard to track customer satisfaction index from various parameters such as call escalations, criticality of issues, pending service interruption events etc.
Day-4: Session-4: Big Data Dashboard for quick accessibility of diverse data and display :
Integration of existing application platform with Big Data Dashboard
Big Data management
Case Study of Big Data Dashboard: Tableau and Pentaho
Use Big Data app to push location based Advertisement
Tracking system and management
Day-5 : Session-1: How to justify Big Data BI implementation within an organization:
Defining ROI for Big Data implementation
Case studies for saving Analyst Time for collection and preparation of Data –increase in productivity gain
Case studies of revenue gain from customer churn
Revenue gain from location based and other targeted Ad
An integrated spreadsheet approach to calculate approx. expense vs. Revenue gain/savings from Big Data implementation.
Day-5 : Session-2: Step by Step procedure to replace legacy data system to Big Data System:
Understanding practical Big Data Migration Roadmap
What are the important information needed before architecting a Big Data implementation
What are the different ways of calculating volume, velocity, variety and veracity of data
How to estimate data growth
Case studies in 2 Telco
Day-5: Session 3 & 4: Review of Big Data Vendors and review of their products. Q/A session:
AccentureAlcatel-Lucent
Amazon –A9
APTEAN (Formerly CDC Software)
Cisco Systems
Cloudera
Dell
EMC
GoodData Corporation
Guavus
Hitachi Data Systems
Hortonworks
Huawei
HP
IBM
Informatica
Intel
Jaspersoft
Microsoft
MongoDB (Formerly 10Gen)
MU Sigma
Netapp
Opera Solutions
Oracle
Pentaho
Platfora
Qliktech
Quantum
Rackspace
Revolution Analytics
Salesforce
SAP
SAS Institute
Sisense
Software AG/Terracotta
Soft10 Automation
Splunk
Sqrrl
Supermicro
Tableau Software
Teradata
Think Big Analytics
Tidemark Systems
VMware (Part of EMC)