77范文网 - 专业文章范例文档资料分享平台

预测电信行业客户流失——基于一种SAS生存分析模式的应用程序

来源:网络收集 时间:2019-01-03 下载这篇文档 手机版
说明:文章内容仅供预览,部分内容可能不全,需要完整文档或者需要复制内容,请下载word后使用。下载word有问题请添加微信号:或QQ: 处理(尽可能给您提供完整文档),感谢您的支持与谅解。点击这里给我发消息

标题:Predicting Customer Churn in the Telecommunications Industry –– An Application of Survival Analysis Modeling Using SAS

原文:ABSTRACT

Conventional statistical methods (e.g. logistics regression, decision tree, and etc.) are very successful in predicting customer churn. However, these methods could hardly predict when customers will churn, or how long the customers will stay with. The goal of this study is to apply survival analysis techniques to predict customer churn by using data from a telecommunications company. This study will help telecommunications companies understand customer churn risk and customer churn hazard in a timing manner by predicting which customer will churn and when they will churn. The findings from this study are helpful for telecommunications companies to optimize their customer retention and/or treatment resources in their churn reduction efforts.

INTRODUCTION

In the telecommunication industry, customers are able to choose among multiple service providers and actively exercise their rights of switching from one service provider to another. In this fiercely competitive market, customers demand tailored products and better services at less prices, while service providers constantly focus on acquisitions as their business goals. Given the fact that the telecommunications industry experiences an average of 30-35 percent annual churn rate and it costs 5-10 times more to recruit a new customer than to retain an existing one, customer retention has now become even more important than customer acquisition. For many incumbent operators, retaining high profitable customers is the number one business pain. Many telecommunications companies deploy retention strategies in synchronizing programs and processes to keep customers longer by providing them with tailored products and services. With retention strategies in place, many companies start to include churn reduction as one of their business goals.

In order to support telecommunications companies manage churn reduction, not only do we need to predict which customers are at high risk of churn, but also we need to know how soon these high-risk customers will churn. Therefore the

telecommunications companies can optimize their marketing intervention resources to prevent as many customers as possible from churning. In other words, if the telecommunications companies know which customers are at high risk of churn and when they will churn, they are able to design customized customer communication and treatment programs in a timely efficient manner.

Conventional statistical methods (e.g. logistics regression, decision tree, and etc.) are very successful in predicting customer churn. These methods could hardly predict when customers will churn, or how long the customers will stay with. However, survival analysis was, at the very beginning, designed to handle survival data, and therefore is an efficient and powerful tool to predict customer churn.

OBJECTIVES

The objectives of this study are in two folds. The first objective is to estimate customer survival function and customer hazard function to gain knowledge of customer churn over the time of customer tenure. The second objective is to demonstrate how survival analysis techniques are used to identify the customers who are at high risk of churn and when they will churn.

DEFINITIONS AND EXCLUSIONS

This section clarifies some of the important concepts and exclusions used in this study.

Churn – In the telecommunications industry, the broad definition of churn is the action that a customer’s telecommunications service is canceled. This includes both service-provider initiated churn and customer initiated churn. An example of service-provider initiated churn is a customer’s account being closed because of payment default. Customer initiated churn is more complicated and the reasons behind vary. In this study, only customer initiated churn is considered and it is defined by a series of cancel reason codes. Examples of reason codes are: unacceptable call quality, more favorable competitor’s pricing plan, misinformation given by sales, customer expectation not met, billing problem, moving, change in business, and so on.

High-Value Customers – Only customers who have received at least three

monthly bills are considered in the study. High-value customers are these with monthly average revenue of $X or more for the last three months. If a customer’s first invoice covers less than 30 days of service, then the customer monthly revenue is prorated to a full month’s revenue.

Granularity – This study examines customer churn at the account level. Exclusions – This study does not distinguish international customers from domestic customers. However it is desirable to investigate international customer churn separately from domestic customer churn in the future.Also, this study does not include employee accounts, since churn for employee accounts is not of a problem or an interest for the company.

SURVIVAL ANALYSIS AND CUSTOMER CHURN

Survival analysis is a clan of statistical methods for studying the occurrence and timing of events. From the beginning, survival analysis was designed for longitudinal data on the occurrence of events. Keeping track of customer churn is a good example of survival data. Survival data have two common features that are difficult to handle with conventional statistical methods: censoring and time-dependent covariates.

Generally, survival function and hazard function are used to describe the status of customer survival during the tenure of observation. The survival function gives the probability of surviving beyond a certain time point t. However, the hazard function describes the risk of event (in this case, customer churn) in an interval time after time t, conditional on the customer already survived to time t. Therefore the hazard function is more intuitive to use in survival analysis because it attempts to quantify the instantaneous risk that customer churn will take place at time t given that the customer already survived to time t.

For survival analysis, the best observation plan is prospective. We begin observing a set of customers at some well-defined point of time (called the origin of time) and then follow them for some substantial period of time, recording the times at which customer churns occur. It’s not necessary that every customer experience churn (customers who are yet to experience churn are called censored cases, while those customers who already churned are called observed cases). Typically, not only do we

predict the timing of customer churn, we also want to analyze how time-dependent covariates (e.g. customers calls to service centers, customers change plan types, customers change billing options, and etc.) impact the occurrence and timing of customer churn.

SAS/STAT has two procedures for survival analysis: PROC LIFEREG and PROC PHREG. The LIFEREG procedure produces parametric regression models with censored survival data using maximum likelihood estimation. The PHREG procedure is a semi-parametric regression analysis using partial likelihood estimation. PROC PHREG has gained popularity over PROC LIFEREG in the last decade since it handles time dependent .However if the shapes of survival distribution and hazard function are known, PROC LIFEREG produces more efficient estimates (with smaller standard error) than PROC PHREG does.

SAMPLING STRATEGY

On August 16, 2000, a sample of 41,374 active high-value customers was randomly selected from the entire customer base from a telecommunications company. All these customer were followed for the next 15 months. Therefore August 16, 2000 is the origin of time and November 15, 2001 is the observation termination time. During this 15-month observation period, the timing of customer churn was recorded. For each customer in the sample, a variable of DUR is used to indicate the time that customer churn occurred, or for censored cases, the last time at which customers were observed, both measured from the origin of time (August 16, 2000). A second variable of STATUS is used to distinguish the censored cases from observed cases. It is common to have STATUS = 1 for observed cases and STATUS = 0 for censored cases. In this study, the survival data are singly right censored so that all the censored cases have a value of 15 (months) for the variable DUR.

DATA SOURCES

There are four major data sources for this study: block level marketing and financial information, customer level demographic data provided through a third party vendor, customer internal data, and customer contact records. A brief description of some of the data sources follows.

Demographic Data – Demographic dada is from a third party vendor. In this study, the following are examples of customer level demographic information:

- Primary household member’s age - Gender and marital status - Number of adults

- Primary household member’s occupation - Household estimated income and wealth ranking - Number of children and children’s age - Number of vehicles and vehicle value - Credit card - Frequent traveler - Responder to mail orders - Dwelling and length of residence

Customer Internal Data – Customer internal data is from the company’s data warehouse. It consists of two parts. The first part is about customer information like market channel, plan type, bill agency, customer segmentation code, ownership of the company’s other products, dispute, late fee charge, discount, promotion/save promotion, additional lines, toll free services, rewards redemption, billing dispute, and so on. The second part of customer internal data is customer’s telecommunications usage data. Examples of customer usage variables are:

- Weekly average call counts - Percentage change of minutes - Share of domestic/international revenue

Customer Contact Records – The Company’s Customer Information System (CIS) stores detailed records of customer contacts. This basically includes customer calls to service centers and the company’s mail contacts to customers. The customer contact records are then classified into customer contact categories. Among the customer contact categories are customer general inquiry, customer requests to change service, customer inquiry about cancel, and so on.

MODELING PROCESS

百度搜索“77cn”或“免费范文网”即可找到本站免费阅读全部范文。收藏本站方便下次阅读,免费范文网,提供经典小说综合文库预测电信行业客户流失——基于一种SAS生存分析模式的应用程序在线全文阅读。

预测电信行业客户流失——基于一种SAS生存分析模式的应用程序.doc 将本文的Word文档下载到电脑,方便复制、编辑、收藏和打印 下载失败或者文档不完整,请联系客服人员解决!
本文链接:https://www.77cn.com.cn/wenku/zonghe/402258.html(转载请注明文章来源)
Copyright © 2008-2022 免费范文网 版权所有
声明 :本网站尊重并保护知识产权,根据《信息网络传播权保护条例》,如果我们转载的作品侵犯了您的权利,请在一个月内通知我们,我们会及时删除。
客服QQ: 邮箱:tiandhx2@hotmail.com
苏ICP备16052595号-18
× 注册会员免费下载(下载后可以自由复制和排版)
注册会员下载
全站内容免费自由复制
注册会员下载
全站内容免费自由复制
注:下载文档有可能“只有目录或者内容不全”等情况,请下载之前注意辨别,如果您已付费且无法下载或内容有问题,请联系我们协助你处理。
微信: QQ: