# Train a classification model to estimate customer churn

<div align="left"><figure><img src="https://3015558743-files.gitbook.io/~/files/v0/b/gitbook-x-prod.appspot.com/o/spaces%2FFEElAdsRIl9DzfMhbRlB%2Fuploads%2FUx7fNjcfw9KvGNf1JTaW%2Fadvanced%20banner.png?alt=media&#x26;token=ea2ec56e-2c6a-4c54-bae4-561b2fa33b7b" alt="" width="175"><figcaption></figcaption></figure></div>

In this tutorial, we’ll dive into telecom customer churn data to uncover the key reasons behind customer departures and develop targeted strategies to boost retention and satisfaction. Specifically, we will learn how to predict customer churn for a telecom provider offering telephony and internet services using [CARTO Workflows](https://carto.com/workflows). You can access the full template [here](https://academy.carto.com/creating-workflows/workflow-templates/bigquery-ml#create-a-classification-model).

### Telco Customer Churn Dataset

For this use case, we will be using [IBM’s Telco Customer Churn Dataset](https://doi.org/10.34740/KAGGLE/DSV/8360350), which contains information about a fictional telco company that provides home phone and Internet services to 7043 customers in California. This dataset provides essential insights into each customer's profile, covering everything from subscribed services and tenure to socio-demographic information and sentiment data.&#x20;

{% embed url="<https://clausa.app.carto.com/map/17b76836-bcbe-486e-8a04-908fd9346bc0>" %}

Before stating, let’s take a look at the data. From the [map](https://clausa.app.carto.com/map/17b76836-bcbe-486e-8a04-908fd9346bc0) widget’s section, we can see that 26,54% of customers churned this quarter, resulting in a $3,68M revenue loss. Regions like Los Angeles and San Diego are characterized by having both a large number of customers and a higher number of lost customers, positioning them as high-priority areas for improving customer retention.<br>

### Installing the BigQuery ML Extension Package

For this tutorial, we will be using CARTO's [BigQuery ML Extension Package](https://docs.carto.com/carto-user-manual/workflows/components/bigquery-ml), a powerful tool that allows users to exploit [BigQuery’s ML capabilities](https://cloud.google.com/bigquery/docs/bqml-introduction) directly from Workflows, enabling seamless integration of machine learning models into automated pipelines.&#x20;

To install the Extension Package from the Workflows gallery, follow the next steps:

1. Log into the [CARTO Workspace](https://app.carto.com/), then head to **Workflows** and **Create a new workflow**; use the **`CARTO Data Warehouse`** connection.
2. Go to the **Components** tab, on the left-side menu, then click on **Manage Extension Packages**.
3. In the  **Explore** tab, you will see a set of [Extension Packages](https://carto.com/blog/new-carto-workflows-extensions) that CARTO has developed. Click on the **BigQuery ML for Workflows** box, then on **Install extension.**

<figure><img src="https://3015558743-files.gitbook.io/~/files/v0/b/gitbook-x-prod.appspot.com/o/spaces%2FFEElAdsRIl9DzfMhbRlB%2Fuploads%2F7FKsfNREGZPCOqvY2jEI%2FTelco%20BQ%20ML%20-%20Install%20from%20UI.png?alt=media&#x26;token=c43731be-98a1-4788-8c89-3f0c2a2f651c" alt=""><figcaption><p>A selection of extension packages developed by CARTO.</p></figcaption></figure>

4. You have successfully installed the Extension Package! Now you can click on it to navigate through the components. You can also go to the **Components** section and see the components from there, ready to be drag-and-droped into the canvas.

<figure><img src="https://3015558743-files.gitbook.io/~/files/v0/b/gitbook-x-prod.appspot.com/o/spaces%2FFEElAdsRIl9DzfMhbRlB%2Fuploads%2FmmpFSisOkS11YvoHMVnT%2FScreenshot%202025-01-28%20at%2012.07.03.png?alt=media&#x26;token=3f2225ce-3c43-46bb-a30a-1cfaf23d3e02" alt=""><figcaption><p>Installation of the BigQuery ML for Workflows Extension Package.</p></figcaption></figure>

{% hint style="info" %}
Alternatively, one can manually install the extension following the next steps:

1. Go to [BigQuery ML Extension Package](https://docs.carto.com/carto-user-manual/workflows/components/bigquery-ml) documentation.
2. Download the `.zip` file by clicking on [**Download the BigQuery ML extension package**](https://storage.googleapis.com/carto-extension-packages/bigquery_ml_extension-v1.0.1.zip)**.**
3. Log into the [CARTO Workspace](https://app.carto.com/), then head to **Workflows** and **Create a new workflow**; use the **CARTO Data Warehouse** connection.
4. Go to **Components** and select **Manage Extension Packages > Upload > Upload extension** and upload the `.zip` file.
5. Click on **Install Extension.**

This type of installation is required for custom extensions and for Self-hosted users not having access to the Workflows gallery from their environment.
{% endhint %}

Please refer to the documentation for more details about [managing Extension Packages](https://docs.carto.com/carto-user-manual/workflows/extension-packages).

### Learning How Telecom Providers Can Leverage BigQuery ML to Predict Customer Churn using Workflows

Now, let's add components to our Workflow to predict customer churn. We will load the telco dataset, from which we’ve pre-selected some interesting features (e.g. those correlated with churn), and we will train a classification model to estimate which customers are prone to churn and which are not.&#x20;

1. Drag the [Get Table by Name](https://docs.carto.com/carto-user-manual/workflows/components/input-output#get-table-by-name) component to the canvas and import the `cartobq.docs.telco_churn_ca_template` dataset. This data is publicly available in BigQuery (remember that we are using a connection to the [CARTO DW](https://docs.carto.com/carto-user-manual/connections/carto-data-warehouse), a fully-managed, default Google BigQuery project for the organization).

<figure><img src="https://3015558743-files.gitbook.io/~/files/v0/b/gitbook-x-prod.appspot.com/o/spaces%2FFEElAdsRIl9DzfMhbRlB%2Fuploads%2FOC15yaWUjsp0XJgoJUWj%2FScreenshot%202025-01-28%20at%2012.53.00.png?alt=media&#x26;token=5ac9ae40-06f0-4652-ab66-dce1396b3e36" alt=""><figcaption><p>Import telco's data source into Workflows.</p></figcaption></figure>

2. Use the [Where](https://docs.carto.com/carto-user-manual/workflows/components/data-preparation#where) component to select only those rows for which the `churn_label` is available (`churn_label IS NOT NULL`). This will be the data we will split for training (70%) and evaluating (30%) our model through random sampling (`RAND() < 0.7`) using another [Where](https://docs.carto.com/carto-user-manual/workflows/components/data-preparation#where) component. Once our model is ready, we will predict the `churn_label` for those customers which  we do not know whether they will churn or not.&#x20;

<figure><img src="https://3015558743-files.gitbook.io/~/files/v0/b/gitbook-x-prod.appspot.com/o/spaces%2FFEElAdsRIl9DzfMhbRlB%2Fuploads%2FdSTHkAW5kPmhMJVri0gv%2FTelco%20BQ%20ML%20-%20Where.png?alt=media&#x26;token=70b595bf-7395-4adb-b1e3-7916d35f8a6c" alt=""><figcaption><p>Select data for training and prediction. Then, split the data into train and test sets.</p></figcaption></figure>

Now, we will use the training data to create a classification model, whose output will be the probability of churn (i.e. 0 means no churn, 1 means churn) for a customer given specific socio-demographic, contract type and sentiment characteristics.&#x20;

3. &#x20;Use the [Drop Columns](https://docs.carto.com/carto-user-manual/workflows/components/data-preparation#drop-columns) component to remove unnecessary columns that won't be used for training: `geom` (`GEOMETRY` type columns are not valid).
4. Connect the [Create Classification Model](https://docs.carto.com/carto-user-manual/workflows/components/bigquery-ml#create-classification-model) component to the input data and set up the model’s parameters: we will train a [Logistic Regression](https://en.wikipedia.org/wiki/Logistic_regression) model and we will not further split the data (we have done so in step 2).
   * Note: You will need to give the model a Fully Qualified Name (FQN), which is where the model will be stored. In this way, it would also be possible to call the model from a different workflow using the [Get Model by Name](https://docs.carto.com/carto-user-manual/workflows/components/bigquery-ml#get-model-by-name) component. To find the FQN of your CARTO DW, go to the SQL tab in the lower menu and copy the project name as seen in the image below. Your FQN should look something like: `carto-dw-ac-<id>.shared.telco_churn_ca_predicted`.

<figure><img src="https://3015558743-files.gitbook.io/~/files/v0/b/gitbook-x-prod.appspot.com/o/spaces%2FFEElAdsRIl9DzfMhbRlB%2Fuploads%2F2W2LoVvezUpdb2DXraUz%2FTelco%20BQ%20ML%20-%20FQN.png?alt=media&#x26;token=4c98a4f0-f3db-4363-a814-c6e1d9b355e8" alt=""><figcaption><p>Get your CARTO DW's project name to build the FQN of your model.</p></figcaption></figure>

<figure><img src="https://3015558743-files.gitbook.io/~/files/v0/b/gitbook-x-prod.appspot.com/o/spaces%2FFEElAdsRIl9DzfMhbRlB%2Fuploads%2FgXRvcjkaq2SAD511lSbi%2FScreenshot%202025-02-03%20at%2011.12.53.png?alt=media&#x26;token=3e43929b-c9c8-469b-953c-c3064cccb56d" alt=""><figcaption><p>Create a classification model.</p></figcaption></figure>

5. Next, we will [Evaluate](https://docs.carto.com/carto-user-manual/workflows/components/bigquery-ml#evaluate) the performance of our model using the test data.

<figure><img src="https://3015558743-files.gitbook.io/~/files/v0/b/gitbook-x-prod.appspot.com/o/spaces%2FFEElAdsRIl9DzfMhbRlB%2Fuploads%2FiDi74fdQQad4g35u9HG9%2FTelco%20BQ%20ML%20-%20Evaluate.png?alt=media&#x26;token=2b7039cf-843a-475f-bc86-3ca2e8141338" alt=""><figcaption><p>Model evaluation.</p></figcaption></figure>

Based on the [classification metrics](https://developers.google.com/machine-learning/crash-course/classification/accuracy-precision-recall), the results seem very promising. The high accuracy indicates that the model correctly predicts the majority of instances, and the low log loss suggests that our model's probability estimates are close to the actual values. With precision and recall both performing well, we can be confident that the model is making correct positive predictions, and the F1 score further reassures us that the balance between precision and recall is optimal. Additionally, the ROC AUC score shows that our model has a strong ability to distinguish between clients churning and not churning. Overall, these metrics highlight that our model is well-tuned and capable of handling the classification task effectively.

6. Having a model that performs good, we can then run predictions and obtain estimates to check which customers are prone to churn. To do so, connect the [Create Classification Model](https://docs.carto.com/carto-user-manual/workflows/components/bigquery-ml#create-classification-model) component and the data with no `churn_label` to the [Predict](https://docs.carto.com/carto-user-manual/workflows/components/bigquery-ml#predict) component.

<figure><img src="https://3015558743-files.gitbook.io/~/files/v0/b/gitbook-x-prod.appspot.com/o/spaces%2FFEElAdsRIl9DzfMhbRlB%2Fuploads%2FddfSzKAj6Lj8srooN3uI%2FTelco%20BQ%20ML%20-%20Predict.png?alt=media&#x26;token=bc237e08-8fc8-4e7e-8e00-33e09e3dc6dc" alt=""><figcaption><p>Run predictions.</p></figcaption></figure>

As we can see, two new columns appear on our data:

* `predicted_churn_label_probs`: indicates the probability that a customer will churn.
* `predicted_churn_label`: indicates whether the customer will or won't potentially churn based on the probability of churning using a threshold of 0,5.

Lastly, to better understand our model, we can take a look at the model’s explainability. This gives an estimate of each feature’s importance when it comes to churn.&#x20;

7. Connect the [Create Classification Model](https://docs.carto.com/carto-user-manual/workflows/components/bigquery-ml#create-classification-model) component to the [Global Explain](https://docs.carto.com/carto-user-manual/workflows/components/bigquery-ml#global-explain) component. The latter provides the feature importance of the model predictors to each class (churn vs no churn). If the Class level explain option is not clicked, the overall feature importances are given, rather than per class.&#x20;

<figure><img src="https://3015558743-files.gitbook.io/~/files/v0/b/gitbook-x-prod.appspot.com/o/spaces%2FFEElAdsRIl9DzfMhbRlB%2Fuploads%2FzHuwBPw0qi1dAMphoaGI%2FTelco%20BQ%20ML%20-%20Global%20Explain.png?alt=media&#x26;token=8e85e50d-93b8-4d58-832b-e2623db59880" alt=""><figcaption><p>Get feature importances of the model predictors.</p></figcaption></figure>

8. For further details, we can also use the [Explain Predict](https://docs.carto.com/carto-user-manual/workflows/components/bigquery-ml#explain-predict) component, that provides feature attributions that indicate how much each feature in your model contributed to the final prediction for each given customer. You can select how many features you want to use to retrieve their attributions.

<figure><img src="https://3015558743-files.gitbook.io/~/files/v0/b/gitbook-x-prod.appspot.com/o/spaces%2FFEElAdsRIl9DzfMhbRlB%2Fuploads%2F9Hbr1gr0KAH46BqVJuj1%2FTelco%20BQ%20ML%20-%20Explain%20Predict.png?alt=media&#x26;token=ccdc6443-a689-4df2-82c9-bbbcf24a2240" alt=""><figcaption><p>Get festure importances for each prediction.</p></figcaption></figure>

From the results for the overall feature importances, we can see that the most important features when it comes to estimating churn are the customer’s overall satisfaction rating of the company (`satisfaction_score`), the customer’s current contract type (`contract`), the number of referrals the customer has made (`number_of_referrals`), and whether or not the customer has subscribed to an additional online security service (`online_security`).

We can visualize the results in the following [map](https://clausa.app.carto.com/map/3f862593-5a94-402b-a610-5c807ba2c065), where we can see which customers are prone to churn, and with which probability this will happen.

{% embed url="<https://clausa.app.carto.com/map/3f862593-5a94-402b-a610-5c807ba2c065>" %}

<br>
