Charity Target Selection

A Machine Learning Approach

I worked on this case with another student during my studies, and decided to post it here because it is one of the assignments I enjoyed most and where I learned a lot.

Case Outline

This case study focused on developing a predictive model for a health sector charity, aiming to identify the most promising donors using minimal information. An important source of income of many Dutch charities is direct mailing. However, mailing everyone is inefficient. So, an optimal selection is needed, called target selection.

The key lies in the strategic use of existing databases, which track past donations but often lack detailed household characteristics. By leveraging zip code data and RFM (Recency, Frequency, Monetary Value) variables charities can predict which households are more likely to respond positively to new mailings. We were given the task of developing a predictive model, and then using these predictions to select a target group that should at least perform better than the charity’s initial selection.

Data

In this study, a database of a Dutch charity covering the period from 1995 up to 2000 was used. The data contains 5000 households and spans over 19 mailings, one in each quarter. The part of the data that was used to train our models is from mailings 7 to 18 and covers 1997 up to 2000, as the first 6 were used to obtain RFM variables. The last mailing is only used to analyze the model’s performance, it is the test set. Besides the RFM variables, certain household-specific variables were included in the dataset based on their zip code data.

Because this is a theoretical assignment in an academic environment, all data was cleaned. We constructed the lifetimevalue variable from the other variables. And for another part of the prediction, we did slightly alter the dataset, but this will be explained in the approach part.

Approach

We decided to use a random forest to predict the donation amount. The motivation was that it can handle non-linear relationships well and that it is robust to outliers. Especially outliers were a minor issue in the data.

However, we also used a ‘nested’ random forest to predict the binary variable of donating or not in mailing 19. We argued that there were two processes in place, responding to the mailing, and then donating or not. This estimated variable would be included in the random forest to predict the donation amount.

For the binary classification random forest, we used the RFM variables, together with lifetime value of the households. The dataset was separated such that only households that received mailings were selected.

Lastly, to get the target group we selected the 80th percentile of households based on the predicted donation. This target group was then compared to the actual donation of those respective households to evaluate the model’s performance.

Results

The initial metrics we used were accuracy and mean squared error. However, most important is how the model improves the selection of the charity. The target selection of our models outperformed the original target selection by 37.8 percent, which I would argue is a worthwhile amount.

Overall, this was a very fun case. It was the first time we did a complete case like this in our studies, where we had to come up with something from scratch and were allowed to use practically anything as long as it was a sound model. It was a good learning experience and I was happy to get an 8,8/10 for the work.

For more details, feel free to reach out to me directly on LinkedIn.

Connect on LinkedIn