Skip to end of metadata
Go to start of metadata

You are viewing an old version of this page. View the current version.

Compare with Current View Page History

« Previous Version 4 Next »


This is a UCI data set from 2014 about clients' spending at a wholesale distrubutor. Every row represents how many monetary units within a category of item the given client bought in a year. 440 clients' spendings on each category were recorded. There are 8 attributes: channel (e.g. the client type: "1" for hotel/restaurant and "2" for retail such as a supermarket), region (Lisbon, Oporto, or other, numbered 1-3), fresh products, milk products, grocery products, frozen products, detergents/paper products, and delicatessen (deli products like cold cuts). The numbers under the product types represent monetary units (m.u.), which is a substitute for measuring in regular currency. Wikipedia defines it as "the change in the utility from an increase in the consumption of that good or service". I will be working with this data by observing relationships between certain features, performing clustering, and doing a PCA analysis.

Problem Statement

As I mentioned before, this data set is from the UCI site, and it was donated in 2014. Based on the region names, this seems to be data collected in Portugal. The main questions I seek to answer are, in general terms, "Is there a relationship between the kind of client and the type of goods of which the most were purchased?", "Is there a relationship between certain product types, whether it be positive or negative correlation?", and "Can we cluster the data into groups based on similar number of products purchased in certain categories?"


describe the visualization and analysis tools/methods you used

Linear regression


Maybe naive bayes and knn if I'm feeling masochistic


show the visualizations and analysis results

Blue is the channel 1, or client type hotel/restaurant, and yellow is for retail. It seems that retail as a whole buys less while some hotel/restaurant clients buy a lot.

In the next picture, we can see that there is a higher rate of fresh foods being bought by clients than of frozen.

In the next picture, I've plotted the frozen foods vs the deli meats vs the toiletries-type products. The yellow, which is retail, seems to buy a lot more toiletries than restaurants or hotels...which makes sense because the customers are not at hotels or restaurants to get those kinds of things. Meanwhile, the restaurants/hotels, as mentioned before, seem to buy more frozen foods. Perhaps this is because customers are leaning toward fresher foods these days while at restaurants/hotels, they cannot see how the food was before eating it? And then the deli meats seem to be bought about equally, except for some outliers in restaurant/hotels that may perhaps be actual delis or sub shops and therefore need a lot.


highlights the important results and concludes the writeup


Thank you to my high school buddy John for explaining some economics things

  • No labels