I have two types of products, videos and sounds, and I want to understand which sounds are purchased often with which videos or sets of videos. Pretty much a market basket analysis but just one in which the lhs of the rules is only video products and the rhs only audio products.

I'm using R and the package arules doesn't seem to have a good option to do this, does anyone know how to do this with R or one of its libraries. Else I'll have to mine simpler rules which would be a shame.


Jorge Guzman
Sounds like you are trying to create a frequency or contigency matrix. Here is one approach

table(myproductsdataframe$videoproduct, myproductsdataframe$audioproduct)

The 'myproductsdataframe' should be dataframe that has a video product and an audio product purchased in the same row. 'videoproduct' and 'audioproduct' are the column names of the dataframe.


larrydag 1
Well, this would work for one to one maping but not of sets which where I think there would be real value to do this. For example, if you have two videos, ny and marvel comics, then it could suggest the spider man theme which could be a good fit even though not for either independently.

Thanks for the reply!

(12 Jul '10, 19:16) Jorchi

The standard way to do this AFAIK is association rules. There is an apriori library in R called arules here http://cran.r-project.org/web/packages/arules/index.html

Another option is a collaborative filtering based approach such as discussed in this paper http://cran.r-project.org/web/packages/recommenderlab/vignettes/recommenderlab.pdf where basic user/user and item/item cf are discussed


iamreddave 1
yeah, I tried with apriori and I couldn't find how to force the video -> audio relationship. Maybe there is a way by I just don't know?

(03 Aug '10, 13:47) Jorchi

Apologies. When you answer questions with a massive hangover this can happen. We are looking for a way to say the right hand side of the arules rule must be a audio item (say). So a set of rules something like rulesA <- subset(rules, subset = rhs %in% "audio") where "audio" is either a vector of the right products audio=c('mariah','madonna') or it is another variable you have set to a 1 boolean value whenever the product is audio. I will have a monkey with the R syntax and get back to you.

(03 Aug '10, 13:59) iamreddave
Asked: 09 Jul '10, 17:49

Last updated: 02 Aug '10, 09:23

