Thursday, 15 January 2015

machine learning - Apache Spark ALS collaborative filtering results. They don't make sense -



machine learning - Apache Spark ALS collaborative filtering results. They don't make sense -

i wanted seek out spark collaborative filtering using mllib explained in tutorial: https://databricks-training.s3.amazonaws.com/movie-recommendation-with-mllib.html algorithm based on paper "collaborative filtering implicit feedback datasets", doing matrix factorization.

everything , running using 10 1000000 movielens info set. info set split 80% training 10% test , 10% validation.

rmse baseline: 1.060505464225402 rmse (train) = 0.7697248827452756 rmse (validation) = 0.8057135933012889 model trained rank = 24, lambda = 0.1, , iterations = 10. the best model improves baseline 23.94%.

which values similar tutorial, although different training parameters.

i tried running algorithm several times , got recommendations don't create sense me. rating kids movies next results:

for ratings:

personal rating: toy story (1995) rating: 4.0 personal rating: jungle book, (1994) rating: 5.0 personal rating: lion king, (1994) rating: 5.0 personal rating: mary poppins (1964) rating: 4.0 personal rating: alice in wonderland (1951) rating: 5.0

results:

movies recommended you:

life of oharu, (saikaku ichidai onna) (1952) more (1998) who's singin' on there? (a.k.a. sings on there) (ko tamo peva) (1980) sundays , cybele (dimanches de ville d'avray, les) (1962) blue light, (das blaue licht) (1932) times of harvey milk, (1984) please vote me (2007) man planted trees, (homme qui plantait des arbres, l') (1987) shawshank redemption, (1994) only yesterday (omohide poro poro) (1991)

which except yesterday doesn't seem create sense.

if there out there knows how interpret results or improve ones appreciate sharing knowledge.

best regards

edit:

as suggested trained model more factors:

baseline error: 1.0587417035872992 rmse (train) = 0.7679883378412548 rmse (validation) = 0.8070339258049574 model trained rank = 100, lambda = 0.1, , numiter = 10.

and different personal ratings:

personal rating: star wars: episode vi - homecoming of jedi (1983) rating: 5.0 personal rating: mission: impossible (1996) rating: 4.0 personal rating: die hard: vengeance (1995) rating: 4.0 personal rating: batman forever (1995) rating: 5.0 personal rating: men in black (1997) rating: 4.0 personal rating: terminator 2: judgment day (1991) rating: 4.0 personal rating: top gun (1986) rating: 4.0 personal rating: star wars: episode v - empire strikes (1980) rating: 3.0 personal rating: alien (1979) rating: 4.0

the recommended movies are:

movies recommended you:

carmen (1983) silent lite (stellet licht) (2007) jesus (1979) life of oharu, (saikaku ichidai onna) (1952) heart of america (2003) for bible tells me (2007) more (1998) legend of leigh bowery, (2002) funeral, (ososhiki) (1984) longshots, (2008)

not 1 useful result.

edit2: using implicit feedback method, much improve results! same action movies above recommendations are:

movies recommended you:

star wars: episode iv - new hope (a.k.a. star wars) (1977) terminator, (1984) raiders of lost ark (indiana jones , raiders of lost ark) (1981) die hard (1988) godfather, (1972) aliens (1986) rock, (1996) independence day (a.k.a. id4) (1996) star trek ii: wrath of khan (1982) goldeneye (1995)

that's more expected! question why explicit version so-so-so bad

note code running not utilize implicit feedback, , not quite algorithm refer to. create sure not using als.trainimplicit. may need different, lambda , rank. rmse of 0.88 "ok" info set; not clear example's values optimal or 1 toy test produced. utilize different value still here. maybe it's not optimal yet.

it stuff bugs in als implementation fixed since. seek comparing implementation of als if can.

i seek resist rationalizing recommendations since our brains inevitably find explanation random recommendations. but, hey, can did not action, horror, offense drama, thrillers here. find kids movies go hand in hand taste arty movies, since, kind of person filled out tastes movielens way when , rated kids movies not kids, parents, , maybe software engineer types old plenty have kids tend watch these sorts of foreign films see.

machine-learning apache-spark collaborative-filtering matrix-factorization

No comments:

Post a Comment