My first thought was to wonder how a LSTM would do. Once might think it would be a better representation for music? There's some models which use convolutional layers along with a LSTM for video representation (eg [1]) and it would be interesting to see if convolutions are useful for capturing similar themes of music.
I wonder if one could build a music embedding (word2vec style) and use similarities in the embedding space as recommendations? The obvious objective function would be skip-gram, but there might be more interesting objectives there too.
I could be totally off on this, but his encoding is an image and LSTM is for time series, which would require a different representation.
I completely agree LSTM would be useful as it would by default require a different representation. I think most commenters agree this representation is overly simplistic. Amazed it works as well as it does!
My first thought was to wonder how a LSTM would do. Once might think it would be a better representation for music? There's some models which use convolutional layers along with a LSTM for video representation (eg [1]) and it would be interesting to see if convolutions are useful for capturing similar themes of music.
I wonder if one could build a music embedding (word2vec style) and use similarities in the embedding space as recommendations? The obvious objective function would be skip-gram, but there might be more interesting objectives there too.
[1] https://github.com/loliverhennigh/Convolutional-LSTM-in-Tens...