IMDb dataset:
A filtered data set listing most frequent keywords associated with movies in the IMDb. We kept keywords appearing at least on 100 different movies in the original data , downloaded from here
Reference
G. Tibély et al: Extracting tag hierarchies PLoS ONE 8(12): e84133 (2013).
Files:
File name Description Format Size
List_of_movies.zip List of movies, where each row corresponds to a movie plain text file
1st. column: movie title id
rest of the columns: keyword ids.
9.9Mb
Movie_keyword_names.zip Tag id names compressed plain text file
1st. column: tag id
2nd columns: name.
1.2Mb
imdb_dataset.zip All files in the IMDb dataset as a zip archive 6.3Mb
Note:
Each file header contains instructions for processing the data with the Hierarchy Extracting Algorithms


Copyrigth notice: Information courtesy of
IMDb
( www.imdb.com ).
For further information contact: licensing@imdb.com

Contact
hiertags@hal.elte.hu