HIERTAGS - Flickr large dataset

Flickr dataset:

We provide a weighted co-occurrence network between Flickr tags together with the corresponding tag frequencies. This network is resulting from the tags (free words) co-appearing on photos in rather long list of queries from Flickr. In the preparation of the network we kept only English nouns and took into account co-occurrences only if they were present on photos from at least 10 different users.

Reference

G. Tibély et al: Extracting tag hierarchies PLoS ONE 8(12): e84133 (2013).

Files:

File name	Description	Format	Size
Flickr_co-occurrence_net.zip	Co-occurrence network of tags on Flickr photos	compressed plain text file 1st. and 2nd. columns: co-appearing tag ids 3d. column: number of co-occurrences.	5.5Mb
Flickr_tag_frequencies.zip	Frequency of Flickr tags	compressed plain text file 1st. column: tag id, 2nd column: number of photos.	0.1Mb
Flickr_tag_names.zip	Tag names	compressed plain text file 1st. column: tag id, 2nd column: name.	1.3Mb

Flickr_dataset.zip	All files in the Flickr dataset as a zip archive	6.9Mb

Note:

Each file header contains instructions for processing the data with the Hierarchy Extracting Algorithms

Note2:

We provide a smaller version of the dataset, where you may get results faster

Contact

hiertags@hal.elte.hu