Tag Recommender

Author: Steffen Rendle, Social Network Analysis, University of Konstanz

This software implements the Pairwise Interaction Tensor Factorization (PITF) [WSDM 2010] model with BPR optimization for tag recommendation. It also contains a Factorization Machine implementation which can mimic PITF (see [ICDM 2010] for details).

Download

Download source code (C++) tagrec-1.10.src.tar.gz (2011-07-14)
Extract the sources and run Make to generate the executable. After compiling, the executable is in the ./bin/ folder.
This software is free of charge for academic purposes. Please contact the author if you want to use this software for commercial purposes. You are not allowed to redistribute this software or its source code. Please acknowledge the software if you publish results produced with this software. The complete license is included in the archive -- please see the file license.txt for details.

How to use

The tool tagrec has the following parameters:

-dim            dim of factorization; default=64
-help           this screen
-init_stdev     stdev for initialization of 2-way factors; default=0.01
-iter           number of iterations for SGD; default=100
-learn_rate     learn_rate for SGD; default=0.1
-method         method: 'pitf' or 'fm' or 'fmgeneric' [MANDATORY]
-num_out        how many tags per post should be written; default=10
-num_sample     number of the pair samples drawn for each training
                tuple, default 100
-out            filename for output; default=''
-regular        regularization; default=0.0
-test           filename for test data [MANDATORY]
-train          filename for training data [MANDATORY]

Example

Run the PITF [WSDM 2010] approach with 64 factors:

./tagrec --train data.train --test data.test --dim 64 --learn_rate 0.01 --iter 50 --regular 0 --init_stdev 0.01 --method pitf

Use a Factorization Machine [ICDM 2010] to mimic PITF [WSDM 2010]:

./tagrec --train data.train --test data.test --dim 64 --learn_rate 0.01 --iter 50 --regular 0 --init_stdev 0.01 --method fm

Output

Here is an example for the output. First some statistics about the dataset are shown. You can use these statistics to check if your file is in the right format.

Loading train...	read data file data.train...
number of users             116
number of items             361
number of tags              412
number of posts             2406
number of distinct triples  9707
Loading test... 	read test file data.test...
number of test users        116
number of test items        92
number of test tags         186
number of test posts        116
number of d. test  triples  378

Then the optimization is started and for each iteration, the test quality in terms of HLU, AUC and F-measure/Precision/Recall at 1 to 10 is measured. The time states the learning runtime for a whole iteration.

Method: PITF (BPR)
Training BPR (Case-Update): num_iter=50 neg_samples=100
Time: 7.25645 / HLU/AUC/FPR1..10: 0.512506/0.908694/0.267197/0.439655/0.191917/0.339247/0.392241/0.298868/...
Time: 7.25645 / HLU/AUC/FPR1..10: 0.598639/0.931949/0.364155/0.594828/0.262397/0.426626/0.482759/0.382188/...
...

File format

Both training and test files have the same format. Each file contains one user-item-tag triple per row. The user (item/tag) should be represented by non-negative numerical IDs (e.g. 0,1,2,...).

Hints

Note that memory is allocated up to the largest ID, so to save memory consumption you should start your IDs with 0 and avoid large gaps of unused IDs.
It is assumed that the train and test set do not overlap in posts (=user/item tuple). That means there should be no triples for a user-item combination (u,i) in the train set if there exists any triple (u,i,t*) in the test set.
You can use tabs or spaces to separate IDs.

Example

This file reads the following: The user with ID 0 has given the tags 0, 1 and 2 to the item with ID 0. And to the item with ID 1 (s)he has given the tag 1 and 3.

References

The PITF method has been introduced in [WSDM 2010]. The Factorization machine that mimics PITF is described in [ICDM 2010]. Please cite the corresponding paper(s) if you use the tagrec software.

[WSDM 2010]	Steffen Rendle, Lars Schmidt-Thieme (2010): Pairwise Interaction Tensor Factorization for Personalized Tag Recommendation, in Proceedings of the Third ACM International Conference on Web Search and Data Mining (WSDM 2010), ACM.		BibTeX	PDF
[ICDM 2010]	Steffen Rendle (2010): Factorization Machines, in Proceedings of the 10th IEEE International Conference on Data Mining (ICDM 2010), Sydney, Australia.	Supplementary Material	BibTeX	PDF