Tag Recommender

Author: Steffen Rendle, Social Network Analysis, University of Konstanz

This software implements the Pairwise Interaction Tensor Factorization (PITF) [WSDM 2010] model with BPR optimization for tag recommendation. It also contains a Factorization Machine implementation which can mimic PITF (see [ICDM 2010] for details).

Download

How to use

The tool tagrec has the following parameters:
-dim            dim of factorization; default=64
-help           this screen
-init_stdev     stdev for initialization of 2-way factors; default=0.01
-iter           number of iterations for SGD; default=100
-learn_rate     learn_rate for SGD; default=0.1
-method         method: 'pitf' or 'fm' or 'fmgeneric' [MANDATORY]
-num_out        how many tags per post should be written; default=10
-num_sample     number of the pair samples drawn for each training
                tuple, default 100
-out            filename for output; default=''
-regular        regularization; default=0.0
-test           filename for test data [MANDATORY]
-train          filename for training data [MANDATORY]

Example

Run the PITF [WSDM 2010] approach with 64 factors:
./tagrec --train data.train --test data.test --dim 64 --learn_rate 0.01 --iter 50 --regular 0 --init_stdev 0.01 --method pitf
Use a Factorization Machine [ICDM 2010] to mimic PITF [WSDM 2010]:
./tagrec --train data.train --test data.test --dim 64 --learn_rate 0.01 --iter 50 --regular 0 --init_stdev 0.01 --method fm

Output

Here is an example for the output. First some statistics about the dataset are shown. You can use these statistics to check if your file is in the right format.
Loading train...	read data file data.train...
number of users             116
number of items             361
number of tags              412
number of posts             2406
number of distinct triples  9707
Loading test... 	read test file data.test...
number of test users        116
number of test items        92
number of test tags         186
number of test posts        116
number of d. test  triples  378
Then the optimization is started and for each iteration, the test quality in terms of HLU, AUC and F-measure/Precision/Recall at 1 to 10 is measured. The time states the learning runtime for a whole iteration.
Method: PITF (BPR)
Training BPR (Case-Update): num_iter=50 neg_samples=100
Time: 7.25645 / HLU/AUC/FPR1..10: 0.512506/0.908694/0.267197/0.439655/0.191917/0.339247/0.392241/0.298868/...
Time: 7.25645 / HLU/AUC/FPR1..10: 0.598639/0.931949/0.364155/0.594828/0.262397/0.426626/0.482759/0.382188/...
...

File format

Both training and test files have the same format. Each file contains one user-item-tag triple per row. The user (item/tag) should be represented by non-negative numerical IDs (e.g. 0,1,2,...).

Hints

Example

0 0 0
0 0 1
0 0 2
0 1 1
0 1 3
...
This file reads the following: The user with ID 0 has given the tags 0, 1 and 2 to the item with ID 0. And to the item with ID 1 (s)he has given the tag 1 and 3.

References

The PITF method has been introduced in [WSDM 2010]. The Factorization machine that mimics PITF is described in [ICDM 2010]. Please cite the corresponding paper(s) if you use the tagrec software.

[WSDM 2010] Steffen Rendle, Lars Schmidt-Thieme (2010): Pairwise Interaction Tensor Factorization for Personalized Tag Recommendation, in Proceedings of the Third ACM International Conference on Web Search and Data Mining (WSDM 2010), ACM. BibTeX PDF
[ICDM 2010] Steffen Rendle (2010): Factorization Machines, in Proceedings of the 10th IEEE International Conference on Data Mining (ICDM 2010), Sydney, Australia. Supplementary Material BibTeX PDF