Tag Recommender
Author: Steffen Rendle, Social Network Analysis, University of KonstanzThis software implements the Pairwise Interaction Tensor Factorization (PITF) [WSDM 2010] model with BPR optimization for tag recommendation. It also contains a Factorization Machine implementation which can mimic PITF (see [ICDM 2010] for details).
Download
- Download source code (C++) tagrec-1.10.src.tar.gz (2011-07-14)
- Extract the sources and run Make to generate the executable. After compiling, the executable is in the ./bin/ folder.
- This software is free of charge for academic purposes. Please contact the author if you want to use this software for commercial purposes. You are not allowed to redistribute this software or its source code. Please acknowledge the software if you publish results produced with this software. The complete license is included in the archive -- please see the file license.txt for details.
How to use
The tool tagrec has the following parameters:-dim dim of factorization; default=64 -help this screen -init_stdev stdev for initialization of 2-way factors; default=0.01 -iter number of iterations for SGD; default=100 -learn_rate learn_rate for SGD; default=0.1 -method method: 'pitf' or 'fm' or 'fmgeneric' [MANDATORY] -num_out how many tags per post should be written; default=10 -num_sample number of the pair samples drawn for each training tuple, default 100 -out filename for output; default='' -regular regularization; default=0.0 -test filename for test data [MANDATORY] -train filename for training data [MANDATORY]
Example
Run the PITF [WSDM 2010] approach with 64 factors:./tagrec --train data.train --test data.test --dim 64 --learn_rate 0.01 --iter 50 --regular 0 --init_stdev 0.01 --method pitfUse a Factorization Machine [ICDM 2010] to mimic PITF [WSDM 2010]:
./tagrec --train data.train --test data.test --dim 64 --learn_rate 0.01 --iter 50 --regular 0 --init_stdev 0.01 --method fm
Output
Here is an example for the output. First some statistics about the dataset are shown. You can use these statistics to check if your file is in the right format.Loading train... read data file data.train... number of users 116 number of items 361 number of tags 412 number of posts 2406 number of distinct triples 9707 Loading test... read test file data.test... number of test users 116 number of test items 92 number of test tags 186 number of test posts 116 number of d. test triples 378Then the optimization is started and for each iteration, the test quality in terms of HLU, AUC and F-measure/Precision/Recall at 1 to 10 is measured. The time states the learning runtime for a whole iteration.
Method: PITF (BPR) Training BPR (Case-Update): num_iter=50 neg_samples=100 Time: 7.25645 / HLU/AUC/FPR1..10: 0.512506/0.908694/0.267197/0.439655/0.191917/0.339247/0.392241/0.298868/... Time: 7.25645 / HLU/AUC/FPR1..10: 0.598639/0.931949/0.364155/0.594828/0.262397/0.426626/0.482759/0.382188/... ...
File format
Both training and test files have the same format. Each file contains one user-item-tag triple per row. The user (item/tag) should be represented by non-negative numerical IDs (e.g. 0,1,2,...).
Hints
- Note that memory is allocated up to the largest ID, so to save memory consumption you should start your IDs with 0 and avoid large gaps of unused IDs.
- It is assumed that the train and test set do not overlap in posts (=user/item tuple). That means there should be no triples for a user-item combination (u,i) in the train set if there exists any triple (u,i,t*) in the test set.
- You can use tabs or spaces to separate IDs.
Example
0 0 0 0 0 1 0 0 2 0 1 1 0 1 3 ...This file reads the following: The user with ID 0 has given the tags 0, 1 and 2 to the item with ID 0. And to the item with ID 1 (s)he has given the tag 1 and 3.
References
The PITF method has been introduced in [WSDM 2010]. The Factorization machine that mimics PITF is described in [ICDM 2010]. Please cite the corresponding paper(s) if you use the tagrec software.
Steffen Rendle, Lars Schmidt-Thieme (2010): Pairwise Interaction Tensor Factorization for Personalized Tag Recommendation, in Proceedings of the Third ACM International Conference on Web Search and Data Mining (WSDM 2010), ACM. | BibTeX | |||
Steffen Rendle (2010): Factorization Machines, in Proceedings of the 10th IEEE International Conference on Data Mining (ICDM 2010), Sydney, Australia. | Supplementary Material | BibTeX |