model_hdp.Rd
Hierarchical Dirichlet process (HDP) is a powerful mixed-membership model for the unsupervised analysis of grouped data. Unlike its finite counterpart, latent Dirichlet allocation, the HDP topic model infers the number of topics from the data. Here we have used Online HDP, which provides the speed of online variational Bayes with the modeling flexibility of the HDP. The idea behind Online variational Bayes in general is to optimize the variational objective function with stochastic optimization.The challenge we face is that the existing coordinate ascent variational Bayes algorithms for the HDP require complicated approximation methods or numerical optimization. This model utilises stick breaking construction of Hdp which enables it to allow for coordinate-ascent variational Bayes without numerical approximation.
model_hdp(corpus, id2word, ...) # S3 method for mm_file model_hdp(corpus, id2word, ...) # S3 method for mm model_hdp(corpus, id2word, ...) # S3 method for python.builtin.list model_hdp(corpus, id2word, ...) load_hdp(file)
corpus | Model as returned by |
---|---|
id2word | Dictionary for the input corpus, as returned by |
... | Any other options, from the official documentation. |
file | Path to a saved model. |
This is a non-parametric bayesian method: notice the lack of num_topics
argument.
#> → Preprocessing 9 documents #> ← 9 documents after perprocessingdictionary <- corpora_dictionary(docs) corpora <- doc2bow(dictionary, docs) corpus_mm <- serialize_mmcorpus(corpora, auto_delete = FALSE) # fit model hdp <- model_hdp(corpus_mm, id2word = dictionary) reticulate::py_to_r(hdp$show_topic(topic_id = 1L, topn = 5L))#> [[1]] #> [[1]][[1]] #> [1] "response" #> #> [[1]][[2]] #> [1] 0.3007322 #> #> #> [[2]] #> [[2]][[1]] #> [1] "user" #> #> [[2]][[2]] #> [1] 0.2898649 #> #> #> [[3]] #> [[3]][[1]] #> [1] "minors" #> #> [[3]][[2]] #> [1] 0.1047671 #> #> #> [[4]] #> [[4]][[1]] #> [1] "trees" #> #> [[4]][[2]] #> [1] 0.06323082 #> #> #> [[5]] #> [[5]][[1]] #> [1] "survey" #> #> [[5]][[2]] #> [1] 0.05174245 #> #>