Hierarchical Dirichlet process (HDP) is a powerful mixed-membership model for the unsupervised analysis of grouped data. Unlike its finite counterpart, latent Dirichlet allocation, the HDP topic model infers the number of topics from the data. Here we have used Online HDP, which provides the speed of online variational Bayes with the modeling flexibility of the HDP. The idea behind Online variational Bayes in general is to optimize the variational objective function with stochastic optimization.The challenge we face is that the existing coordinate ascent variational Bayes algorithms for the HDP require complicated approximation methods or numerical optimization. This model utilises stick breaking construction of Hdp which enables it to allow for coordinate-ascent variational Bayes without numerical approximation.

model_hdp(corpus, id2word, ...)

# S3 method for mm_file
model_hdp(corpus, id2word, ...)

# S3 method for mm
model_hdp(corpus, id2word, ...)

# S3 method for python.builtin.list
model_hdp(corpus, id2word, ...)

load_hdp(file)

Arguments

corpus

Model as returned by mmcorpus_serialize.

id2word

Dictionary for the input corpus, as returned by corpora_dictionary.

...

Any other options, from the official documentation.

file

Path to a saved model.

Details

This is a non-parametric bayesian method: notice the lack of num_topics argument.

Examples

docs <- prepare_documents(corpus)
#> Preprocessing 9 documents #> 9 documents after perprocessing
dictionary <- corpora_dictionary(docs) corpora <- doc2bow(dictionary, docs) corpus_mm <- serialize_mmcorpus(corpora, auto_delete = FALSE) # fit model hdp <- model_hdp(corpus_mm, id2word = dictionary) reticulate::py_to_r(hdp$show_topic(topic_id = 1L, topn = 5L))
#> [[1]] #> [[1]][[1]] #> [1] "response" #> #> [[1]][[2]] #> [1] 0.3007322 #> #> #> [[2]] #> [[2]][[1]] #> [1] "user" #> #> [[2]][[2]] #> [1] 0.2898649 #> #> #> [[3]] #> [[3]][[1]] #> [1] "minors" #> #> [[3]][[2]] #> [1] 0.1047671 #> #> #> [[4]] #> [[4]][[1]] #> [1] "trees" #> #> [[4]][[2]] #> [1] 0.06323082 #> #> #> [[5]] #> [[5]][[1]] #> [1] "survey" #> #> [[5]][[2]] #> [1] 0.05174245 #> #>