Serialise a term-document matrix to disk.

serialize_mmcorpus(corpus, file = NULL, auto_delete = TRUE)

as_serialized_mmcorpus(file)

delete_mmcorpus(file)

Arguments

corpus

A corpus as returned by doc2bow.

file

Path to a .mm file (recommended), if NULL it is saved to a temp file.

auto_delete

Wether to automatically delete the temp file after first use.

Value

An object of class mm_file which holds the path to the file and metadata.

Details

Serialize the corpus to disk in order to take advantage of Python's file scan efficiency.

Functions

  • serialize_mmcorpus - Serialize the corpus

  • as_serialized_mmcorpus - Create an object of class mm_file from an already created corpus file.

  • delete_mmcorpus - Delete temp corpus.

Examples

docs <- prepare_documents(corpus)
#> Preprocessing 9 documents #> 9 documents after perprocessing
dict <- corpora_dictionary(docs) corpora <- doc2bow(dict, docs) # serialize and delete
# NOT RUN { corpus_mm <- serialize_mmcorpus(corpora) # }# NOT RUN { delete_mmcorpus(corpus_mm) # }