Cluster computing: nltk

Sunday, May 5, 2013

nltk

Let's quickly review the documentation for nltk.text
1) a bidirectional index between words and their 'contexts' in a text
methods:
word_similarity_dict : returns a dictionary mapping between words and their 'similarity_scores'
similar_words : returns words from the context
common_contexts: finds contexts where all the words can appear
2) Concordance index: an index that can tell where the words occur
methods:
print_concordance : prints a concordance for the word
3) TokenSearcher : uses regular expressions to search over tokenized strings
methods:
find_all: finds instances of the regular expressions in the text
4) Text: a wrapper around a sequence of simple string tokens, initialized from a simple
methods:
concordance : prints a concordance for word with the specified context window
collocations : prints collocations derived from text, ignoring stopwords
count: the number of times a given word appears
similar: this gives other words that appear in the same contexts as the specified word
dispersion_plot: shows the distribution of words throughout the text
5) TextCollection : initializes a collection of text

Cluster computing

Sunday, May 5, 2013

nltk

No comments:

Post a Comment