How can I get popular tags/keywords from a collection of unstructured text chunks? -
i storing little chunks of texts - of around 100 - 200 words - in nosql database, , need display trending keywords/tags among of these chunks.
i know of text analysis apis alchemy extract entities single chunk of text, want top keywords/tags among chunks.
should store keywords against each text-chunk , exhaustive counting of top keywords? in case, each keyword may differ , may lead fragmentation of similar keywords.
its not necessary filtering out entities provide result (thought serves basic purpose). if want more effective should remove stopwords, stemming, uppercase lowercase converstion, spelling correction , utilize hashmap find frequencies. using frequency can filter out top 100-200 entities/tags.
i hope helps.
full-text-search text-analysis
No comments:
Post a Comment