My Blog: python - Amazon MapReduce with my own reducer for streaming -

Tuesday, 15 September 2015

python - Amazon MapReduce with my own reducer for streaming -

i wrote simple map , cut down programme in python count numbers each sentence, , grouping same number together. i.e suppose sentence 1 has 10 words, sentence 2 has 17 words , sentence 3 has 10 words. final result be:

10 \t 2 17 \t 1

the mapper function is:

    import sys     import re      pattern = re.compile("[a-za-z][a-za-z0-9]*")     line in sys.stdin:          word = str(len(line.split()))  # calculate how many words each line         count = str(1)         print "%s\t%s" % (word, count)

the reducer function is:

    import sys       current_word = none     current_count = 0     word = none      line in sys.stdin:         line = line.strip()         word, count = line.split('\t')         try:             count = int(count)             word = int(word)         except valueerror:              go on         if current_word == word:             current_count += count         else:             if current_word:                 print "%s\t%s" % (current_word, current_count)             current_count = count             current_word = word      if current_word == word:         print "%s\t%s" %(current_word, current_count)

i tested on local machine first 200 lines of file : head -n 200 sentences.txt | python mapper.py | sort | python reducer.py results correct. used amazon mapreduce streaming service, failed @ reducer step. changed print in mapper function to:

print "longvaluesum" + word + "\t" + "1"

this fits default aggregate in mapreduce streaming service. in case, don't need reducer.py function. final results big file sentences.txt. don't know why reducer.py function failed. give thanks you!

got it! "stupid" mistake. when tested it, utilize python mapper.py. mapreduce, need create executable. add

# !/usr/bin/env python

in beginning.

python amazon-web-services mapreduce

My Blog

Tuesday, 15 September 2015

python - Amazon MapReduce with my own reducer for streaming -

No comments:

Post a Comment