hadoop - Autoscaling EMR- is it required? Should I just use EC2? Should I just use Qubole? -
in order cut down time provisioning, we've decided maintain dedicated emr cluster 5 instances (we expect need 5). in case need more, think we'll need implement sort of autoscaling.
i'm not familiar @ emr- back upwards autoscaling? found in docs: http://docs.aws.amazon.com/elasticmapreduce/latest/developerguide/emr-manage-resize.html
is right place autoscaling or misunderstanding mean "resize". i've read 1 benefit of emr "on demand processing" , think splits load between ec2 instances without specifying how many instances gives me impression scaling of ec2 instances on own, meaning don't need autoscale ourselves. misunderstanding "on demand processing" means?
if resizing link provided appropriate i'm trying do, have experience determining when resize? doc describes how not, example, how have alarm when resize. i've used regular autoscaling service , allows resize based on conditions i'm not seeing here.
i'm still unsure if autoscaling emr bad idea- involved (since there entire companies qubole provide this) or maybe not useful since emr uses whatever computing powerfulness needs? don't know much emr provides maybe that's why i'm confused.
the page linked showed ways of either manually or programmatically increasing nodes in cluster. couldn't find else autoscaling emr.
unless we're missing facts, you’d still have come own scaling algorithm , process. if you’re taking factors business relationship such job backlog, units of time you’re paying for, utilize of less-expensive “spot” instances, multiple clusters, etc, not trivial exercise.
in add-on increasing size of cluster, there downsizing. emr allows (manually or programmatically) task nodes, state don't core nodes. you'd have terminate core node through aws functionality , risk losing data. if workloads increment , decrease on time, core node downsizing valuable keeping costs lower.
qubole automatically takes care of of these things out of box. run jobs ui or api , starts, sizes or resizes cluster. when you're finished, downsizes or terminates cluster. allows have minimum number of nodes running @ 1 time. i've heard startup time qubole nodes faster emr.
hope helps you.
hadoop amazon-web-services emr autoscaling qubole
No comments:
Post a Comment