cassandra 2.0 - CQL SELECT with lower bound -
suppose have cassandra db , need process big bunch of info can query select. problem processing slow , i'd utilize distributed scheme work. how can reshape cql query can chunk of datas?
i know can limited number of rows using limit ability of cql, need more limit , offset each process can independant chunk of data. (is offset implemented in cql? i've read inefficient, reason why not implemented?)
i avoid waiting end of query start next one, suggested in cassandra pagination: how utilize get_slice query cassandra 1.2 database python using cql library. maintain processes idle while waiting previous queries complete.
as example, suppose i'd process weather info , moment, table looks (i utilize other info type storage, such timeuuid time, dummy problem):
create table weather_data ( station varchar, date varchar, time varchar, value double, primary key ( (station,date), time ) );
for given station , date, i'd create chunks of info (based on time). can suppose know how many measures have each station , date.
if right reply "change construction of table", glad see how modify it.
i alter reply since misunderstood original problem. break other sub-chunks info concerning station , date, instance day hr or whatever reasonable partition you
create table weather_data ( station varchar, date varchar, dayhour int, time varchar, value double, primary key ( (station,date), dayhour, time ) );
in way can split info 24 chunks , allowing parallel execution told before. way can split getting first 2 hours instance - downside nail same nodes. alternative create such primary key:
primary key ( (station,date,dayhour), time )
this 1 partition info based on dayhour, side effect if need measurement given station in specific date have perform 24 queries. lastly not to the lowest degree solution denormalization (organize info sorted hr in new table , leave original is).
hth, carlo
cassandra-2.0 cql3
No comments:
Post a Comment