Sunday, 15 July 2012

cassandra - Delete rows vs Delete Columns performance -


I am creating datamodals for chain applications at Cassandra 2.1.3 once. We will preserve the data of X volume for each user of the system and I am wondering what is the best way to design for this requirement.

Option 1:

In a 'Use Bucket' partition key, so the data goes for the X period in the same row:

  ((ID, bucket), timestamp) - & gt; Data   

Once I can delete the same row at the expense of maintaining this bucket concept. This

option 2:

may limit the query range on possibly timestamp to store all data in a single line . N are deleted columns.

  (id, timestamp) - & gt; Data  

Category question is easy again. But what about the performance after the removal of several columns?

Given that we use TTL to expire the data, which of the two models will give the best performance? What option 1 & amp; Lt; & Lt; Will a columnist column be reflected on option 2 or both models?

I am trying to avoid myself burying in a cemetery graveyard.

I think it will all depend on whether you have given the partition key How many data plans are, what your TTL is and what are you questioning? / P>

I usually bend to option # 1, especially if your TTL is all the same for writing. In addition to this, if you are using LeveledCompactionStrategy or DataTieredCompactionStrategy, Cassandra will share the same segment in the same SSTable Good data will keep working, keeping improving performance.

If you use option # 2, then the data for the same partition can spread on multiple levels (using LCS) or in general, in many setstals, which you can get from many SSTBs Depending on the nature of your questions, you can read. There is also the issue of hotspotting, where you can surcharge to specific cassandra nodes if you actually have a wide partition.

The second advantage of # 1 (which lets you forward) is that you can easily remove it from the whole partition, which creates a single tombstone marker which is very cheap. In addition, if you are using the same TTL, the data will be terminated at the same time within that partition.

I believe that it is a bit of a pain which is going to read many questions, in the form of many divisions this application finally pushes some complications.

As far as performance goes on, are you likely to see that you will need it? When your app asks questions to read the cross-section? For example, if you have a query of 'most recent 1000 records' and a partition is usually broad, then you may only need to create 1 query for option # 1 However, if you want to' record me all If you want a query like 'Give', option # 2 might be better otherwise you will need to do a query for each bucket.


No comments:

Post a Comment