You’ve set up your tables, inserted some data, hand-crafted a query, executed and…
InvalidRequest: code=2200 [Invalid query] message="Only EQ and IN relation are supported on the partition key (unless you use the token() function)”
Those coming from a relational background are likely to have come across this error message.
In my case I had defined the nodeId and date as the primary key fields (PRIMARY KEY ((nodeid, date), timestamp)) and tried to run a simple query:
select fields from data_current where nodeId = 3 and date > ‘2016-10-10’ and date < ‘2016-10-11’
Well, it turns out that Cassandra’s clustering model uses the Primary Key to distribute data evenly across partitions and that kind of query is expensive and not possible with the default murmur3 partitioner. In relational terms this query would trigger a full table scan— except in this instance potentially across multiple nodes— despite the use of an index.
The solution requires shifting your mental model of how data should be stored. In my instance the solution I landed on was to implement another table that stored data with the year as part of the primary key rather than the full date as the application would not require data searches across years, and the data_current table data has a TTL of 1 day.
Requests for the current day’s data hits the small table whilst larger ranges hit the data_historic table. This is in line with the ‘storage is cheap’ mantra that, unlike most relational, promotes duplicating data to better fit the model.
Another solution, keeping the original table only, would be to use Spark’s Cassandra connector which allows powerful querying using Scala.