https://academy.datastax.com/courses/installing-and-configuring-cassandra https://academy.datastax.com/courses/installing-configuring-and-manipulating-keyspace/selecting-and-installing-cassandra
http://www.datastax.com/dev/blog/ccm-a-development-tool-for-creating-local-cassandra-clusters
Java 1.7+ 64 bits Python 2.5+ http://planetcassandra.org/cassandra tar.gz (tout est localisé au même endroit) .deb package : répartit aux endroits usuels # files
conf
cassandra.yaml
cassandra-env.sh : HEAP ...
cassandra-rackdc.properties :rack, datacenter
cassandra-topoloy.properties : répartition et duplication des données
bin
cassandra cqlsh nodetool
Configuring
https://academy.datastax.com/courses/installing-configuring-and-manipulating-keyspace/conifguring-cassandra-node
Cassandra Architecture
https://academy.datastax.com/courses/understanding-cassandra-architecture/introducing-cassandras-node-based-architecture
cluster_name listen_address (localhost) commitlog_directory : best practice : on a separate disk in prod (unless SSD) data_file_directories saved_caches_directories: index et caches rpc_address / rpc_port : listen address for thrift client connction (def localhost / 9160) native_transport_port (9042) CQL driver protocol
cassandra-env.sh
MAX_HEAP_SIZE max 8G in prod (java limitations on GC)
Set of nodes
no SPOF node : instance de Cassandra partition : one ordered and replicable unit of data on a node rack : logicla set of nodes datacenter : logical set of racks cluster : full set of nodes / racks / datacenter
node -> cluster
seed available at @IP conf/cassandra.yaml
cluster_name: shared name (logical) seeds :@IP listen_address : @IP
Start / stop
https://academy.datastax.com/courses/installing-configuring-and-manipulating-keyspace/manually-starting-and-stopping-cassandra
cassandra -f (foreground)
cassandra -p PID
ou
service cassandra start / stop
CCM
https://academy.datastax.com/courses/installing-configuring-and-manipulating-keyspace/using-cassandra-cluster-manager-ccm
not part of standard distribution, separate tool
ccm create cluster1 --cassandra-version 1.2.15# télécharge, compile
ccm list
ccmpopulate --nodes 3
ccm start
ccm node2 stop
ccm status
Request coordination
https://academy.datastax.com/courses/understanding-cassandra-architecture/understanding-request-coordination
coordinator : le node qui reçoit une requête : en charge de propager les infos dans le système, de centraliser la réponse. Pas de node privilégié.
round-robin pattern by default drivers : node.js, PHP, perl, go, clojure, haskel, R, ruby, scala
replication factor : sur combien de nodes on duplique la donnée ? replication strategy : quels nodes peuvent recevoir les replicas, comment ? (row, rack, cluster, datacenter ) consistency level : nb de confirmations (quorum) pour un read:write(ANY, ONE, QUORUM (RF/2) +1, ALL)
Partitioning
https://academy.datastax.com/courses/understanding-cassandra-architecture/introducing-partitioning-process
token : 128 bit ID partition : unit of daa storage on a node
répartition des tokens sur les noeuds, les clés primaires servent à calculer le token, qui sera assigné au premier noeud responsable de ce token. Murmur3Partitioner (default)(cassandra.yaml) doit etre le même pour les nodes du cluster
Virtual nodes
https://academy.datastax.com/courses/understanding-cassandra-architecture/introducing-virtual-nodes
bootstrap : node joins cluster and fills with partitions for its primary range decommission : node leaves a cluster num-tokens num of token ranges per node (def 256)
différences pour search ou analytics
TODO a revoir
Replication
https://academy.datastax.com/courses/understanding-cassandra-architecture/understanding-replication
https://academy.datastax.com/courses/understanding-cassandra-architecture/understanding-hinted-handoff
hints : when the node comes back, data is written later
https://academy.datastax.com/courses/understanding-cassandra-architecture/introducing-consistency-levels
consistency level is set for every request default : ONE
passed at each request
https://academy.datastax.com/courses/understanding-cassandra-architecture/understanding-tunable-consistency
immediate consistency : highest latency (all replicas checked) eventual consistency : lowest latency
https://academy.datastax.com/courses/understanding-cassandra-architecture/introducing-repair-operations
read repair: stale nodes are updates (found a more recent update of the data). occurs on a configurable percentage of reads (default 10%) read-repair_chance 0.1 ALTER TABLE [table] with [property] = [value] nodetool repairsbin/nodtool -h host -p JMX port repair options (after recovering a failed node, bring a node back online, eriodically with infrequent read nodes, write or delet activity, periodically every gc_grace_seconds 10 days)
https://academy.datastax.com/courses/understanding-cassandra-architecture/introducing-internode-communications
gossip protocol : contact 1 to 3 nodesavailable ? restart ? where ? load ? bootstrap ? contacts seed nodes
snitch : subsystem to track and report cluster topology
DESCRIBE keyspaces;
CREATE KEYSPACE DCO WITH REPLICATION = {'class': 'SimpleStrategy' , 'replication_factor': 1};
use DCO;
CREATE TABLE users (
user_id int PRIMARY KEY,
fname text,
lname text
);
INSERT INTO users (user_id,fname, lname)
VALUES (1745, 'john', 'smith');
INSERT INTO users (user_id,fname, lname)
VALUES (1744, 'john', 'doe');
INSERT INTO users (user_id,fname, lname)
VALUES (1746, 'john', 'smith');
truncate tweets;