https://academy.datastax.com/courses/installing-and-configuring-cassandra https://academy.datastax.com/courses/installing-configuring-and-manipulating-keyspace/selecting-and-installing-cassandra

http://www.datastax.com/dev/blog/ccm-a-development-tool-for-creating-local-cassandra-clusters

Java 1.7+ 64 bits Python 2.5+ http://planetcassandra.org/cassandra tar.gz (tout est localisé au même endroit) .deb package : répartit aux endroits usuels # files

conf

cassandra.yaml

cassandra-env.sh : HEAP ...

cassandra-rackdc.properties :rack, datacenter

cassandra-topoloy.properties : répartition et duplication des données

bin

cassandra cqlsh nodetool

Configuring

https://academy.datastax.com/courses/installing-configuring-and-manipulating-keyspace/conifguring-cassandra-node

Cassandra Architecture

https://academy.datastax.com/courses/understanding-cassandra-architecture/introducing-cassandras-node-based-architecture

cluster_name listen_address (localhost) commitlog_directory : best practice : on a separate disk in prod (unless SSD) data_file_directories saved_caches_directories: index et caches rpc_address / rpc_port : listen address for thrift client connction (def localhost / 9160) native_transport_port (9042) CQL driver protocol

cassandra-env.sh

MAX_HEAP_SIZE max 8G in prod (java limitations on GC)

Set of nodes

no SPOF node : instance de Cassandra partition : one ordered and replicable unit of data on a node rack : logicla set of nodes datacenter : logical set of racks cluster : full set of nodes / racks / datacenter

node -> cluster

seed available at @IP conf/cassandra.yaml

cluster_name: shared name (logical) seeds :@IP listen_address : @IP

Start / stop

https://academy.datastax.com/courses/installing-configuring-and-manipulating-keyspace/manually-starting-and-stopping-cassandra

cassandra -f (foreground)
cassandra -p PID 

ou

service cassandra start / stop

CCM

https://academy.datastax.com/courses/installing-configuring-and-manipulating-keyspace/using-cassandra-cluster-manager-ccm

not part of standard distribution, separate tool

ccm create cluster1 --cassandra-version 1.2.15# télécharge, compile
ccm list
ccmpopulate --nodes 3
ccm start
ccm node2 stop
ccm status

Request coordination

https://academy.datastax.com/courses/understanding-cassandra-architecture/understanding-request-coordination

coordinator : le node qui reçoit une requête : en charge de propager les infos dans le système, de centraliser la réponse. Pas de node privilégié.

round-robin pattern by default drivers : node.js, PHP, perl, go, clojure, haskel, R, ruby, scala

replication factor : sur combien de nodes on duplique la donnée ? replication strategy : quels nodes peuvent recevoir les replicas, comment ? (row, rack, cluster, datacenter ) consistency level : nb de confirmations (quorum) pour un read:write(ANY, ONE, QUORUM (RF/2) +1, ALL)

Partitioning

https://academy.datastax.com/courses/understanding-cassandra-architecture/introducing-partitioning-process

token : 128 bit ID partition : unit of daa storage on a node

répartition des tokens sur les noeuds, les clés primaires servent à calculer le token, qui sera assigné au premier noeud responsable de ce token. Murmur3Partitioner (default)(cassandra.yaml) doit etre le même pour les nodes du cluster

Virtual nodes

https://academy.datastax.com/courses/understanding-cassandra-architecture/introducing-virtual-nodes

bootstrap : node joins cluster and fills with partitions for its primary range decommission : node leaves a cluster num-tokens num of token ranges per node (def 256)

différences pour search ou analytics

TODO a revoir

Replication

https://academy.datastax.com/courses/understanding-cassandra-architecture/understanding-replication

https://academy.datastax.com/courses/understanding-cassandra-architecture/understanding-hinted-handoff

hints : when the node comes back, data is written later

https://academy.datastax.com/courses/understanding-cassandra-architecture/introducing-consistency-levels

consistency level is set for every request default : ONE

passed at each request

https://academy.datastax.com/courses/understanding-cassandra-architecture/understanding-tunable-consistency

immediate consistency : highest latency (all replicas checked) eventual consistency : lowest latency

https://academy.datastax.com/courses/understanding-cassandra-architecture/introducing-repair-operations

read repair: stale nodes are updates (found a more recent update of the data). occurs on a configurable percentage of reads (default 10%) read-repair_chance 0.1 ALTER TABLE [table] with [property] = [value] nodetool repairsbin/nodtool -h host -p JMX port repair options (after recovering a failed node, bring a node back online, eriodically with infrequent read nodes, write or delet activity, periodically every gc_grace_seconds 10 days)

https://academy.datastax.com/courses/understanding-cassandra-architecture/introducing-internode-communications

gossip protocol : contact 1 to 3 nodesavailable ? restart ? where ? load ? bootstrap ? contacts seed nodes

snitch : subsystem to track and report cluster topology

DESCRIBE keyspaces;

CREATE KEYSPACE DCO WITH REPLICATION = {'class': 'SimpleStrategy' , 'replication_factor': 1};

use DCO;

CREATE TABLE users (
user_id int PRIMARY KEY, 
fname text, 
lname text
);

INSERT INTO users (user_id,fname, lname) 
VALUES (1745, 'john', 'smith');
INSERT INTO users (user_id,fname, lname) 
VALUES (1744, 'john', 'doe');
INSERT INTO users (user_id,fname, lname) 
VALUES (1746, 'john', 'smith');


truncate tweets;

- Tintouli