Unconventional Programming with Chemical Computing
Carin Meier
Living Clojure
@Cognitect
Inspired by Book - unconventional programming paradigms
"the grass is computing"
all living things process information via chemical reactions on molecular level
hormones
immune system
bacteria signal processing
will NOT be programming with chemicalsusing metaphor of molecules and reactions to do computing
nothing currently in the wild using chemical computing
at the heart of chemical programming: the reaction
will calculate primes two ways:
traditional
with prime reaction
uses clojure for the examples
prime reaction
think of the integers as molecules
simple rule: take a vector of 2 integers, divide them, if the mod is zero, return the result of the division, otherwise, return the vector unchanged
name of this procedure: gamma chemical programming
reaction is a condition + action
execute: replacement of original elements by resulting element
solution is known when it results in a steady state (hence, for prime reaction, have to churn over lists of integers multiple times to filter out all the non-primes)
possible advantages:
modeling probabilistic systems
drive a computation towards a global max or min
higher order
make the functions molecules as well
fn could "capture" integer molecules to use as args
what does it do?
it "hatches" => yields original fn and result of applying fn to the captured arguments
reducing reaction fn: return fewer arguments than is taken in
two fns interacting: allow to exchange captured values (leads to more "stirring" in the chem sims)
no real need for sequential processing; can do things in any order and still get the "right" answer
dining philosophers problem
something chemical programming handles well
two forks: eating philosopher
one fork or no forks: thinking philosopher
TP with 2fs reacting with EAT => EP
"self organizing": simple behaviors combine to create what look like complex behaviors
mail system: messages, servers, networks, mailboxes, membranes
membranes control reactions, keep molecules sorted
passage through membranes controlled by servers and network
"self organizing"
How Machine Learning helps Cancer Research
evelina gabasova
university of cambridge
cost per human genome has gone down from $100mil (2001) to a few thousand dollars (methodology change in mid-2000s paid big dividends)
cancer is not a single disease; underlying cause is mutations in the genetic code that regulates protein formation inside the cell
brca1 and brca2 are guardians; they check the chromosomes for mistakes and kill cells that have them, so suppress tumor growth; when they stop working correctly or get mutated, you can have tumors
clustering: finding groups in data that are more similar to each other than to other data points
example: clustering customers
but: clustering might vary based on the attributes chosen (or they way those attributes are lumped together)?
yes: but choose projection based on which ones give the most variance between data points
can use in cancer research by plotting genes and their expression and looking for grouping
want to be able to craft more targeted responses to the diagnosis of cancer based on the patient and how they will react
collaborative filtering
used in netflix recommendation engine
filling in cells in a matrix
compute as the product of two smaller matrices
in cancer research, can help because the number of people with certain mutations is small, leading to a sparsely populated database
theorem proving
basically prolog-style programming, constraints plus relations leading to single (or multiple) solutions
can use to model cancer systems
was used to show that chronic myeloid leukemia is a very stable system, that just knocking out one part will not be enough to kill the bad cell and slow the disease; helps with drug and treatment design
data taken from academic papers reporting the results of different treatments on different populations
machine learning not just for targeted ads or algorithmic trading
will become more important in the future as more and more data becomes available
Q: how long does the calculation take for stabilization sims?
A: for very simple systems, can take milliseconds
Q: how much discovery is involved, to find the data?
A: actually, whole teams developing text mining techniques for extracting data from academic papers (!)
When Worst is Best
Peter Bailis
what if we designed computer systems for the worst-case scenarios?
website that served 7.3Billion simultaneous users; would on average have lots of idle resources
hardware: what if we built this chip for the mars rover? would lead to very expensive packaging (and a lot of R&D to handle low-power low-weight environments)
security: all our devs are malicious; makes code deployment harder
designing for the worst case often penalizes the average case
could we break the curve? design for the worst case and improve the average case too
distributed systems
almost everything non-trivial is distributed these days
operate over a network
networks make designs hard
packets can be delayed
packets may be dropped
async network: can't tell if message has been delayed or dropped
handle this by adding replicas that can respond to any request at any time
network interruptions don't stop service
no coordination means even when everything is fine, we don't have to talk
possible infinite service scale-out
coordinated multi-server transactions pay large penalty as we add more servers (from locks); get more throughput if we let access be uncoordinated
don't care about latency if you don't have to send messages everywhere
but what about the CAP theorem?
inktomi from eric brewer: for large scale services, have to trade off between always giving an answer and always giving the right answer
takeaway: certain properties of a system (like serializability) require unavailability
original paper: cathy lynch
common conclusion: availability is too expensive, and we have to give up too much, and it only matters during failures, so forget about it
if you use worst case as design tool, you skew toward coordination-avoiding databases
high coordination is legacy of old db design
coordination-free designs are possible
example: read committed isolation
goal: never read uncommitted data
legacy implementation: lock records during access (coordination)
one way: copy on write (x -> x', do stuff -> write back to x)
or: versioning
for more detail, see martin's talk on saturday about transactions
research on coordination-free systems have potential for huge speedups
other situations where worst-case thinking yields good results
replication for fault tolerance can also increase your request-serving capacity
fail-over can help deployments/upgrades: if it's automatic, you can shut off the primary whenever you want and know that the backups will take over, then bring the primary back up when your work is done
tail latency in services:
avg of 1.2ms (not bad) can mean 0.1% of requests have 100ms (which is terrible)
if you're one of many services being used to fulfill a front-end request, your worst case is more likely to happen, and so drag down the avg latency for the end-user
universal design: designing well for everyone; ex: curb cuts, subtitles on netflix
sometimes best is brittle: global maximum can sit on top of a very narrow peak, where any little change in the inputs can drive it away from the optimum
defining normal defines our designs; considering a different edge case as normal can open up new design spaces
hardware: what happens if we have bit flips?
clusters: what's our scale-out strategy?
security: how do we audit data access?
examine your biases
All In with Determinism for Performance and Testing in Distributed Systems
John Hugg
VoltDB
so you need a replicated setup?
could run primary and secondary
could allow writes to 2 servers, do conflict detection, and merge all writes
NOPE
active-active: state a + deterministic op = state b
if do same ops across all servers, should end up with the same state
have client that sends A B C to coordination system, which then ends ABC to all replicas, which do the ops in order
ABC: a logical log, the ordering is what's important
can write log to disk, for later replay
can replicate log to all servers, for constant active-active updates
can also send log across network for cluster replication
look out for non-determinism
random numbers
wall-clock time
record order
external systems (ping noaa for weather)
bad memory
libraries that use randomness for security
how to protect from non-determinism?
make sure sql is as deterministic as possible
100% of their DML is deterministic
rw transactions are hard to make deterministic, have to do a little more planning (swap row-scan for tree-index scan)
use seeded random-number generators that are lists created in advance
hash up the write ops, and require replicas to send back their computed hashes once the ops are done so the coordinator can confirm the ops were deterministic
can also hash the whole replica state when doing a transactional snapshot
reduce latency by sending condensed representation of ops instead of all the steps (the recipe name, not the recipe)
why do it?
replicate faster, reduces concerns for latency
persist everything faster: start logging when the work is requested, not when the work is completed
bounded sizes: the work comes in as fast as the network allows, so the log will only be written no faster than the network (no firehose)
trade-offs?
it's more work: testing, enforcing determinism
running mixed versions is scary: if you fix a bug, and you're running different versions of the software between the replicas, you no longer have deterministic transactions
if you trip the safety checks, we shut down the cluster
simulation a la foundationDB not as useful for them, since they have more states
message/state-machine fuzzing
unit tests
smoke tests
self-checking workload (best value)
everything written gets self-checked; so to check a read value, write it back out and see if it comes back unchanged
use "nefarious app": application that runs a lot of nasty transactions, checks for ACID failures
nasty transactions:
read values, hash them, write them back
add huge blobs to rows to slow down processing
add mayhem threads that run ad-hoc sql doing updates
multi-table joins
read and write multiple values
do it all many many times within the same transaction
mix up all different kinds of environment tweaks
different jvms
different VM hosts
different OSes
inject latency, disk faults, etc
client knows last sent and last acknowledged transaction, checker can be sure recovered data (shut down and restart) contains all the acknowledged transactions
Scaling Stateful Services
Caitie MacCaffrey
been using stateless services for a long time, depending on db to store and coordinate our state
has worked for a long time, but got to place where one db wasn't enough, so we went to no-sql and sharded dbs
data shipping paradigm: client makes request, service fetches data, sends data to client, throws away "stale" data
will talk about stateful services, and their benefits, but WARNING: NOT A MAGIC BULLET
data locality: keep the fetched data on the service machine
lower latency
good for data intensive ops where client needs quick responses to operations on large amounts of data
sticky connections and consistency
using sticky connections and stateful services gives you more consistency models to use: pipelined random access memory, read your write, etc
blog post from werner vogel: eventual consistency revisited
building sticky connections
client connecting to a cluster always gets routed to the same server
easiest way: persistent connections
but: no stickiness once connection breaks
also: mucks with your load balancing (connections might not all last the same amount of time, can end up with one machine holding everything)
will need backpressure on the machines so they can break connections when they need to
next easiest: routing logic in cluster
but: how do you know who's in the cluster?
and: how do you ensure the work is evenly distributed?
static cluster membership: dumbest thing that might work; not very fault tolerant; painful to expand;
next better: dynamic cluster membership
gossip protocols: machines chat about who is alive and dead, each machine on its own decides who's in the cluster and who's not; works so long as system is relatively stable, but can lead to split-brain pretty quickly
consensus systems: better consistency; but if the consensus truth holder goes down, the whole cluster goes down
work distribution: random placement
write anywhere
read from everywhere
not sticky connection, but stateful service
work distribution: consistent hashing
deterministic request placement
nodes in cluster get placed on a ring, request gets mapped to spot in the ring
can still have hot spots form, since different requests will have different work that needs to be done, can have a lot of heavy work requests placed on one node
work around the hot spots by having larger cluster, but that's more expensive
work distribution: distributed hash table
non-deterministic placement
stateful services in the real world
scuba:
in-memory db from facebook
believe to be static cluster membership
random fan-out on write
reads from every machine in cluster
results get composed by machine running query
results include a completeness metric
uber ringpop
nodejs library that does application-layer sharding for their dispatching services
swim gossip protocol for cluster membership
consistent hashing for work distribution
orleans
from Microsoft Research
used for Halo4
runtime and programming model for building distributed systems based on Actor Model
gossip protocol for cluster membership
consistent hashing + distributed hash table for work distribution
actors can take request and:
update their state
return their state
create a new Actor
request comes in to any machine in cluster, it applies hash to find where the DHT is for that client, then that DHT machine routes the request to the right Actor
if a machine fails, the DHT is updated to point new requests to a different Actor
can also update the DHT if it detects a hot machine
cautions
unbounded data structures (huge requests, clients asking for too much data, having to hold a lot of things in memory, etc)
memory management (get ready to make friends with the garbage collector profiler)
reloading state: recovering from crashes, deploying a new node, the very first connection of a session (no data, have to fetch it all)
sometimes can get away with lazy loading, because even if the first connection fails, you know the client's going to come back and ask for the same data anyway
fast restarts at facebook: with lots of data in memory, shutting down your process and restarting causes a long wait time for the data to come back up; had success decoupling memory lifetime from process lifetime, would write data to shared memory before shutting process down and then bring new process up and copy over the data from shared to the process' memory