Distributed Data Analysis Design Principles
Abstraction of function from definite products
Performance tools used for decision support
Exploit data locality: spatial and temporal
Load balancing and hot-spot removal with parallelism
Seek to minimize the workload: which functions are necessary ?
Minimize lock times and latency; and/or multithreading
Associate priority with resource usage and lock times
Isolate heavy tasks with dedicated resources where necessary