- chicken & eggplant chinese leftovers while reviewing 0-CFA
- discussed the shift from denotational to constraint-based presentations of type systems
- looked at a PLer's take on string theory
- jointly critiqued another's critique of the metasystem of his metalanguage
- met in order to schedule a meeting around a soccer (AKA football) match
- chase citations (Kahn networks, Lustre/clock-calculus driven compilation, some other goodies..)
- ginger & garlic (pepper, really) soup, spiced with more papers
- chance encounter to briefly discuss speculative sand-boxes
- summarized ~150 bug reports from 2006
@LMeyerov: Scientist-at-large launching a big data visualization startup.
Previous life in hacking new languages: Superconductor for hardware accelerated data visualization, Ph.D. at Berkeley on multicore web browsers, Flapjax for reactive JavaScript (FRP), and ConScript+Margrave for secure scripting.
Wednesday, June 25, 2008
A Day in the Life
I'm still ruminating over the implications of my day. Presented chronologically; I claim no objectivity.
Friday, June 20, 2008
Taking Advantage of Statistics in Runtimes
Michael Bond presented for us, as part of his overall talk, a very neat paper on using basic statistics (... more like probability at this level) to improve space or time properties of runtime representations of program properties at the expense of precision.
For example, in a long-running program, we may want to know what objects have been around for awhile, and, often, it would help to know at what line of code they were allocated. This would give us an idea of where to find many dangerous memory leaks. So the goal is, given a set of objects in the runtime, if a bunch of them came from the same call-site, have a pretty good idea that this is the case. One way would be, when the object is instantiated, randomly (with some distribution) decide whether to include this information in the object's header. However, this is undesirable implementation-wise (heterogeneous object headers). What's more, as the number of objects being examined shrinks, we suffer from a high variance in information. An alternative that circumvents both of these drawbacks is to use 1 bit of every object header (... which would likely have to have been used anyways in a heterogeneous approach) to encode a bit of information about every object.
We first parameterize our labeling mechanism by, for very address, creating a mapping from call-site to booleans. This is a somewhat arbitrary mapping, but the important part is that address is (less than) psuedo-random, and we want mappings from call-sites to booleans to be randomly permuted based on this address.
f: Address -> Call-site -> Boolean
Whenever an allocation for some object occurs, we can then label the header of the object:
function label (address, call-site) = (f(address))(call-site)
...
//third allocation site for some object foo
... o = new foo()
... o.tag = label(address(o), 3)
...
Later, when we have a set of objects, we can start asking questions about most likely call-sites, under the assumption that, in a leak, there's a good chance that leaked objects came from the same call-site.
For each call-site c_i:
count_i = |{objects | label(address(o), c_i) = o.tag}|
Call-sites with high counts are likely to be the culprits. At this point, you can do fancier things to (cheaply) get back precision. For example, for every call-site, you may record how many times it was used, which has a fixed space cost and cheap time cost, and figure out some similar estimate for deallocations, which would let you deal with the alignment problem a bit better.
However, the more important lesson is that, when we don't need 100% precision in an analysis, it might make sense to look at lowering space or time costs by using probabilistic representations. Michael has another paper on 'probabilistic calling contexts' looking at another application of this idea. Most of my interests in statistical approaches, so far, have been in novel ways of exploring behavior (bottleneck analysis, likely violated implicit invariants, etc) - I had not really thought about implications on runtime representations before.
For example, in a long-running program, we may want to know what objects have been around for awhile, and, often, it would help to know at what line of code they were allocated. This would give us an idea of where to find many dangerous memory leaks. So the goal is, given a set of objects in the runtime, if a bunch of them came from the same call-site, have a pretty good idea that this is the case. One way would be, when the object is instantiated, randomly (with some distribution) decide whether to include this information in the object's header. However, this is undesirable implementation-wise (heterogeneous object headers). What's more, as the number of objects being examined shrinks, we suffer from a high variance in information. An alternative that circumvents both of these drawbacks is to use 1 bit of every object header (... which would likely have to have been used anyways in a heterogeneous approach) to encode a bit of information about every object.
We first parameterize our labeling mechanism by, for very address, creating a mapping from call-site to booleans. This is a somewhat arbitrary mapping, but the important part is that address is (less than) psuedo-random, and we want mappings from call-sites to booleans to be randomly permuted based on this address.
f: Address -> Call-site -> Boolean
Whenever an allocation for some object occurs, we can then label the header of the object:
function label (address, call-site) = (f(address))(call-site)
...
//third allocation site for some object foo
... o = new foo()
... o.tag = label(address(o), 3)
...
Later, when we have a set of objects, we can start asking questions about most likely call-sites, under the assumption that, in a leak, there's a good chance that leaked objects came from the same call-site.
For each call-site c_i:
count_i = |{objects | label(address(o), c_i) = o.tag}|
Call-sites with high counts are likely to be the culprits. At this point, you can do fancier things to (cheaply) get back precision. For example, for every call-site, you may record how many times it was used, which has a fixed space cost and cheap time cost, and figure out some similar estimate for deallocations, which would let you deal with the alignment problem a bit better.
However, the more important lesson is that, when we don't need 100% precision in an analysis, it might make sense to look at lowering space or time costs by using probabilistic representations. Michael has another paper on 'probabilistic calling contexts' looking at another application of this idea. Most of my interests in statistical approaches, so far, have been in novel ways of exploring behavior (bottleneck analysis, likely violated implicit invariants, etc) - I had not really thought about implications on runtime representations before.
Wednesday, June 18, 2008
Distractions
A very distracting day:
Hopefully I'll stop with the distractions soon... Raluca should be back and the web needs parallelizing :)
- Ras and I met with John Ousterhout (hurray Tcl!) to find we're taking almost completely opposite paths on our respective projects, yet almost completely agree on motivations and goals. Side note: I want Electric Cloud to integrate into ANT/SVN and run on EC2 :D
- Reading group day (axiomatic semantics, data flow as abstract interpretation, and Self)
- Instead of manually implementing some application consistency work, I decided to finally buckle down and understand some of the bidirectional programming work at Penn. and am finding the papers to be incredibly clearly written. However, it's a subtle topic: ... the papers understate the broad applications and motivation of the work, and it personally took hitting certain problems and trying to encode solutions to see the point of some of it.
- Foundations: Combinators for bidirectional tree transformations: A linguistic approach to the view-update problem.
- The idea is generally applicable: Relational Lenses: A Language for Updateable Views
- On deck: Exploiting Schemas in Data Synchronization
Hopefully I'll stop with the distractions soon... Raluca should be back and the web needs parallelizing :)
Subscribe to:
Posts (Atom)