Recently, we’ve been discussing how best to weight each package that shows up in a dependency tree of an Open Source project. Some discussion can be found here.
Background
Each time an open source project is published, installed, or maintained, a list of “dependencies” is installed. Dependencies were revolutionary. It meant that instead of having to bundle my code along with everyone else’s code that I reference, I can simply denote that my project “imports X package”. This “dependency” is taken note of within a package manifest file. The benefits were many-fold. First, it meant my package code was smaller. All run-time dependencies would be installed on the fly, and the code I wrote is then isolated. Second, it means that I am now referencing someone else’s package. So if they improve their package, they can then bump the version number, I can accept the updated version of the dependency, and benefit from the improved code of my dependency at no work to me.
Now that we’re attempting to fund Open Source, the question arises as to “what percent of a donation does the core package deserve” vs “what percent of a donation does a dependency package deserve”. We refer to this relationship as “weighting” i.e. how much “weight” does a dependency have on the entire “mass” of a project.
In other words, if I own project A, and project A depends on project B and project C, and project A receives a donation of 10 dollars, how much does project A deserve vs how much should be shard with project B and project C?
In practice
At Flossbank, we’ve settled on an initial solution where if project A received 10 dollars and they depend on project B and C as top level dependencies, they’d split the donation evenly with them i.e. project A would receive 3.33 dollars, as well as B and C.
Now this gets slightly confusing when you introduce a real world example of nested dependencies.
Sometimes this not only means nested dependencies, but cycles as well. Package managers don’t care about cycles because they just care about “did we install every package referenced”. They won’t get into recursive loops. In our case however, we want to divvy up donations in the most equitable way, so even if we’ve see a package before, we continue going down the tree until a set epsilon. You can see our code here.
In order to distribute donations, we first craft a “weight map”. At each level of a dependency tree, Flossbank splits the weight at each level with the top level dependencies of that specific package.
For example, we give package “None” a weight of 100, we would then split the 100 equally with it’s top level deps (see circled top level deps). This means “none” would receive 1/7 of the 100 weight (splitting the 100 equally with it’s 6 top level dependencies).
On the next traversal, we would see that “commonwealth” received 1/7 of 100 weight (14 weight) and would split commonwealths 14 weight equally with it’s three dependencies (in, the, and pennsylvania). Meaning commonwealth would then have 1/4 of the 14 weight it received.
We then continue down the tree, at each level splitting the mass of that level equally with it’s dependencies.
Why
We started with this solution for a few reasons
- Simplicity: There is ample opportunity to make a confusing weight structure, where leaf nodes are heavier, nodes with more code are heavier, and various other heuristics. We would love to dive into those, but we wanted to start with something simple.
- Equality: Who is to say how important a dependency is to the package pulling it in? After much discussion, we’ve found that if a package is going to be pulled in, it deserves its share of the weight. After all, it must be contributing value in order to have been pulled in, otherwise the original author would have written and decided to maintain the code themselves.
- Frequency vs. depth: We tried to find a solution that adheres to both the frequency that a dependency is used in a tree as well as how deep we found the package. In our solution, a package can appear many times at many different depths, accumulating “weight” at each occurrence. This weight is summed up to be a total “weight” for that package.
Discussion
We’d love to start a discussion around various other algorithms for weight distribution. This is just one example, but we’d love to explore models that amplify leaf nodes, amplify parent nodes, or any others. Please share your distribution algorithm ideas with some possible execution notes as well as reasons why you think it’d be good for the Open Source ecosystem.
Also if you have any questions or comments on our initial solution, please share!
Thanks