tag:fortytw2.com,2014:/feedramblings2017-05-01T10:00:50-07:00Ian Chileshttps://fortytw2.comroot@fortytw2.comSvbtle.comtag:fortytw2.com,2014:Post/relational-dbs-on-kv-store-pt-12017-05-01T10:00:50-07:002017-05-01T10:00:50-07:00Relational DBs on Key-Value Stores<p>Over the last several years, we’ve seen the rise of fast, ordered, transactional key-value stores such as <code class="prettyprint">leveldb</code>, <code class="prettyprint">rocksdb</code>, and <code class="prettyprint">lmdb</code>. These are all very cool projects in their own rights, but perhaps even more notable are the projects built on top of them, such as MyRocks, SQLite4, CockroachDB, and Google’s F1.</p>
<p>To better understand how these work, I’ve been hacking together a basic relational database on a key-value store named <a href="https://github.com/fortytw2/voodoo" rel="nofollow">voodoo</a>. Interestingly, many of the concepts are applicable to everyday usage of a normal SQL database, especially in understanding query plans and indexes.</p>
<h1 id="starting-from-the-beginning_1">Starting from the Beginning <a class="head_anchor" href="#starting-from-the-beginning_1" rel="nofollow">#</a>
</h1>
<p>A relational database is centered around the concept of a <code class="prettyprint">table</code>, which is comprised of <code class="prettyprint">rows</code> and <code class="prettyprint">columns</code>. Notably, all tables have a column labeled as a <code class="prettyprint">primary key</code>, which is used to uniquely identify each row. In the example <code class="prettyprint">users</code> table below, <code class="prettyprint">id</code> is the <code class="prettyprint">primary key</code>. </p>
<table>
<thead>
<tr>
<th>id</th>
<th>username</th>
<th>email</th>
</tr>
</thead>
<tbody>
<tr>
<td>1</td>
<td>ian</td>
<td><a href="mailto:root@example.com" rel="nofollow">root@example.com</a></td>
</tr>
<tr>
<td>2</td>
<td>alice</td>
<td><a href="mailto:alice@security.com" rel="nofollow">alice@security.com</a></td>
</tr>
<tr>
<td>3</td>
<td>eve</td>
<td><a href="mailto:eve@pwned.com" rel="nofollow">eve@pwned.com</a></td>
</tr>
</tbody>
</table>
<p>If we had a key value store that implemented the interface:</p>
<pre><code class="prettyprint lang-go">type KV interface {
Get(key []byte) []byte
Set(key []byte, val []byte)
}
</code></pre>
<p>we could map each row of the above table onto it by setting keys for each primary key, making sure to include the table name:</p>
<pre><code class="prettyprint">'users/1' => 'username: ian, email: root@example.com'
'users/2' => 'username: alice, email: alice@security.com'
'users/3' => 'username: eve, email: eve@pwned.com'
</code></pre>
<p>And in doing so, could set and retrieve rows of our <code class="prettyprint">users</code> table by primary key. </p>
<h1 id="indexes_1">Indexes <a class="head_anchor" href="#indexes_1" rel="nofollow">#</a>
</h1>
<p>What would we do if we wanted to look up <code class="prettyprint">users</code> where <code class="prettyprint">email = alice@security.com</code>? We have no way to see what keys exist if we just have the above interface. Luckily, all of the aforementioned key-value stores implement an interface that looks more like:</p>
<pre><code class="prettyprint lang-go">type KV interface {
Get(key []byte) []byte
Set(key []byte, val []byte
Walk(prefix []byte) Iterator
}
type Iterator interface {
// get the value at the iterator location
Val() []byte
// get the current key
Key() []byte
// move to the next key
Next() bool
}
</code></pre>
<p>which allows us to traverse the keyspace by prefix. So with some code like:</p>
<pre><code class="prettyprint lang-go">iterator := kv.Walk([]byte("users/"))
for iterator.Next() {
if hasEmail(iterator.Val(), "alice@security.com") {
return iterator.Val()
}
}
</code></pre>
<p>We could find the key associated with <code class="prettyprint">email: alice@security.com</code> by iterating through all of the keys until we find a matching row (or all matching rows if the column isn’t marked unique). This is known in RDBMS terminology as sequential scan, something you may have seen in the output of a SQL <code class="prettyprint">EXPLAIN</code> before (probably marked <code class="prettyprint">Seq Scan</code>). Unfortunately, while this strategy will work, it’s not particularly fast. To make this query fast we’ll need what’s known as a <code class="prettyprint">secondary index</code>, a mapping of all emails to the keys in which they belong, akin to the SQL <code class="prettyprint">CREATE UNIQUE INDEX users_email_idx ON users(email);</code>. This is fairly straightforward, changing our keyspace to look like:</p>
<pre><code class="prettyprint">'users/1' => 'username: ian, email: root@example.com'
'users/2' => 'username: alice, email: alice@security.com'
'users/3' => 'username: eve, email: eve@pwned.com'
'users_email_idx/root@example.com' => 'users/1'
'users_email_idx/alice@security.com' => 'users/2'
'users_email_idx/eve@pwned.com' => 'users/3'
</code></pre>
<p>With this change implemented, our pseudo-go-code to look up <code class="prettyprint">alice@security.com</code> becomes:</p>
<pre><code class="prettyprint lang-go">primaryKey := kv.Get([]byte("users_email_idx/alice@security.com"))
row := kv.Get(primaryKey)
</code></pre>
<p>As one might imagine, this indexing does not come for free, as it both takes up more space on disk, and means we have to insert two keys for every one row we want to write in a table.</p>
<p>The same concept we used to create a single-column secondary index applies to the creation of composite indexes - mapping the SQL <code class="prettyprint">CREATE UNIQUE INDEX users_username_email_idx ON users(username, email)</code> to keys like <code class="prettyprint">'users_username_email_idx/eve/eve@pwned.com' => 'users/3'</code>. </p>
<h1 id="wrapup_1">Wrapup <a class="head_anchor" href="#wrapup_1" rel="nofollow">#</a>
</h1>
<p>The above techniques allow a relatively powerful mapping of a simple relational database onto a key-value store, while not holding us back as we pave the way for more complex features (<code class="prettyprint">JOIN</code> operations, “real” <code class="prettyprint">SQL</code> parsing, executing, and planning, etc.). </p>
tag:fortytw2.com,2014:Post/go-dependencies-considered-harmful2016-08-15T12:00:03-07:002016-08-15T12:00:03-07:00Go Dependencies Considered Harmful<p>In the wake of the ongoing vendoring discussions within the Go community, I think that almost all Go projects should be able rely on having few, mature, dependencies, regular builds (almost all CI systems support some sort of <code class="prettyprint">cron</code> for regular builds), and good test coverage to ensure they stay on top of their third-party dependencies.</p>
<p>Mature third party libraries don’t change their API hugely, and suddenly. Libraries such as <code class="prettyprint">github.com/lib/pq</code> and <code class="prettyprint">github.com/gorilla/mux</code> that are cornerstones of many large Go projects simply don’t become different libraries with completely different APIs overnight.</p>
<p>Using immature and unstable libraries is a mistake no matter how you look at it - if the library is still under development or not well maintained (i.e. breaks API compatibility every other week), why are you using it for a production application? It’s entirely the developer’s responsibility to review any third party dependency they introduce to their application - you wouldn’t introduce a random SaaS into your codebase without checking it out / asking around about it first, would you?</p>
<p>It’s also more than doable to build large, production applications without using any third-party code at all, for a great example of this, see Netflix’s <a href="https://github.com/netflix/rend" rel="nofollow">Rend</a>, a <code class="prettyprint">memcached</code> proxy that manages data chunking and L1 / L2 caches. The core of <a href="https://nats.io/" rel="nofollow">nats</a>, a high performance messaging system, <a href="https://github.com/nats-io/gnatsd" rel="nofollow">gnatsd</a>, is another great example with only 3 dependencies outside the main repo, on two <code class="prettyprint">golang.org/x/net/crypto/</code> packages, and <code class="prettyprint">github.com/nats-io/nuid</code>, all of which are very stable/controlled dependencies.</p>
<p>In the event that you <u>absolutely</u> have to use an unstable or immature library within your application, e.g. for an exotic file format - Go supports <code class="prettyprint">/internal/...</code> packages, into which you can copy these sort of dependencies, independently of any tooling or trust in the original maintainers, while ensuring they’re not importable <a href="https://docs.google.com/document/d/1e8kOo3r51b2BWtTs_1uADIA5djfXhPT36s6eHVRIvaU/edit" rel="nofollow">outside your application</a>. As <a href="https://twitter.com/rob_pike" rel="nofollow">Rob Pike</a> says, a little bit of copying is better than a little bit of dependency - the ultimate way to be happy with your dependencies is to eliminate them. </p>
<p>Vendoring dependencies solves a particular problem (no matter the language) where having non-absolutely-reproducible builds (not just build artifacts), not being able to deploy while GitHub is down for an hour (do your CI/CD systems work independently of GitHub, too?), and fixing small, rare, API changes will cost you thousands of <code class="prettyprint">$CURRENCY</code>. Otherwise, it is <u>not</u> an answer to using poorly maintained libraries that cause your builds to break every few hours because of <u>your</u> choice in libraries and maintainers. In all likelihood, vendoring is a solution to a problem you simply don’t face.</p>
<p>Some sort of “package manager” is equally not the solution to the problems that a lack of discipline around managing third party dependencies creates. Developers of libraries and applications alike should hold themselves to a higher standard of technical excellence; designing stable APIs and avoiding dependency trees that grow out of control. Otherwise, a tool to make things easier to manage will only mask these problems, not solve them.</p>
<p>Follow me on twitter, <a href="https://twitter.com/fortytw2" rel="nofollow">@fortytw2</a>, or shoot me an email if you think I’ve said anything interesting here and want to chat more</p>