<?xml version="1.0" encoding="utf-8" standalone="yes"?><?xml-stylesheet href="/feed_style.xsl" type="text/xsl"?><rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:media="https://www.rssboard.org/media-rss">
  <channel>
    <title>Ps on blog.qnx.moe</title>
    <link>https://blog.qnx.moe/p/</link>
    <description>Recent content in Ps on blog.qnx.moe</description>
    <generator>Hugo -- gohugo.io</generator>
    <language>en-us</language>
    <copyright>qnx.moe</copyright>
    <lastBuildDate>Thu, 02 Apr 2026 23:31:32 +0530</lastBuildDate><atom:link href="https://blog.qnx.moe/p/index.xml" rel="self" type="application/rss+xml" /><icon>https://blog.qnx.moe/logo-100.webp</icon>
    
    
    <item>
      <title>Why I refuse to use Linkedin</title>
      <link>https://blog.qnx.moe/p/linkedin/</link>
      <pubDate>Thu, 02 Apr 2026 23:31:32 +0530</pubDate>
      
      <guid>https://blog.qnx.moe/p/linkedin/</guid>
      <description><![CDATA[<p>People in the tech industry assume everyone has a LinkedIn account with an
up-to-date profile. I do have an account, but I refuse to update or use it; for various
reasons listed here.</p>
<h2 id="linkedin-is-invasive">LinkedIn is invasive</h2>
<p>Read:</p>
<ul>
<li><a href="https://browsergate.eu/">LinkedIn illegally scans your browser for things it shouldn&rsquo;t know.</a></li>
<li><a href="https://wccftech.com/linkedin-sued-for-spying-on-clipboard-data-after-ios-14-exposes-its-app/">LinkedIn was sued for spying on user clipboard data.</a></li>
</ul>
<h2 id="employment-histroy-is-private-information">Employment histroy is private information</h2>
<p>I don&rsquo;t want my employment histroy to be public. Simple as that.</p>
<p>I do not understand why employers and everyone in corpo expect everyone to
put their employment histroy and education background out open in public, for
everyone to see.</p>
<p>If you want to know where I have worked for the entirity of my career, you
should have to request that information from me; in form of my resume for example. I do not
believe this information should be publicly available for viewing without my
permission/consent.</p>
<h2 id="the-content-i-dont-wish-to-consume">The content I don&rsquo;t wish to consume</h2>
<p>Most, if not all LinkedIn posts on my LinkedIn feed are one of either:</p>
<ul>
<li>AI generated posts which say a lot yet don&rsquo;t say anything.</li>
<li>Stupid illustrations that claim to explain a very basic concept but the
analogies are bad and misrepresent the thing they are explaining.</li>
<li>The &ldquo;Insperational&rdquo; posts.</li>
<li>Marketing.</li>
</ul>
<p>I do not wish to see any of these.</p>
<h2 id="to-you-the-reader">To you, the reader</h2>
<p>If you are my employer, asking me to update my LinkedIn profile with your
organisation&rsquo;s logo or post something on behalf of you:
I refuse. And I will quit my job over it if this is a deal-breaker for you.</p>
<p>If you are a co-worker/peer/friend: now you know why.</p>
]]></description>
      
    </item>
    
    
    
    <item>
      <title>SandDB: An immutable, persistent key-value store</title>
      <link>https://blog.qnx.moe/p/sand-db/</link>
      <pubDate>Wed, 20 Aug 2025 00:00:00 +0000</pubDate>
      
      <guid>https://blog.qnx.moe/p/sand-db/</guid>
      <description><![CDATA[<h2 id="motivation">Motivation</h2>
<p>I had just began to cook my <em>super secret</em> project for which I needed an
embedded data store to back my data.</p>
<p>Now, I <em>could</em>, use a well known and well supported solution. But hey, that&rsquo;s
not very fun, is it?</p>
<p>I decided I will write my own. Not because this was an &ldquo;unsolved&rdquo; problem, but
just because I thought &ldquo;how hard could it be to roll by own key-value
database?&rdquo;.</p>
<p>I had recently read up some litrature on building simple databases from various
sources and really wanted to give it a try. The whole process has been very
rewarding. I might even give building a relational database a shot in future.</p>
<h2 id="goals">Goals</h2>
<p>My goals for the store (from the orignal use-case) are:</p>
<ul>
<li>Very fast writes single and batch writes.</li>
<li>Reasonably fast point reads.</li>
<li>Reasonably fast range scans.</li>
</ul>
<h2 id="non-goals">Non goals</h2>
<ul>
<li>Deletion: I wanted an append-only, immutable store. I did not need a delete
operation.</li>
<li>Foreign keys: I only wanted to read and write a bunch of values quickly. I did
not need a way to reference one datum from another. Well, at least not in a
way the store should be aware of.</li>
</ul>
<h2 id="inspiration">Inspiration</h2>
<h3 id="gits-reftable">Git&rsquo;s reftable</h3>
<p>Git has traditionally used files under <code>.git/refs</code> to store refs (&ldquo;loose refs&rdquo;).
This is simple and worked, however some large repositories such as android and
rails had way too many refs and lookup was noticeably slow.</p>
<p>Other problems with loose refs include:</p>
<ul>
<li>They occupy a large number of disk blocks</li>
<li>Batch reads are penalized by large number of syscalls.</li>
</ul>
<p>It should be noted that git also had the &ldquo;packed ref&rdquo; format as an alternative
but it had its own problems:</p>
<ul>
<li>Single lookup requires linearly scanning the file.</li>
<li>Atomic writes required copying the entire packed ref file for even small
transactions.</li>
</ul>
<p>This lead to the development of the new reftable format which employs an
LSM-tree-like storage engine. The performance gains are massive in terms of both
query time and storage.</p>
<p>For more information, see <a href="https://git-scm.com/docs/reftable">reftable page</a>
in git&rsquo;s documentation.</p>
<p>This facinated me and I dived a little bit into the design and implemenation
details of this new format.</p>
<h3 id="rocksdb">RocksDB</h3>
<p>While looking around the ideas and techniques behind the reftable format, I
found a few more key-value databases. The one that perticularly stood out to be
was RocksDB. Reading about it, I discovered the concept of Log Structured Merge
trees (LSM trees).</p>
<p>I also named this project SandDB because I was inspired from RocksDB.</p>
<h2 id="log-structured-merge-trees">Log-structured-merge-trees</h2>
<p>The core and heart of SandDB is LSM tree. A data structure designed for very
fast writes.</p>
<p>LSM trees have a tiered storage model where each tier/level is more &ldquo;permanent&rdquo;.</p>
<h3 id="memtable">Memtable</h3>
<p>The first level is (usually) backed by an in-memory data structure. This is
called a &ldquo;memtable&rdquo;. SandDB uses Rust&rsquo;s <code>BTreeMap</code> (for no other reason than
it is readily available for use in the language).</p>
<p>All the new writes are added to the memtable and no files are touched.</p>
<p>If you can&rsquo;t tell, this is <em>very</em> fast since we inherit <code>BTreeMap</code>&rsquo;s insertion
time complexity i.e. O(log n) where n is the height of B+Tree.</p>
<h3 id="sstables">SSTables</h3>
<p>Of course, entries in the memtable will eventually have to be written to files
in order for the store to be persistent.</p>
<p>Once the memtable reaches a certain threshold, it is flushed to an SSTable file.
This is an immutable file i.o.w, once created it is never modified.</p>
<p>We must also flush the memtable to an SSTable when the database exits so that we
don&rsquo;t lose any entries.</p>
<p>SSTables are binary files. I have documented it&rsquo;s format <a href="https://github.com/quanta-kt/SandDB/blob/master/docs/sst-file-spec.md">here</a>
if you are interested.</p>
<h3 id="compaction">Compaction</h3>
<p>Now since we create a new SSTable on each flush, we will inevitably end up with
several such files over time. This can be a problem for querying as we might end
up having to scan a lot of different files. Opening files also add to our
number of syscalls per query.</p>
<p>We must therefore occasionally <em>compact</em> our SSTables, i.e. merge smaller
SSTables into a larger ones.</p>
<p>SSTables are tiered. When a memtable is flushed, a level 0 SSTable is created.
When we have &ldquo;way too many&rdquo; SSTables, we initiate a compaction: gather all level
0 SSTables and merge them into a single level 1 SSTable.</p>
<p>When we have enough level 1 SSTables, we merge them to level 2. At the time of
writing, SandDB only merges upto level 2. We also merge several level 2 SSTables
into one if there are too many of them.</p>
<p>We only pay the cost of compaction once in a while.</p>
<h3 id="manifest">Manifest</h3>
<p>An additional file called manifest is kept around to track the SSTables and
their metadata. It includes:</p>
<ul>
<li>ID of the manifest file, using which we can derive the filename.</li>
<li>It&rsquo;s level.</li>
<li>The key range.</li>
</ul>
<p>Each time an SSTable is added or removed, the manifest is updated to reflect
the same.</p>
<p>When querying, the reader looks at the manifest and figures out which SSTables
might contain the key it is looking for and reads those in the order of their
(ID, level). The ordering matters because we want to ensure we look in the most
recently written SSTables first.</p>
<p>We only open the SSTables which the manifest tells us might contain our key.
This way we don&rsquo;t have to open every single SSTable and we save reads and
syscalls.</p>
<p>We can also go beyond having a key range and add per-SSTable
<a href="https://en.wikipedia.org/wiki/Bloom_filter">bloom filters</a>
in the manifest. This would further reduce the number of files we would have to
open and read. I plan to implement this sometime later in future.</p>
<p>The manifest file is a WAL-style file. When a new SSTable is added, an entry
saying &ldquo;added sst x&rdquo; is appended to the file. Similarly, when an SSTable is
deleted, an entry saying &ldquo;deleted sst x&rdquo; is inserted.</p>
<p>Why do this instead of rewriting the manifest? For atomicity (and possibly for
isolation, in future).</p>
<p>If we replace the entire contents of manifest instead of appending, a reader in
the middle of reading manifest might end up reading first half of the file
before the sst was added and the later half after.</p>
<p>I will admit that SandDB does not yet allow concurrent read/write, however I do
see myself sitting down to implement it in future. Soon (TM).</p>
<p>I have documented manifest file&rsquo;s format here
<a href="https://github.com/quanta-kt/SandDB/blob/master/docs/manifest-file-spec.md">here</a>
if you are interested.</p>
]]></description>
      
    </item>
    
    
  </channel>
</rss>
