Why FeatherDB exists and why is it written in Java
Marcus R. Breese
Posted: Apr 16, 2008
Updated: May 12, 2008

Sorry about the formatting errors on some comments... they've been fixed (I think).

I think that I best addressed my lack of willingness to use Erlang here: http://reddit.com/info/6fvf1/comments/c03q623. Suffice it to say that my decision to rewrite the application was a little more complicated than my unwillingness to use Erlang. 

So here I want to address a little better the differences between FeatherDB and CouchDB, and more importantly, where I think we go from here.

 


 

Data model 

However, the biggest reason for the rewrite was that I didn't think that CouhcDB the data model quite right. I think that the ability to allow common attributes in documents is an important one. At least, it is for the application that I was planning on using it for.  The use-case that I'm specifically thinking of is the ability to tag pages in a CMS.  If you want to add a tag to a page, using CouchDB, you simply create a new revision of the document, and add the tag.  Simple.  But inefficient.  You're effectively duplicating the contents of a document in the database to add common (meta) data to a document.  The concept of a tag logically can apply to all revisions of a document, so there shouldn't be the need to create a duplicate.

(Note: This inefficiency only applies to the storing/retrieving of documents, running/creating views is a different issue, and I'm not quite sure how CouchDB would handle it) .

 

Java

Why is FeatherDB written in Java?  Another way of saying this is why didn't I just work on the Erlang version?  One reason was that I was primarily using a Windows machine for development at the time, and the development process for CouchDB (even to just get it running) was quite tortuous.  I had to switch over to a Linux box just to write the Java library to connect to it.  Why? Because I couldn't get the most up-to-date version running on Windows.  This isn't necessarily a big deal for me, since I use a Windows, Mac and Linux machine on a daily basis.  However, for some this is a deal-breaker.   So, I wrote it in the language that I was working with at the time... it just happened to be Java.

Would there be the same issue had I written this in Python or Ruby? 

Now, there is one big (and I mean big) advantage that Java brings to the table, and that is the ability to run multiple languages (including Javascript) in the JVM.  This adds one feature that CouchDB cannot add without increasing overhead: the ability for a Javascript view to directly query the database.  This feature allows for the ability to have 'joins' in your views.  Meaning that the Javascript view function that you run can pull other documents from the database.  This makes your view code much more flexible.

It is perfectly reasonable to argue that you may not want this feature, but I'm not sure how you'd do it with the current view system in CouchDB.  Maybe it is possible, and I just don't know it.

 

Scalability

One of the biggest criticisms was the lack of distributed scalabilty of FeatherDB.  My flippant reply was:

Re: Distributed - Via the plugin backend storage mechanism, it actually wouldn't be that hard to make the system distributed. There would need to be a new backend written that delegates requests to other servers. The trick is to figure out the logic to correctly 'guess' which server has the data you're interested in. However, this is entirely possible.

Now, this is admittedly a little simplistic. Writing a new backend to farm a single request to a number of other servers and then 'reduce' them back to return to the user isn't difficult. The pseudo-code is very similar to the in-memory-cache implementation that is currently in SVN. For a brute force, dumb system, it would work. What is difficult is figuring out what other servers to farm the request to. This is the secret sauce of any distributed system, and isn't trivial regardless of language.  As was said by one of the commentors, there is no such thing as a simple distributed system. Perhaps this is were Erlang's sweet spot is... I don't know.

Unfortunately, from what I've seen, the simple distributed scalability that is advertised with CouchDB isn't quite a reality. But this is to be expected... CouchDB is very much a work in progress. People act as though it is a finished product, but this is far from the case. 

 

Motivations 

I think the bigger question is what are my motivations for writing this, and what are my motivations for releasing it to the public.

I love the things I read in Damien Katz's blog and his descriptions of CouchDB and non-RDBMS style of data management.  I first heard of CouchDB from Assaf Arkin's blog, and I have continued to follow it ever since. 

Now, I don't want to compete with CouchDB, and I'm not looking to make any money from this.  I've released this because this is something that I worked on a few months ago, and felt bad seeing it sit in my private SVN repo gathering dust.  That and, these are some concepts that I've been looking at for a while.  I guess that my ideal outcome would be for some of these concepts to make their way into CouchDB so that I don't have to think about it.  But, I'm more of a 'working-code wins' kind of guy.  So I prototyped somethings up, and thought that maybe someone else could find it useful.


Future

I'm not too sure where FeatherDB will go.  As I said, I'd be happy to use CouchDB (or maybe I'll have to look at StrokeDB or GrassyKnoll), but for now, it doesn't fit my needs.  Until it does, I'll keep updating FeatherDB... and I'll be following CouchDB closely.

Comments

Cortland KleinApr 16, 2008 7:27 PM
See http://www.cmlenz.net/archives/2007/10/couchdb-joins for an example of how CouchDB can handle joins. This is how Ajatus (based on CouchDB) uses its TagWithDocument view. (Run Ajatus and then look at it's CouchDB views.
Marcus R. BreeseApr 16, 2008 7:34 PM
Very true, you can mimic joins in CouchDB using complex keys in views... but I think that the example breaks down for more complicated cases.

That being said, I'm not sure you'd even _want_ to see joins in CouchDB.
JanApr 17, 2008 6:16 AM
Heya Marcus,
that is correct, do we want to see Joins in CouchDB… :)

While my initial reaction to FeatherDB was thinking what a crazy guy you are (I'm not fond of Java, Windows or rewrites because of lazy reasons :-), I now think that this 'competition' is a good thin: If you or anyone takes FeatherDB in a direction that is interesting for CouchDB, we adapt and everybody wins. This is a good thing, keep it up!

There are a few things that we could discuss in terms of "can do with CouchDB" and I'd like to invite you to our mailing lists (http://incubator.apache.org/couchdb/community/lists.html) and IRC channel (#couchdb on Freenode) for future discussions.

Disclosure: I'm deeply involved with CouchDB.

Cheers
Jan
--
BartApr 17, 2008 10:29 AM
Ehm, I think your use case for the tag is wrong IMHO. The tag must apply to a revision as it can change as the contents changes over time. I'm not familiar with the update mechanisms of couchDB but it should not be necessary to keep a duplicate of the data. Just keep the diff (or keep the latest release with a diff to go back to the previous revision).

Secondly, I thought that the answer to all java scalability issues these days is Terracotta ;-)

Anyway It's great that you are doing this and I hope this project can stay inline with the spec for couchDB (I think Sam Ruby is trying to put that on paper).
Noah SlaterApr 17, 2008 12:14 PM
Hey, you have based some of your comments on misconceptions about CouchDB's revision system, see:

http://wiki.apache.org/couchdb/DocumentRevisions

Add comment

Your name
Your link