Jamie McCracken (jamiemcc) wrote,
Jamie McCracken

Tracker - Big Things Come In Small Packages

I know I have not blogged much about whats going on in tracker land lately but big changes have been made that will have a hugely significant effect on things to come.

There has been an explosion of activity on the tracker mailing lists with loads of competent hackers submitting tons of patches and keeping me busy! Its always a good sign of a very successful project too :)

I would like to say a big thank you to Laurent Aguerreche who has been behaving like a full time developer on tracker and has given us rocking patches for Evolution, Thunderbird and KMail indexing support

Also Edward Duffy deserves special praise for submitting awesome code to replace our big dependency on libextractor with custom extractors as well as his help in designing the soon to be launched rocktastic tracker Gui (more on that later).

Thanks also goes out to all the other contributors big and small - tracker is rapidly becoming the state of the art thanks to you guys!

Anyways, a summary of what's in store:

[1] Tracker is getting lighter, faster and leaner - if you thought the old version of tracker was fast and light then be prepared to be stunned! On a 1GHz 256MB RAM machine, indexing 1GB of docs, images and music files (approx 1/3 of total file size each) takes under 40 minutes. Search results average under 20ms and RAM Usage is as low as 3MB resident of which 2MB is shared and only 1MB private - no other system that I know of gets even close to tracker's speed and efficiency!

[2] Database wise we have shifted from embedded MYSQL to Sqlite which has significantly improved the build procedure and made the tracker daemon a fraction of its previous size (now less than 200KB without debug)

[3] Whilst Sqlite has no full text indexing support, we have built a custom full text indexer using the super fast QDBM file based hash table and according to some benchmarks inverted word indexes using QDBM are faster and massively lighter than Lucene (java) ones! We are also now fuilly scalable for indexing massive amounts of data without speed loss thanks to hashtables having O(1) performance regardless of size of data indexed (all the big search engines like google, lucene, alta vista at al use file based hash tables because of this)

[4] Our custom indexer is tightly coupled to our sqlite DB and fully optimised for both super fast search and indexing speed - there is no overlap of data storage as all metadata is stored in sqlite and QDBM is only used for the inverted word index/hash table (which takes a word and returns a list of DocIDs and scores that match).

[5] We can optionally make use of Pango word breaking in our parser so we can handle all Pango supported languages including CJK out of the box (something no other indexer can do AFAIK!)

[6] We use the Snowball stemmers to support a wide range of languages so making tracker a highly accurate and relevant search tool.

[7] Tracker is fully optimised for speed - we not only make use of the fastest DB and fastest file based Hash table but its also coded in 100% native c code which makes ample usage of the performance enhancing g_slice and uses qsort for high speed sorting. Im confident no other system can (significantly) beat tracker here!

[8] Tracker is amazingly robust and stable thanks not only to some excellent testing by users on the tracker mailing list but also to our investment in a full test suite for the new indexer and parser. We also only use very stable and mature components (sqlite & QDBM) which coupled with our KISS design in tracker makes us confident that tracker can deliver real stability and maturity well beyond its age.

So when will the next release happen?

Hopefully this weekend when I have had a chance to finish off the GUI. Of course if you cant wait , feel free to check out the latest from gnome cvs

  • Post a new comment


    default userpic

    Your reply will be screened

    Your IP address will be recorded 

    When you submit the form an invisible reCAPTCHA check will be performed.
    You must follow the Privacy Policy and Google Terms of use.