Thursday, May 24, 2012

History and bookmarks in Firefox for Android

The feature I have been working on now for quite some time is one I'd jokingly refer to as "a feature I hope most users will never need". It's the system that translates your browser history from XUL Fennec to Native Fennec. New users obviously don't need this, and I hope we get a lot of new users with the switch to the native UI, so there.

More seriously, for something that I initially described in a meeting as "doing a query in one DB and some inserts in another, how hard can it be?", this turned out to be a project that ended up sinking easily over a man-month of engineering time. Of course putting it like that in the meeting was just asking for problems, and I'm going to use this post to give some background into what happened, and explain some more things about our current History/bookmarks architecture and how databases work on Android.

The need for a new History & bookmarks database

First of all, you can ask yourself why we needed to change the format of browser history in the first place. In XUL Fennec, the history and bookmarks are managed by the Places system, the exact same as is powering desktop Firefox. It's a combination of SQLite with a C++ and JavaScript control layer.

The reason for rewriting the UI parts of Fennec in Java was to make it start up faster on Android, so that the user has something he can interact with instantly. Having bookmarks and history available early is quite important as that's what the user will most likely be interacting with, either through the AwesomeBar or the home screen that Native Fennec has. So we cannot wait for Gecko to load to access its Places system - history and bookmarks must be available immediately.

Secondly, Native Fennec now also includes Sync as a true Android service that works with the Android Accounts panel, background synchronization, etc. Having the history and bookmarks available natively in Java removes the need for Sync to load Gecko, which isn't something that you'd want a background service to do.

Lastly, storage on Android devices is often fairly limited. Places databases can get quite large for large if you've been using Firefox a long time and browse a lot, for example on my current desktop system the places.sqlite file is over 70Mb. This isn't desirable on Android, so we probably want to tune history expiration a bit differently.

Evolution of the new database

Our initial implementation of an Android history & bookmarks system was to simply use the built-in default database. This has some nice advantages such as automatic integration with the default Browser. If the user is transitioning from that to Firefox he will immediately have his history & bookmarks available to him. Feedback from the community came in that they weren't entirely comfortable with this. There was a feeling that although users trust Firefox with their private data, they don't necessarily trust Android or other applications that may access this data in the same way.

We started adding our own implementation, a local database (LocalDB) using Android's SQLite library, our own ContentProvider, and a wrapper layer (BrowserDB) that allowed most of the code to seamlessly switch between using the local database and Android's database.

As we started adding more features to our bookmarks system, and Sync implementation was underway, it became clear at some point that we couldn't reasonably support them well with the limited bookmark support of Android itself. For example, by default the Android implementation can only remember the last 250 URL's you visited, which makes the AwesomeBar's search sort-of useless. So eventually, support for Android's history & bookmarks database was phased out.

Remapping Places & Database constraints

Importing the Places database into our LocalDB format is mostly a straightforward remapping, although there are some minor differences. The Places database is very normalized, and for example each bookmarks and history entry refers to an entry in another table which contains the actual URL. LocalDB is a bit simpler in this regard as the URLs are contained within the entries themselves, saving some database work - which can help on slow devices. We made a small design mistake here in the Favicons table. They're big enough that we do not want them duplicated in the database, but Favicons can be the same for multiple URLs. We caught this issue a bit too late and as a result it will not be fixed in the first release - see bug 717428. You can occasionally see constraint violation warnings in the Android logs if this problem occurs.

Another small complication in transferring bookmarks is that they form a folder (tree) structure. We added some (SQL key) constraints in the new LocalDB to ensure no folders get orphaned even if we have bugs in our first versions, but this means that we must reconstruct the tree structure in the correct order when remapping the bookmarks data. A simple breadth-first algorithm makes multiple passes over the bookmarks data until we reach the leaves. Any bookmarks left over were orphaned and are not imported.

Places "tags" are stored in a fairly complicated manner in Places (using a multitude of bookmark folders) and due to this they are not supported in Native Fennec, and likely won't be for a while, so there is no need to migrate them.

Some fragmentation pains

During the development, at some point bug reports started coming in from some users that were migrating to Native Fennec, didn't get any bookmarks & history imported, and got an error "database is corrupted". After investigation, it turns out this was problem of combined SQLite and Android fragmentation. Firefox XUL (and Desktop) ships with the most recent SQLite revision, which means that it is currently using Write-Ahead-Logging for database operations.

Various Android devices come with various versions of SQLite, and only since (approximately) Honeycomb are the SQLite's new enough to understand the new database format that WAL requires. So older Android devices couldn't correctly read Firefox's SQLite databases with their built-in SQLite support.

We ended up writing a bridge driver that uses JNI to talk to Firefox's SQLite library, and provides an interface compatible with the Android SQLite one. This turned out to be useful for a number of other things too, such as accessing the original Passwords and Form History database from Gecko. In general, this kind of experiences made the many work spent on this feature worthwhile - quite a bit of the work was usable in other parts of the project as well.

Another small fragmentation issue we ran into is that foreign key constraints are not supported on some older Android devices. Because we basically only use these for error checking, it was not a large problem.

Performance aspects

The migration initially ran in the background on the first run, but it turns out this leads to horrible performance as the user tries to interact for the first time while there is heavy disk I/O and processing happening at the same time. Because that kind of nullifies the message we were trying to send that Native Fennec is a lot faster, we opted to show a modal screen instead that informs the user of what is happening. Of  course, popping up a loading screen doesn't exactly convey a message of speediness either, but we should be fine as long as the user only sees it once and it doesn't take too long.

The latter was a bit of a problem though. Initial implementations ran at a speed of a few bookmarks or history entries per second. It's not unusual for Firefox users to have hundreds of bookmarks and hundreds of thousands of history entries. Android provides an API to allow ContentProviders to support database transactions, and implementing support for those on both sides sped up our migration speed to about 50-100 records per second. This was another example of the work on this little-seen feature providing benefits elsewhere, as Sync is now also using the faster API.

There are some interesting interactions with the batch API in Android if part of the transaction fails. Basically, it is implementation-defined what to do if an operation fails for some operations (applyBatch), and the API seems to be designed to really allow both options (continue or abort) to work. For our purposes, doing as much as possible was the better option, but this turns out to have some problems because the API to tell SQLite what to do in such a case isn't available in Android <= 2.2, which we wanted to support. This lead to some "interesting" code that may make your eyes bleed.

Sync integration 

Because that's still not fast enough to migrate a users entire history on the first run, we also made an API that allows Sync to interact with the migration feature. The first run will migrate about 2000 entries and the entire bookmarks table. For users without Sync, this is all that they'll get, but it is enough to make the AwesomeBar quite usable and covers several weeks of browsing even for moderately heavy users.

If the users have Sync set up, the idea was that Sync will continue the migration from the database before starting to use the network. Feedback from the beta made us reconsider the previous approach: the problem of delaying Sync until the Places database has been entirely migrated is that the user will not get his more recent History updates until his updates from potentially weeks or months ago have been migrated. This leads to a suboptimal experience, and it's something we're going to try to address for the release.

The XUL to Native migration includes a feature that tries to migrate your Sync credentials automatically if you have them set up in Native Fennec. Unfortunately this had a bug in the very first beta release, so if you were one of the earliest adopters this will not have worked for you. The issue was fixed in the second beta release.

Future work 

Despite the first beta being out, work on this crucial feature is far from finished. Aside from the points already mentioned above, there are a few more notable areas that we plan to get addressed before the release ships:

  • After support for the Android bookmarks & history database was dropped, users who were previously using the built-in browser have no way to import their bookmarks and history into Fennec right now. We're currently working on adding this feature as a first priority. Eventually we will add export functionality as well (though we'd rather make Fennec so awesome that nobody ever wants to use the built-in browser ever again!).
  • There is currently no expiration in the history database. So although the format is a bit more compact than Places, it could theoretically grow unbounded if the user visits a lot of sites on his phone and keeps on using Firefox. The algorithms for this that Places uses are fairly complicated, so we need to study which parts are meaningful for a mobile device, and if we want to tune it differently for typical phone or tablet use scenarios.

9 comments:

  1. I'd like to comment on some things Places related:
    1. Places adapts expiration to the device, on desktop the database may go up to tens of MB, as you said, on mobile would be a few MB (in the past there was a bug slowing down expiration, and Sync didn't behave too well in that regards).
    2. While I agree some things in Places are too much normalized, I disagree on urls. They are the largest data, so it's positive to not repeat them, especially since they have an index. Avoiding this normalization data may explode much faster.
    3. Tags format sucks, we don't want them to be folders, and they will stop being in bug 424160. It was a dumb decision taken before FF3.

    We are actively working to simplify the APIs and remove any redundancies, as you probably saw. It's a pity we didn't have the jni bridge immediately, otherwise we could have kept using Places on mobile, and just use Sync to copy bookmarks and history to the system storage. Who knows, maybe one day we'll be able to re-merge the two systems, looking forward for that.

    Good job, I know managing this stuff looks simple, but not always is :)

    ReplyDelete
  2. Hi Marco,

    some comments:

    Note that on Android LocalDB the history URLs are not repeated for each visit, only the last visit is stored. So there is only URL redundancy for bookmarks that are also in the history (a very small number), not for repeated visits to the same URL. But we save a JOIN on almost every history/bookmarks operation. Not doing it for Favicons was a bad mistake on our side and it's mentioned in the article.
    moz_keywords is a nice example in Places of a fully redundant table that should not have been normalized. (But most users don't use keywords so it's not a big deal, I guess)

    As for having the bridge earlier: it would still have been necessary to duplicate all the Places DB handling logic in Java if we do not want to load the Gecko libraries to have access to it. So this would not have changed anything.

    ReplyDelete
  3. Ah good, yes, for bookmarks may be a minimal issue, good to know.
    moz_keywords is a mess, actually you never want a field that is mostly nulls for efficiency and fragmentation reasons, though who designed the Places schema was able to do normalization without removing the sucky keyword_id field. So I partially agree that it's bad, but exactly due to the fact users have just a few, I would have kept it separated still, maybe not even a bookmark!

    About the bridge, well, surely you should have duplicated some logic in Java, though on an existing DB. So you would have saved DB design, migration and the need to redesign a system from scratch (that also involves making same errors from the past sometimes, see favicons). You would also have gained more people able to work on it. So, I think in the end would have been a bit cheaper.
    But it's good to have alternatives to look into!

    ReplyDelete
  4. adding import/export functionality - so awesome...
    Too late!
    Fennec has no bookmark backup mechanism that I know of,
    so when Ff mobile uplifted 12 from Aurora to Beta,
    wiping my entire Beta channel bookmarks empty,
    --That-- was --Not-- awesome.

    Eddie Maddox

    ReplyDelete
  5. Hi Eddie,

    the import/export thing at the very end only talks about importing/exporting them to Android's built-in Browser (i.e. the one Google wrote). Firefox/Fennec already supports importing them from old versions of itself and that is fully automatic, in fact most of the above post is talking exactly about that feature.

    Your bookmarks and history are automatically updated to the new format upon first launch. You'll see a screen informing you of that because it can take a while.

    If this did not happen for you, it's very important to file a bug in Bugzilla describing exactly what you did and what happened - we did not receive any other reports of this and as far as we know the feature is working perfectly.

    Aside from this problem, if you require backup for your History/Bookmarks, please note that this has existed for a long time in the form of Sync, which is supported both by the old XUL Fennec and the new Native Fennec. Sync exists exactly to back up and transfer your settings/history/bookmarks between Firefox and Fennec installations.

    ReplyDelete
  6. I iust felt a need to comment because the lkack of tag suppoprt is a real hindrance to how I use Firefox. On the desktop, *all* of my boookmarks are organized via tags. That allows me to have a "check periodically" tag, where something tagged that way may be a webcomic that doesn't have an rss feed (tagged "webcomic") or other items. But I alsop have webcomics tagged as "investigate later" which I *don't* check periodically. My bookmarks simply don't work AT ALL in a purely hierarchical model. From the perspective of a user, I don't care that the way Places stores tags is horrific, I need the functionality. As a software developer, I feel much the same way for a different reason: you already said Native is using its own storage over sqlite, so how Places stores it becomes less relevant. why not use a tagging schema like tag_addict based system suggested in the highest-voted answer to http://stackoverflow.com/questions/664283/scalable-database-tagging-schema ?

    ReplyDelete
  7. Gah, tag_assoc. I hate phone keyboards, and I hate autocomplete.

    ReplyDelete
  8. (To clarify further, I'd like to be able to check my periodic bookmarks on my phone. The lack of tags makes this impossible)

    ReplyDelete
  9. The way Places stores tags is very relevant because we require Sync to be able synchronize between the two implementations. And Sync is currently the only way you could ever get tags, because there's no UI for them in Native Fennec.

    Places currently stores tags as a (hidden) hierarchy of bookmark folders, with the tags being the folder names and the relevant bookmarks being the children. So saying that you require tags because your bookmarks don't work in a hierarchical model is a contradiction - it's exactly how they're implemented in desktop Firefox.

    I agree that having tags working is a desirable feature, but quickly hacking something in that we already know is going to give problems down the road is a bad idea.

    Wrt the possible new tagging schemas, see the bug Marco already linked above.

    ReplyDelete