How can I export thousands of bookmarks to a database, given that JSON is schemaless, revisited?
Lost all my work when I registered in another tab. Again:
Since 1999 I have been accumulating bookmarks. Most of them are related to a single project, and these have been filed as a foo/yr/mo hierachy. As the browser has evolved, some are tagged, some have keywords, some have descriptions characteristically modified ('GM: ' prefix) with commentary, etc.
I need to bring this hierachy to into a (probably PostGreSQL) database for analysis, but JSON is schemaless. I've read the Canovas and and Cabot paper Discovering Implicit Schemas in JSON Data (icwe2013-CanovasCabot.pdf). It's interesting work, but discovery tools would not tell me anything about e.g. how developers thought tags v keywords would be used.
I thought I would just ask for the schema (there is surely one in there), and any commentary Mozilla might care to provice. I would be happy to share the code in my intended data reduction pipeline.
All Replies (3)
The .json format used for bookmark backups has a well defined structure, but I'm not sure where it is documented. The JavaScript function that Firefox uses when reading the file can be viewed in the online source code repository here:
http://mxr.mozilla.org/mozilla-release/source/toolkit/components/places/BookmarkJSONUtils.jsm#342
The actual bookmarks as used by Firefox are stored in the places.sqlite database. If you would rather work with a relational database than a JSON file, you may be able to extract the data directly. The SQLite Manager extension is handy for browsing the tables and trying ad hoc SQL queries. However, please back up your database first.
Here is a search of the Mozilla Developer Network website for json. Something there should be able to help you.
https://www.google.com/search?q=site:developer.mozilla.org%20json&ie=utf-8&oe=utf-8&lr=lang_en
Tags are stored as a separate folder in a JSON backup and each bookmark is listed in the children array of the Tags folder.
{"index":2,"title":"Tags","id":4,"parent":1,"dateAdded":<epoch>,"lastModified":<epoch>,"type":"text/x-moz-place-container","root":"tagsFolder","children":[]}