Sector(s)
A prominent public interest research and advocacy organization based in New Delhi focuses on promoting sustainable and equitable development.
For this project, the organization sought to migrate over 300,000 articles from four separate content databases into a unified Drupal platform.
Key Highlights:
- Migration of data (with massive data transformation and clean-up) from 3 different database sources
- Migration scripts for ongoing data migrations from MS-SQL to Drupal (MySQL)
- Over 8000 tags as part of the Taxonomy
- One of the largest installations of the powerful Apache Solr search for free text searching and tag-based (author and source as well) filtering of articles
- Use of Panels2 with node-queues (almost a mini-CMS implementation) to enable editorial team to create tag-based in-depth pages anytime
About the project
The Goals
Our client was inspired to build an environment portal as part of the push for portal development from the National Knowledge Commission, Govt. of India. They decided to release 20 years of their research articles spread over 3 proprietary systems, including a Library Management System, a Content Management System written in ASP/MS-SQL for managing the website of their premier Science and Environment magazine - Down to Earth (DTE) - and some Access Databases.
The total number of records exceeded 250,000. They also had a very extensive tag vocabulary, or thesaurus, of their library tags, organized as What, Where and Who lists, which represented nature of articles, geographies they represented, and people they were associated with including the authors, respectively. All the articles in its library and DTE database had been classified tagged with these.
The goals were to:
- Migrate all 300,000 records from these scattered systems into one common system
- Have an excellent tagging system to cross-access content across the site
- Have a powerful search to enable environment journalists and researchers to enable content filtering thus reaching 100% of the content available
The Challenges:
Key challenges faced during migration to Drupal:
- The MS-SQL database architecture was poor with data redundancies
- There were some cases where UNIQUE fields had duplicate values
- Database Normalization was missing. Thus the flexibility, data integrity and efficiency were not so good. We were required to safeguard the database from anomalies
- Since the data was populated through a Library Management tool, the data were having many special characters which were stored in database. So it was a time consuming and repetitive work as the our team to clean up the data without the availability of any documentation of the database
- Designing the MySQL database to accommodate the specific concepts of box stories and cover stories required, understanding the relationships of tables and data, the way it was maintained in MS-SQL database. There were many redundancies/anomalies in the database for these tables.
The Solution
- We proposed to break the project into multiple phases, with the first being a pilot for data migration of the DTE database, which formed the bulk of the data, into an open source Content Management System.
- They then set up Drupal (with a default theme) and build all the required modules to start showcasing the content in the desired manner. A key component of these phases was the setting up of a powerful search. Apache Solr was researched with, and selected as the choice of the search.
Why Drupal was chosen
The initial choices for the project were Drupal or TYPO3. Drupal was eventually chosen, for the following reasons:
- Drupal had excellent vocabulary and tag management core modules, which would have to be written in TYPO3
- Drupal's caching mechanism is better than TYPO3; this would be a critical requirement in the high volume traffic that our client was expecting for the portal
- A powerful and fast search engine was required for searching through 250,000 articles. Drupal had an Apache Solr search module integrated with it, which was a candidate for implementing this search
Technical Specifications
Business Benefits:
- Migration of over 300,00 articles, from 4 disparate content databases, to Drupal was made possible
- One of the largest installations of the powerful Apache Solr search for free text searching and tag-based filtering of articles