Articles

Sharing Content

At the moment we have one drupal install to hold all our sites (see previous article). Whilst this allows for easy content sharing, it isn’t scalable in the grand scheme of things. So we’re planning to move to an Aegir hosting system to deploy multiple different sites in an easily managed way. However our problem comes in that we won’t then be able to share content between sites which is one of our primary requirements. Having looked around there doesn’t seem to be any module that fits our exact requirements, so a new custom module might be the way to go. What follows is a rough outline of how I see it working, but very happy to comments / suggestions from others as to how best tackle it.

Requirements

  • Provide a centralised content store of information
  • Make that data available to external sources
  • Allow external sources to query the content available and download content.
  • Maintain a link between the downloaded content and the content store and keep the two in sync.
  1. We need to be able to recognise when content has been changed manually and not overwrite that when doing the sync.
  • Information download should be automatic and manual
  1. Automatic: Set up parameters (taxonomy terms?) to download information automatically.
  2. Manual: Search for information and download
  • BONUS FEATURE: Allow for syncing of content on a field by field basis. i.e. including/excluding certain fields as required.

Design the Custom Module

Content Store

This is perhaps the easiest of the requirements to set up. Using CCK we can build content types for all our different types of information. We setup vocabularies and tag the information as required thereby creating a searchable database of content. We’d also want to use the UUID module so we have unique IDs for our content for the purposes of maintaining a link.

Content Store Services

The content store information needs to be made available externally, and the Services module provides an ideal way to do this. We’d need to setup the following services for use by the sync module.

  1. Retrieve taxonomy terms
    1. We want to be able to allow for the filtering of content based on taxonomy term. This may just be a user friendly way to cut down content, but we may use it to restrict searching of content to predefined areas.
    2. Retrieve content types
      1. We don’t want to download content for which we have no destination. So the resulting list would be filtered depending on which content types are available on the external install.
      2. Retrieve content summary
        1. Uuid, title, *content type, *taxonomy terms, created, *updated
        2. The items with * would be filter terms based on what had been selected in the GUI.
        3. Retrieve nodes
          1. Send a list of UUIDs and get back the content items.

There may be others that we’ll need, but for the time being that should be enough for our base requirements. As to how the data is presented to the receiver…I’m open to suggestions. I’d like to make them REST calls and return JSON, but that may change further down the development process.

Content Receiver

This is the real meat to this project and what is going to take the most time. I’ll expand on this in the future hopefully, but for the sake of time, here’s a rough idea of how the manual import would work. For the sake of keeping it simple, we’ll assume everyone can see everything.

  1. Retrieve the list of taxonomy terms in order to create a combo box for the filtering of content.
  2. Retrieve the list of available content types and filter according to what we have available locally.
  3. Based on the current taxonomy and content type selection, retrieve a list of nodes from the content store, ordered by last updated date.
  4. Display form to user. Something along the lines of…ContentSyncMockup
  5. User selects the content items that they want and press “Sync Selected”.
  6. UUIDs are requested from the content store and the information inserted into the local database.
  7. Additionally the UUIDs are inserted into a separate table for logging which UUIDs need checked to see if the content store has been updated.
  8. If a content item is changed locally, the UUID is removed from the table and the sync is broken.

For the automatic import, we’d probably need a screen like the following.

SetupContentImport

Cron

We’d need to set up a regular cron job that polled the content store to see whether any information needed to added/updated for both the manual and automatic versions.

Alternatives

There is a slight alternative to the above. Instead of the client having to poll the server, we instead use the pubsubhubbub protocol to have information automatically pushed out.

Conclusion

As you can see, it’s a fairly complex module. I’d be more than happy to hear from anyone who thinks there are modules out there that can do this already. With so many modules available it’s easy to overlook the obvious.

Drupal

Sharing information is without a doubt something that Universities are just not good at. We hold large amounts of really useful information in corporate systems, but it tends to sit there and very little actually gets distributed out with the bounds of central services. Sometimes it’s down to the technicalities of sharing that information. If the system in question is not capable of making its data available in some kind of format, you’re stuck. However sometimes it is more to do with the politics behind the data. I remember working in one place where getting the data into a format that my scripts could consume took no more than a day or so. Arranging the authorisation to actually get and use that data took over a year of being passed from pillar to post to get it rubber stamped.

One of the great failings of large organisations is not allowing the correct people to access the information that can help them do their job. Very often we’re aware that there are information and systems out there, but in the interests of timescales it is often easier to gather our own information / design our own systems, instead of trying to plug into the corporate systems. This invariably leads to duplicate and incorrect information being stored, which in the realms of data protection results in a myriad of potential legal action.

If we as higher education institutions are to survive the 21st century we need to use the data we have to its maximum effect. 10-20 years ago the idea of sharing information and letting others out with central services crunch data wasn’t the norm. However with a host of data analysis tools being available, it’s now to be expected that others should, and do, want to crunch the numbers and analyse the data for themselves.

Therefore as we move into the future and deploy new systems, data sharing must be thought of and built in from the start rather than being thought of later. Failure to do this results at best in increased workload for those on the fringes, at worst it results in lost income and opportunities.

Rant over; let me outline the ideas we had behind sharing information in a web context.

As I detailed in a previous article, the College of Life Sciences has to be able to cope with potentially over 100 different sites, each with a different look and feel. However one thing we’ve been very aware of for a long time is that a lot of the information presented on these sites is the same (or very similar). We were also aware that the same person could be updating several sites with the same information. Additionally, the information that was being presented was already being maintained centrally for the main College site. It therefore made sense that any CMS implementation had to allow for the sharing of this information across sites. In effect we needed a “create once, share often” methodology.

Sharing information is one thing, but we also wanted to ensure that by making information available, we weren’t simply adding to the already large burden of work those who were updating sites had. So it wasn’t simply enough to make the data available, but also to proactively push that information out to the sites that needed it. The result being that our information is publicised more widely than it ever was. To illustrate this, let me give you an example.

Let’s assume a fictional member of the College, and call him Professor A (very imaginative name, I think you’ll agree). Professor A works in the School of Research in the fictional division of Biological Research and has just been awarded a new multi-million pound grant to continue his research. As is the norm, the College writes a press release for the website and it is placed on the main College site. Previously the news item would have sat there in isolation until either links to it were manually made from other pages, or the text was copied manually onto other site. You’ll already see not only the wasted effort, but the wasted opportunity with the above scenario. With our new system, by tagging the news item correctly, links to the item automatically appear on the School of Research site, Division of Biological Research site and Professor A’s own personal website. The only additional effort has been a few extra clicks and it’s job done. Now no matter which level someone accesses information on Professor A, they’ll be able to see exactly what he’s been up to.

So we had a good method of sharing information, but we wanted to take it further still. We knew that information was being held in the College about everyone who worked there. Again, it was centrally held, albeit in a separate database, and contained information that was useful. So we went about integrating it as well. In our above scenario Professor A could potentially have his contact details on any and all of the sites. If those were to change for any reason, we’d have to change it in four different places. Now, because information is created once and used often, any changes made to the central authoritative source, are reflected quickly and easily on the websites in question.

We pushed it further still. We not only held contact information on a person, but organisational information, in other words, where in the college they worked. Using that information meant that we can show at all the different levels who our Principal Investigators are, who is in their group, etc. And because this information is updated centrally, anyone moving divisions/groups or starting/leaving is displayed automatically without us having to lift a finger.

The above examples are by no means revolutionary, but hopefully they demonstrate the big plan that we have. Whilst we want to give our customers as much control as possible over their own sites and provide infrastructure and support that starts to push forward rather than catch up, we need to do that in a manageable and cost effective way. Our mantra is to develop once, deploy often. So as we design new functionality we always try to ensure that the work can be transferred easily to other areas if required.

Drupal

One of the big problems we have is that we have a staging server and then copy that information across to our live server which is proxied via another server. Unfortunately because Drupal uses absolute URLs this breaks things like images. A quick change to the way image cache generates the URLs does the trick.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
function cls2_imagecache($presetname, $path, $alt = '', $title = '', $attributes = NULL, $getsize = TRUE) {
  // Check is_null() so people can intentionally pass an empty array of
  // to override the defaults completely.
  globalglobal $base_url;
 
  if (is_nullis_null($attributes)) {
    $attributes = arrayarray('class' => 'imagecache imagecache-'. $presetname);
  }
  if ($getsize && ($image = image_get_info(imagecache_create_path($presetname, $path)))) {
    $attributes['width'] = $image['width'];
    $attributes['height'] = $image['height'];
  }
  $attributes = drupal_attributes($attributes);
  $imagecache_url = imagecache_create_url($presetname, $path);
  $regex = "^" . $base_url . "^";
  $final_url = preg_replacepreg_replace($regex, "", $imagecache_url);
  return '<img src="'. $final_url .'" alt="'. check_plain($alt) .'" title="'. check_plain($title) .'" '. $attributes .' />';
}
 

Drupal

User Rating: / 1
PoorBest 

The College of Life Sciences College of Life Sciences at the University of Dundee University of Dundee recently launched a new website based on the Drupal Drupal Content Management System. What follows is a description of how the site was built and the hurdles that we have had to overcome / are still overcoming in order to promote the College and boost awareness.

Background

The College of Life Sciences has a somewhat complicated organisational structure. The College itself is split into two schools (School of Learning and TeachingSchool of Learning and Teaching, School of ResearchSchool of Research). The School of Research is then split into twelve different divisions covering a variety of research areas. Each division is then split into a number of different research groups which total approximately 80.

Each of these distinct areas (schools/divisions/groups) can have their own site (same domain for the most part but different directories, but we also need to be able to cope with different domains). However there is a lot of information that is common to each site that needs to be shared between them. This ranges from staff profiles (which are imported from an external source) to news and event items. Much of this information is maintained centrally, but needed to be filtered to each site automatically. Additionally, we also needed to provide individual site administrators with the ability to create their own news/event/content items that were specific to their own site.

To top it all off, we wanted to provide a staging system that separated the live site from the staging site to reduce the likelihood of problems on the staging site affecting the live site.

How we did it

Architecture

Our Drupal environment is made up of three servers: live, staging and development. In order to synchronise information between these servers we use scripts that make heavy use of drushdrush. At their most basic these scripts copy the database, then the files and then turn on/off various modules as required. This allows us to mirror current information onto the development server for testing new themes/modules in a semi live environment so that we can be relatively sure they will work when copied onto the staging/live servers. Every hour everything from the staging site is copied across to the live site.

The live site is isolated from the rest of our infrastructure. Nobody has access to it, with information being pushed to it via drush from the staging site. In the event that the site is hacked, it should be a case of synching with the staging site to remove any problems. It itself sits behind a proxy server to further speed things up.

Drupal Install

We have a single Drupal 6 installation that manages all our different sites. Using a combination of modules such as Virtual Sites Virtual Sites and Organic Groups Organic Groups we are able to deliver the look of different sites (same domain, different directory).

Managing Content

Utilising the CCK CCK module, different content types were created for each distinctive content type (news, events, profiles, etc) and taxonomy created to represent the structure of the College (schools, divisions, groups, etc). As each content item is inserted, it is tagged with the area to which it belongs. As each Organic Group is created, it is tagged in a similar way which allows us to design views that can be used in different sites to show information that is related to that site. It also allows us to enable users to create their own content items which aren’t replicated to other sites.

Performance

In order to boost performance we use the Boost Boost module. This works really well for us as most of our traffic is anonymous. However it does pose problems when information is being replicated across to the live site. We came across an issue early on where old information wasn’t being expired when it was updated on the staging server and transferred to the live server. After many days of digging into Boost we discovered it was because of the way that it flagged stale content. After much fiddling we managed to fix this problem.

Problems

“Could you do it this way?”

Standardising views is a great idea in principle, but we’re starting to find that each group wants things slightly different. Unfortunately this has resulted in a large increase in the number of views we have to look after and is starting to get a bit unwieldy. Whilst not a huge problem in theory, it goes against our mantra of “develop once, deploy often”.

Modules, modules everywhere…

As we bring more groups onto the system we inevitably find that they want to do “new and exciting things”. The beauty of Drupal is that if you want to do it, there is most likely a module out there that will do it for you. However it means that our codebase is increasing, things are starting to slow down, and makes for doing routine upgrades a real pain. Whilst Drush goes a long way towards making that easier, testing each site after each upgrade is tedious at best.

Performance

As I mentioned above the Boost module works well for us at the moment as most of our traffic is anonymous. However we have plans in the future for making the site more interactive and personalised. As Boost won’t cache authenticated content, we are going to have to look at other ways of speeding the site up once we get to this stage.

Staging

Whilst our plan for staging servers seemed sensible at the time, as we move towards a more personalised experience for users, it becomes impossible to work with. How exactly we replicate end user content from the live site to staging without overwriting anything that has been input since the last sync, and vice versa, is a tricky beast to figure out.

Future Plans

For our first foray into a Drupal site we’re quite happy. Compared to the system that we had it is infinitely more configurable and we can set up pretty much anything we want. However with potentially over 100 clients all wanting different things, our current implementation is going to struggle. We could limit what we’ll implement, we could insist on a one design to fit all, but with the web changing so quickly and user requirements changing all the time, we need an infrastructure that can react quickly and easily to those changes.

We’re currently looking into implementing AegirAegir. If successful this should give us the ability to spin out Drupal installs in a managed way. What we also need to get better at is trying to put as much configuration into code as possible in order to realise our “Develop Once, Deploy Often” mantra. Through a combination of Features and upcoming improvements in Drupal core for this kind of thing, it should be manageable.

The big hurdle to implementing Aegir is that it breaks our ability to share content between sites. How we go about that I’m still investigating, any and all suggestions very welcome!

Drupal

An example custom type taken from our model.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
        <type name="cls:events">
<title>CLS Events</title>
<parent>cls:www</parent>
<properties>
<property name="cls:eventspeaker">
<type>d:text</type>
<multiple>true</multiple>
</property>
<property name="cls:eventcontact">
<type>d:text</type>
<multiple>false</multiple>
</property>
<property name="cls:eventlocation">
<type>d:text</type>
<multiple>false</multiple>
</property>
<property name="cls:eventdate">
<type>d:datetime</type>
<multiple>false</multiple>
</property>
<property name="cls:eventduration">
<type>d:text</type>
<multiple>false</multiple>
</property>
<property name="cls:eventtype">
<type>d:text</type>
<multiple>false</multiple>
</property>
</properties>
</type>

Config I've tried for the Drupal integration

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
$conf['cmis_sync_map'] = arrayarray(
  'page' => arrayarray(
    'enabled' => TRUE,
    'cmis_folderPath' => '/WebContent/Website'
  ),
  'cls_event' => arrayarray(
        'enabled' => TRUE,
        'cmis_folderPath' =>'/WebContent/Repo/cls_events',
        'cmis_type' => 'cls:events',
        'fields' => arrayarray(
                'field_cls_eventspeaker' => 'cls:eventspeaker',
                'field_cls_eventcontact' => 'cls:eventcontact',
                'field_cls_eventlocation' => 'cls:eventlocation',
                'field_cls_eventdate' => 'cls:eventdate',
                'field_cls_eventduration' => 'cls:eventduration',
                'field_eventtype' => 'cls:eventtype'
        ),
        'subfolders' => TRUE,
        'full_sync_next_cron' => TRUE,
        'cmis_sync_cron_enabled' => TRUE,
        'cmis_sync_nodeapi_enabled' => TRUE
  )
 

Alfresco

More Articles...

Page 1 of 4

Start
Prev
1

Member Login

Sponsored Links

Subscribe to Update Email

Site Update


Receive HTML?

Sponsored Links