Hi,
I’ve actually asked the same question on google groups (https://groups.google.com/forum/?hl=en-GB#!topic/couchbase/Q_iVbL6eqig), but I’ll ask it here as well.
I have this problem with a view and/or rereduce. Its hard to explain what it need to do, but let my try:
I have to historical data, that back in time have some attributes that I need to summarise and aggregate to end up with a nice looking graph as an end result.
So my documents look something like this:
{
“id”: “20111003-140324-053-VODK0200000760010001.xml”,
“assetId”: “1234”,
“date”: 20121003140324052,
“url”: “https://blah.dk/sfsdf”,
“platform”: “TEST”,
“lastModified”: “2011-11-12T08:55:23.181Z”,
“licenseStart”: “2011-11-15T00:00:00.000+02:00”,
“licenseEnd”: “2013-09-14T22:58:58.000+02:00”,
“availability”: “public”
}
So what I need in the end is a graph (in graphite) that shows me which of these assets (documents) that in a given time have been “public” available within the license dates and for each “platform".
I need to summarise this for e.g for each month or day (prob. too much). My data goes back 2-4 years and I have about a million documents in the bucket.
For this I have done a view function (with help fro, Tug), that can emit the platform and dates in an array.
function (doc, meta) {
if (meta.type == “json”) {
if (doc.platform && doc.availability ===‘public’){
var startDate = new Date(doc.licenseStart);
var endDate = new Date(doc.licenseEnd);
for (var d = startDate; d <= endDate; d.setDate(d.getDate() + 1)) {
var dateAsArray = dateToArray(d);
dateAsArray.unshift(doc.platform);
emit( dateAsArray );
}
}
}
}
This is all great and good I will give me somewhat what I need, but I actually end up with duplicates in my map and thus the wrong end result.
This is because I am storing all the old documents in the different versions back in history.
This means that a document with id “20111003-140324-053-VODK0200000760010001.xml”, (with e.g. assetId: 1234) can be in couchbase several times in different states with different keys of course (but it will have the same assetId). This means that duplicates are present. Is there a way to make the view “distinct” over assetId? So that each row with the same assetId only counts once?
I can’t really see how. So Ive written a really dumb rereduce function:
function(key, values, rereduce) {
var count = 0;
var uniqueList = [];
var unique = true;
if (!rereduce){
return values.length;
}
for (var i=0; i < values.length ; i++) {
for (var j=0; j < uniqueList.length; j++) {
if (values[i] === uniqueList[j]) {
unique = false;
break;
}
}
if(unique) {
uniqueList.push(values[i]);
count++;
}
}
return count;
}
Its really ugly looking code and NOT optimal, and I am not even sure that it will work. Does anybody have an idea how to make my map unique when it comes to “assetId”???
Somewhat like DISTINCT in the SQL world.
Any ideas?
/Steffen