So, this is an interesting and academically fun task.
Say, we have the data from a call centre. You have various events, they have start and end times, and various attributes which tell you whether the event is productive or not.
However the data is not “dense”. For periods where they were not on a call, but were not doing anything else either there is no event for this. And this time is your available time when you *could* have been working on a call if there was one there.
So; A quick and clean way to measure available time is to make the data dense at the ETL level. Or so you’d think anyway. So you’d have a new event which you just create to fill the gap.
I got thinking how to do this in PDI and generally adding rows is either done with steps that expect you to know how many (such as clone rows) or joins. I think you could do a cunning join solution to this problem, but I found another way.
Using the UDJC of course! haha. BUT why? why Dan! Oh why won’t you think of those poor forlorn native steps..
Well here’s why:
- I’m not convinced there is a clean nice native way. (Waiting for Nelson to show me)
- In reality this is only a very very small number of lines of code. I’ve written it particularly verbosely with spaces and comments all over the place and it’s only 65 lines of code.
If you want to see my solution then you can find it in my github samples repo
You’ll note the transformation has 15 steps. 13 of them are generating test data! Yes; It took probably 2 if not 3 times as long to get the test data in the format as i wanted it, than it took to actually implement the gap filler.
Oh; There’s a bug too, Free bottle of honey ale to whoever spots what the issue is. Extra points for the fix too.
This whole solution does open up a can of worms though. Especially if you have events of mixed granularity time stamps. But hey ho, this is what we do every day, there’s always a can of worms to play with.