Just a quick one this on the move to a single server configuration in Pentaho 7. This resolved a long running quirk with the Pentaho server stack, but don’t take that as a recommendation for installation in production! It’s still very important to separate your DI and front end analytic workloads and I doubt we’ll see any other than the very smallest installations using the single server for both tasks simultaneously.
Separating the workload gives several important advantages:
- independent scaling (reduced cost and no wasted resources)
- protecting either side from over ambitious processing
Of course! Don’t take my word for it – Pedro said the same in the release announcement:
And luckily the Pentaho docs on the website give clear instructions for adding/removing plugins from the server – Key thing being don’t install PDD or PAZ on your DI server.
Final point – You can of course choose whether to extend the logical separation to the repository itself. By separating the repository as well it gives you ultimate control over your system, even if for now it is hosted on the same database.
So, following on from the first post in this series, here’s all the technical gubbins.
Firstly, how do you build PDI as an engine? Well simple – you need to create a pom.xml and use maven.
The key parts of that file are:
- Adding the Pentaho repository
- Defining pentaho.kettle.version
- Adding the core lambda java libraries
- Figuring out that the vfs library version needs to be this weird thing: 20050307052300
- And then the key point – using the “Maven shade” plugin, which basically gathers up the whole lot and dumps it into a jar suitable for uploading directly to AWS.
What next? Well topics for next few weeks include:
- The java code wrapper to launch PDI
- Persistence (S3 / redshift)
As we’re heading into crazy event season for the Pentaho community there wont be another PLUG (Pentaho London Usergroup) until around December time.
So, keep an eye on social media and your inboxes for the latest news on when and where PCM17 will be. Hint: It’ll be November time again.
Also – Don’t forget the official Pentaho world conference is on again this year in Orlando – that’s one not to miss. Find that on the Pentaho website.
Finally – Mark hall – Creator of Weka is in town in early June and there’s a meetup with him where you can find out about “The future of machine learning”:
If anyone wants to talk in December then put your hands up and let me know, otherwise have a great summer. In a similar vein – any feedback about the group, content, location or timings – send that too.