- Data ops really is a thing, but it’s just a name for what we’re all doing anyway!
- (Everyone who mentioned data ops started by trying to explain what it is!)
- Everyone is going cloud
- And if you’re going to go cloud, you really should re-architect, not just dump your existing app in as-is.
- Everyone is moving about between employers at a crazy pace!
- a LOT of ex Pentaho folk there.
- Architecture
- Everyone has the same diagram.
- The “performance” solving solutions (e.g. GPUs) don’t solve the problem per-se, they simply allow you to get it all on one box which hides the problem.
- Although, on that note, snowflake genuinely does seem to solve the problem
- Conspicuous in their absence?
- Talend not there.
- Hitachi Vantara not there.
- Neo4j despite talking, didnt have a stand. Thats quite bizarre!
- Why were Mercedes themselves there? This i don’t get at all. Tibco must have found some way to encourage them. Certainly there’s nothing in it for Mercedes in being there.
- The data catalog guys – They all were very flexible – if you already have a metastore then we’ll use that. Or we can be your metastore. etc. This is really clever stuff. However, they’re all ripe for acquisition, I can’t see a “data catalog” company being sustainable as it’s own thing. Imagine if Pentaho combined IO Tahoe with their metadata driven data ingestion framework……. That will be amazing (and indeed, it’s what i’m actually planning in the CE version!)
Anyway – the talks. This was a mixed bag. If you only attended Day 1 then you did miss out:
- Jay kreps – Kafka keynote. Not much content in this, i guess being a keynote doesn’t mean it’s necessarily interesting
- Tamr- good agile points. Good points about always allowing data feedback. Get it out fast, and react.
- Attunity actually still talking about Hadoop. Huh!
- Mercedes – One of 3 great talks.
- Concentrate on making sure their 200 analysts never look at boring data.
- Don’t have that much data, only 15TB per week
- 60ghz wifi allows them to transfer 2gb data from the car at 60mph in the time it takes to travel 100m. Wow! That includes security and handshaking!
- ITERATE!
- Zaf Khan – Arcadia data, turned into a bit of a sales pitch. BUT used the good old “use the right tool” adage.
- Serverless talk – this was good – need to understand what (if) is the difference between AWS lamda and faas!?
- Event thinking. Events as the api not commands
- If you think about analytics – A lot of our day job is converting stateful data into events! A fact table by definition is a record of events…
- Domain driven design greg young
- Cube.js – analytics on faas, interesting.
- Event thinking. Events as the api not commands
- Matt aslet 451 research – Future!
- Calling out cloudera acquisition for what it is! lol
- Total data warehouse
- Blockchain
- Agility and data ops
- Operationalisation
- Google
- New architecture new possibilities
- Complexity kills innovation
- You have to solve data before effective ml
- It’s never the first that defines it. Google rarely first
- Moving from client server to fundamentally distributed
- Deploy…
- Hsbc example 57$ per run. 6 mins. This for a process that they spent millions on that used to take 6 days. Then they upgraded it and it still took 6hrs. Then bigquery and boom. Sorted.
- BigQuery/Dremel (internal implementation) is a SQL interface that actually works – when you have that power and flexibility amazing possibilities open up
- The presenters job was commercialising googles internal tools.
- Interesting that gcloud has no graph DB. They must be using one internally though?
- Jim Weber – Neo4j
- Very funny talk.
- Much needed at the end of the day
- Neo is clearly great. It’s finding a use for it that is the trick…
- Hannah Fry
- Amazing talk – If you’ve not seen her before on TV then check her out on BBC4.
- City data expert – What a fun job!
Vendor Visits
- SQream – GPU based DWH. Actually there are no end of fast DWH’s at the show must have been 10+. Nothing different here.
- Influx data – impressive timeseries db – worth a look.
- Data Catalogs (ALL of these are very interesting. Using ML on your metadata to inprove quality and linkage)
- IO Tahoe
- Tamr
- Waterline
- Calibra(didn’t visit this one)
- Snowflake – This has to be worth a look.