Wednesday, July 9, 2014

Taming the Rapids Feeding Your Data Lake

First, here is a good little page that explains Hadoop Data Lakes at a high-level.

To put that into an analogy....

Just like a real lake is fed by rivers and streams, a data lake is also fed by data rivers and data streams (Binaries, flat files, Sybase, Oracle, MSSQL, etc.) Is your data stream currently fast and large enough to handle your company or government organization's data flows?

With real rivers, when heavy rains fall or a waterway becomes choked (think "beaver dam"), the river can quickly overflow its banks and wreak considerable mayhem and damage on the surrounding ecosystem. The same thing happens with data. When you have data coming in faster than you can read, process, analyze, etc., the surrounding environment can quickly become encumbered or disrupted (i.e. storage exhaustion, BI misinformation, application development delays, production outages). The same effects will happen when constraints restrict your data flow (ticketing system handoff delays between departments, inability to quickly refresh full data sets, cumbersome data rewind processes, etc.) And for every production data river, you will have 4-10 non-production tributaries (Dev(s), Test, Int, QA, Stage, UAT, Break/Fix, Training, BI, etc.). Time to build an ark.

The ebbs and tides of data are going to come and are often influenced by external factors beyond our control. The best you can do is be prepared and agile enough to adapt to the weather and adverse conditions. By virtualizing your non-production data sources with Delphix, you have now "widened you banks" by 90%. You have enabled your engineers to develop "flood control systems" two times faster, enabling your systems to quickly adapt to fast-evolving needs. By allowing them to test with full virtual environments, and not some simulation or sample data, you allow your engineers to know exactly how your systems will behave when called upon in live applications. No more simply hoping the dam will hold. Hope is not a strategy.

And those data rivers aren't just there to look pretty and enhance your view. They are there because you are using them to harness their power to benefit your company or government organization (i.e. leveraging CRM Data), irrigate other areas (i.e. feeding Data Warehouses), and as a means of agility/mobility (i.e leveraging Business Intelligence to react to market conditions).

Delphix supports the Hadoop ecosystem by enabling you to efficiently and effectively handle the various datasources that will feed the Hadoop Data Lake, all of their necessary downstream copies (staging, identification, curation, etc), and accelerate the application projects utilizing the data sources (masked and refined, if needed). Delphix delivers the right data, in the right manner (masked/unmasked), to the right team, at the right time.

Find out more about how to lift your applications and data out of the flood plane here:

Find out how Delphix helped Bobby Durrett of US Foods quickly restore production and save countless hours and overtime in his unsolicited testimonial on his blog: