Описание
Relational databases haven’t gone away, but they are evolving to integrate messy, disjointed unstructured data into a cleansed repository for analytics. With the execution of massively parallel processing (MPP), the latest generation of analytic data warehouses is helping organizations move beyond business intelligence to processing a variety of advanced analytic workloads. These MPP databases expose their power with the familiarity of SQL.
This report introduces the Greenplum Database, recently released as an open source project by Pivotal Software. Lead author Marshall Presser of Pivotal Data Engineering takes you through the Greenplum approach to data analytics and data-driven decisions, beginning with Greenplum’s shared-nothing architecture. You’ll explore data organization and storage, data loading, running queries, as well as performing analytics in the database.
You’ll learn:
- How each networked node in Greenplum’s architecture features an independent operating system, memory, and storage
- Four deployment options to help you balance security, cost, and time to usability
- Ways to organize data, including distribution, storage, partitioning, and loading
- How to use Apache MADlib for in-database analytics, and GPText to process and analyze free-form text
- Tools for monitoring, managing, securing, and optimizing query responses available in the Pivotal Greenplum commercial database