Managing the Data Lake


Moving to Big Data Analysis

Категория: Метки: , , , ,


Organizations across many industries have recently created fast-growing repositories to deal with an influx of new data from many sources and often in multiple formats. To manage these data lakes, companies have begun to leave the familiar confines of relational databases and data warehouses for Hadoop and various big data solutions. But adopting new technology alone won’t solve the problem.

Based on interviews with several experts in data management, author Andy Oram provides an in-depth look at common issues you’re likely to encounter as you consider how to manage business data. You’ll explore five key topic areas, including:

  • Acquisition and ingestion: how to solve these problems with a degree of automation.
  • Metadata: how to keep track of when data came in and how it was formatted, and how to make it available at later stages of processing.
  • Data preparation and cleaning: what you need to know before you prepare and clean your data, and what needs to be cleaned up and how.
  • Organizing workflows: what you should do to combine your tasks—ingestion, cataloging, and data preparation—into an end-to-end workflow.
  • Access control: how to address security and access controls at all stages of data handling.


O'Reilly Media, Inc.

O'Reilly Media spreads the knowledge of innovators through its books, online services, magazines, research, and conferences. Since 1978, O'Reilly has been a chronicler and catalyst of leading-edge development, homing in on the technology trends that really matter and galvanizing their adoption by amplifying "faint signals" from the alpha geeks who are creating the future. An active participant in the technology community, the company has a long history of advocacy, meme-making, and evangelism.

Добавить комментарий

Ваш адрес email не будет опубликован. Обязательные поля помечены *