Companies of all sizes are considering data lakes as a way to deal with terabytes of security data that can help them conduct forensic investigations and serve as an early indicator to identify bad or relevant behavior. Many think about replacing their existing SIEM (security information and event management) systems with Hadoop running on commodity hardware.
Before your company jumps into the deep end, you first need to weigh several critical factors. This O’Reilly report takes you through technological and design options for implementing a data lake. Each option not only supports your data analytics use cases, but is also accessible by processes, workflows, third-party tools, and teams across your organization.
Within this report, you’ll explore:
- Five questions to ask before choosing architecture for your backend data store
- How data lakes can overcome scalability and data duplication issues
- Different options for storing context and unstructured log data
- Data access use cases covering both search and analytical queries via SQL
- Processes necessary for ingesting data into a data lake, including parsing, enrichment, and aggregation
- Four methods for embedding your SIEM into a data lake