From IBM's website:
Everyday, we create 2.5 quintillion bytes of data–so much that 90% of the data in the world today has been created in the last two years alone. This data comes from everywhere: from sensors used to gather climate information, posts to social media sites, digital pictures and videos posted online, transaction records of online purchases, and from cell phone GPS signals to name a few.
Big data spans three dimensions: Volume, Velocity and Variety.
In the industry where big data is discussed, involving companies like GigaOm, Opera Solutions and IBM, the goal is to take very very large data sets and conduct complex analytics to extract insight that would otherwise remain obscured. Extracting insight takes a lot of analytics.
Anyone who has worked on cases knows that any data that requires expert assessment runs along a separate track with a pretty direct route to consulting experts. Attorneys look at roll-up reports and emails. Experts handle anything that requires interpretation. These types of data are certainly discovery, but they do not comprise the stuff about which one thinks when contemplating "e-discovery".
Because of this, it is unlikely, despite the hoopla, that e-discovery will be directly impacted by big data. Unstructured data repositories will undoubtedly get bigger; there will be challenges to address them in e-discovery. And e-discovery will have to identify relevant documents that discuss the insights and information gleaned by big data processes.
But the vast ocean of data for which "big data" was coined will likely in the main pass through the litigation process without registering on e-discovery processes.