Integrating updating domain knowledge data mining
Deep Dive is project led by Christopher Ré at Stanford University.
Current group members include: Michael Cafarella, Xiao Cheng, Raphael Hoffman, Dan Iter, Thomas Palomares, Alex Ratner, Theodoros Rekatsinas, Zifei Shan, Jaeho Shin, Feiran Wang, Sen Wu, and Ce Zhang.
Like dark matter, dark data is the great mass of data buried in text, tables, figures, and images, which lacks structure and so is essentially unprocessable by existing software.
Deep Dive helps bring dark data to light by creating structured data (SQL tables) from unstructured information (text documents) and integrating such data with an existing structured database.
By allowing users to build their system end-to-end, Deep Dive allows users to focus on the portion of their system that most directly improves the quality of their application.
By contrast, previous pipeline-based systems require developers to build extractors, integration code, and other components—without any clear idea of how their changes improve the quality of their data product.
As of 2017, Deep Dive project is in maintenance mode and no longer under active development.
Deep Dive-based systems are used by users without machine learning expertise in a number of domains from paleobiology to genomics to human trafficking; see our showcase for examples.
Your access to the NCBI website at gov has been temporarily blocked due to a possible misuse/abuse situation involving your site.
This is not an indication of a security issue such as a virus or attack.
Users should be familiar with DDlog or SQL, working with relational databases, and Python to build Deep Dive applications or to integrate Deep Dive with other tools.
A developer who wants to modify and improve Deep Dive must have basic background knowledge mentioned in the Deep Dive developer's guide.