Monday, August 19, 2013

1010Data Overview

UNDERLYING TECHNOLOGY

The 1010data analytics platform is an integrated stack that includes a proprietary database management system (DBMS), data integration tools, and a unique user interface (the Trillion-Row Spreadsheetsm.) The DBMS also supports a wide range of other interfaces, including an XML application program interface (API), ODBC, software development kits (SDKs) and a command-line tool.



DATABASE MANAGEMENT SYSTEM

1010data's proprietary DBMS is a columnar, MPP DBMS with an advanced, dynamic in-memory capability. It takes advantage of all available RAM and runs entirely in-memory given sufficient RAM, but it efficiently manages memory use when the size of the data exceeds available RAM.
The next-generation columnar technology is the result of over three decades of experience in the development and use of columnar databases in demanding environments.
This combination of technologies gives the platform unmatched performance and scalability. In fact 1010data regularly outperforms all competitors by an order of magnitude in highly controlled customer benchmarks, covering all aspects of the data management process, including data loading, extraction, querying, concurrency and resiliency.

IN-DATABASE ANALYTICS
The 1010data analytics platform incorporates a whole suite of advanced analytics not normally found in database management systems or spreadsheets. Methods like time-series analysis, regression analysis and statistical modeling are built into the platform, are easy to use, and are exceptionally fast.
Such functionality is only part of a broader range of supported "in-database" computations that allow analysts and data scientists to easily do things that normally require complex MapReduce programming.

STREAMLINED DATA INTEGRATION

"Best Practices"

Raw data, as it comes from its original source, is often unsuitable for analysis and reporting. In many cases it is "dirty", containing incorrect or invalid information due to human input error or machine malfunctions. In other cases it is too voluminous to process using conventional technologies. Standard practice is therefore to clean and summarize the data as it is loaded into the database, so that all reporting starts with the transformed data. This is the familiar data integration approach using an ETL (extract, transform and load) process.

Challenges

There are significant problems with the standard approach.
  1. Almost all transformations result in information loss. In the case of summarization this is patently obvious, but it is true even when data is cleansed.
  2. It takes a long time to figure out how to clean all the data and summarize it, and then to develop the ETL processes. The database cannot be built and delivered until all that work is done.

1010data's Way

1010data customers are able to take a different path. Our platform is powerful enough to support the analysis of raw data at its most granular level, and fast enough to allow analysts to cleanse the data on the fly. This gives users much faster access to data and an unmatched ability to analyze it in new ways.

SECURITY

1010data uses a risk-based approach in providing whatever level of security is appropriate, reasonable, and cost-effective for protecting your data, consistent with your regulatory and compliance regime (up to HIPAA-grade security.) Our baseline offering puts equipment in data centers with top grade environment and physical security, and above that adds excellent network and platform security and supplemental security controls when needed. Logical and physical access is limited to those specific, designated, trustworthy people who need it, and can be restricted to specific tables and rows containing only certain values in certain columns. Regular collaboration means we typically know our customers, and advise them on reducing risk by good data design (such as deidentifying or not even storing risky data with low analytical value.) We use strong cryptography to protect interactive sessions, and ensure confidentiality and integrity of data in transit and in archival storage. Data access can be limited to users on a customer's network, and we allow customers to substantially manage and log their users’ access to data.

RELIABILITY

1010data uses complementary mechanisms to contribute to high reliability. Data is hosted in datacenters (one or several) with the highest quality environment (power, cooling, and connectivity.) Within a datacenter, the data is stored on multiple servers (both for reliability and for performance.) Each server has RAID disk storage, so failure of a hard disk will only temporarily reduce performance. We offer several levels of BCP/Disaster Recovery protection depending on a customer's risk analysis, requirements for recovery times, and the amount of data involved. The reliability of our software is ensured through strict version control, and new releases are deployed to a customer only after the customer has tested the new version and approved its use.