Non-relational database knowledge points summarized

Non-relational databases include several main categories what are the characteristics of each?

A non-relational database (NoSQL) is a database that does not rely on a relational model and provides a more flexible and scalable way of storing data. Non-relational databases mainly include the following categories:

Column-stored database: this kind of database usually stores data in a column and supports fast columnar and distributed computation. It is suitable for dealing with massive structured data, such as logs, sensor data, etc.

Document-based database: this kind of database usually stores data in a document and supports flexible querying of data and complex aggregation

What are the non-relational databases

Several common non-relational databases:

(Learn video sharing: redis video tutorial)

1, MongoDB

MongoDB is the most famous NoSQL database. It is a document-oriented open source database.MongoDB is a scalable and accessible database. It is in c++.MongoDB can also be used as a file system. JavaScript can be used as a query language in MongoDB. By using shardingMongoDB scales horizontally. It is very useful in popular JavaScript frameworks.

People really enjoy the sharding, advanced text search, gridFS and map-rece features. Amazing performance and new features put this NoSQL database at the top of our list.

Features: offers high performance; automatic sharding; runs on multiple servers; supports master-slave replication; data is stored as JSON-style documents; indexes any field in a document; it has an automatic load-balancing configuration because the data is placed in shards; supports regular expression searches; and is easy to manage in the event of a failure.

Pros: easy to install MongoDB; MongoDBInc. provides professional support to customers; supports ad-hoc queries; high-speed database; schema-less database; horizontally scalable database; very high performance.

Disadvantages: does not support joins; large amount of data; nested documents are limited; increases unnecessary memory usage.

2, Cassandra

Cassandra is developed by Facebook for inbox search.Cassandra is a distributed data storage system for processing large amounts of structured data. Typically, this data is distributed across many common servers. You can also add data storage capacity to keep your service online and you can easily accomplish this task. Since all nodes in the cluster are identical, there is no need to deal with complex configurations.

Cassandra is written in Java.Cassandra Query Language (CQL) is a sql-like language for querying Cassandra databases. As a result, Cassandra is ranked second among the best open source databases. some of the biggest companies like Facebook, Twitter, Cisco, Rackspace, eBay, Twitter, Netflix, etc. are using Cassandra.

Features: linearly scalable;; maintains fast response time ; supports attributes such as atomicity, consistency, isolation, and durability (ACID); uses ApacheHadoop to support MapRece; maximum flexibility in distributing data; highly scalable; peer-to-peer architecture.

Benefits: highly scalable; no single point of failure; Multi-DC replication; tight integration with other JVM-based applications; better suited for multi-data center deployments, redundancy, failover, and disaster recovery.

Disadvantages: limited support for aggregation; unpredictable performance; no support for ad hoc queries.

3. Redis

Redis is a key-value store. Moreover, it is the most famous key-value store.Redis supports some c++, PHP, Ruby, Python, Perl, Scala and so on.Redis is written in C language. Moreover, it is licensed under BSD.

Features: automatic failover; keeps its database entirely in memory; transactions; Lua scripting; replicates data to any number of slave servers; keys have a limited lifespan; LRU evicts keys; publish/subscribe support.

Pros: supports many data types; very easy to install; very fast (executes about 110,000 groups per second, about 81,000 times per second); operations are atomic; multipurpose tool (used in many use cases).

Disadvantages: no support for joins; knowledge of Lua required for stored procedures; datasets must fit well in memory.

4, HBase

HBase is a distributed, column-oriented open source database, the technology from the Google paper “Bigtable: A Distributed Storage System for Structured Data” written by FayChang. Just as Bigtable utilizes the distributed data storage provided by Google’s FileSystem, HBase provides Bigtable-like capabilities on top of Hadoop.

HBase is a subproject of Apache’s Hadoop project.HBase is different from a typical relational database in that it is a database suitable for unstructured data storage. Another difference is that HBase is column-based rather than row-based schema.

5. neo4j

Neo4j is known as a native graph database because it effectively implements the attribute graph model all the way down to the storage layer. This means that the data is stored exactly as it would be on a whiteboard, and the database uses pointers to navigate and traverse the graph.Neo4j has both a Community Edition and an Enterprise Edition of the database. The Enterprise Edition includes all the features that the CommunityEdition must provide, as well as additional enterprise requirements such as backup, clustering, and failover capabilities.

Features: it supports unique constraints; Neo4j supports full ACID (Atomicity, Consistency, Isolation, and Durability) rules; JavaAPI: CypherAPI and native JavaAPI; uses ApacheLucence indexing; Simple Query Language, Neo4jCQL; Includes UI for executing CQL commands. Neo4jDataBrowser.

Benefits: easy to retrieve its neighboring nodes or relational details without joins or indexes; easy to learn Neo4jCQL query language commands; does not require complex joins to retrieve data; very easy to represent semi-structured data; high availability for large enterprise real-time applications; simplified tuning.

Disadvantages: no support for sharding.

Business Intelligence and Non-Relational Databases in Big Data

For the moment, Big Data involves a lot of technologies that are able to help you better understand the knowledge related to Big Data, in this article we focus on Business Intelligence and Non-Relational Databases, we hope that through our introduction we can make you really understand this knowledge about Big Data.

1. Business Intelligence

Business Intelligence is generally called BI, which stands for BusinessIntelligence. Business Intelligence is a complete set of solutions used to effectively integrate existing data in an organization to quickly and accurately provide reports and a basis for decision-making to help companies make informed business operations decisions. At that time, BI was defined as a category of technology and its applications consisting of data warehousing, query reporting, data analysis, data mining, data backup and recovery, and other components, with the purpose of helping enterprises make decisions. In order to transform data into knowledge, technologies such as data warehouses, on-line analytical processing (OLAP) tools, and data mining need to be utilized. Therefore, technically speaking, BI is not a new technology; it is just a combination of technologies such as data warehousing, OLAP and data mining. It can be seen that there is a connection between all the terms related to big data.

2. How do you think about business intelligence?

It would be appropriate to think of business intelligence as a solution. The key to BI is to extract useful data from many data from different enterprise operation systems and clean it to ensure the correctness of the data, and then after extraction, conversion and loading, merge it into an enterprise-level data warehouse, so as to get a global view of the enterprise data, and on the basis of which use appropriate querying and analyzing tools, data mining tools, OLAP tools, and so on, for its analyze and process, and finally present the knowledge to the manager, providing data support for the manager’s decision-making process. This is why business intelligence is hot.

3. Non-relational database

Non-relational database, referred to as NoSQL, we know through Baidu encyclopedia above NoSQL first appeared in 1998, is by CarloStorzzi developed the earliest of a lightweight, open-source, not compatible with the SQL functionality of relational databases, in 2009, in a seminar on distributed open source databases, again proposed NOSQL, the first of its kind in the world. In 2009, in a distributed open source database seminar, the concept of NOSQL was again proposed, at this time NOSQL mainly refers to I non-relational, distributed, do not provide ACID (database transaction processing of the four elements of the database) database design pattern. The most common definition of NOSQL by many data scientists is “non-associative”, emphasizing the benefits of Key-Value stores and document databases, which is when NoSQL began to formally appear in front of the world.

In this article we introduced you to the knowledge of business intelligence and non-relational databases, the above mentioned content is the need for us to learn and familiar with the content, if you really intend to big data industry friends must seriously learn up yo!

Hbase knowledge summary?

hbase concept:

Unstructured distributed column-oriented storage non-relational open source database, according to one of Google’s three major papers bigtable

tall wide and thick table


In order to solve the challenges posed by large-scale data collections of multiple data types, especially the big data application challenges.


What it can do:

Store large result sets of data with low latency random queries.


Structured query language


Non-relational database, columnar storage and document storage (query low latency), hbase is a kind of nosql, which is characterized by columnar storage.

Non-relational database – columnar storage (hbase)

Non-relational database – document storage (MongoDB)

Non-relational database – in-memory storage (redis)

Non-relational database – graphical model (graph)

hive and hbase difference?

Hive is positioned as a data warehouse, although there are also additions, deletions, and checks, but its deletions and checks correspond to the entire table rather than a single row of data, and the latency of the query is higher. Its essence is more convenient to use the power of mr for offline analysis of a data analysis tool.

HBase is positioned as a database for hadoop, which computer training found to be a typical Nosql, so HBase is used to perform low-latency random queries on large amounts of data.

How hbase runs:


Single node and pseudo-distributed?

Single node:Separate processes running on the same machine

hbase application scenarios:

Storing massive amounts of data with low latency querying of the data

hbase table consists of multiple rows

hbase rows A row in hbase consists of a row health and the values of one or more columns, stored in alphabetical order by row health.

Non-relational databases mainly include several categories what are the characteristics of each?

NoSQL describes a collection of a large number of structured data storage methods, according to the structuring method and the different application occasions, the main NoSQL can be divided into the following categories.


Retrieval-oriented columnar storage, its storage structure is columnar structure, the same as relational databases row structure, this structure will make a lot of statistical aggregation operations easier and more convenient, so that the system has a high scalability. This type of database can also adapt to the increase in the amount of data and changes in data structure, this feature is consistent with the relevant needs of cloud computing, such as GoogleAppengine’s BigTable and the same design concept of the Hadoop subsystem HaBase is a typical representative of this type. It should be especially noted that BigTable is particularly suitable for MapRece processing, which is highly adaptable to the development of cloud computing.

(2) Key-Value.

For high-performance concurrent read/write cache storage, its structure is similar to the data structure of the Hash table, each Key corresponds to a Value, which can provide a very fast query speed, large data storage and high concurrency operations, is very suitable for querying and modifying the data through the primary key operations.Key-Value database main feature is that it is very fast and easy to use. Key-Value databases are characterized by extremely high concurrent read/write performance, which is ideal for use as a caching system.MemcacheDB, BerkeleyDB, Redis, Flare are representatives of Key-Value databases.

(3) Document-Oriented.

The document storage for mass data access, the structure of this type of storage and Key-Value is very similar to the structure of the Key-Value, but each Key corresponds to a Value, but the Value is mainly in the form of JSON ( JavaScriptObjectNotations) or XML format documents to store. This storage method can be easily used by object-oriented languages. This type of database can quickly query data in the massive amount of data, typically represented by MongoDB, CouchDB and so on.

NoSQL has the advantages of simple scaling, high concurrency, high stability, low cost, etc. There are also some problems. For example, NoSQL does not provide SQL support, which will cause additional learning costs for developers; NoSQL is mostly open source software and its maturity compared with commercial relational database systems have a gap; NoSQL’s architectural characteristics determine that it is difficult to ensure the integrity of the data, suitable for use in a number of special application scenarios.

What is a non-relational database? How is it defined?

With the rise of web2.0 sites on the Internet, non-relational databases are now an extremely popular new field, and the development of non-relational database products is very rapid. Traditional relational databases have been unable to cope with web2.0 websites, especially the ultra-large-scale and highly concurrent SNS type web2.0 purely dynamic websites, exposing many insurmountable problems, such as:

1, Highperformance–high database performance and concurrent read/write requirements

2, Highperformance–high database performance. Concurrent read and write demand

Web2.0 website to generate dynamic pages and provide dynamic information in real time according to the user’s personalized information, so basically can not use dynamic page static technology, so the database concurrent load is very high, often to reach tens of thousands of read and write requests per second. Relational database to cope with tens of thousands of SQL queries can barely hold up, but to cope with tens of thousands of SQL write data requests, hard disk IO has been unable to withstand. In fact, for ordinary BBS sites, there is often a demand for highly concurrent write requests, such as JavaEye site’s real-time statistics on the status of online users, record the number of clicks on popular posts, vote counts, etc., so this is a fairly common demand.

2. HugeStorage – the need for efficient storage and access to massive amounts of data

SNS sites like Facebook, twitter, and Friendfeed generate massive amounts of user dynamics every day. Friendfeed, for example, reached 250 million user dynamics a month, for relational databases, in a table of 250 million records for SQL queries, the efficiency is extremely low and even intolerable. Another example is the user login system of large web sites, such as Tencent, Shanda, hundreds of millions of accounts, relational databases are also very difficult to cope with.

3, HighScalability&& HighAvailability – high scalability and high availability of the database needs

In the web-based architecture, the database is the most difficult to horizontally expand, when an application system user volume and access volume. When the number of users and access to an application system is increasing day by day, there is no way for your database to scale in performance and load capacity by simply adding more hardware and service nodes as in the case of webserver and appserver. For many websites that need to provide 24/7 service, upgrading and scaling the database system is a pain, often requiring downtime for maintenance and data migration, so why can’t the database scale by constantly adding server nodes?

In the face of the “three highs” mentioned above, relational databases have encountered insurmountable obstacles, while for web 2.0 sites, many of the main features of relational databases are often useless, such as:

1. database transaction consistency requirements

Many web real-time systems do not require strict database transactions, the requirements for read consistency is very low, some occasions on the write consistency requirements are not high. Therefore, database transaction management becomes a heavy burden under high database load.

2. Write real-time and read real-time database requirements

Relational databases, insert a piece of data immediately after the query, it is certain that you can read out the data, but for many web applications, and does not require such a high degree of real-time, let’s say that I (JavaEye’s robbin) send a message after a few seconds or even a dozen seconds, my subscribers will see the message. It’s perfectly acceptable for my subscribers to see the message after a few seconds or even ten seconds.

3, the need for complex SQL queries, especially multi-table correlation queries

So, relational databases in these more and more application scenarios seem less appropriate, in order to solve these types of problems non-relational databases came into being, and now in the past two years, a variety of non-relational databases, in particular, key-value database (Key-ValueStoreDB) has emerged, and a variety of non-relational databases, especially key-value databases (Key-ValueStoreDB) has emerged. ) is surging, more than dazzling. Not long ago, foreign countries have just held a NoSQLConference, all kinds of NoSQL databases have debut, plus not debut but the reputation of the outside, at least there are more than 10 open source NoSQLDB, such as:

Redis, TokyoCabinet, Cassandra, Voldemort, MongoDB, Dynomite, HBase, HBase, HBase, HBase, HBase, HBase, HBase, HBase, HBase, HBase, HBase, HBase, HBase, HBase. Dynomite, HBase, CouchDB, Hypertable, Riak, Tin, Flare, Lightcloud, KiokuDB, Scalaris, Kai, ThruDB, ……

These NoSQL databases, some are written in C/C++, some are written in Java, and some are written in Erlang, each has its own unique features, it is too much to look at, I (robbin) can only pick some of the more distinctive, seemingly more promising products to learn and understand.

Relational database and non-relational difference

Relational database and non-relational in the cost, query rate, storage format, scalability, data consistency, transaction processing on the difference.

1, cost: Nosql database is easy to deploy, do not have to spend a higher cost like Oracle that shopping.

2, query rate: Nosql database will store the data in the cache, do not have to go through the SQL layer of analysis; relational databases will store the data in the computer hard disk, the query rate is far worse than the Nosql database.

3, storage format: Nosql storage file format is keyvalue way, text file way, photo way, the kind of objects can be stored in a flexible way; relational database is only applicable to the basic type.

4, scalability: relational databases have join as a multi-table query mechanism to limit the expansion of the poorer. nosql based on key-value pairs, there is no coupling between the data, so it is easy to expand horizontally.

5, data consistency: non-relational databases focus on eventual consistency; relational databases focus on strong consistency throughout the life cycle of the data.

6. Transaction processing: SQL databases support transaction atomicity granularity control, and convenient transaction rollback; NoSQL also supports transaction processing, but the reliability is not enough, and its value lies in scalability and large data volume processing.