Distributed Object Storage Concepts

What is distributed storage?

What is a distributed storage system?

It’s the decentralized storage of data on multiple ****ed up devices

What is distributed storage? What is a better choice for distributed storage?

Distributed storage system, is to spread the data storage on multiple **** standing devices. The traditional network storage system uses a centralized storage server to store all the data, the storage server becomes the bottleneck of system performance and the focus of reliability and security, and cannot meet the needs of large-scale storage applications. Distributed network storage system uses a scalable system structure, using multiple storage servers to share the storage load, using location servers to locate the stored information, it not only improves the reliability, availability and access efficiency of the system, but also easy to expand.

The Lenovo hyper-converged ThinkCloudAIO hyper-converged cloud all-in-one is a core product launched by Lenovo for enterprise-level users.ThinkCloudAIO hyper-converged cloud all-in-one realizes the seamless integration of cloud management platforms, compute, network, and storage systems, and builds a one-stop solution for cloud computing infrastructure-as-a-service, offering users a highly simplified one-stop infrastructure solution. A highly simplified one-stop infrastructure cloud platform. This not only shortens business deployment from weeks to days, but also completely decouples it from enterprise application software, middleware, and database software, which effectively improves the efficiency of enterprise IT infrastructure operation and maintenance management and the performance of key applications

What is Distributed Data Storage


Distributed database refers to the use of high-speed computer networks to connect physically dispersed multiple data storage units to form a distributed data storage system. multiple data storage units connected to form a logically unified database using a high-speed computer network. The basic idea of distributed database is to decentralize the data stored in the original centralized database to multiple data storage nodes connected through the network, in order to obtain a larger storage capacity and higher concurrent access. In recent years, with the rapid growth of data volume, distributed database technology has also been rapid development, the traditional relational database began to develop from the centralized model to the distributed architecture, based on the relational distributed database in the retention of the data model and the basic characteristics of the traditional database, from the centralized storage to distributed storage, from the centralized computation to distributed computing.


1. High scalability: Distributed databases must be highly scalable, able to dynamically add storage nodes in order to achieve linear expansion of storage capacity.

2 high concurrency: distributed database must respond to large-scale user read/write requests in a timely manner, and be able to randomly read/write massive amounts of data.

3. High Availability: Distributed databases must provide fault-tolerant mechanisms that can realize redundant backups of data to ensure a high degree of reliability of data and services.

What is the difference between distributed block storage and distributed file storage

Distributed file systems (dfs) and distributed databases both support depositing, removing and deleting. But distributed file systems are more brute force and can be used as key/value access. Distributed databases involve refined data, and traditional distributed relational databases define a schema for data tuples, with less granularity for depositing and removing and deleting.

Distributed file system is now better known as GFS (not open source), HDFS (Hadoopdistributedfilesystem). Distributed database is now famous for Hbase, oceanbase, which Hbase is based on HDFS, and oceanbase is its own internal implementation of the distributed file system, you can also say that the distributed database to the distributed file system to do the basic storage.

Difference between unified storage and converged storage and distributed storage

Unified storage specific concepts:

Unified storage, in essence, is a networked storage architecture that can support file-based network-attached storage (NAS) as well as block-based SAN. It is also known as multiprotocol storage because it supports different storage protocols to provide data storage for host systems.

Basic introduction:

Unified storage (sometimes referred to as networked unified storage or NUS) is a storage system that can run and manage files and applications on a single device. To do this, a unified storage system consolidates file-based and block-based access on a single storage platform, supporting Fibre Channel-based SANs, IP-based SANs (iSCSI), and NAS (Network Attached Storage).

How it works:

Since it’s a centralized disk array, it supports host systems for file-level data access over an IP network or block-level data access over a Fibre optic protocol on a SAN network. Similarly, iSCSI is a very versatile IP protocol, except that it provides block-level data access. These disk arrays are configured with multi-port storage controllers and a management interface that allows storage administrators to create on-demand storage pools or spaces and make them available to host systems with different access types. The most common protocols generally include NAS and FC, or iSCSI and FC. Of course, it is possible to support all three of these protocols at the same time, but the average storage administrator will choose one of FC or iSCSI, both of which provide block-level access and file-level access (the NAS method) to form unified storage.

Distributed storage supports multiple nodes, what is a node, a disk or a master?

A node is short for storage node, the storage node is usually a storage server (necessarily with a controller), the servers are interconnected by high-speed network.

Now more and more storage servers use armCPU + disk array to save energy and improve the “capacity to energy ratio”.

What are the main categories of distributed file systems?

Distributed storage in big data, cloud computing, virtualization scenarios have a brave place, in most scenarios is still vital. munity.emc/message/655951 The following is a brief introduction to the history of distributed file system development under the *nix platform:

1, stand-alone file system

This is the first time that a distributed file system has been used in the market, and the most important thing is that it has been used for a long time. p>Used for local storage for operating systems and applications.

2, Network File System (abbreviation: NAS)

Based on the existing Ethernet architecture, to realize the traditional file system data sharing between different servers.

3, clustered file system

Based on shared storage, through the cluster lock, to achieve different servers can share a traditional file system.

4. Distributed file system

On the traditional file system, through additional modules to achieve cross-server distribution of data, and its own integrated raid protection function can ensure that multiple servers simultaneously access and modify the same file system. The performance is superior, the scalability is very good, and the cost is low.

What are the distributed storage, and explain its basic implementation principles

Shenzhou Yunke DCNNCSDFS2000 (referred to as DFS2000) series is a storage system for big data, using a distributed architecture, a truly distributed, fully symmetric cluster architecture, modular storage nodes combined with data and storage management software, cross-node client Connection load balancing, automatic balance of capacity and performance, optimization of cluster resources, 3-144 nodes seamlessly scalable, capacity, performance age node increase and linear growth, add a node in 60 seconds to expand performance and capacity.

What is Hadoop Distributed File System 10 points

DistributedFileSystem (DistributedFileSystem) refers to the file system manages the physical storage resources are not necessarily directly connected to the local nodes, but through the floating computer network connected to the nodes.

Hadoop is an open-source parallel computing programming tool and decentralized file system developed by the Apache Software Foundation, similar in concept to MapRece and Google File System.

HDFS (Hadoop Distributed File System) is part of it.

What is used in a distributed file storage system

I. Distributed Session several ways to implement 1.Session sharing based on database 2.Based on NFS shared file system 3.Based on memcached session, how to ensure the high availability of memcached itself?4.Based on the session replication mechanism of the resin/tomcat web container itself5. TT/Redis or jbosscache for session sharing.6. cookie-based session sharing or: I. SessionReplication way of management (i.e., session replication) Introduction: broadcast replication of session data from one machine to the rest of the machines in the cluster Usage Scenarios: Fewer machines, less network traffic Advantages: simple implementation, less configuration, when the network has a machine Down does not affect the user access Disadvantages: broadcast replication to the rest of the machine has a certain Ting time, bringing a certain network overhead Second, SessionSticky way management Introduction: that is, the sticky Session, when the user accesses a machine in the cluster, the mandatory designation of all subsequent requests are fall to this machine to use the scenario: moderate number of machines, stability requirements are not very demanding Advantages: simple to implement, easy to configure, no additional network overhead Disadvantages: when a machine Down in the network, the user Session will be lost, easy to cause a single point of failure Third, centralized management of the cache Introduction: Session is stored in a distributed cache cluster on a machine, when the user accesses the When users access different nodes first from the cache to get Session information Use Scenarios: the number of machines in the cluster, the network environment is complex Advantages: good reliability Disadvantages: the implementation of complexity, stability depends on the stability of the cache, Session information into the cache to have a reasonable strategy to write Second. the difference and connection between Session and Cookie and the realization of the principle of Session1, session is stored in the server, the client does not know the information; cookie is stored in the client, the server can know the information. 2, session is stored in the object, cookie is stored in the string. 3, session can not distinguish between paths, the same user in the visit to a website, all the session in any one of the place can be accessed. If a cookie is set to a path parameter, then the cookies under different paths in the same website are not accessible to each other. 4, session needs to work properly with the help of cookies. If the client completely prohibit cookies, session will be invalid. is a stateless protocol, each time the client reads a web page, the server opens a new session ……

What are the types of distributed storage

Distributed storage, categorized as file storage, block storage, and object storage, are different types of services provided by storage devices, adapted to different usage scenarios.

Distributed is the way the storage device is deployed, whether it is deployed on a single machine or in a cluster of multiple devices. Software-defined this concept is broader, refers to the software function to achieve once through the dedicated hardware to complete the work, that is, for the storage hardware has no requirements, with general-purpose hardware + storage software to achieve a server, into a storage device. In fact, regardless of whether it is software-defined storage, its internal storage system software is running, to take out this word alone, is to emphasize its hardware requirements.

What is distributed storage?

A distributed storage system is one that stores data dispersed across multiple independent devices. Traditional network storage systems use a centralized storage server to store all the data, and the storage server becomes the bottleneck of system performance, as well as the focus of reliability and security, and cannot meet the needs of large-scale storage applications. Distributed network storage system uses a scalable system structure, using multiple storage servers to share the storage load, using location servers to locate the stored information, it not only improves the reliability, availability and access efficiency of the system, but also easy to expand.

Distributed and centralized storage

The advantages and disadvantages of centralized storage is that the physical media is centrally deployed; video streaming uploaded to the center of the high environmental requirements of the server room, requiring a large space in the server room, load-bearing, air conditioning and so on are all issues that need to be considered.

Distributed storage, centralized management of the advantages and disadvantages are, physical media distributed to different geographic locations; video streams uploaded near the backbone network bandwidth requirements; can be used in a number of low-end small-capacity storage equipment distribution deployment, equipment prices and maintenance costs are low; small-capacity equipment distribution deployment, low requirements for the server room environment.

ChainQiao Education Online’s Scholar Innovation Blockchain Technology Workstation is the only approved “blockchain technology professional” pilot workstation in the “Smart Learning Workshop 2020 – Scholar Innovation Workstation” carried out by the School Planning and Construction Development Center of the Ministry of Education of China. The only approved pilot station of “Blockchain Technology Specialization” in the “Smart Learning Workshop 2020 – Scholastic Innovation Workstation” conducted by the School Planning and Development Center of the Ministry of Education. The station is based on providing diversified growth paths for students, promoting the reform of the cultivation mode of combining industry-university-research for professional graduate students, and constructing a system for cultivating application-oriented and compound talents.

What is distributed storage?

Distributed storage has block storage, object storage, and file storage, with different open source projects such as Ceph, GlusterFS, Sheepdog, and Swift, and different commercial implementations such as Google, AWS, Microsoft, Kingsoft, Seven Bulls, Another Shot, and AliCloud Metacore Cloud, all with more or less different ideas, and with a wide range of hardware options available . There seems to be so many options available, and each has its own advantages and disadvantages.

What is object storage?

Block storage and file storage are not object storage

Object data composition structure

Unlike the way block storage and file storage manage data, object storage manages data in the form of objects. The biggest difference between objects and files is the addition of metadata on top of files. In general, objects are divided into three parts: data, metadata, and object id.

The data of an object is usually unstructured data, such as: images, videos, or documents; the metadata of an object refers to the relevant description of the object, such as: the size of the image, the owner of the document, etc.; and the object id is a globally unique identifier used to differentiate the object.

The three types of storage are fundamentally different in terms of data structure. Block storage has an array data structure, while file storage is a binary tree (B,B-,B+,B* various trees) and object storage is basically a hash table.

Arrays and binary trees are clich├ęs, there is not much to say, and object storage using the hash table is often heard of key-value (KeyVaule-type) storage of the core data structure, each object to find a UID (the so-called “key” KEY), counting the hash value (the so-called “value Vaule”). “value Vaule”) later and the target corresponds. Find a hash table example is as follows:

Key-value correspondence is simple and brutal, after all, counting a hash value is very fast, this flat form of organization can be made very large, avoiding the depth of the binary tree, for the real – massive data storage and large-scale access can give support. So not only object storage, many NoSQL distributed databases use it, such as Redis, MongoDB, Cassandra and Dynamo and so on.

One Object Storage Disk Stores Multiple Websites

The reason one object storage disk can store multiple websites is because object storage is highly scalable and flexible. Object storage is a distributed storage architecture that scatters data across multiple nodes, each of which can store a large number of objects. This distributed architecture makes it easy to scale storage capacity, whether storing one site or multiple sites, by dynamically adding nodes and storage space as needed.

In addition, object storage is highly redundant and reliable. Each object is divided into multiple data fragments and replicated to different nodes, so that even if a node fails, no data is lost. Moreover, object storage supports automatic data repair and backup to ensure data reliability and persistence.

Expansion: In addition to being highly scalable and reliable, object storage has other advantages. For example, it supports multiple access methods, including the standard HTTP/HTTPS protocols, making the content of a website accessible through a simple API or URL. In addition, object storage provides flexible data management features that allow you to set up access rights, version control, data archiving, etc. as needed. These features make object storage ideal for storing and managing large-scale website data.

Distributed file/object storage system?

Distributed storage system for massive data storage access and sharing needs, to provide high-performance, highly reliable and scalable data storage and access capabilities based on multiple storage nodes, to achieve distributed storage nodes on the multi-user access to share. Currently the industry’s more popular distributed storage systems are as follows: HDFS, OpenStackSwift, Ceph, GlusterFS, Lustre, AFS, OSS.