Home PC Games Linux Windows Database Network Programming Server Mobile  
           
  Home \ Linux \ Github inventory objects Algorithm     - Different between Linux file path and the windows (Linux)

- Hadoop 2.0 Detailed Configuration Tutorial (Server)

- Linux process scheduling opportunity (Programming)

- Automate deployment of Docker-based Rails applications (Server)

- The direct insertion sort algorithm (Programming)

- Systemd on RHEL7 (Linux)

- Mounting kit under Fedora Linux (Linux)

- Getting Started Linux Shell Scripting (Programming)

- Mounting Windows shared directory system under the Linux (Linux)

- How MySQL tracking sql statement (Database)

- Java-- get the reflection object information (Programming)

- IOS interview questions Summary (Programming)

- Secondary exponential smoothing prediction method implemented in Python (Programming)

- imp / exp Oracle Database import and export commands (Database)

- Oracle 11g can not export a variety of empty table solution (Database)

- RHEL 6.5 x86_64 CentOS yum configuration source (Linux)

- Gentoo: existing preserved libs problem solving (Linux)

- How to use tmpfs in RHEL / CentOS 7.0 (Linux)

- 10 useful Linux command Interview Questions and Answers (Linux)

- Installation and Configuration JDK8 In CentOS 7 (Linux)

 
         
  Github inventory objects Algorithm
     
  Add Date : 2018-11-21      
         
         
         
  $ Git clone https://github.com/torvalds/linux
Cloning into 'linux' ...
remote: Counting objects: 4350078, done.
remote: Compressing objects: 100% (4677/4677), done.
Receiving objects: 4% (191786/4350078), 78.19 MiB | 8.70 MiB / s

Section suggested that the remote code libraries, a total of 4,350,078 objects need to clone.

This is called "inventory objects" (counting objects), Github need calculated in real time, the total number of objects need to clone.

This process is very slow, according to the disclosure Github, like the Linux kernel such a huge library inventory time 8 minutes! That is, after issuing git clone command will do so on eight minutes, and then begin actual data transmission. This is of course intolerable. Github team has been trying to solve this problem.

Later, they finally found a new algorithm, now counted once as long as 3 milliseconds!

To understand this algorithm, you must first know what is Git object. Simply put, the object is a file, there are three most important objects.

Snapshot Object (Commit)
Directory object (Directory)
File object (File)
Each commit code, generates a commit object, which has a corresponding "Directory Object" in the name of the current. "Directory Object" to save the subdirectories and files in the root directory of the information contained in the code. Each subdirectory is another "directory objects," each file is "file object", which is the specific contents of the file.

Therefore, the "inventory objects" is the inventory of the various commit, directories, files, and so on. git clone git fetch operations and inventory objects required because of the need to know which object files to download in the end.

Inventory object's original algorithm is as follows.

It lists all the local branches of the latest in a commit
Lists all the remote branch the latest in a commit
Compare the two, as long as there are different, it means that the branch changed
Each commit changes occur, which are counted subdirectories and files for the specific changes
Dating back to the parent node of the current commit, repeat the fourth step, until local and remote history of consistency so far
The sum of all the objects need to be changed
The above description of the process, "Inventory Object" is a document traversal algorithm, the object will change to inventory was 11, which means that a large number of file reads. For large code base, this process is very slow.

New Algorithm Github team think, is to create a Bitmap index, that is, each commit to generate a binary value.

Open the local Github repository .git / objects / pack / directory, you will see an index file and a data file, they are Bitmap. Simply put, these two files indexed all the current object code libraries, and then use a binary value representing these objects. How many objects there are that many bits of the binary value. Its first n bits to represent the data file inside the n-th object.

Each commit will have a corresponding binary value, all objects contained in the current snapshot representation. These objects corresponding bit is 1, the other bits are 0.

This benefit is not read commit objects, just read the binary value, you will know which node contains the current commit. Even better, as long as the two binary values ​​do a XOR operation, we will know which bits (ie, which objects) changes have taken place. Moreover, because the new objects are always added to the existing binary bits, so long as those who read the extra bit, you know the current commit more than in the last commit what object.

As a result, "Inventory Object" becomes a binary value comparison operation, so fast. Further introduction, see the official documentation "Bitmap interpretation", "Bitmap format."

Currently, Github production environment has been deployed with this algorithm, users no longer have to count the objects, while waiting over. Moreover, Github team also merged it into Git, which means that, from now on all Git implementations can use the Bitmap function, so the future will definitely have more fun usage appears.
     
         
         
         
  More:      
 
- CentOS yum install LNMP PHP5.4 version (Server)
- Android judgment toward camera pictures (Programming)
- Ubuntu Thunderbird 24.4.0 (Linux)
- Bootstrap 3.3.5 release download, Web front-end UI framework (Linux)
- Ubuntu 14.04 install Sublime Text 3 plug and use SublimeClang (Linux)
- VMware virtual machine can not start VMnet0 no Internet access and other issues (Linux)
- Linux file compression and archiving (Linux)
- GAMIT10.5 under CentOS installation (Linux)
- JDK comes with tools JPS (Linux)
- C language binary tree counts words (Programming)
- Transfer MySQL database to MariaDB (Database)
- Joseph Central Java implementation (Programming)
- SYN attack hacker attack and defense of the basic principles and prevention technology (Linux)
- Oracle TAF Analysis (Database)
- RHEL6.4 one key installation Redmine (Linux)
- Java string concatenation techniques (StringBuilder tips) (Programming)
- CentOS 7 install Hadoop-cdh-2.6 (Server)
- Oracle 11g R2 RAC RMAN backup script example (Database)
- Use Mop monitor stock prices at the Linux command line (Linux)
- Talk Packages (Linux)
     
           
     
  CopyRight 2002-2022 newfreesoft.com, All Rights Reserved.