Cheng Xiang Zhai

Cheng Xiang Zhai

ChengXiang Zhai is a computer scientist. He is a Donald Biggar Willett Professor in Engineering in the Department of Computer Science at the University of Illinois at Urbana-Champaign. == Biography == Zhai received the BS (1984), MS (1987, under Guoliang Zheng), and PhD (1990, under Jiafu Xu) in Computer Science from Nanjing University. He spent 1990 to 1993 working at Nanjing University's State Key Laboratory for Novel Software Technology. In 1993, he left for America to pursue a second PhD, this time at Carnegie Mellon University (CMU) with David A. Evans. Evans then left to spend more time with the company ClariTech. Zhai obtained from CMU a MS (1997) in computational linguistics and then started working with John Lafferty. He finally received from CMU a PhD in Language and Information Technologies in 2002. Since then, he has been an Assistant Professor (2002–2008), Associate Professor (2008–2013), Professor (2013–2018), and Donald Biggar Willett Professor (2018–) at the UIUC Department of Computer Science. He also holds joint appointments with the Carl R. Woese Institute for Genomic Biology, Department of Statistics, and School of Information Sciences at UIUC. == Awards == ACM SIGIR Gerard Salton Award, 2021, "for significant and sustained contributions to information retrieval and data science. His work has defined many of the theoretical foundations of the language modeling approach, yielding major insights into areas such as smoothing methods, relevance feedback, topic diversification, and text representations that incorporate positional information. He and his collaborators have also pioneered the axiomatic approach to information retrieval, which continues to provide inspiration for retrieval model and evaluation research." ACM SIGIR Academy inductee, 2021 ACM Fellow, 2017, "for contributions to information retrieval and text data mining." ACM SIGIR Test of Time Award, 2016, for paper A study of smoothing methods for language models applied to Ad Hoc information retrieval ACM SIGIR Test of Time Award, 2016, for paper Document language models, query models, and risk minimization for information retrieval ACM SIGIR Test of Time Award, 2014, for paper Beyond independent relevance: methods and evaluation metrics for subtopic retrieval ACM Distinguished Member, 2009 Presidential Early Career Award for Scientists and Engineers (PECASE), 2004, "for his work on user-centered, adaptive intelligent information access. His techniques expect to improve search-engine performance, support better information organization and enable understanding of large volumes of information. Zhai's work in information retrieval is expected to enhance curricula and provide new educational tools for the growing information technology workforce." ACM SIGIR Best Paper Award, 2004, for paper A formal study of information retrieval heuristics == Personal == Zhai's son Alex has earned three medals at the International Mathematical Olympiad.

Neural scaling law

In machine learning, a neural scaling law is an empirical scaling law that describes how neural network performance changes as key factors are scaled up or down. These factors typically include the number of parameters, training dataset size, and training cost. Some models also exhibit performance gains by scaling inference through increased test-time compute (TTC), extending neural scaling laws beyond training to the deployment phase. == Introduction == In general, a deep learning model can be characterized by four parameters: model size, training dataset size, training cost, and the post-training error rate (e.g., the test set error rate). Each of these variables can be defined as a real number, usually written as N , D , C , L {\displaystyle N,D,C,L} (respectively: parameter count, dataset size, computing cost, and loss). A neural scaling law is a theoretical or empirical statistical law between these parameters. There are also other parameters with other scaling laws. === Size of the model === In most cases, the model's size is simply the number of parameters. However, one complication arises with the use of sparse models, such as mixture-of-expert models. With sparse models, during inference, only a fraction of their parameters are used. In comparison, most other kinds of neural networks, such as transformer models, always use all their parameters during inference. === Size of the training dataset === The size of the training dataset is usually quantified by the number of data points within it. Larger training datasets are typically preferred, as they provide a richer and more diverse source of information from which the model can learn. This can lead to improved generalization performance when the model is applied to new, unseen data. However, increasing the size of the training dataset also increases the computational resources and time required for model training. With the "pretrain, then finetune" method used for most large language models, there are two kinds of training dataset: the pretraining dataset and the finetuning dataset. Their sizes have different effects on model performance. Generally, the finetuning dataset is less than 1% the size of pretraining dataset. In some cases, a small amount of high quality data suffices for finetuning, and more data does not necessarily improve performance. Many scaling laws, due to their inherent diminishing returns nature, value data based on a submodular set function which was shown in a paper on this topic. === Cost of training === Training cost is typically measured in terms of time (how long it takes to train the model) and computational resources (how much processing power and memory are required). It is important to note that the cost of training can be significantly reduced with efficient training algorithms, optimized software libraries, and parallel computing on specialized hardware such as GPUs or TPUs. The cost of training a neural network model is a function of several factors, including model size, training dataset size, the training algorithm complexity, and the computational resources available. In particular, doubling the training dataset size does not necessarily double the cost of training, because one may train the model for several times over the same dataset (each being an "epoch"). === Performance === The performance of a neural network model is evaluated based on its ability to accurately predict the output given some input data. Common metrics for evaluating model performance include: Negative log-likelihood per token (logarithm of perplexity) for language modeling; Accuracy, precision, recall, and F1 score for classification tasks; Mean squared error (MSE) or mean absolute error (MAE) for regression tasks; Elo rating in a competition against other models, such as gameplay or preference by a human judge. Performance can be improved by using more data, larger models, different training algorithms, regularizing the model to prevent overfitting, and early stopping using a validation set. When the performance is a number bounded within the range of [ 0 , 1 ] {\displaystyle [0,1]} , such as accuracy, precision, etc., it often scales as a sigmoid function of cost, as seen in the figures. == Examples == === (Hestness, Narang, et al, 2017) === The 2017 paper is a common reference point for neural scaling laws fitted by statistical analysis on experimental data. Previous works before the 2000s, as cited in the paper, were either theoretical or orders of magnitude smaller in scale. Whereas previous works generally found the scaling exponent to scale like L ∝ D − α {\displaystyle L\propto D^{-\alpha }} , with α ∈ { 0.5 , 1 , 2 } {\displaystyle \alpha \in \{0.5,1,2\}} , the paper found that α ∈ [ 0.07 , 0.35 ] {\displaystyle \alpha \in [0.07,0.35]} . Of the factors they varied, only task can change the exponent α {\displaystyle \alpha } . Changing the architecture optimizers, regularizers, and loss functions, would only change the proportionality factor, not the exponent. For example, for the same task, one architecture might have L = 1000 D − 0.3 {\displaystyle L=1000D^{-0.3}} while another might have L = 500 D − 0.3 {\displaystyle L=500D^{-0.3}} . They also found that for a given architecture, the number of parameters necessary to reach lowest levels of loss, given a fixed dataset size, grows like N ∝ D β {\displaystyle N\propto D^{\beta }} for another exponent β {\displaystyle \beta } . They studied machine translation with LSTM ( α ∼ 0.13 {\displaystyle \alpha \sim 0.13} ), generative language modelling with LSTM ( α ∈ [ 0.06 , 0.09 ] , β ≈ 0.7 {\displaystyle \alpha \in [0.06,0.09],\beta \approx 0.7} ), ImageNet classification with ResNet ( α ∈ [ 0.3 , 0.5 ] , β ≈ 0.6 {\displaystyle \alpha \in [0.3,0.5],\beta \approx 0.6} ), and speech recognition with two hybrid (LSTMs complemented by either CNNs or an attention decoder) architectures ( α ≈ 0.3 {\displaystyle \alpha \approx 0.3} ). === (Henighan, Kaplan, et al, 2020) === A 2020 analysis studied statistical relations between C , N , D , L {\displaystyle C,N,D,L} over a wide range of values and found similar scaling laws, over the range of N ∈ [ 10 3 , 10 9 ] {\displaystyle N\in [10^{3},10^{9}]} , C ∈ [ 10 12 , 10 21 ] {\displaystyle C\in [10^{12},10^{21}]} , and over multiple modalities (text, video, image, text to image, etc.). In particular, the scaling laws it found are (Table 1 of ): For each modality, they fixed one of the two C , N {\displaystyle C,N} , and varying the other one ( D {\displaystyle D} is varied along using D = C / 6 N {\displaystyle D=C/6N} ), the achievable test loss satisfies L = L 0 + ( x 0 x ) α {\displaystyle L=L_{0}+\left({\frac {x_{0}}{x}}\right)^{\alpha }} where x {\displaystyle x} is the varied variable, and L 0 , x 0 , α {\displaystyle L_{0},x_{0},\alpha } are parameters to be found by statistical fitting. The parameter α {\displaystyle \alpha } is the most important one. When N {\displaystyle N} is the varied variable, α {\displaystyle \alpha } ranges from 0.037 {\displaystyle 0.037} to 0.24 {\displaystyle 0.24} depending on the model modality. This corresponds to the α = 0.34 {\displaystyle \alpha =0.34} from the Chinchilla scaling paper. When C {\displaystyle C} is the varied variable, α {\displaystyle \alpha } ranges from 0.048 {\displaystyle 0.048} to 0.19 {\displaystyle 0.19} depending on the model modality. This corresponds to the β = 0.28 {\displaystyle \beta =0.28} from the Chinchilla scaling paper. Given fixed computing budget, optimal model parameter count is consistently around N o p t ( C ) = ( C 5 × 10 − 12 petaFLOP-day ) 0.7 = 9.0 × 10 − 7 C 0.7 {\displaystyle N_{opt}(C)=\left({\frac {C}{5\times 10^{-12}{\text{petaFLOP-day}}}}\right)^{0.7}=9.0\times 10^{-7}C^{0.7}} The parameter 9.0 × 10 − 7 {\displaystyle 9.0\times 10^{-7}} varies by a factor of up to 10 for different modalities. The exponent parameter 0.7 {\displaystyle 0.7} varies from 0.64 {\displaystyle 0.64} to 0.75 {\displaystyle 0.75} for different modalities. This exponent corresponds to the ≈ 0.5 {\displaystyle \approx 0.5} from the Chinchilla scaling paper. It's "strongly suggested" (but not statistically checked) that D o p t ( C ) ∝ N o p t ( C ) 0.4 ∝ C 0.28 {\displaystyle D_{opt}(C)\propto N_{opt}(C)^{0.4}\propto C^{0.28}} . This exponent corresponds to the ≈ 0.5 {\displaystyle \approx 0.5} from the Chinchilla scaling paper. The scaling law of L = L 0 + ( C 0 / C ) 0.048 {\displaystyle L=L_{0}+(C_{0}/C)^{0.048}} was confirmed during the training of GPT-3 (Figure 3.1 ). === Chinchilla scaling (Hoffmann, et al, 2022) === One particular scaling law ("Chinchilla scaling") states that, for a large language model (LLM) autoregressively trained for one epoch, with a cosine learning rate schedule, we have: { C = C 0 N D L = A N α + B D β + L 0 {\displaystyle {\begin{cases}C=C_{0}ND\\L={\frac {A}{N^{\alpha }}}+{\frac {B}{D^{\beta }}}+L_{0}\end{cases}}} where the variables are C {\displaystyle C} is the cost o

Radio code

A radio code is any code that is commonly used over a telecommunication system such as Morse code, brevity codes and procedure words. == Brevity code == Brevity codes are designed to convey complex information with a few words or codes. Specific brevity codes include: ACP-131 Aeronautical Code signals ARRL Numbered Radiogram Multiservice tactical brevity code Ten-code Phillips Code NOTAM Code === Operating signals === Brevity codes that are specifically designed for use between communications operators and to support communication operations are referred to as "operating signals". These include: Prosigns for Morse code 92 Code, Western Union telegraph brevity codes Q code, initially developed for commercial radiotelegraph communication, later adopted by other radio services, especially amateur radio. Used since circa 1909. QN Signals, published by the ARRL and used by Amateur radio operators to assist in the transmission of ARRL Radiograms in the National Traffic System. R and S brevity codes, published by the British Post Office in 1908 for coastal wireless stations and ships, superseded in 1912 by Q codes X code, used by European military services as a wireless telegraphy code in the 1930s and 1940s Z code, also used in the early days of radiotelegraph communication. == Other == Morse code is commonly used in amateur radio. Morse code abbreviations are a type of brevity code. Procedure words used in radiotelephony procedure, are a type of radio code. Spelling alphabets, including the ICAO spelling alphabet, are commonly used in communication over radios and telephones. == Other meanings == Many car audio systems (car radios) have a so-called 'radio code' number which needs to be entered after a power disconnection. This was introduced as a measure to deter theft of these devices. If the code is entered correctly, the radio is activated for use. Entering the code incorrectly several times in a row will cause a temporary or permanent lockout. Some car radios have another check which operates in conjunction with car electronics. If the VIN or another vehicle ID matches the previously stored one, the radio is activated. If the radio cannot verify the vehicle, it is considered to be moved into another vehicle. The radio will then request for the code number or simply refuse to operate and display an error message such as "CANCHECK" or "SECURE".

Interstellar communication

Interstellar communication is the transmission of signals between planetary systems. Sending interstellar messages is potentially much easier than interstellar travel, being possible with technologies and equipment which are currently available. However, the distances from Earth to other potentially inhabited systems introduce prohibitive delays, assuming the limitations of the speed of light. Even an immediate reply to radio communications sent to stars tens of thousands of light-years away would take many human generations to arrive. == Radio == The SETI project has for the past several decades been conducting a search for signals being transmitted by extraterrestrial life located outside the Solar System, primarily in the radio frequencies of the electromagnetic spectrum. Special attention has been given to the Water Hole, the frequency of one of neutral hydrogen's absorption lines, due to the low background noise at this frequency and its symbolic association with the basis for what is likely to be the most common system of biochemistry (but see alternative biochemistry). The regular radio pulses emitted by pulsars were briefly thought to be potential intelligent signals; the first pulsar to be discovered was originally designated "LGM-1", for "Little Green Men." They were quickly determined to be of natural origin, however. Several attempts have been made to transmit signals to other stars as well. (See "Realized projects" at Active SETI.) One of the earliest and most famous was the 1974 radio message sent from the largest radio telescope in the world, the Arecibo Observatory in Puerto Rico. An extremely simple message was aimed at a globular cluster of stars known as M13 in the Milky Way Galaxy and at a distance of 30,000 light years from the Solar System. These efforts have been more symbolic than anything else, however. Further, a possible answer needs double the travel time, i.e. tens of years (near stars) or 60,000 years (M13). == Other methods == It has also been proposed that higher frequency signals, such as lasers operating at visible light frequencies, may prove to be a fruitful method of interstellar communication; at a given frequency it takes surprisingly small energy output for a laser emitter to outshine its local star from the perspective of its target. Other more exotic methods of communication have been proposed, such as modulated neutrino or gravitational wave emissions. These would have the advantage of being essentially immune to interference by intervening matter. Sending physical mail packets between stars may prove to be optimal for many applications. While mail packets would likely be limited to speeds far below that of electromagnetic or other light-speed signals (resulting in very high latency), the amount of information that could be encoded in only a few tons of physical matter could more than make up for it in terms of average bandwidth. The possibility of using interstellar messenger probes for interstellar communication — known as Bracewell probes — was first suggested by Ronald N. Bracewell in 1960, and the technical feasibility of this approach was demonstrated by the British Interplanetary Society's starship study Project Daedalus in 1978. Starting in 1979, Robert Freitas advanced arguments for the proposition that physical space-probes provide a superior mode of interstellar communication to radio signals, then undertook telescopic searches for such probes in 1979 and 1982.

Distributed operating system

A distributed operating system is system software over a collection of independent software, networked, communicating, and physically separate computational nodes. They handle jobs which are serviced by multiple CPUs. Each individual node holds a specific software subset of the global aggregate operating system. Each subset is a composite of two distinct service provisioners. The first is a ubiquitous minimal kernel, or microkernel, that directly controls that node's hardware. Second is a higher-level collection of system management components that coordinate the node's individual and collaborative activities. These components abstract microkernel functions and support user applications. The microkernel and the management components collection work together. They support the system's goal of integrating multiple resources and processing functionality into an efficient and stable system. This seamless integration of individual nodes into a global system is referred to as transparency, or single system image; describing the illusion provided to users of the global system's appearance as a single computational entity. == Description == A distributed OS provides the essential services and functionality required of an OS but adds attributes and particular configurations to allow it to support additional requirements such as increased scale and availability. To a user, a distributed OS works in a manner similar to a single-node, monolithic operating system. That is, although it consists of multiple nodes, it appears to users and applications as a single-node. Separating minimal system-level functionality from additional user-level modular services provides a "separation of mechanism and policy". Mechanism and policy can be simply interpreted as "what something is done" versus "how something is done," respectively. This separation increases flexibility and scalability. == Overview == === The kernel === At each locale (typically a node), the kernel provides a minimally complete set of node-level utilities necessary for operating a node's underlying hardware and resources. These mechanisms include allocation, management, and disposition of a node's resources, processes, communication, and input/output management support functions. Within the kernel, the communications sub-system is of foremost importance for a distributed OS. In a distributed OS, the kernel often supports a minimal set of functions, including low-level address space management, thread management, and inter-process communication (IPC). A kernel of this design is referred to as a microkernel. Its modular nature enhances reliability and security, essential features for a distributed OS. === System management === System management components are software processes that define the node's policies. These components are the part of the OS outside the kernel. These components provide higher-level communication, process and resource management, reliability, performance and security. The components match the functions of a single-entity system, adding the transparency required in a distributed environment. The distributed nature of the OS requires additional services to support a node's responsibilities to the global system. In addition, the system management components accept the "defensive" responsibilities of reliability, availability, and persistence. These responsibilities can conflict with each other. A consistent approach, balanced perspective, and a deep understanding of the overall system can assist in identifying diminishing returns. Separation of policy and mechanism mitigates such conflicts. === Working together as an operating system === The architecture and design of a distributed operating system must realize both individual node and global system goals. Architecture and design must be approached in a manner consistent with separating policy and mechanism. In doing so, a distributed operating system attempts to provide an efficient and reliable distributed computing framework allowing for an absolute minimal user awareness of the underlying command and control efforts. The multi-level collaboration between a kernel and the system management components, and in turn between the distinct nodes in a distributed operating system is the functional challenge of the distributed operating system. This is the point in the system that must maintain a perfect harmony of purpose, and simultaneously maintain a complete disconnect of intent from implementation. This challenge is the distributed operating system's opportunity to produce the foundation and framework for a reliable, efficient, available, robust, extensible, and scalable system. However, this opportunity comes at a very high cost in complexity. === The price of complexity === In a distributed operating system, the exceptional degree of inherent complexity could easily render the entire system an anathema to any user. As such, the logical price of realizing a distributed operation system must be calculated in terms of overcoming vast amounts of complexity in many areas, and on many levels. This calculation includes the depth, breadth, and range of design investment and architectural planning required in achieving even the most modest implementation. These design and development considerations are critical and unforgiving. For instance, a deep understanding of a distributed operating system's overall architectural and design detail is required at an exceptionally early point. An exhausting array of design considerations are inherent in the development of a distributed operating system. Each of these design considerations can potentially affect many of the others to a significant degree. This leads to a massive effort in balanced approach, in terms of the individual design considerations, and many of their permutations. As an aid in this effort, most rely on documented experience and research in distributed computing power. == History == Research and experimentation efforts began in earnest in the 1970s and continued through the 1990s, with focused interest peaking in the late 1980s. A number of distributed operating systems were introduced during this period; however, very few of these implementations achieved even modest commercial success. Fundamental and pioneering implementations of primitive distributed operating system component concepts date to the early 1950s. Some of these individual steps were not focused directly on distributed computing, and at the time, many may not have realized their important impact. These pioneering efforts laid important groundwork, and inspired continued research in areas related to distributed computing. In the mid-1970s, research produced important advances in distributed computing. These breakthroughs provided a solid, stable foundation for efforts that continued through the 1990s. The accelerating proliferation of multi-processor and multi-core processor systems research led to a resurgence of the distributed OS concept. === The DYSEAC === One of the first efforts was the DYSEAC, a general-purpose synchronous computer. In one of the earliest publications of the Association for Computing Machinery, in April 1954, a researcher at the National Bureau of Standards – now the National Institute of Standards and Technology (NIST) – presented a detailed specification of the DYSEAC. The introduction focused upon the requirements of the intended applications, including flexible communications, but also mentioned other computers: Finally, the external devices could even include other full-scale computers employing the same digital language as the DYSEAC. For example, the SEAC or other computers similar to it could be harnessed to the DYSEAC and by use of coordinated programs could be made to work together in mutual cooperation on a common task… Consequently[,] the computer can be used to coordinate the diverse activities of all the external devices into an effective ensemble operation. The specification discussed the architecture of multi-computer systems, preferring peer-to-peer rather than master-slave. Each member of such an interconnected group of separate computers is free at any time to initiate and dispatch special control orders to any of its partners in the system. As a consequence, the supervisory control over the common task may initially be loosely distributed throughout the system and then temporarily concentrated in one computer, or even passed rapidly from one machine to the other as the need arises. …the various interruption facilities which have been described are based on mutual cooperation between the computer and the external devices subsidiary to it, and do not reflect merely a simple master-slave relationship. This is one of the earliest examples of a computer with distributed control. The Dept. of the Army reports certified it reliable and that it passed all acceptance tests in April 1954. It was completed and delivered on time, in May 1954. This was a "portable comput

Lexxe

Lexxe is an internet search engine that applies Natural Language Processing in its semantic search technology. Founded in 2005 by Dr. Hong Liang Qiao, Lexxe is based in Sydney, Australia. Today, Lexxe's key focus is on sentiment search with the launch of a news sentiment search site at News & Moods (www.newsandmoods.com). Lexxe has experienced several stages of change of focus in search technology: Lexxe launched its Alpha version in 2005, featuring Natural Language question answering (i.e. users could ask questions in English to the search engine apart from keyword searches — this feature has been suspended for redevelopment since 2010). It used only algorithms to extract answers from web pages, with no question-answer pair databases prepared in advance. In 2011, Lexxe launched a beta version with a new search technology called Semantic Key. Semantic Keys enable users to query with a conceptual keyword (or a keyword with a special meaning, hence the term Semantic Key) in order to find instances under the concept, e.g. price → $5.95 or €200, color → red, yellow, white. For example, “price: a pound of apples”, “color: ferrari”. With initial 500 Semantic Keys at the Beta launch, Lexxe became the first search engine in the world to offer this unique and useful search technology to the users. The cost of building Semantic Keys was too heavy though. In 2017, Lexxe launched News & Moods (www.newsandmoods.com), an open platform for news sentiment search, a first step towards sentiment search feature for the entire Internet search in Lexxe search engine. News & Moods also comes with smartphone apps in Android and iOS.

VHS

VHS (Video Home System) is a discontinued standard for consumer-level analog video recording on tape cassettes, introduced in 1976 by JVC. It was the dominant home video format throughout the tape media period of the 1980s and 1990s. Magnetic tape video recording was adopted by the television industry in the 1950s in the form of the first commercialized video tape recorders (VTRs), but the devices were expensive and used only in professional environments. In the 1970s, videotape technology became affordable for home use, and widespread adoption of videocassette recorders (VCRs) began; the VHS became the most popular media format for VCRs as it would win the "format war" against Betamax (backed by Sony) and a number of other competing tape standards. The cassettes themselves use a 0.5-inch (12.7 mm) magnetic tape between two spools and typically offer a capacity of at least two hours. The popularity of VHS was intertwined with the rise of the video rental market, when films were released on pre-recorded videotapes for home viewing. Newer improved tape formats such as S-VHS were later developed, as well as the earliest optical disc format, LaserDisc; the lack of global adoption of these formats increased VHS's lifetime, which eventually peaked and started to decline in the late 1990s after the introduction of DVD, a digital optical disc format. VHS rentals were surpassed by DVD in the United States in 2003, which eventually became the preferred low-end method of movie distribution. For home recording purposes, VHS and VCRs were surpassed by (typically hard disk–based) digital video recorders (DVR) in the 2000s. Production of all VHS equipment ceased by 2016, although the format has since gained some popularity amongst collectors. A niche revival of VHS has taken place with This Is How The World Ends becoming the first straight-to-VHS release in 20 years. == History == === Before VHS === In 1956, after several attempts by other companies, the first commercially successful VTR, the Ampex VRX-1000, was introduced by Ampex Corporation. At a price of US$50,000 in 1956 (equivalent to $592,000 in 2025) and US$300 (equivalent to $3,600 in 2025) for a 90-minute reel of tape, it was intended only for the professional market. Kenjiro Takayanagi, a television broadcasting pioneer then working for JVC as its vice president, saw the need for his company to produce VTRs for the Japanese market at a more affordable price. In 1959, JVC developed a two-head video tape recorder and, by 1960, a color version for professional broadcasting. In 1964, JVC released the DV220, which would be the company's standard VTR until the mid-1970s. In 1969, JVC collaborated with Sony and Matsushita Electric (Matsushita was the majority stockholder of JVC until 2011) to build a video recording standard for the Japanese consumer. The effort produced the U-matic format in 1971, which was the first cassette format to become a unified standard for different companies. It was preceded by the reel-to-reel 1⁄2-inch EIAJ format. The U-matic format was successful in businesses and some broadcast television applications, such as electronic news-gathering, and was produced by all three companies until the late 1980s, but because of cost and limited recording time, very few of the machines were sold for home use. Therefore, soon after the U-Matic release, all three companies started working on new consumer-grade video recording formats of their own. Sony started working on Betamax, Matsushita started working on VX, and JVC released the CR-6060 in 1975, based on the U-matic format. === VHS development === In 1971, JVC engineers Yuma Shiraishi and Shizuo Takano put together a team to develop a VTR for consumers. By the end of 1971, they created an internal diagram, "VHS Development Matrix", which established twelve objectives for JVC's new VTR; among them: The system must be compatible with any ordinary television set. Picture quality must be similar to a normal air broadcast. The tape must have at least a two-hour recording capacity. Tapes must be interchangeable between machines. The overall system should be versatile, meaning it can be scaled and expanded, such as connecting a video camera, or dubbing between two recorders. Recorders should be affordable, easy to operate, and have low maintenance costs. Recorders must be capable of being produced in high volume, their parts must be interchangeable, and they must be easy to service. In early 1972, the commercial video recording industry in Japan took a financial hit. JVC cut its budgets and restructured its video division, shelving the VHS project. However, despite the lack of funding, Takano and Shiraishi continued to work on the project in secret. By 1973, the two engineers had produced a functional prototype. === Competition with Betamax === In 1974, the Japanese Ministry of International Trade and Industry (MITI), desiring to avoid consumer confusion, attempted to force the Japanese video industry to standardize on just one home video recording format. Later, Sony had a functional prototype of the Betamax format, and was very close to releasing a finished product. With this prototype, Sony persuaded the MITI to adopt Betamax as the standard, and allow it to license the technology to other companies. JVC believed that an open standard, with the format shared among competitors without licensing the technology, was better for the consumer. To prevent the MITI from adopting Betamax, JVC worked to convince other companies, in particular Matsushita (Japan's largest electronics manufacturer at the time, marketing its products under the National brand in most territories and the Panasonic brand in North America, and JVC's majority stockholder), to accept VHS, and thereby work against Sony and the MITI. Matsushita agreed, fearing Sony would dominate the market with a Betamax monopoly. Matsushita also regarded Betamax's one-hour recording time limit as a disadvantage. Matsushita's backing of JVC persuaded Hitachi, Mitsubishi, and Sharp to back the VHS standard as well. Sony's release of its Betamax unit to the Japanese market in 1975 placed further pressure on the MITI to side with the company. However, the collaboration of JVC and its partners was much stronger, which eventually led the MITI to drop its push for an industry standard. JVC released the first VHS machines in Japan in late 1976, and in the United States in mid-1977. Sony's Betamax competed with VHS throughout the late 1970s and into the 1980s (see Videotape format war). Betamax's major advantages were its smaller cassette size, theoretical higher video quality, and earlier availability, but its shorter recording time proved to be a major shortcoming. Originally, Beta I machines using the NTSC television standard were able to record one hour of programming at their standard tape speed of 1.5 inches per second (ips). The first VHS machines could record for two hours, due to both a slightly slower tape speed (1.31 ips) and significantly longer tape. Betamax's smaller cassette limited the size of the reel of tape, and could not compete with VHS's two-hour capability by extending the tape length. Instead, Sony had to slow the tape down to 0.787 ips (Beta II) in order to achieve two hours of recording in the same cassette size. Sony eventually created a Beta III speed of 0.524 ips, which allowed NTSC Betamax to break the two-hour limit, but by then VHS had already won the format battle. Additionally, VHS had a "far less complex tape transport mechanism" than Betamax, and VHS machines were faster at rewinding and fast-forwarding than their Sony counterparts. VHS eventually won the war, gaining 60% of the North American market by 1980. == Initial releases of VHS-based devices == The first VCR to use VHS was the Victor HR-3300, and was introduced by the president of JVC in Japan on September 9, 1976. JVC started selling the HR-3300 in Akihabara, Tokyo, Japan, on October 31, 1976. Region-specific versions of the JVC HR-3300 were also distributed later on, such as the HR-3300U in the United States, and the HR-3300EK in the United Kingdom. The United States received its first VHS-based VCR, the RCA VBT200, on August 23, 1977. The RCA unit was designed by Matsushita and was the first VHS-based VCR manufactured by a company other than JVC. It was also capable of recording four hours in LP (long play) mode. The UK received its first VHS-based VCR, the Victor HR-3300EK, in 1978. Quasar and General Electric followed-up with VHS-based VCRs – all designed by Matsushita. By 1999, Matsushita alone produced just over half of all Japanese VCRs. TV/VCR combos, combining a TV set with a VHS mechanism, were also once available for purchase. Combo units containing both a VHS mechanism and a DVD player were introduced in the late 1990s, and at least one combo unit, the Panasonic DMP-BD70V, included a Blu-ray player. == Technical details == VHS has been standardized in IEC 60774–1. === Cassette and