Tuesday, 26 November 2019

Op Ed: Want to Learn About Bitcoin? Try Contributing a Transcript

https://ift.tt/12L05rn

Continuing the series on the various ways one can learn about the technical aspects of Bitcoin, in this article we will focus on transcripts and contributing to or reading the archive of transcripts maintained by Bryan Bishop (kanzure).

In the early years of Bitcoin’s history, all communication involving Satoshi Nakamoto occurred online on mailing lists, IRC and the BitcoinTalk forum. These years are well archived by the Satoshi Nakamoto Institute. There are no recordings of Satoshi speaking, presumably as they could have been used to identify him. However, once in-person meetups, conferences and meetings of core developers started to be organized, there was a danger of content from verbal presentations and discussions disappearing and being forgotten.

In the last decade, Bishop has transcribed over 600 transcripts racking up over a million-and-a-half words. The transcripts can be accessed here and pull requests to add or edit a transcript can be submitted to this GitHub repository. A small selection of highlights include a transcript on choosing safe curves for elliptic curve cryptography from 2014, a transcript of Greg Maxwell presenting confidential transactions from 2017 and the transcripts from the Bitcoin Core developer meetings that are not filmed or otherwise recorded.

Typing at the Speed of Lightning

At the CES Summit 2019, Bishop explained why all talks should have transcripts. These reasons include facilitating further discussion after the talk, distributing the content beyond the attendees in the room, and text being easier to parse and search than video and audio. His presentation spurred others to attempt to transcribe Bishop’s talk in real time.

Bishop takes pride in publishing the transcript before the speaker has sat down. He believes the immediate availability of the transcript is the most critical factor for those who utilize the transcripts, even more important than the quality of the content. It is certainly true that having a transcript available immediately at the conclusion is extremely valuable for supporting further in-person discussions and for bringing up to speed those who are not present but interested in what was discussed.

Granted, Bishop is an extremely fast typist. He started transcribing in high school when he sought to prove to his high school principal that the classes were a waste of his time. After four years of transcribing the classes’ content, he realized no one cared.

However, one upside of the experience is that Bishop was ranked 30th for typing speed out of 5 million competitors. He can type up to 200 words per minute. Court stenographers can typically type faster than this but they take advantage of special keyboards called stenotypes and a system of abbreviations called shorthand. If it wasn’t for his high-paying career in software development, Bishop could try to join the ranks of court stenographers earning around $200,000 per year.

The fastest speaker in the Bitcoin ecosystem is undoubtedly Laolu Osuntokun (roasbeef), CTO of Lightning Labs. He has become almost as renowned for his pace of verbal delivery as his weighty contributions to the lnd Lightning implementation and his work on Neutrino, the privacy-preserving light client. So if anyone in the Bitcoin ecosystem would be able to defeat Bishop, it would be him. 

However, Bishop, with his ability to type up to 200 words per minute, has risen to the challenge on a number of occasions and conquered this particular human adversary. (The rivalry is obviously entirely good-natured and other individuals in the Bitcoin community have got involved in the fun on Twitter [1] and [2])

AI: Not a Complete Alternative

So no human speaker in the Bitcoin ecosystem has been able to defeat Bishop. But what about artificial intelligence? As it did in chess and the board game Go, is AI able to overpower the best humanity can offer and type at least as fast as Bishop but with even greater accuracy? The answer to this question is not yet. 

The Stephan Livera Podcast is one of the most popular Bitcoin podcasts. Livera has experimented with transcripts on his show. Initially, a sponsor of the show (GiveBitcoin) paid for human transcription on a small subset of episodes and they are available on Livera’s site. Some of them have since been added to the transcript repository maintained by Bishop. These “polished” transcripts were purchased from rev.com. They are high quality in terms of accuracy, they promise to be 99 percent accurate but they cost $1 per audio minute. 

Livera has also tried machine-generated transcripts from rev.com. These cost only $0.10 per audio minute but are only promised to be 80 percent accurate. Therefore, they require Livera or somebody else to edit them afterward.

The Challenge of ‘Searchability’ in Transcripts

On the Software Engineering Daily podcast, Wenbin Fang — the founder of ListenNotes, a podcast search engine — discussed with Jeff Meyerson the latest state of podcast transcripts. Unlike Livera who is only concerned with the content he produces, ListenNotes is interested in all the podcasts that anyone in the world produces. 

In an ideal world, all podcasts would be transcribed. Indexing on accurate transcripts would allow you to search “Bitcoin” and thus find every single podcast episode that mentioned Bitcoin even once. 

However, Fang struggles with the same transcription challenges as Livera. He offers transcripts to paying customers and uses Google’s Speech-to-Text API to generate them, which currently costs $0.024 per audio minute. The accuracy of these transcripts is generally not of sufficient quality. They may be good enough to surface some keywords for a search engine index but the reading experience offered directly to a human is subpar. 

Fang also can’t afford to pay for this transcription for every podcast episode ever created. Instead, he relies on metadata for his search engine which ideally includes keywords, the title and a description of the podcast.

Bishop himself has experimented with machine learning. He built a Tensorflow implementation of Baidu’s DeepSpeech and trained his model using audiobooks. With very few technical Bitcoin books in existence and even fewer that are available in audiobook format, it is unsurprising that he encountered an approximate 20 percent error rate in word recognition. So, for now at least, Bishop rules over AI for technical Bitcoin transcripts.

Ensuring Permanence

Another concern that transcripts address is the reliance on YouTube and other video hosting sites to preserve videos of presentations and to not start charging for access to them and/or restrict access to them. Once a video is uploaded to a video hosting site, it is unclear how many of the uploaders continue to store these large video files locally. 

Bishop reckons that the half life of any given hyperlink on the web is less than a few years. As Bitcoin Magazine’s Vlad Costea reports, there have been numerous examples of YouTube making changes to how videos are monetized and how likely a certain video will show up in a user search. Additionally, the continuous changes to platform policies can sometimes result in the outright removal of certain types of content. With text files much smaller than video files, a large collection of transcripts can easily be self-hosted and/or made available on the Internet Archive.

How Can You Help?

Even if you don’t have Bishop’s typing abilities, you can still complete transcripts from videos and podcasts that Bishop has yet to transcribe. These include some of Bishop’s own presentations and podcast appearances. (Although Bishop is perhaps best known in the Bitcoin community for his transcripts, he is also a long-term contributor to Bitcoin Core, has published various proposals including on Bitcoin Vaults and even finds the time to work on notable biotech projects). 

It’s also possible to look back and open pull requests on some of Bishop’s past transcripts, in case you are able to find inaccuracies, typos or missing sections, or would like to add references. The transcripts can often be improved by someone with the advantage of playback, volume control and speed adjustment. 

Bishop notes that his transcripts aren’t always the most accurate. “I type as fast as I can, and sometimes my own ideas spill out when I am trying to fill in gaps as I go along. Most often, any errors are my own and not those of the speaker,” he says. 

If there is a presentation or podcast that you find educational or informative then consider transcribing it. The exercise forces you to listen to the speaker’s every word and challenges your understanding of the topic to a greater extent than if you were merely passively listening. If you don’t understand a term or acronym, pause the video and look it up to ensure the accuracy of your transcript. Alternatively, you could try one of the machine-learning APIs and then manually edit the result.

It is important not to discount the value of having a transcript available at any point, even if it is days, months or even years afterward, especially when the content is of educational or historical value. A number of Bitcoin developers have admitted to referring back to Aaron van Wirdum’s epic three-parter in Bitcoin Magazine on how Lightning works years after publication to remind themselves of the basics of the Lightning protocol. 

Having an available transcript will allow future academic papers, formal manuscripts and even patents to refer to a presentation. It will also make it more likely that the content is ranked higher on search engine results, meaning that more people get to see it online. Finally, it allows those with a hearing impairment to follow the discussion.

Bishop would like to raise funding for a “scribe fund” to pay for an individual (“not him,” as he says he is too busy with other work) with fast typing ability to travel and transcribe at different conferences as Bishop has been doing for a large part of the last decade. It would most likely need to be a developer or technical editor who is familiar with terms like “UTXO” and wouldn’t transcribe it as “You tea eks oh.” 

So if you have benefitted from Bishop’s archive of transcripts, consider making a financial donation to this project to ensure the next decade of Bitcoin presentations and discussions are preserved and disseminated just like the previous decade’s.

Thanks to Bryan Bishop for reviewing this article and for maintaining this historical and educational archive of Bitcoin transcripts.

The post Op Ed: Want to Learn About Bitcoin? Try Contributing a Transcript appeared first on Bitcoin Magazine.



via Bitcoin Magazine https://bitcoinmagazine.com/articles/op-ed-want-to-learn-about-bitcoin-try-contributing-a-transcript

No comments:

Post a Comment