Filecoin Foundation Digest

The Filecoin Ecosystem Digest is a digital magazine to showcase innovation unfolding across the ecosystem published by Filecoin Foundation.

Editorial articles, authored by individuals from the Filecoin ecosystem, will dive into the challenges of our current internet infrastructure, explore experiences and learnings within the community, and share visions for how decentralization and decentralized storage are creating the foundation for a better web.

Issue 1 - Sep 2024

The Inaugural Edition: All Systems Go

Featuring Guest Editor Jonathan Victor, Co-Founder, Ansa Research

Published in September 2024, the inaugural issue explores topics that impact the ecosystem –– from interplanetary resilience to AI-generated media and the data economy. The Digest highlights the voices behind the technology being developed in the Filecoin network –– as we embark on a collective journey towards a decentralized future.

Issue 1Article 1
Letter from the Guest Editor
Dear Readers, Welcome to the inaugural edition of _The Filecoin Ecosystem Digest: All Systems Go_ - a journey into the transformative power of the decentralized web and its far-reaching implications across various fields. In this issue, we aim to spotlight the intersection of decentralized infrastructure with a number of key emerging trends - from the role of decentralized architectures in AI, to the rise of the space economy, to how sustainability intersects with the future of data centers, and more. The Filecoin community is filled with brilliant innovators - and in this Digest, we aim to assemble many voices to showcase how Filecoin is playing a critical role for the future of technology. First, we have a piece from Molly Mackinlay of the FilOz team highlighting the key focus areas for the Filecoin ecosystem - expanding outside just the domain of storage. Next, we turn to AI - with two wonderful pieces. The first is from Sofia Yan and Ryan Matthew of the Numbers Protocol team - demonstrating the potential for decentralized technology to bring back trust to media via provenance and openness. This is followed by a piece from Tom Trowbridge of the Fluence team - arguing for why centralized LLMs make the case for decentralized AI and the importance of neutrality in the critical infrastructure we rely on. Our exploration doesn’t stop at Earth. Dietrich Ayala’s piece centers on the role of decentralized data in the space economy - and the importance of the current moment to embed our technologies into the emerging standards forming in this growing field. We ground ourselves with two pieces from Mara MacMahon from the DeStor team and Luke Behncke from Holon Investments focused on emerging trends in the enterprise sector and the future of sustainability within data centers. Then, we follow with a piece from Irma Jiang from ND Labs - reflecting on how Web3 offers a new paradigm of community based growth. Lastly, we look ahead to the future of decentralized storage and the emergence of the “Intelligence Economy” with a piece from Porter Sowell of the Filecoin Foundation. We hope this issue sparks your curiosity and inspires you to explore the boundless possibilities that decentralized infrastructure presents. As always, thank you for being a part of our journey. Warm regards,\ Jonathan Victor
Read Article
Issue 1Article 3
The Potential for Decentralized Technology To Rebuild Digital Trust
In April 2024, a 30-second video featuring Bollywood star Aamir Khan circulated widely across the Indian internet. Khan was seen criticizing Indian Prime Minister Narendra Modi for unfulfilled campaign promises and neglecting critical economic issues during his two terms. The video concludes with the election symbol for Congress, Modi’s rival party, and the slogan: “Vote for Justice, Vote for Congress.” Khan's immense popularity in India and the video's release during the general election period likely contributed to its explosive distribution. The video was entirely artificial and generated by AI, yet a sizeable amount of the electorate deemed it authentic. It was representative of a surge of deepfake content created to mislead the Indian public in the lead-up to the national general election, a concern further intensified as it seemed that the source of disinformation stemmed from the country's major political parties. Disinformation, particularly AI-generated deepfakes, is a growing global crisis in 2024. This year, 40 national elections are scheduled or have already taken place, impacting 41% of the world's population. Counterintelligence agencies and news media in all of these regions are now evaluating measures they can take to mitigate the impact of disinformation on the foundations of their democracies. ## Disinformation in the age of generative AI Fears of disinformation initiatives are not new. In the aftermath of the 2016 U.S. election, both the Senate Intelligence Committee and the U.S. intelligence community concluded that the [Russian government utilized disinformation attacks](https://www.politico.com/news/2020/04/21/senate-intel-report-confirms-russia-aimed-to-help-trump-in-2016-198171) to denigrate the public’s faith in the democratic process and attempt to influence the outcome of the U.S. Presidential election. What has changed, however, is the complexity, sophistication, and quantity of the attacks. By lowering the costs and increasing the effectiveness of information warfare initiatives, generative artificial intelligence has exacerbated an already serious problem to extremes that counterintelligence agencies were not prepared for. Using Gen AI tools, nefarious actors are not only able to create fake images, voices, and videos that are indiscernible by most from reality, but they can do it at a scale that is difficult for authorities to curb. The consequences of this deluge of disinformation have been not just a significant shift in the political landscape, but also a decimation of public trust in the media. [A survey](https://apnorc.org/topics/media-insight-project/) conducted by the American Press Institute and the Associated Press-NORC Center for Public Affairs Research revealed that 42% of Americans harbor anxieties about news outlets employing generative artificial intelligence to fabricate stories. Additionally, 53% of Americans have expressed serious concerns about the possibility of inaccuracies or misinformation being reported by news organizations during elections. In part, the cause of this loss of trust can be attributed to the proliferation of fake news on less reputable sites that lack the rigorous checks or standards of the formal mainstream press. These news sources have managed to convince readers that their unreliable news is equally credible. Once universally trusted media institutions are being put on an equal footing to hobbyist bloggers or sites that have no qualms about publishing glaringly fabricated stories. ## Why trust is eroding from institutions The breakdown in trust stems from being unable to determine where a piece of media comes from –– a lack of provenance. Provenance refers to the origin and history of a piece of content, a term often used in the context of art or antiques. In terms of digital content, provenance refers to data about the origin and history of a piece of content, including the location, date of creation, and any changes made throughout its existence. By knowing the origin and history of media, one can verify if it was authentically created or artificially generated. [A recent study](https://www.cip.uw.edu/2024/02/21/provenance-synthetic-media-trust-perceptions/) conducted by the Center for an Informed Public (CIP) highlighted the importance of provenance in understanding media accuracy and trust. When users were exposed to provenance information for a piece of media, they were able to better calibrate their trust in the authenticity of the content and their perceptions of its accuracy. For deceptive content, users were able to successfully identify the content as less trustworthy and less accurate when provenance was disclosed. ## How we can add provenance to digital media One way to add provenance is by watermarking content, such as how [C2PA](https://c2pa.org/) (the Coalition for Content Provenance and Authenticity) proposes. This method involves inserting a watermark with a unique identifier into the digital content of the image and then recording that unique identifier. However, there are challenges in ensuring that malicious actors do not strip these watermarks (as noted by* [MIT Technology Review](https://www.technologyreview.com/2023/08/09/1077516/watermarking-ai-trust-online/)*). Another approach is to leverage blockchains –– record hashes of content and data on a distributed ledger to create a verifiable log anyone can inspect. Numbers Protocol uses this method in its approach to trust on the web, allowing data to be stored securely while also maintaining its integrity. Content history records on the blockchain are both immutable and exist in perpetuity. The permanent accessibility of these records ensures that the data has not been tampered with, and by the public nature of blockchains, anyone can verify the origin of a piece of content, be it an individual or an organization. ## Case study: 2024 Taiwan presidential election The adoption of blockchains to solve these problems isn’t a distant future –– it’s already happening. Before the Tawainese Presidential Election, in January 2024, there was widespread societal concern regarding the threat of disinformation campaigns. Due to Taiwan’s difficult political situation, attacks primed with creating confusion and sowing distrust were expected. Numbers Protocol collaborated with the [Starling Lab](https://www.starlinglab.org/), Taiwanese news outlets, and journalists to show how decentralized technologies can be utilized to rebuild media trust. We launched a pilot study using the [Capture Cam App](https://captureapp.xyz/), which allows content recorded by partnered media and civil society groups to be marked and logged as having been authenticated by those groups. The media captured by users was registered on the blockchain, generating a traceable and secure record of the election's vital moments. One example was a civic society member who used Capture Cam to record the election counting process at the polling station to counter disinformation that there was widespread vote counting fraud. This provided a [permanent record of the counting process](https://verify.numbersprotocol.io/asset-profile/bafybeifsvsdiu6srknxvcs3kpc2cssucfea6uysf36elmeq2k3qk5sp6re?nid=bafybeifsvsdiu6srknxvcs3kpc2cssucfea6uysf36elmeq2k3qk5sp6re) on the blockchain. The Numbers Protocol created a digital provenance trail, and all of the metadata and assets were stored securely on the Filecoin Network, with multiple copies distributed globally and cryptographic proofs being submitted continually to show the ongoing integrity of the data. ## Conclusion The key to regaining trust lies in empowering audiences to authenticate and verify the genuineness of their news. In terms of provenance, blockchain outperforms other watermark technologies due to its immutability, which can effectively show whether content has been tampered with. Provenance provides a root of trust by allowing readers to verify that what they read is authentic and dependable. They can, with certainty, know that this is an article written by someone on a specific date and time, rather than just generated by AI. This is the foundation needed to regain public trust in media as a reliable fourth estate required for a functioning democracy.
Read Article
Issue 1Article 4
Interplanetary Resiliance
The vision of Filecoin is of verifiably robust storage of data scaling to meet the needs of humanity over centuries. In order to achieve this vision we need to _literally_ aim for the stars. We’re at a turning point in the intersection of space communications technology and the development of the internet itself. We are seeing a massive increase in satellites and space networks, and multiple industries priming for the changes that will bring. Back on the ground, the terrestrial internet is beyond human (with many non-humans on the way!) capacity and also constantly under threat from both [climate disasters](https://www.scientificamerican.com/article/global-internet-connectivity-is-at-risk-from-climate-disasters/) and [war](https://www.forbes.com/sites/alexknapp/2024/03/08/undersea-internet-cables-are-vulnerable-targets-in-future-wars/). For Filecoin to be successful, one key step along that path is helping the world understand the importance of open and cryptographically verifiable systems. How we consume, share and store data will change radically over the next few years –– which presents an opportunity to set a course for open, interoperable communications standards with these primitives at their core. ## Open Source, Open Standards, and Uncle Sam Space is increasingly recognized as an important frontier. While [headlines are dominated by a handful of companies](https://electrek.co/2024/03/08/tesla-shipping-cybertruck-tent/), the substance of the industry is driven by government contracts, both civilian and military. The [lunar economy](https://www.nasa.gov/humans-in-space/growing-the-lunar-economy/) is growing, with no end in sight. Investment continues to grow, and missions being planned today are sometimes targeting well into the 2030s. ![The chart shows the total investment in billions of dollars on the left vertical axis, ranging from $0 to $15 billion, and the number of deals on the right vertical axis, ranging from 0 to 300. The horizontal axis lists the years from 2012 to 2021.](/assets/images/investment-start-up-space.webp "Investment in Start-Up Space Companies") ([Prometheus Space, 2022](https://prometheusspace.com/the-state-of-the-space-startup-companies-in-2022-and-the-way-forward/)) ![A stacked bar chart with a line graph showing annual investment sources from 2014 to 2023. Investment types include Other, Corporate, Venture Capital, and Angel/Individual. Both total investment and the number of rounds generally increased, peaking around 2021.](/assets/images/annual-investment-source.webp "Annual Investment Source") ([ARS Technica, 2024](https://arstechnica.com/space/2024/01/taking-stock-private-investment-in-space-companies-rebounded-in-2023/)) As is renewed activity from the public sector (e.g. the Artemis mission). In recognition of this, the government has been pushing heavily towards open modular systems at all layers of the [stack](https://arstechnica.com/space/2024/03/the-us-government-seems-serious-about-developing-a-lunar-economy/). This is a stark departure from expensive, proprietary, long-term vendor contracts of old: a change [now enshrined in law](https://uscode.house.gov/view.xhtml?req=granuleid:USC-prelim-title10-section4401&num=0&edition=prelim). This is critical as the industry is _legally_ mandated to work towards open and interoperable standards –– which naturally dovetails with the technologies that have been developed in the broader Web3 community. If we want to help shape the direction of this industry, we need to be a part of the conversation. ## Why IPFS Two words: Cryptographically verifiable. Wait, two more words: Transport agnostic. Hm, actually those four words unlock 1.5 more words: Locationlessness. Five and half good words which, in the end, describe characteristics of IPFS that combine to form a powerful set of capabilities in the context of the harsh conditions of vacuum, radiation, latency, and [everything that can go wrong](https://www.theverge.com/2024/5/24/24163846/starlink-succumbs-to-russian-electronic-warfare) under the sol. IPFS was _designed_ for when things go wrong. To unpack how each of these properties are critically useful in space: ### Cryptographically Verifiable One scenario that's been discussed as a good illustration of what IPFS brings to the table is frenemy satellite communication –– an increasingly important challenge given the rapid increase in space traffic expected over the next decade. Cryptographically-verifiable content-addressing of data means that messages can be exchanged with confidence that they have not been modified by radiation, equipment malfunction, or even a frenemy that you're sharing space with, while still allowing emergency communications. ### Transport Agnosticity Given the mandate for modularity, there’s a need to have standards and interoperability at the application layer, without needing to know too much about the hardware and transports available. IPFS neatly handles this transport agnosticity because we can _cryptographically verify_ the content as it lands at its destination –– enabling the portability, modularity and extensibility we are used to in terrestrial internet application programming. ### Locationless-ness Once you can trust that you will receive –– cryptographically, verifiably unmodified –– what you ask for, you no longer have as much concern about where you get it or who from. Locationless-ness means that applications can interact with and operate on data from anywhere reachable, without any prior configuration or knowledge required to find data given the application knows the content address. This can be particularly useful in sharing positioning data in a satellite mesh or broadcasting emergency environmental condition data. ## Mission Accomplished The Filecoin Foundation, Protocol Labs, Lockheed Martin, and Little Bear Labs collaborated to show that IPFS can deliver on this vision, and after a couple of years of planning, design, and testing, the mission completed in October 2023. The mission goals were to: - Demonstrate the suitability and benefits of the IPFS protocol for satellite and space communication. - Lay the groundwork for eventual large scale storage in lunar/space use cases. - Introduce open standards for decentralized communication and data access in space. - Introduce and demonstrate cryptographic verifiability as a key component in open, interoperable space applications. The demonstration was completed in the fall of 2023, putting IPFS into space around its 10th birthday. The mission was successful, which set the stage for future deployments of the protocol in real world use. After simulation tests, ground tests, and hundreds of meetings, the IPFS-based application was deployed to both a satellite in geosynchronous orbit and a ground station in Littleton Colorado in the U.S. for a series of tests and transmissions. ![Diagram showing the operation of LINUSS satellites running Myceli with four satellites in orbit and a ground server. Steps illustrate the satellite establishing a radio link, maintaining a strong Line of Sight (LOS) link for data transfer, downloading data, and stopping data transfer when the radio link is broken. Connections to IPFS interface and the open web are indicated.](/assets/images/linuss-satelite.webp "LINUSS Satellite Running Myceli") The LINUSS satellite is designed to be updatable, and is described as “[the size of a four-slice toaster](https://news.lockheedmartin.com/linuss-small-sats-mission).” ![A satellite orbiting near Earth alongside a floating red toaster. Another satellite is visible in the background. The phrase "Yeah, pretty close I guess" is written in white text next to the Earth.](/assets/images/space.webp "Satellite Near Earth with Toaste") You can read more about the mission in the [official Filecoin Foundation announcement](https://www.fil.org/blog/filecoin-foundation-successfully-deploys-interplanetary-file-system-ipfs-in-space). ## Distant Horizons? IPFS isn't quite _interplanetary_ yet, but it is certainly extra-terrestrial. This is a remarkable accomplishment which shows great promise in IPFS as a part of a resilient, open source, open standards, and interoperable future in space communications. Cryptographic verifiability is an increasingly important primitive for robust, reliable internet communication. This demonstration shows that we can implement content addressing in the modern space technology stack, and deploy it successfully. This sets the stage for more systems building with cryptographic verifiability, like Filecoin, to be competitive in these emerging markets –– and continue to serve humanity’s information, wherever and whenever it goes.
Read Article
Issue 1Article 5
How Centralized LLMs Prove the Case for Decentralized AI
Big Tech has raced to roll out conversational AI models since the launch of ChatGPT in late 2022, but the companies building the models seem unable to resist manipulating them to reflect their company culture or to meet a particular political or ideological agenda. Because the models are closed source black boxes, the training data and the model mechanics are hidden, giving users no information about how responses are generated. The alternative is open, transparent models run and trained on decentralized systems that will be more trusted than the closed, corporate models we see today. ## Bias in centralized LLMs Since before the launch of ChatGPT, groups have warned about the dangers of bias in closed systems. These warnings often came from progressive critics of AI: those who said large language models (LLMs) were just “stochastic parrots” also [warned that](https://dl.acm.org/doi/abs/10.1145/3442188.3445922) they “overrepresent hegemonic viewpoints and encode biases potentially damaging to marginalized populations.” Ironically, some of the strongest reactions to ChatGPT’s biases came from the other side of America’s political divide. Users of ChatGPT quickly noticed that the model could discuss Russian interference in the 2020 election, but would not respond when queried about Hunter Biden's laptop, which was also widely reported. Research has supported the allegation of bias: “We find robust evidence that ChatGPT presents a significant and systematic political bias toward the Democrats in the US, Lula in Brazil, and the Labour Party in the UK,” noted one [study](https://link.springer.com/article/10.1007/s11127-023-01097-2). Given the human element in constructing models, some element of bias is unavoidable, but when models are trained opaquely and then marketed as ‘neutral,’ users can be unknowingly subject to the bias of either the training data or the trainers –– bias that they are unable to inspect. And the bias can go beyond the data inputs used. In early 2024 Google Gemini’s image creator received such scathing reviews that it was quickly ‘paused’ for ‘updates.’ In Google’s quest to avoid offending what it saw as mainstream political and social sensitivities, it forced its model to insert diversity in nearly all images, resulting in outcomes that are preposterously inaccurate, such as African and Asian Nazis and a diverse group of American founding fathers. Not only were these images wildly inaccurate, they were also offensive. Most importantly, they lifted the veil on the hidden manipulation risks inherent in proprietary, closed AI models developed and run by companies. ## How do the models work? All models are subject to the biases of their creators, but the image prompts to Google’s Gemini model are also run through an additional set of rules designed to match what Google believes are acceptable or desirable answers, such as increasing diversity. These rules may be well-intentioned, but they are hidden from users. With Gemini, the diversity rules were so obvious and so clumsy that the output was quickly the subject of global ridicule, as users vied to generate the most absurd result. Because image requests rely on the AI model to generate results, we know a similar bias and likely similar rules underlie every answer. And thanks to the image results, there was no question as to the bias which was obvious for everyone to see, but given the closed nature of these models, this manipulation is much harder to discern with just text responses. ## Open, transparent AI is the answer For LLMs to be widely trusted, rather than being trained and manipulated behind closed doors by corporations, they need to be built on a transparent foundation openly inspectable and free from opaque biases: this is only possible with open source models, that are provably trained on specific data sets. A number of open source projects such as [Hugging Face](https://huggingface.co/), which has raised $400 million, are making great progress in building, developing and training these open models. These models can be open source and available for anyone to see, and can run on a decentralized network of computers, like the Fluence platform, that proves each result was executed against the model without manipulation. Highly resilient decentralized networks currently exist for payments and storage, and a number of GPU marketplaces like [Aethir](https://aethir.com/), [Akash](https://akash.network/), [Gensyn](https://www.gensyn.ai/), and [Io.net](https://io.net/) are being optimized to train and even run AI models. Decentralized networks are necessary because they operate globally on a wide range of infrastructure with no single owner, making them very hard to threaten or shut down. This quickly growing ecosystem includes GPU marketplaces for training and running models, platforms like Filecoin for storing the data, CPU platforms like Fluence for running models with provability, and open tooling for developing the models. With this infrastructure, open models will be a powerful force. ## Is this realistic? Google and Microsoft have spent billions of dollars developing their LLMs, which seems like an insurmountable lead, but we have seen these huge companies outcompeted before. Linux overcame Microsoft’s Windows’ decade head start and billions of dollars to become the [leading operating system](https://en.wikipedia.org/wiki/Linux#History). The open source community worked together to build Linux, and we can expect a similar level of success in building and training open-source LLMs –– especially if we have a common platform that facilitates development. One near term strategy is that rather than compete head-to-head with the monolithic LLMs, like ChatGPT, smaller, domain-specific models with unique data sets may emerge that are more trusted in their particular topics. For example, we could see a children's oncology model that has exclusive use of the data from top children’s hospitals, and a single frontend could pull from a wide range of these domain-specific models, replicating a ChatGPT experience but on a transparent and trusted foundation. Model aggregation is a viable path to creating a trusted alternative to corporate LLMs, but as important as building and training AI is the running of the model in way that is verifiable. No matter the inputs, scrutiny is on the outputs, and any organization running a model will be subject to pressure. Companies are subject to influence from politicians, regulators, shareholders, employees, and the general public, as well as armies of Twitter bots. But decentralized models, hosted by any storage provider anywhere in the world, run on an open, decentralized compute network — like [Fluence](https://fluence.network/) — which can process auditable queries, is immune from both hidden bias and censorship and will be far more trustworthy. Big Tech is aware of its bias problem, but it will have a very hard time supporting models that give answers unpopular with its employees, governments and customer constituencies, even if accurate. OpenAI will take steps to reduce the obvious bias and Google will update Gemini to be more historically accurate, but hidden bias in both will remain, and we should use this revelation of Big Tech’s manipulation as a welcome warning about the risks of relying on any centralized company developing and running AI models, no matter how well-intentioned. This is our call to build open, transparent, and decentralized AI systems we can trust.
Read Article
Issue 1Article 6
Enterprise Storage Market Insights From the Field
At DeStor, we are dedicated to revolutionizing decentralized storage solutions by seamlessly integrating them with enterprise applications. With over 50 years of combined experience in enterprise data storage, our team has a deep understanding of this segment of the decentralized storage market. We specialize in connecting data owners with Filecoin storage providers through an intuitive and transparent platform. Why focus on the enterprise market? Enterprises today are storing an average of 10 petabytes of data, making even small gains in this sector translate into significant revenue for Filecoin storage providers. Since launching in April, 2024, we’ve secured 3 petabytes (PBs) of paid data storage deals and have an additional 39 PBs in the pipeline. Our early success with enterprise customers is no coincidence; it’s the culmination of months of research, conversations, and meetings with IT decision-makers. In this article, we share the key feedback and insights we’ve gathered from these IT executives. Our goal is to empower projects within the Filecoin ecosystem to accelerate their go-to-market strategies and successfully engage enterprise customers. ## Privacy and Security Top Concerns Decentralized storage is of keen interest to enterprises looking to enhance resiliency and incorporate verifiability into their flows. However, survey feedback from our storage industry executive dinner series in Hong Kong, New York and Chicago highlighted two primary concerns gating adoption: data security and privacy. 50% of IT executives identified data security and privacy concerns as the main barrier to integrating decentralized storage within their organizations, despite clear breaches from traditional public cloud storage providers like [Microsoft](https://www.breaches.cloud/incidents/o365-2023/) and [AWS](https://hackread.com/black-hat-usa-2024-aws-bucket-monopoly-account-takeover/). ![A donut chart illustrating the primary barriers to integrating decentralized storage within organizations. The largest segment, representing 50%, is labeled "Security & Privacy." Other barriers include "Lack of Strategic Alignment" at 20%, "Organizational Cultural Resistance" at 10%, "Regulatory Compliance" at 10%, and "Technical & Interoperability" at 10%.](/assets/images/pie-chart-decentralised-storage.webp "Primary Barriers to Integrating Decentralized Storage in Organizations") This insight revealed that targeting IT leaders with security titles at their organizations (one of our initial buyer personas) is off the mark. These individuals are primarily concerned with preventing malware and ransomware from entering through employee actions. Focusing on infrastructure-based titles (CTO, IT Director, etc.) responsible for recovery post-attack will likely prove more fruitful as these executives are more likely to resonate with robust data recovery and security solutions provided by decentralized storage technologies. ## The AI Opportunity AI continues to be a hot topic with IT leaders. While the aforementioned data privacy and security concerns are delaying their adoption of decentralized storage, 40% of survey respondents agreed that integrating decentralized storage and AI is a necessary evolution to stay competitive. This confidence inspiring feedback aligns with exciting decentralized AI developments in the [Filecoin ecosystem](https://x.com/Filecoin/status/1812907959338606611), [investment community](https://www.grayscale.com/research/reports/ai-is-coming-crypto-can-help-make-it-right), and [beyond](https://machinelearning.apple.com/video/web3-decai). Even with the promise of decentralized AI, IT leaders still have concerns about the technical challenges faced when integrating decentralized storage and AI. 30% of respondents expressed concern about interoperability within their existing IT infrastructure, and 60% are grappling with the potential for increased complexity in [data governance](https://datagovernance.com/defining-data-governance/) (the overall management of the availability, usability, integrity, and security of an organization's data). While integration and governance concerns are absolutely valid, we believe they can be eased through a combination of education and high-touch technical support. We’ve seen Filecoin storage providers find success by developing proof of concepts (POCs) to win over IT decision makers. And we’ve personally experienced the impact that compelling [educational content](https://destor.com/resources/videos/datadrop) can have in unlocking paid storage deals, especially with innovation focused teams within the enterprise. ## Summary The feedback we gathered from IT executives reveals that enterprise adoption of decentralized storage is gated by concerns around data security, privacy, and integration complexity. However, IT decision makers recognize the promise of AI and the role decentralized storage will play in keeping their organization competitive. Successfully engaging with enterprise customers will require clearly defined buyer personas, impactful educational content and the ability to deliver high-touch technical support and POCs. As marketers at heart, we’d be remiss to not have a POV on messaging. Yes, features and benefits-based marketing can move the needle, but it’s increasingly challenging to cut through the clutter of the noisy enterprise IT media environment. Messages that resonate focus on being different versus. being better, creating opportunities to deliver [impactful](https://www.forbes.com/councils/forbescommunicationscouncil/2023/03/17/why-your-marketing-strategy-should-appeal-to-emotions-not-logic/) emotional appeals. As organizations continue to navigate the intricacies of integrating AI and decentralized storage, we firmly believe the Filecoin ecosystem is on track to build the foundational stack for the next generation of the cloud.
Read Article
Issue 1Article 8
The Future of Data Lies at the Green Edge
As we move into the era of digital autonomy, driven by machines and machine-generated data, we need to vastly improve the world’s capacity to manage and process data at a scale previously unimaginable. At Holon, we believe the intersection of data infrastructure and green energy needs to be re-imagined and re-focused on commercial applications to enable significant, sustainable productivity gains and information resiliency for humanity. ## The exponential growth of data Through extensive research and analysis of data trends, [The Holon Data Report](https://holon.investments/the-holon-data-report-part-6-filecoin-your-guide-to-the-opportunity-in-the-key-building-block-of-web-3-0/) conservatively forecasts that over 75,000 Zetabytes (ZiB) per annum of data will be created by 2040, largely driven by machine-generated data. This represents an exponential shift from the total of human-generated data that pushed us from 1 ZiB in 2010 to over 100 ZiB 10 years later. Today, approximately 4 ZiB of enterprise storage is available globally. Based on our data projections, the world will require closer to 1,000 ZiB of enterprise storage, or some 250 times larger than today’s capacity, in the next five years. And even at this level of capacity, the demand for data storage will outpace the capacity for data storage, as energy consumption to meet these data demands is much more than what our current infrastructure can support. Alongside these exploding volumes of data, the data business model underpinning the world’s largest companies generates over US$300 billion per annum in revenue. In this context, the resourcing for the world’s data growth is now an urgent issue. ## Meeting future data needs – Web 2.0 vs Web 3.0 The Web 2.0 technology stack, maintained by air-cooled hardware solutions and grid-supported metro data centres, cannot physically afford to support the level of data creation we anticipate. Simply put, the cost and energy requirements will be unsustainable in a world dominated by machine-generated data (see Part 3 of the Holon Data Report). However, Web 3.0 data networks represent a profound structural shift in the way we can run data at scale, such as: - **Decentralised Infrastructure** - Moving away from having to trust nodes (e.g., AWS, Google, Microsoft) to trusting a network where data is verifiable, significantly reducing storage and compute costs and resources; - **Energy Transparency** - Providing complete transparency down to the “byte” – allowing precise energy efficiency calculations for data infrastructure solutions that are not achievable in Web 2.0; and most importantly, - **Data Ownership** - Enabling ownership of data through a structure that allows for every data point to become a digital bearer object. In pursuit of these capabilities, Holon is currently building a bridge from traditional energy architecture to Web 3.0 through two main infrastructure solutions: - Green Modular Data Centres: Utility-scale, non-latency dependent, 100% renewable-powered centers that interact with the grid to support renewable energy transition; and - Green Edge Micro-Data Centres: Distributed, low-latency solutions that are 100% renewable-powered. Building distributed cloud solutions, like our Green Data Centres, where the storage providers own and control the solar, batteries, and IT immersion-cooled hardware, is the only way that we can begin to reduce the energy consumption needed to process a rapidly expanding amount of data. And while this technology is built on a Web 3.0 technology stack, it also has the advantage of servicing Web 2.0 – and beyond. ## The future of data economics is green edge The crux of all of this is decentralised, distributed data infrastructure – bringing closer proximity between the user and data source. In a world of exponential data creation, these edge data solutions will become increasingly valuable as the key generator of efficiencies and economic benefits. Web 3.0 data networks such as Filecoin are, by design, the ultimate edge solution, with the capability to horizontally and vertically integrate distributed and decentralised infrastructure, energy, and data. Additionally, storage providers, who are essential in driving the critical infrastructure of Web 3.0 open data networks, have the potential to access two revenue streams (fiat and digital asset rewards) across traditional Web 2.0 data services and emerging Web 3.0 consensus services and verifiability. We believe that this killer combination produces the required economics for Web 3.0 networks to disrupt the current data model, significantly bootstrap their networks, and provide the world with sustainably green, decentralised, and distributed data solutions. In the future, Web 3.0 data networks will facilitate large-scale data trusts to solve some of humanity’s greatest challenges. Green data storage and compute providers will increase in value as they drive the next evolution of the cloud model, where energy and cost efficiency are not at odds but rather aligned features. Inevitably, market forces, consumer demands, and business efficiency push the future of data to this ideal – to the green edge.
Read Article
Issue 1Article 9
The Intelligence Economy Is Coming
Imagine a world where with a swipe of your finger your data could be used to help train an LLM. A world where you were compensated in real time for your data contributions. A world where you are in control over who has access and can benefit from your digital history and behaviors, whether you are sharing insights with the person next to you on an airplane or another business interested in your research. This reality is not only possible but inevitable in the new Intelligence Economy. What is the Intelligence Economy? The Intelligence Economy is when data takes its place in the global economy as one of, if not the most important digital asset. It is the period when both individuals and businesses understand that their data has value, and they begin to take control of their asset to share, protect, or even monetize its use. When society begins to treat data as an asset, it is no longer the “black box” that only a few statisticians can understand or interpret. Rather, it evolves to be humanity’s avenue for greater autonomy, heritage, and progress. The concept of “data as the new oil” has been around for over a decade and has largely been misunderstood. Coined in 2006 by Clive Humby, it actually intended that data needs to be “refined” for it to have value. The “data is the new oil” catchphrase has since been picked up by Big Tech executives implying that data is a commodity that can be monetized. To date, data monetization has existed mostly for private corporations, where in 2023 [$3.4B](https://www.grandviewresearch.com/industry-analysis/data-monetization-market) was transacted annually, predominantly within the silos of AWS, IBM, Oracle, and GCP. Data monetization is not the Intelligence Economy (although it’s a part of it). Today, technological advancements are converging simultaneously to make the Intelligence Economy a reality: AI, Blockchain, Chips, and Data (The ABCDs). Mark Yusko, Managing Partner of Morgan Creek Capital, who coined the acronym “ABCD,” goes so far as to say that “the best investment opportunities over the next decade will be where the [ABCDs](https://www.youtube.com/watch?v=smzapXTOz4E&t=5s) intersect.” The Intelligence Economy is the most powerful opportunity where all four of the ABCDs combine to usher in a new data revolution. The convergence of the ABCDs are what makes the Intelligence Economy possible: - Without AI: There would be no buyer. - Without Blockchain: There would be no trust or integrity. - Without Chips: There would be no scale. - Without Data: There would be no asset. ## How does the Intelligence Economy get started? It will start with Enterprise. It's logical that businesses are the first movers to assess data value given the extremely tangible profit and IP protection motivations. **Did you know that:** - 5% of all data and 25% of high-quality sources in AI training sets (C4, RefinedWeb, Dolma) are now restricted. **Which is the result of actions by publishers and other data holders who have:** - Set up paywalls or changed terms of service to limit AI data use. - Blocked automated web crawlers from companies like [OpenAI](https://www.linkedin.com/company/openai/), [Anthropic](https://www.linkedin.com/company/anthropicresearch/), and [Google](https://www.linkedin.com/company/google/). - Begun charging for data access (e.g., [Reddit](https://www.linkedin.com/company/reddit-com/), Inc., StackOverflow). - Taken legal action for unauthorized use (e.g., [The New York Times’ lawsuit](https://nytco-assets.nytimes.com/2023/12/NYT_Complaint_Dec2023.pdf) against OpenAI and Microsoft). ([Source: Stefaan Verhulst, PHD](https://www.linkedin.com/in/stefaan-verhulst/)) The Intelligence Economy will only evolve as tools, marketplaces, and protections fall into place that will encourage users of all types to leverage their data assets to the fullest desired extent. [Michael Clark](https://www.linkedin.com/in/futureofmichael/), author of the upcoming book _The Data Revolution - The Rise of an Asset_, shares a framework for how the Data Revolution will shape all aspects of our lives. “AI has forced us to reconsider our approaches; realizing that past thinking and habits are no longer enough to create a future that is beyond better. Data ownership is now within reach and both individuals and businesses have started to assess the intrinsic and extrinsic value of their data.” Since the inception of public cloud, businesses have stored most of their data adhering to a “just-in-case” model. When data volumes were relatively small, public cloud made it easy and cheap to store it all. However, those [costs are mounting](https://www.techradar.com/pro/businesses-are-spending-huge-amounts-of-cloud-storage-funds-on-fees) because of the sheer amount of data being stored and rising fees. Much like someone storing their personal items in a physical storage unit, eventually the conversation turns to value. _Do I need this?_ _Is it important to me or someone else?_ What tends to happen next is that owners will purge everything that doesn’t hold value (intrinsic or extrinsic). Owners will continue to hold sentimental items in vaults –– rarely accessed but secured –– and migrate valuable assets to marketplaces to be monetized. Data is going through this transformation as we speak. ## What might this mean for the future? With the amount of high-quality data already being pulled off the internet, the market demand for data will begin to rapidly accumulate to accommodate the growing needs of AI. As both buyers and sellers become more vocal, the market will materialize in the areas that can generate a clear signal on the data’s quality and uniqueness. The Web3 space will enable the Intelligence Economy with its unique data quality-preserving capabilities: from data attribution, authentication, and provenance, to ownership, interoperability, and integrity. As these solutions begin to harmonize, groundbreaking use cases will emerge. **Academic research institutions** will no longer be focused on "[publish or perish](https://fil-foundation.on.fleek.co/hosting/FF-CaseStudy-DeSci.pdf)," but on selling university-branded data through AI marketplaces and decentralized data exchanges. This will create a royalty stream of cash flows for both the professor/PhD and the University. Publishing may still have a role, but perhaps more as marketing materials to explain what the data is and why it's relevant. **Web2 social media platforms** will rapidly pivot their business models to protect user data and intellectual property, empowering users to monetize their information directly on their platform. While the major platforms we know today will never fully decentralize, they will at least provide the illusion of control and monetization for their users to remain competitive with rising players that have these benefits baked in. Alternative Web3 native browsers and apps will emerge that make data protection, control, and monetization the leading features of their product. **Enterprise** will rush to contribute and promote their data on data marketplaces, platforms that match buyers and sellers of data for the purpose of unlocking business value (e.g., [Nukl.ai](https://www.nukl.ai/) and [SingularityNet](https://singularitynet.io/)). As companies become more comfortable with the controls, security, and transparency of their data when participating in an open market, enterprises will begin to see data storage as one of their leading profit generators instead of the cost center it functions as now. While data marketplaces exist today, many are provided by the major tech powerhouses of AWS, Google, IBM, and Snowflake –– another example of Big Tech centralizing the power of data. These markets will move on-chain as they become less siloed, more permissionless, more programmable, and more global. **Wall St.**: Much like your home mortgage, frequently transacted data will be securitized by Wall St. Datasets will have a face value and predictable cash flows, which will launch an array of financial products not seen since Salomon Brothers created the mortgage-backed security in 1977. ## The role of Filecoin To ensure confidence in this level of transaction and securitization, the data must have the utmost integrity and quality assurances to support both buying and selling in a liquid marketplace. Here’s what data integrity will look like in the Intelligence Economy. - **Ownership**: A way to embed clear creator ownership into data, generating digital intellectual property rights visible to consumers. Example: A Digital Identity (DiD) solution like [Adobe’s Content Attribution](https://blog.adobe.com/en/publish/2021/10/26/adobe-unleashes-content-attribution-features-photoshop-beyond-max-2021). - **Provenance**: An on-chain proof of data origin and ownership lineage throughout the life of the dataset. Example: [ClimateGPT](https://climategpt.ai/), an EQTY Labs solution built with Filecoin & Hedera. - **Immutability**: Confidence that the data cannot be changed. Solution: [InterPlanetary File System (IPFS)](https://ipfs.tech/) - **Verifiability**: Continuous proof that the data is still on-chain and has not been altered. Solution: [Filecoin](https://filecoin.io/) - **Decentralization**: No single point of failure or subjugation to Big Tech’s control. Solution: [Filecoin](https://filecoin.io/) Filecoin and IPFS, which Filecoin is built on top of, are essential to ensuring data integrity and quality assurance of the intelligence economy. Without decentralized data infrastructure, the data economy will never reach its full potential –– leaving Big Tech to clean up all the potential profits. Only open and decentralized ecosystems and protocols can facilitate the trustless and permissionless nature that a global data economy demands. ## Closing words > The greatest wealth is created by being an early investor in innovation. Making that investment requires believing in something before the majority of people understand it. You will be mocked, ridiculed & criticized for your non-consensus action. It is absolutely worth it! –Mark Yusko, Managing Director of Morgan Creek Capital The Intelligence Economy will be one of the largest economic, technological, and social transformations over the next 30 years. According to [Market.us](https://www.linkedin.com/pulse/blockchain-ai-market-reach-usd-2787-million-2033-markets-us-digvc/), the global blockchain AI market size is expected to be worth around $2,787M by 2033, up from $349M in 2023 –– and this is all before mass adoption truly occurs. The technology is there, the components are already in production, and integrations are occurring rapidly. Be curious; be excited; be first. Welcome to the Data Revolution!
Read Article