Who Should Control AI? Inside the Global Fight Over Big Tech, Copyright, and Open‑Source Models

As governments, creators, and open‑source communities clash with Big Tech over AI training data, copyright, safety regulation, and the future of open models, the outcome will determine who controls the most capable AI systems and how their benefits and risks are shared across society. This article unpacks the copyright lawsuits and licensing deals, new safety and transparency rules in the EU and US, and the polarizing debate over open‑source vs. closed AI, explaining what is at stake for innovation, competition, and democratic oversight.

The rapid rise of generative AI has turned what once sounded like an abstract policy problem—“how should we regulate AI?”—into a concrete power struggle playing out across courtrooms, parliaments, and developer forums. At the heart of the debate are three intertwined questions: who controls the most advanced models, what data they can be trained on, and what safeguards are required to keep them safe, fair, and rights‑respecting.


In late 2023 and 2024, lawsuits from news organizations, book authors, music labels, and image libraries against major AI labs intensified, while regulators in the EU, US, and UK pushed forward new transparency and risk‑management rules. At the same time, open‑source communities released increasingly capable models—Meta’s LLaMA family, Mistral’s models, Stability AI’s image systems, among others—raising fears of uncontained misuse but also hopes of breaking Big Tech’s dominance.


“We are in the middle of a decisive moment. The governance choices made now will define who benefits from AI—and who bears the risks—for decades.”

— Lina Khan, Chair of the U.S. Federal Trade Commission

Mission Overview: What Does It Mean to “Regulate Big Tech AI”?

Regulating Big Tech AI encompasses three overlapping missions:

  • Copyright and data governance – deciding what data can be used for training, under what legal basis, and with what compensation or opt‑out rights for creators and users.
  • Safety and accountability – requiring testing, monitoring, and disclosure of AI capabilities and risks, especially for systems that could affect critical infrastructure, elections, or public health.
  • Market structure and openness – ensuring that regulation does not merely cement the power of a few companies, but instead preserves room for open‑source ecosystems, academic research, and small firms to thrive.

These goals often collide. A licensing deal that secures fair compensation for a major publisher, for instance, may be affordable for a trillion‑dollar firm but unattainable for a startup or academic lab. Similarly, stringent compliance requirements might deter reckless deployment but also raise barriers that only the largest incumbents can meet.


Modern large language models (LLMs) and generative systems are trained on massive corpora scraped from books, news articles, code repositories, social media, and image or audio archives. This practice has triggered a wave of litigation and policy proposals centered on whether this use is covered by “fair use” or analogous exceptions, or whether explicit consent and licensing are required.


Major Lawsuits and Licensing Deals

Since 2023, several high‑profile cases have reshaped the landscape:

  1. News organizations vs. AI labs – Publishers like The New York Times sued OpenAI and Microsoft, alleging unauthorized use of their archives for training and claim that AI outputs can substitute for their content. Reporting in outlets such as Recode and Wired has also documented parallel licensing deals, where some publishers accept payment in exchange for API or archive access.
  2. Book authors and image creators – Class actions by authors and visual artists argue that training on copyrighted works without consent or compensation infringes their exclusive rights, especially when generated outputs mimic individual styles or enable text‑to‑image recreation of protected characters and scenes.
  3. Music and audio – Labels and rights‑holders have challenged AI systems that can imitate particular voices or musical styles, raising novel questions about voice rights and “style cloning.”

“The central legal question is whether ingesting copyrighted material to learn statistical patterns is more like reading a book to learn a language—or more like copying that book to resell it.”

— Legal analysis summarized by the Electronic Frontier Foundation

Emerging Approaches to Data Governance

In response, AI companies and policymakers have begun experimenting with:

  • Opt‑out schemes where creators can exclude their works from future training runs.
  • Collective licensing models reminiscent of music royalties, potentially pooling payments from AI providers and distributing them to rights‑holders.
  • Data provenance tools such as watermarking and content credentials to mark AI‑generated material, making it easier to track reuse and reduce confusion.

Critics warn that secretive bilateral deals between large publishers and Big Tech risk locking in privileged access. Independent journalists, smaller outlets, and individual creators may be left with weaker bargaining power and fewer transparency guarantees around how their work is used.


AI Safety & Regulation: From Risk Tiers to Red‑Team Testing

Safety regulation aims to reduce harms such as disinformation, systemic bias, privacy violations, and potential assistance in cyber or biological threats, while still enabling beneficial innovation. Different jurisdictions have taken distinct—but increasingly convergent—approaches.


The EU AI Act and Global Templates

The EU AI Act, finalized in 2024, is the most comprehensive attempt to date. It introduces:

  • Risk‑based categories (unacceptable, high‑risk, limited risk, minimal risk) with corresponding obligations.
  • Rules for “general‑purpose AI systems” (GPAI), including transparency about training data sources, technical documentation, and systemic risk assessments for the most capable models.
  • Requirements for high‑risk uses in areas like employment, credit scoring, law enforcement, and critical infrastructure.

Ars Technica, The Verge, and The Next Web have detailed how the Act’s treatment of GPAI has been revised multiple times, under intense lobbying from both Big Tech and open‑source advocates. Final text introduces lighter‑touch obligations for “openly licensed” models below certain thresholds, while demanding more from frontier systems that could pose severe risks.


US, UK, and International Trends

In the United States, there is no single AI statute yet, but:

  • The White House AI Executive Order set voluntary reporting expectations for advanced models and tasked agencies like NIST and the FTC with creating testing standards and investigating unfair practices.
  • Sectoral regulators (e.g., for finance, healthcare, employment) are issuing guidance on algorithmic discrimination, transparency, and auditing.
  • Congressional proposals range from narrowly focused bills (e.g., deepfake labeling) to broader frameworks for licensing high‑risk AI systems.

The UK has chosen a more “pro‑innovation” and regulator‑led approach, relying on existing laws and guidance rather than a single new act, though political pressure for stronger measures has grown alongside headline‑grabbing AI failures and security concerns.


“We need rigorous, repeatable evaluations of AI systems—before and after deployment—not one‑off demos that only show best‑case behavior.”

— NIST AI Risk Management Framework contributors

Technical Safety Practices

Across jurisdictions, regulators are converging on a common vocabulary of safety practices:

  • Red‑teaming – systematically probing models for misuse potential, such as generating malware code, biological threat assistance, or targeted harassment.
  • Capability and hazard evaluation – measuring models’ performance on specialized benchmarks related to cyber‑offense, persuasion, or chemical and biological design.
  • Monitoring and incident reporting – requiring companies to track real‑world misuse, document incidents, and share lessons with authorities and sometimes the public.
  • Access controls – rate‑limiting, gating powerful features behind additional checks, and restricting certain outputs (e.g., real‑time voice cloning) in sensitive contexts.

Open‑Source vs Closed‑Source: The Fight Over Open Models

No AI policy discussion has generated more passion than the open‑source vs. closed‑source debate. For open‑source advocates, broad access to model weights and training code is vital for democratic oversight, academic research, and competition. For skeptics, releasing highly capable models without usage controls could enable malicious actors to scale disinformation, cyberattacks, or even biological threat modeling.


Arguments for Openness

Supporters of open models, active on platforms like Hacker News, GitHub, and X (Twitter), emphasize:

  • Reproducibility and scrutiny – public weights and code allow independent researchers to validate claims, discover biases, and propose mitigations.
  • Innovation and customization – small teams can fine‑tune or adapt open models for local languages, niche domains, or privacy‑sensitive on‑device uses.
  • Decentralization of power – open ecosystems prevent a small set of platforms from becoming unaccountable “AI gatekeepers.”

Arguments for Caution

Critics, including many AI safety researchers and security experts, counter that:

  • Unrestricted distribution scales misuse – once powerful models are widely available, it becomes extremely difficult to prevent them from being repurposed for harmful tasks.
  • Security externalities – a model optimized by legitimate scientists for benign protein design might be repurposed by a bad actor; the harm is not confined to the original developers.
  • Regulatory blind spots – current definitions of “open” or “open weight” may not capture the real‑world risks tied to fine‑tuning or combining models with external tools.

“Openness is not an absolute good. For some capabilities, we may need to delay or withhold weight releases until we are confident we can manage the associated risks.”

— Statement reflecting positions from several leading AI labs and safety researchers

Policy Proposals Around Open Models

Policy discussions reported by TechCrunch and The Verge include:

  • Capability thresholds – above a certain hazard score or benchmark performance, weight release could require additional safeguards or oversight.
  • Responsible release frameworks – staged releases, from API‑only access to partial weights to full open sourcing, depending on demonstrated risk management.
  • Distinguishing infrastructure from applications – lighter obligations on base open models used by many, with stronger rules for high‑risk downstream applications (e.g., deepfake tools targeting elections).

The practical challenge is evaluating “dangerous capability” in advance and agreeing on thresholds that are technically meaningful and not easily gamed.


Labor, Economic Impacts, and Platform Power

AI regulation is not just a question of abstract principles; it is a structural issue that affects jobs, wages, and market concentration. Journalists at Wired, Recode, and other outlets have highlighted several key trends.


Automation Pressure on White‑Collar Work

Generative AI can already draft reports, marketing copy, code, and legal memos, raising concerns about:

  • Task displacement in professions like customer support, copywriting, paralegal work, and software testing.
  • Intensified surveillance and metrics when employers use AI to track productivity and “augment” workers in ways that reduce autonomy.
  • Skill polarization, where a smaller number of high‑skill workers orchestrate AI tools, while routine roles shrink or become more precarious.

Regulatory Levers for Fairness

Policymakers are exploring tools such as:

  • Transparency obligations around AI use in hiring, credit decisions, and workplace monitoring.
  • Impact assessments that require organizations to analyze and mitigate discriminatory outcomes or disproportionate burdens on certain groups.
  • Stronger social safety nets and retraining programs to help workers transition into new roles as AI reshapes sectors.

“We need to treat AI not just as a technical system but as part of a broader political economy—who gains, who loses, and who gets to decide.”

— Paraphrasing perspectives from labor economists and the International Labour Organization

Technology Under the Hood: Why Training Data and Model Design Matter

To understand why copyright, safety, and openness are so intertwined, it helps to look briefly at how modern AI systems are built and deployed.


From Web‑Scale Data to Fine‑Tuned Assistants

Most state‑of‑the‑art generative models follow a similar pipeline:

  1. Data collection – large corpora gathered from web crawls, licensed datasets, and curated sources.
  2. Pre‑training – optimization on self‑supervised tasks (e.g., predicting the next token) to learn general language or image representations.
  3. Fine‑tuning – additional training on specific tasks or datasets to specialize for coding, legal analysis, medical summarization, etc.
  4. Reinforcement learning from human feedback (RLHF) or similar techniques – aligning model outputs with human preferences and safety constraints.
  5. Deployment and monitoring – exposing models via APIs, consumer apps, or on‑device runtimes, with telemetry to detect misuse and degradation.

Each stage raises distinct governance questions: what data is permissible for pre‑training, who audits fine‑tuning datasets for bias and safety, and how deployment telemetry can be collected without over‑surveilling users.


Why Transparency Is Hard but Necessary

Regulators often call for disclosure of training data “to the extent technically feasible.” Yet:

  • Datasets may contain billions of documents from ever‑changing sources.
  • Companies argue that full transparency could reveal trade secrets or facilitate adversarial attacks.
  • Some data is licensed under confidentiality constraints that bar public listing.

Emerging best practices involve high‑level dataset summaries, documentation of curation and filtering processes, and external audits under non‑disclosure agreements for especially sensitive components.


Visualizing the Regulatory Landscape

Law gavel and digital code overlay representing AI regulation and justice
Figure 1: Symbolic illustration of courts and regulators trying to catch up with AI advances. Source: Pexels.

Developers working around laptops and code screens discussing AI models
Figure 2: Developers and researchers collaborating on AI models in an open‑source environment. Source: Pexels.

Books and digital tablet highlighting tension between traditional media and new AI technologies
Figure 3: Traditional media meets digital AI: a visual metaphor for copyright and training data disputes. Source: Pexels.

Professional workspace showing charts and laptop modeling AI policy and risk frameworks
Figure 4: Policymakers and analysts use data‑driven tools to evaluate AI risks and regulatory options. Source: Pexels.

Key Milestones in the AI Governance Debate

The governance of Big Tech AI has evolved rapidly over the past few years. Some notable milestones include:

  • High‑profile generative AI launches (2022–2023) – public releases of chatbots and image generators triggered political attention and regulatory hearings worldwide.
  • Major copyright lawsuits (2023–2024) – cases by authors, artists, and media companies signaled that training data practices would be tested in court, not just in law review articles.
  • EU AI Act agreement (2024) – Europe set a global reference point by adopting a horizontal law with dedicated provisions for general‑purpose AI.
  • Frontier model safety commitments – voluntary pacts among leading labs to share information about dangerous capabilities and to invest in red‑teaming and alignment research.
  • Open‑weight frontier models – releases of increasingly capable open models sparked both celebration in developer communities and anxiety among policymakers and security experts.

Each milestone has fed back into the others. Lawsuits shape how companies document and justify their data practices; regulatory drafts influence which models are released openly or only via APIs; and open‑source breakthroughs push policymakers to revisit assumptions about who can build powerful AI in the first place.


Challenges: Balancing Innovation, Rights, and Security

Designing AI regulation is less about choosing “pro‑innovation” or “pro‑safety” and more about navigating trade‑offs under uncertainty. Several hard problems keep reappearing in expert debates and in coverage across Recode, Ars Technica, and Hacker News.


Regulatory Capture and Compliance Costs

Heavy compliance requirements risk:

  • Concentrating power in firms that can afford large legal and policy teams.
  • Pushing innovation into unregulated shadows or jurisdictions with weaker oversight.
  • Imposing one‑size‑fits‑all rules on radically different use cases, from creative writing tools to industrial control systems.

Technical Uncertainty and Fast Iteration

AI capabilities and architectures are evolving faster than traditional legislative cycles. By the time a rule is finalized, new paradigms—like efficient on‑device models or multi‑agent systems—may already challenge its assumptions.


To cope, some experts advocate adaptive regulation: iterative guidance, regulatory sandboxes, and stronger technical capacity within agencies, so that oversight can follow the technology rather than lag years behind it.


Global Coordination vs. Fragmentation

AI is inherently cross‑border. A model trained in one country can be hosted in another and used worldwide. Divergent rules across regions risk:

  • Regulatory arbitrage, where companies relocate sensitive activities to the most permissive jurisdictions.
  • Incompatibility in safety or documentation requirements, complicating international collaboration.
  • Tensions around values, such as free expression, privacy, and content moderation norms.

International bodies and standard‑setting organizations—like the OECD, G7’s Hiroshima AI Process, and ISO/IEC committees—are attempting to harmonize high‑level principles, but implementation varies widely.


Tools, Resources, and Further Learning

For practitioners, policymakers, and curious readers wanting to dive deeper, a growing ecosystem of tools and resources is available.


Technical and Policy Reading

  • The NIST AI Risk Management Framework provides a structured approach to assessing and mitigating AI risks.
  • The EU’s official portal for the EU AI Act offers summaries and legal text, plus guidance for developers and companies.
  • Research organizations like the AI Ethics Lab and Alignment Forum host in‑depth discussions on alignment, safety, and governance.

Educational Courses & Books (Including Helpful Tools)

If you are building or auditing AI systems, grounding yourself in both the technical and legal basics is invaluable. Consider:

  • Introductory AI ethics and law courses on platforms like Coursera and edX, taught by leading universities.
  • Practical books on AI and society, such as those by scholars like Kate Crawford and Timnit Gebru, which explore data power and social impacts.
  • For hands‑on practitioners, a dependable laptop with a strong GPU or fast cloud connectivity can make experimentation and evaluation easier. Devices such as the ASUS ROG Strix 15 gaming laptop with RTX graphics can provide ample compute for local fine‑tuning and evaluation of smaller open‑source models.

Podcasts and YouTube Channels

  • Tech policy podcasts on platforms like Spotify and YouTube—covering law, technology, and society—regularly break down AI court cases and legislative drafts.
  • Channels such as Stanford Online and MIT CSAIL publish talks and panel discussions on AI governance, safety, and ethics.

Conclusion: A Constitutional Moment for AI

The current battle over Big Tech AI governance is less like a routine regulatory update and more like a constitutional moment for the digital age. Decisions about copyright rules, transparency mandates, and the status of open models will shape who is allowed to build powerful systems, on what terms, and with what forms of democratic oversight.


A sustainable path forward likely requires:

  • Legally robust data governance that respects creators’ rights without freezing innovation or cementing incumbents.
  • Layered safety regimes that scale obligations with risk and capability, encouraging responsible deployment rather than punishing experimentation per se.
  • Protected space for open and academic research, combined with clear norms for responsible release and red‑teaming of powerful models.
  • Ongoing public participation—from workers, educators, civil society, and affected communities—in shaping what “responsible AI” should mean in practice.

Because AI systems now underpin search, social feeds, creative tools, and even chip design, this is not a niche legal topic—it is a question about the future distribution of knowledge and power. Whether we end up with a handful of corporate AI gatekeepers or a more pluralistic ecosystem of accountable, open technologies will depend on the choices made in the next few years by lawmakers, courts, companies, and citizens alike.


Practical Takeaways for Different Stakeholders

To make the ongoing AI governance debate more actionable, here are concrete steps various stakeholders can take today:


For Developers and Startups

  • Document your training data sources and curation process, even if not yet legally required—this will reduce future compliance headaches.
  • Adopt open standards for model cards and data sheets, enabling clearer communication with users and regulators.
  • Engage with open‑source safety tools and benchmarks to routinely test for harmful capabilities.

For Creators and Publishers

  • Track emerging opt‑out mechanisms from major AI providers and content platforms.
  • Consider collective bargaining or rights‑management organizations to negotiate licensing at scale.
  • Experiment with AI as a supplement to, not replacement for, your creative process—while asserting clear terms around data and derivative works.

For Policymakers and Advocates

  • Invest in in‑house technical expertise to evaluate claims from both industry and civil society.
  • Pilot regulatory sandboxes that invite diverse participants—not only incumbents—to test compliance approaches.
  • Ensure transparency around lobbying and standard‑setting so that emerging rules reflect broad public interests.

The healthiest AI ecosystem will likely emerge where technical excellence, legal safeguards, and democratic accountability reinforce—rather than undermine—each other.


References / Sources

Further reading and source material (non‑exhaustive):

Continue Reading at Source : Wired / Recode / Hacker News