Getting to grips with your data storage management is always hard

IT Europa attended the IT Press Tour in New York last week, and gleaned the latest thinking around successful data strategies among storage vendors.

CTERA

CTERA Networks, which offers its Global File System to enterprises, said 95% of generative AI pilots never make it to production. But the reason isn't the technology used, it’s the quality of the data used.

Organisations are pointing their AI tools at file systems full of duplicate files, outdated versions, and sensitive information that should never be processed in the first place, said CTERA, and then they wonder why they are looking at “garbage” when they come to make business decisions, said the firm.

The solution is better data management. This isn’t easy, as while machine learning models need data, that data resides in silos with different formats, permissions, and quality levels, and data science teams can spend months on clean-up processes instead of building models.

CTERA says firms must ride three “waves” before they get to the right position for their AI data lakes, and can manage their data effectively.

Wave 1 sees organisations control costs, by building a file system that unifies all distributed storage behind a single namespace, making hundreds of separate file servers look like one unified system.

In Wave 2, firms must protect the business, and CTERA is offering AI-powered detection that monitors file access patterns in real-time. When CTERA detects suspicious activity, such as unusual encryption or mass deletions, it automatically blocks access and enables recovery from immutable snapshots.

For Wave 3, CTERA has built a systematic data curation process. The system ingests data from any location, and converts it to a standard format. It extracts metadata using AI, and filters out sensitive content based on your company policies. When that is done, it create the indexes that AI models use for querying.

In New York, CTERA introduced its CTERA Data Intelligence technology as part of the above Wave 3, as a first major foray into the world of agentic AI. It is first being offered to existing customers as part of their current deployments, but could well be offered as a standalone product through its channel partners in the future.

CTERA says its channel partners include Hitachi, HPE, IBM, Microsoft, and Amazon.

ExaGrid

Backup target appliance vendor ExaGrid says it now has over 4,800 customers, and claims it is the largest backup storage hardware provider behind the big primary storage vendors, including Dell, HPE, NetApp, Pure Storage, Hitachi, Huawei, and IBM.

As it says, many of these vendors don’t, or can’t specify what their storage systems are used for once they’re shipped, but ExaGrid’s hardware is specifically sold as a data backup solution.

Its appliances are disk drive-based and ingest backup data into a “landing zone”, where it is kept in a raw state with deduplication carried out afterwards. This means restores from the landing zone are fast, as the data does not have to be “rehydrated” from a deduplicated state.

On the IT Press Tour, ExaGrid CEO Bill Andrews reported 19 consecutive quarters of free cash flow and EBITDA, although that had “slowed” because of market uncertainty around president Trump’s tariffs, he said.

“We’re still growing, but not at the same rate two years ago,” we were told.

The company supports integration with all the major backup applications, including Veeam, Commvault, and Veritas NetBackup, now owned by Cohesity.

The latest quarter saw 160 new customers added, with almost half of new business coming from outside the US. Version 7.3.0 of software on ExaGrid appliances added support for Rubrik and MongoDB Ops Manager, in addition to deduplication for encrypted Microsoft SQL Server data dumps.

Crucially for annual recurring revenue, over 99% of customers are on maintenance and support contracts, said Andrews. So while ExaGrid is almost alone in the market in supplying backup hardware, it is still generating cash from both its software and hardware.

HYCU

Cloud data management and SaaS workload protection firm HYCU conducted research that showed 65% of organisations experienced a SaaS-related security breach in the past year, with the average incident costing $2.3m over a typical five-day recovery period.

Sathya Sankaran, HYCU head of cloud products, described the "lifestyle diseases" of cloud backup, problems that develop slowly but pose a serious risk over time:

-Storage obesity: Cloud backup APIs don't support incremental backups the way on-premises systems do. Every backup is a full export. When you're dealing with petabyte-scale datasets, this creates massive storage costs and makes long retention periods economically impractical.

-Fragmentation: With different consoles for different cloud providers, separate tools for SaaS applications, and multiple third-party services to fill gaps in coverage, organisations end up with "console chaos" and "automation sprawl," said Sankaran.

-Blind spots: Most cloud backup solutions handle VMs and files well, but with database-as-a-service products, data lakehouses, and AI training datasets, these workloads either aren't supported or require manual scripting to protect them, he said.

The potential problems in HYCU's research are self-evident. Organisations using over 200 SaaS applications experienced breach rates of 77%, but those running less than 100 SaaS apps saw a breach rate of 60%. Effectively, a greater number of apps created a wider attack surface.

As part of its presentation, HYCU trumpeted its partnership with the Dell Data Domain platform, using DD Boost technology, which provides deduplication at the source. HYCU claims 40:1 deduplication ratios in real customer environments, meaning 40 petabytes of logical data stored as one petabyte physical.

While Dell is primarily a hardware vendor selling on-premise kit, HYCU concentrates on cloud and SaaS protection services. For HYCU, Dell's Data Domain platform offers a deduplication target, making large-scale cloud backup “economically viable”, it says.

“What we offer is the ability to push a whole bunch of data from 90-plus, 100-plus connectors we now have into that Dell box," said Sankaran.

HYCU is also now offering support for iManage Cloud, said to be used by 80% of the top 100 law firms. iManage stores documents, and maintains cases, file links, images, evidence, metadata, and permissions. When such lawyers charge at least $500 per hour, such data protection is critical.

Arcitecta

Princeton University in the US is building a 100-year data management plan, with the aim of making its data and research accessible and usable for the next century.

Princeton chose Arcitecta's Mediaflux platform as the foundation of this ambitious project.

The university had storage scattered across different platforms, and it was acquired on an ad hoc basis over many years. As these stacks grew higher and the number of users increased, data management became a significant issue, we were told.

Princeton created TigerData as its research data management platform, which was built on Mediaflux. The system currently manages 200 petabytes of research data, and that that number will keep growing, of course. As of October 2025, we heard, TigerData tracks 497m assets.

Princeton said it needed to break free from IBM vendor lock-in. Mediaflux allows them to add Dell PowerScale, Dell ECS, and IBM Diamondback tape libraries, alongside existing IBM storage. And everything appears as a single namespace to users.

In addition, Mediaflux enables researchers to tag data with metadata at the time of upload. This shows who created it, what it contains, which grant funded it, and how long it needs to be preserved, for instance.

As part of the 100-year plan, Princeton needed a system that could migrate data across technology refreshes without losing data. As Mediaflux sits above the storage hardware, upgrading storage doesn't break the system that is already deployed.

Princeton can now understand exactly what data is being used, and what's sitting idle. This led to a decision to move 18 petabytes of data to tape because access records showed it was rarely needed.

To push things forward for the vendor, during the IT Press Tour, we heard that Robert Mollard had joined Arcitecta as global business development lead.

“His knowledge of data workflow requirements and experience architecting innovative, highly performant solutions for large-scale enterprises and applications will prove invaluable, as Arcitecta continues to serve a growing base of customers and partners worldwide,” said Jason Lohrey, CEO of Arcitecta. “Rob will play a significant role in expanding our reach and take the power of Mediaflux to the next level, continuing to set the standard for exceptional data management solutions.”

Mollard is a data management technologist and HPC (high performance computing) and AI solutions architect, with more than 20 years of experience in helping customers solve data workflow requirements and complex challenges around accessing, managing and optimising data throughout its lifecycle.

Before joining Arcitecta, Mollard worked for Hewlett Packard Enterprise for nine years, serving as an HPC and AI solution architect and storage specialist covering the Asia Pacific region.

AuriStor

How do you provide users with access to the data they need, wherever they are located, and without incurring significant costs or compromising data security?

AuriStor has built a distributed file system that helps to address this challenge. AuriStor offers a global namespace accessible via a simple path, and files stored anywhere appear in the same directory structure for all users. The system handles replication, caching, and synchronisation automatically.

Zero client configuration is required, you just install the client, and DNS handles service discovery.

AuriStor says the US Geological Survey uses its AuriStorFS platform to deliver natural hazard information during emergencies. When hurricanes or earthquakes strike, for instance, access to current data is crucial.

The system replicates content across three AWS regions, taking snapshots every ten minutes and replicating them. Clients automatically fetch data from the nearest region. If that region fails, they “seamlessly” access data from the next closest location, said AuriStor.

AuriStor’s biggest customer, believed to be financial services and consulting giant Goldman Sachs, deployed AuriStorFS for software distribution at scale. It serves 175,000 clients across 80 cells with 300 servers managing 1.5m volumes. The infrastructure spans 180 to 200 regional cells globally across multiple cloud providers, including Amazon Web Services, Google Cloud, and Oracle Cloud.

Many storage solutions charge based on capacity or data volume, so when you scale, the cost can be high in terms of terabytes and petabytes. AuriStor charges per server and per user/machine identity. The base price is $21,000 annually for a cell, with up to four servers and 1,000 identities.

You can add more servers at $1,000 to $2,500 each, depending on quantity. User identities are charged as part of a tiered structure: 1,000 to 2,500 identities cost $1,375 in total, while 50,000 to 100,000 identities cost $30,000.

"We give you the first 100 exabytes for free," said AuriStor CEO Jeffery Altman. “Store 100TB or 500TB on the same server and pay the same amount. For organisations with large datasets, but controlled user populations, this model makes budgeting straightforward.”

The perpetual license is also a potential advantage. If a user decides it’s not for them in the long-term, they can stop renewing and keep using the version they have, with security updates continuing for two years.

Getting to grips with your data storage management is always hard

SaaS and cloud data backup market sees channel advancement

Markets round-up for the week: 29 July

HYCU improves data mobility and protection for service providers

Vendors form alliance to tackle compliance headaches like DORA

NetApp offers the channel a ‘simple’ way to make money out of AI

Certification: Why box-ticking won’t save MSPs

Businesses must embed Zero Trust into culture to bolster security, says report

The latest from the evolving AI channel: 9 October

Getting to grips with your data storage management is always hard

Share