

More interoperability, more flexibility, more happy customers. We also want to counter the industry trend of trying to lock down the catalog and hoping nobody notices. This is a foundational step to ensure engines outside of Snowflake can interoperate with Iceberg Tables - which is absolutely our goal.įrom our perspective, this is another aspect of Iceberg we really like - we can build an open catalog story without making customers' lives more difficult with additional packages and proprietary bits. ) at .metastore.RetryingHMSHandler.(RetryingHMSHandler.java:83) at .(RetryingHMSHandler.java:92) at .(HiveMetaStore.java:6902) at .metastore.HiveMetaStoreClient.(HiveMetaStoreClient.java:164) at .metastore.HiveMetaStoreClient.(HiveMetaStoreClient.java:129) at 0(Native Method) at (NativeConstructorAccessorImpl.java:62) at (DelegatingConstructorAccessorImpl.java:45) at .newInstance(Constructor.java:423) at .DynConstructors$Ctor.newInstanceChecked(DynConstructors.java:60) at .DynConstructors$Ctor.newInstance(DynConstructors.java:73) at .HiveClientPool.newClient(HiveClientPool.Last week Dennis Huo cut our first major pull request (PR) to the #apacheiceberg project! This PR is the first of many steps to add Iceberg Catalog support to Snowflake.
APACHE ICEBERG SPARK CODE
The above code results in the following exception:Įxception in thread "main" .RuntimeMetaException: Failed to connect to Hive Metastore at .HiveClientPool.newClient(HiveClientPool.java:63) at .HiveClientPool.newClient(HiveClientPool.java:30) at .ClientPool.get(ClientPool.java:117) at .n(ClientPool.java:52) at .HiveTableOperations.doRefresh(HiveTableOperations.java:121) at .refresh(BaseMetastoreTableOperations.java:86) at .current(BaseMetastoreTableOperations.java:69) at .loadTable(BaseMetastoreCatalog.java:102) at .$doComputeIfAbsent$14(BoundedLocalCache.java:2344) at .compute(ConcurrentHashMap.java:1853) at .(BoundedLocalCache.java:2342) at .(BoundedLocalCache.java:2325) at .(LocalCache.java:108) at .(LocalManualCache.java:62) at .loadTable(CachingCatalog.java:94) at .SparkCatalog.loadTable(SparkCatalog.java:125) at .SparkCatalog.loadTable(SparkCatalog.java:78) at .SparkSessionCatalog.loadTable(SparkSessionCatalog.java:118) at .2Util$.loadTable(CatalogV2Util.scala:283) at .$ResolveRelations$.loaded$lzycompute$1(Analyzer.scala:1010) at .$ResolveRelations$.loaded$1(Analyzer.scala:1010) at .$ResolveRelations$.$anonfun$lookupRelation$3(Analyzer.scala:1022) Caused by: MetaException(message:Version information not found in metastore. Table.updateProperties().set(TableProperties.WRITE_NEW_DATA_LOCATION, location).commit() Spark DSv2 is an evolving API with different levels of support in Spark versions: Feature support Spark 3.0 Spark 2.

Iceberg uses Apache Spark’s DataSourceV2 API for data source and catalog implementations. Table table = catalog.createTable(tableId, schema, spec) A catalog is created and named by adding a property .(catalog-name)with an implementation class for its value. Spark Queries To use Iceberg in Spark, first configure Spark catalogs.

HadoopCatalog catalog = new HadoopCatalog(new Configuration(), location) The nessie-spark-extensions jars are distributed by the Nessie project and contain SQL extensions that allow you to manage your tables with nessie’s git-like. config("_catalog", ".SparkSessionCatalog") The iceberg-spark-runtime fat jars are distributed by the Apache Iceberg project and contains all Apache Iceberg libraries required for operation, including the built-in Nessie Catalog. SparkSession` spark = SparkSession.builder() Apache Iceberg is a new table format for storing large and slow moving tabular data on cloud data lakes like S3 or Cloud Storage.
