org.datanucleus.store.rdbms.exceptions.MappedDatastoreException: INSERT INTO “TABLE_PARAMS” – Hive with Kite Morphlines

rdbms.exceptions.MappedDatastoreException: INSERT INTO “TABLE_PARAMS”

If you had read my previous post (Anatomy of a configuration file) where i hive_logo
described the flow of the application
on which I was working . In the last
command of my morphline I was
creating the hive table using the avro
schema which was created in the previous command.
As per the design of my application I
decided that i will store my avro schema
file in the local file system instead of saving it on HDFS. Everything was working fine as expected , but things started breaking when my avro schema file contains more than 4000 characters. Below is the complete exception which I got while Hive table creation :

org.datanucleus.store.rdbms.exceptions.MappedDatastoreException: INSERT INTO “TABLE_PARAMS” 

Complete Trace :

Exception in thread "main" org.kitesdk.data.DatasetRepositoryException: Hive MetaStore exception
	at org.kitesdk.data.hcatalog.impl.HCatalog.createTable(HCatalog.java:153)
	at org.kitesdk.data.hcatalog.HCatalogManagedMetadataProvider.create(HCatalogManagedMetadataProvider.java:71)
	at org.kitesdk.data.hcatalog.HCatalogDatasetRepository.create(HCatalogDatasetRepository.java:64)
	at com.techidiocy.morphline.commands.LoadHiveTableParquetBuilder$LoadHiveTable.doProcess(LoadHiveTableParquetBuilder.java:85)
	at org.kitesdk.morphline.base.AbstractCommand.process(AbstractCommand.java:156)
	at org.kitesdk.morphline.base.Connector.process(Connector.java:64)
	at com.techidiocy.morphline.commands.GenerateAvroSchemaFileBuilder$BuildAvroSchemaFile.doProcess(GenerateAvroSchemaFileBuilder.java:74)
	at org.kitesdk.morphline.base.AbstractCommand.process(AbstractCommand.java:156)
	at org.kitesdk.morphline.base.Connector.process(Connector.java:64)
	at com.techidiocy.morphline.commands.DroolsReocrdValidatorBuilder$DroolsReocrdValidator.doProcess(DroolsReocrdValidatorBuilder.java:86)
	at org.kitesdk.morphline.base.AbstractCommand.process(AbstractCommand.java:156)
	at org.kitesdk.morphline.base.Connector.process(Connector.java:64)
	at com.techidiocy.morphline.commands.TransformXMLToFlatFileBuilder$XMLTransformer.doProcess(TransformXMLToFlatFileBuilder.java:59)
	at org.kitesdk.morphline.base.AbstractCommand.process(AbstractCommand.java:156)
	at org.kitesdk.morphline.base.Connector.process(Connector.java:64)
	at org.kitesdk.morphline.stdio.ReadLineBuilder$ReadLine.doProcess(ReadLineBuilder.java:100)
	at org.kitesdk.morphline.stdio.AbstractParser.doProcess(AbstractParser.java:93)
	at org.kitesdk.morphline.base.AbstractCommand.process(AbstractCommand.java:156)
	at org.kitesdk.morphline.base.AbstractCommand.doProcess(AbstractCommand.java:181)
	at org.kitesdk.morphline.base.AbstractCommand.process(AbstractCommand.java:156)
	at com.techidiocy.kite.morphlines.test.ArgoIntegrationTest.main(ArgoIntegrationTest.java:33)
Caused by: MetaException(message:javax.jdo.JDODataStoreException: Put request failed : INSERT INTO "TABLE_PARAMS" ("PARAM_VALUE","TBL_ID","PARAM_KEY") VALUES (?,?,?) 
	at org.datanucleus.api.jdo.NucleusJDOHelper.getJDOExceptionForNucleusException(NucleusJDOHelper.java:451)
	at org.datanucleus.api.jdo.JDOPersistenceManager.jdoMakePersistent(JDOPersistenceManager.java:732)
	at org.datanucleus.api.jdo.JDOPersistenceManager.makePersistent(JDOPersistenceManager.java:752)
	at org.apache.hadoop.hive.metastore.ObjectStore.createTable(ObjectStore.java:709)
	at sun.reflect.GeneratedMethodAccessor15.invoke(Unknown Source)
	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
	at java.lang.reflect.Method.invoke(Method.java:606)
	at org.apache.hadoop.hive.metastore.RetryingRawStore.invoke(RetryingRawStore.java:122)
	at com.sun.proxy.$Proxy0.createTable(Unknown Source)
	at org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.create_table_core(HiveMetaStore.java:1078)
	at org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.create_table_with_environment_context(HiveMetaStore.java:1111)
	at org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.create_table(HiveMetaStore.java:1100)
	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
	at java.lang.reflect.Method.invoke(Method.java:606)
	at org.apache.hadoop.hive.metastore.RetryingHMSHandler.invoke(RetryingHMSHandler.java:103)
	at com.sun.proxy.$Proxy5.create_table(Unknown Source)
	at org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$Processor$create_table.getResult(ThriftHiveMetastore.java:6453)
	at org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$Processor$create_table.getResult(ThriftHiveMetastore.java:6437)
	at org.apache.thrift.ProcessFunction.process(ProcessFunction.java:39)
	at org.apache.hadoop.hive.metastore.TUGIBasedProcessor.process(TUGIBasedProcessor.java:104)
	at org.apache.thrift.server.TThreadPoolServer$WorkerProcess.run(TThreadPoolServer.java:244)
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
	at java.lang.Thread.run(Thread.java:744)
NestedThrowablesStackTrace:
org.datanucleus.store.rdbms.exceptions.MappedDatastoreException: INSERT INTO "TABLE_PARAMS" ("PARAM_VALUE","TBL_ID","PARAM_KEY") VALUES (?,?,?) 
	at org.datanucleus.store.rdbms.scostore.JoinMapStore.internalPut(JoinMapStore.java:1046)
	at org.datanucleus.store.rdbms.scostore.JoinMapStore.putAll(JoinMapStore.java:224)
	at org.datanucleus.store.rdbms.mapping.java.MapMapping.postInsert(MapMapping.java:137)
	at org.datanucleus.store.rdbms.request.InsertRequest.execute(InsertRequest.java:518)
	at org.datanucleus.store.rdbms.RDBMSPersistenceHandler.insertTable(RDBMSPersistenceHandler.java:167)
	at org.datanucleus.store.rdbms.RDBMSPersistenceHandler.insertObject(RDBMSPersistenceHandler.java:143)
	at org.datanucleus.state.JDOStateManager.internalMakePersistent(JDOStateManager.java:3777)
	at org.datanucleus.state.JDOStateManager.makePersistent(JDOStateManager.java:3753)
	at org.datanucleus.ExecutionContextImpl.persistObjectInternal(ExecutionContextImpl.java:2124)
	at org.datanucleus.ExecutionContextImpl.persistObjectWork(ExecutionContextImpl.java:1972)
	at org.datanucleus.ExecutionContextImpl.persistObject(ExecutionContextImpl.java:1820)
	at org.datanucleus.ExecutionContextThreadedImpl.persistObject(ExecutionContextThreadedImpl.java:217)
	at org.datanucleus.api.jdo.JDOPersistenceManager.jdoMakePersistent(JDOPersistenceManager.java:727)
	at org.datanucleus.api.jdo.JDOPersistenceManager.makePersistent(JDOPersistenceManager.java:752)
	at org.apache.hadoop.hive.metastore.ObjectStore.createTable(ObjectStore.java:709)
	at sun.reflect.GeneratedMethodAccessor15.invoke(Unknown Source)
	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
	at java.lang.reflect.Method.invoke(Method.java:606)
	at org.apache.hadoop.hive.metastore.RetryingRawStore.invoke(RetryingRawStore.java:122)
	at com.sun.proxy.$Proxy0.createTable(Unknown Source)
	at org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.create_table_core(HiveMetaStore.java:1078)
	at org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.create_table_with_environment_context(HiveMetaStore.java:1111)
	at org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.create_table(HiveMetaStore.java:1100)
	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
	at java.lang.reflect.Method.invoke(Method.java:606)
	at org.apache.hadoop.hive.metastore.RetryingHMSHandler.invoke(RetryingHMSHandler.java:103)
	at com.sun.proxy.$Proxy5.create_table(Unknown Source)
	at org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$Processor$create_table.getResult(ThriftHiveMetastore.java:6453)
	at org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$Processor$create_table.getResult(ThriftHiveMetastore.java:6437)
	at org.apache.thrift.ProcessFunction.process(ProcessFunction.java:39)
	at org.apache.hadoop.hive.metastore.TUGIBasedProcessor.process(TUGIBasedProcessor.java:104)
	at org.apache.thrift.server.TThreadPoolServer$WorkerProcess.run(TThreadPoolServer.java:244)
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
	at java.lang.Thread.run(Thread.java:744)
Caused by: org.postgresql.util.PSQLException: ERROR: value too long for type character varying(4000)
	at org.postgresql.core.v3.QueryExecutorImpl.receiveErrorResponse(QueryExecutorImpl.java:2102)
	at org.postgresql.core.v3.QueryExecutorImpl.processResults(QueryExecutorImpl.java:1835)
	at org.postgresql.core.v3.QueryExecutorImpl.execute(QueryExecutorImpl.java:257)
	at org.postgresql.jdbc2.AbstractJdbc2Statement.execute(AbstractJdbc2Statement.java:500)
	at org.postgresql.jdbc2.AbstractJdbc2Statement.executeWithFlags(AbstractJdbc2Statement.java:388)
	at org.postgresql.jdbc2.AbstractJdbc2Statement.executeUpdate(AbstractJdbc2Statement.java:334)
	at com.jolbox.bonecp.PreparedStatementHandle.executeUpdate(PreparedStatementHandle.java:203)
	at org.datanucleus.store.rdbms.ParamLoggingPreparedStatement.executeUpdate(ParamLoggingPreparedStatement.java:399)
	at org.datanucleus.store.rdbms.SQLController.executeStatementUpdate(SQLController.java:439)
	at org.datanucleus.store.rdbms.scostore.JoinMapStore.internalPut(JoinMapStore.java:1037)
	... 36 more
)
	at org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$create_table_result$create_table_resultStandardScheme.read(ThriftHiveMetastore.java:20149)
	at org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$create_table_result$create_table_resultStandardScheme.read(ThriftHiveMetastore.java:20117)
	at org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$create_table_result.read(ThriftHiveMetastore.java:20043)
	at org.apache.thrift.TServiceClient.receiveBase(TServiceClient.java:78)
	at org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$Client.recv_create_table(ThriftHiveMetastore.java:732)
	at org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$Client.create_table(ThriftHiveMetastore.java:719)
	at org.apache.hadoop.hive.metastore.HiveMetaStoreClient.createTable(HiveMetaStoreClient.java:432)
	at org.kitesdk.data.hcatalog.impl.HCatalog$3.call(HCatalog.java:139)
	at org.kitesdk.data.hcatalog.impl.HCatalog$3.call(HCatalog.java:136)
	at org.kitesdk.data.hcatalog.impl.HCatalog.doWithRetry(HCatalog.java:67)
	at org.kitesdk.data.hcatalog.impl.HCatalog.createTable(HCatalog.java:145)
	... 20 more

Morphline :

morphlines : [{
    id : morphline1
    importCommands : ["org.kitesdk.**","com.techidiocy.morphline.commands.**"]
    commands : [
    {
       readLine
       {
            commentPrefix : "#" 
            charset : "ASCII" 
       } 
     } 
     {
        transformXMLToFlat
        {
            flatFileOutputLocation="/home/cloudera/files/dataflatfile/"        
        }
     }
     {  
       dataValidator
       {
         inputSchemaLocation :"/home/cloudera/files/metadataflatfile/"
         rulesSheetLocation : "/home/cloudera/files/rules-sheet/"
         outputFileLocation : "/home/cloudera/files/accepted-rejected-records"
       }
     }
     {
       generateAvroSchemaFile
        {
           avroSchemaFileLocation : "/home/cloudera/files/avroschemafile"
           avroSchemaHDFSFileLocation : "/user/cloudera/avroschema/"
        }
     }
     {
      hiveDataLoader
      {
        format : "Formats.PARQUET"
        nameNodeLocation : "hdfs://localhost.localdomain:8020"      
      }
     }   
    ]
}]

For the first time when i looked into the exception I never thought that it is happening because of the avro schema file containing more than 4000 characters , but from the exception it was clear that in the Hive metastore there is a table with name “TABLE_PARAMS” in which my program is trying to insert something which is more than 4000 characters in length. So, to check the contents of the table I connected to the postgres database (hive metastore) from the CLI.

psql –h localhost.localdomain -p 7432 –U hive

And queried for that particular table “TABLE_PARAMS

select * from "TABLE_PARAMS";

After seeing the result of this query , I was surprised that it is inserting all the data of the avro schema file into the column “PARAM_VALUE” and because of the 4000 characters limitation it is failing to insert the data. Now at that moment first solution came to my mind was , I have to increase the column limit from 4000 characters to some big number. But there were two big problems which comes as a bonus with this solution.

  1. What is the maximum limit that I can define for that and how will i ensure that it will cover all the possible cases ? (e.g An schema file contains 1 million characters)
  2. Is it recommended way of solving this problem , as I am going to alter the Hive Metastore table which can bring inconsistency in the HDFS. ?

Answer to both of these questions was very simple , this is not an acceptable solution. Before proceeding further for the next solution I just want to make sure that I am heading in correct direction , so I altered the table for testing purpose ,

ALTER TABLE "TABLE_PARAMS" ALTER COLUMN "PARAM_VALUE" type varchar(10000);

And as expected this time it worked fine , my application created all the tables successfully and loads the data into those table. So, at this moment i was sure that table column size is the root cause and I have to find some solution to this problem. Before moving further I reverted back the table to its original state.

ALTER TABLE "TABLE_PARAMS" ALTER COLUMN "PARAM_VALUE" type varchar(4000);

After doing lot of research over it , I didn’t find the solution , so finally i posted it on Cloudera CDK DEV community and there I got the answer to my question.

Thanks to Joey Echeverria from Cloudera who answered the question.

Solution to this problem was very simple , as this is the limitation on the Hive side for not allowing more than 4000 chars, so instead of storing the avro schema file on local file system store it on HDFS file system.

As soon as I changed my code to copy the avro schema file from local file system to HDFS file system , every thing started working fine. It solved my problem but the actual suspense still exists , now what it is storing into the “PARAM_VALUE” column. So , here is the answer for that , i executed the query again and result was itself self explanatory.

select * from "TABLE_PARAMS";

Now as you can see instead of copying all the data , it is just having the reference for the schema file location.

org.datanucleus.store.rdbms.exceptions.MappedDatastoreException: INSERT INTO "TABLE_PARAMS" - Hive with Kite Morphlines

Thats all for now. :)

Suggestions ,corrections,questions are most welcomed.

Disclaimer : All the logos and images used above belong to their respective owners.

Let'sConnect

Saurabh Jain

A Developer working on Enterprise applications ,Distributed Systems, Hadoop and BigData.This blog is about my experience working mostly on Java technologies ,NoSQL ,git , maven and Hadoop ecosystem.
Let'sConnect

Share and Enjoy

  • Facebook
  • Twitter
  • Delicious
  • LinkedIn
  • StumbleUpon
  • Add to favorites
  • Email
  • RSS
Add Comment Register



Leave a Reply

Your email address will not be published. Required fields are marked *

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <s> <strike> <strong>