I run into an issue with anomaly detection when using HANA_ML. I have a table in in SAP HANA with numeric columns in format DOUBLE. I created it in two variants:
HC_TABLE1 with just the data
HC_TABLE2 the same data but also having an ID columns
Now training an IsolationForsest with the data works fine and I can save the model:
iso_forest = IsolationForest(
max_samples =2048,
n_estimators=100, # Number of trees
max_features=458, # Use all features from the table
random_state=1, # For reproducibility
thread_ratio=0.9 # Parallel processing
)
hana_df1 = conn.table(‘HC_TABLE1’)
result = iso_forest.fit(hana_df1)
However predicting the outliers does NOT work:
hana_df2 = conn.table(‘HC_TABLE2’)
predictions = iso_forest.predict(hana_df2, key=’ID’)
ERROR:hana_ml.algorithms.pal.preprocessing:HANA version: 2.00.078.00.1715149848 (fa/hana2sp07). (423, ‘AFL error: AFL DESCRIBE for nested call failed – invalid table(s) for ANY-procedure call (Input table 0: column 0 has invalid SQL type.): line 33 col 1 (at pos 8351)’)
How can I identify what is going wrong here? I can train a model but never use it? The exactly same data which was used for fitting is not accepted for doing predictions. I am close to ditching HANA_ML completely and using the real python ML routines instead.