<html>
<head>
<meta http-equiv="Content-Type" content="text/html; charset=Windows-1252">
<style type="text/css" style="display:none;"> P {margin-top:0;margin-bottom:0;} </style>
</head>
<body dir="ltr">
<div style="font-family: Aptos, Aptos_EmbeddedFont, Aptos_MSFontService, Calibri, Helvetica, sans-serif; font-size: 12pt; color: rgb(0, 0, 0);" class="elementToProof">
Hi Sandeep,</div>
<div style="font-family: Aptos, Aptos_EmbeddedFont, Aptos_MSFontService, Calibri, Helvetica, sans-serif; font-size: 12pt; color: rgb(0, 0, 0);" class="elementToProof">
<br>
</div>
<div style="font-family: Aptos, Aptos_EmbeddedFont, Aptos_MSFontService, Calibri, Helvetica, sans-serif; font-size: 12pt; color: rgb(0, 0, 0);" class="elementToProof">
Ok, I understand how that goes.</div>
<div style="font-family: Aptos, Aptos_EmbeddedFont, Aptos_MSFontService, Calibri, Helvetica, sans-serif; font-size: 12pt; color: rgb(0, 0, 0);" class="elementToProof">
<br>
</div>
<div style="font-family: Aptos, Aptos_EmbeddedFont, Aptos_MSFontService, Calibri, Helvetica, sans-serif; font-size: 12pt; color: rgb(0, 0, 0);" class="elementToProof">
Thanks,</div>
<div style="font-family: Aptos, Aptos_EmbeddedFont, Aptos_MSFontService, Calibri, Helvetica, sans-serif; font-size: 12pt; color: rgb(0, 0, 0);" class="elementToProof">
<br>
Adam<br>
</div>
<div id="appendonsend"></div>
<hr style="display:inline-block;width:98%" tabindex="-1">
<div id="divRplyFwdMsg" dir="ltr"><font face="Calibri, sans-serif" style="font-size:11pt" color="#000000"><b>From:</b> Sikka, Sandeep <sandeep.sikka@staples.com><br>
<b>Sent:</b> 09 August 2023 22:49<br>
<b>To:</b> Adam Pocock <adam.pocock@oracle.com><br>
<b>Cc:</b> Parthasarathy, Bharadwaj <bharadwaj.parthasarathy@staples.com>; tribuo-devel@oss.oracle.com <tribuo-devel@oss.oracle.com>; Kumar, Navdeep <navdeep.kumar@staples.com>; Jack Sullivan <jack.t.sullivan@oracle.com><br>
<b>Subject:</b> Re: [EXT]:Re: [External] : Re: Tribuo Non Numerical Features Support</font>
<div> </div>
</div>
<style>
<!--
@font-face
{font-family:Helvetica}
@font-face
{font-family:"Cambria Math"}
@font-face
{font-family:Calibri}
p.x_MsoNormal, li.x_MsoNormal, div.x_MsoNormal
{margin:0in;
font-size:11.0pt;
font-family:"Calibri",sans-serif}
a:link, span.x_MsoHyperlink
{color:blue;
text-decoration:underline}
span.x_apple-converted-space
{}
span.x_EmailStyle20
{font-family:"Calibri",sans-serif;
color:windowtext}
.x_MsoChpDefault
{font-size:10.0pt}
@page WordSection1
{margin:1.0in 1.0in 1.0in 1.0in}
div.x_WordSection1
{}
-->
</style>
<div lang="EN-US" link="blue" vlink="purple" style="word-wrap:break-word; line-break:after-white-space">
<div class="x_WordSection1">
<p class="x_MsoNormal">Hi Adam</p>
<p class="x_MsoNormal">I realized I never replied to you. The business committee decided to not pursue the project that necessitated the conversation below. We have deferred the work. If anything changes or a new need comes up, I will reach out. Thanks for
your help.</p>
<p class="x_MsoNormal"> </p>
<p class="x_MsoNormal">Sandeep.</p>
<p class="x_MsoNormal"> </p>
<div>
<p class="x_MsoNormal" style=""><b><span style="font-size:14.0pt; font-family:"Arial",sans-serif; color:#CB000A">Sandeep Sikka</span></b><span style="font-size:13.5pt; color:black"></span></p>
<p class="x_MsoNormal"><b><span style="font-size:10.5pt; font-family:"Arial",sans-serif; color:#646464">Distinguished Engineer</span></b><span style="font-size:13.5pt; color:black"></span></p>
<p class="x_MsoNormal"><span style="font-size:10.5pt; font-family:"Arial",sans-serif; color:#646464">500 Staples Dr., Framingham MA</span><span style="font-size:13.5pt; color:black"></span></p>
<p class="x_MsoNormal"><span style="font-size:10.5pt; font-family:"Arial",sans-serif; color:#646464">508.253.7977</span><span style="font-size:13.5pt; color:black"></span></p>
<p class="x_MsoNormal"><span style="font-size:10.5pt; font-family:"Arial",sans-serif; color:black"><a href="mailto:sandeep.sikka@staples.com"><span style="color:#0563C1">sandeep.sikka@staples.com</span></a></span><span style="font-size:13.5pt; color:black"></span></p>
<p class="x_MsoNormal" style="line-height:18.0pt"><span style="font-size:3.0pt; font-family:"Arial",sans-serif; color:#646464"> </span><span style="font-size:13.5pt; color:black"></span></p>
<p class="x_MsoNormal"><sub><span style="font-family:"Arial",sans-serif; color:#646464"><img border="0" width="129" height="23" id="x_Picture_x0020_5" alt="signature_1764920163" style="width:1.3437in; height:.2395in" data-outlook-trace="F:1|T:1" src="cid:image001.png@01D9CB13.C5ECADE0"></span></sub><span style="font-size:13.5pt; color:black"></span></p>
<p class="x_MsoNormal"><span style="font-size:3.0pt; font-family:"Arial",sans-serif; color:#646464"> </span><span style="font-size:13.5pt; color:black"></span></p>
<p class="x_MsoNormal"><span style="font-family:"Arial",sans-serif; color:#646464"><img border="0" width="16" height="16" id="x_Picture_x0020_4" alt="signature_1743869139" style="width:.1666in; height:.1666in" data-outlook-trace="F:1|T:1" src="cid:image002.png@01D9CB13.C5ECADE0"><img border="0" width="16" height="16" id="x_Picture_x0020_3" alt="signature_1068797335" style="width:.1666in; height:.1666in" data-outlook-trace="F:1|T:1" src="cid:image003.png@01D9CB13.C5ECADE0"> <img border="0" width="16" height="16" id="x_Picture_x0020_2" alt="signature_107582287" style="width:.1666in; height:.1666in" data-outlook-trace="F:1|T:1" src="cid:image004.png@01D9CB13.C5ECADE0"> <img border="0" width="16" height="16" id="x_Picture_x0020_1" alt="signature_66462932" style="width:.1666in; height:.1666in" data-outlook-trace="F:1|T:1" src="cid:image005.png@01D9CB13.C5ECADE0"></span><span style="font-size:13.5pt; color:black"></span></p>
</div>
<p class="x_MsoNormal"> </p>
<p class="x_MsoNormal"> </p>
<p class="x_MsoNormal"> </p>
<div style="border:none; border-top:solid #B5C4DF 1.0pt; padding:3.0pt 0in 0in 0in">
<p class="x_MsoNormal"><b><span style="font-size:12.0pt; color:black">From: </span>
</b><span style="font-size:12.0pt; color:black">"Sikka, Sandeep" <sandeep.sikka@staples.com><br>
<b>Date: </b>Tuesday, June 20, 2023 at 3:00 PM<br>
<b>To: </b>Adam Pocock <adam.pocock@oracle.com><br>
<b>Cc: </b>"Parthasarathy, Bharadwaj" <bharadwaj.parthasarathy@staples.com>, "tribuo-devel@oss.oracle.com" <tribuo-devel@oss.oracle.com>, "Kumar, Navdeep" <navdeep.kumar@staples.com>, Jack Sullivan <jack.t.sullivan@oracle.com><br>
<b>Subject: </b>Re: [EXT]:Re: [External] : Re: Tribuo Non Numerical Features Support</span></p>
</div>
<div>
<p class="x_MsoNormal"> </p>
</div>
<p class="x_MsoNormal">Thank you, Adam, for the detailed feedback. Let me have some discussions internally and I will get back to you.</p>
<p class="x_MsoNormal"> </p>
<p class="x_MsoNormal">Sandeep. </p>
<div style="border:none; border-top:solid #B5C4DF 1.0pt; padding:3.0pt 0in 0in 0in">
<p class="x_MsoNormal"><b><span style="font-size:12.0pt; color:black">From: </span>
</b><span style="font-size:12.0pt; color:black">Adam Pocock <adam.pocock@oracle.com><br>
<b>Date: </b>Friday, June 16, 2023 at 1:48 PM<br>
<b>To: </b>"Sikka, Sandeep" <sandeep.sikka@staples.com><br>
<b>Cc: </b>"Parthasarathy, Bharadwaj" <bharadwaj.parthasarathy@staples.com>, "tribuo-devel@oss.oracle.com" <tribuo-devel@oss.oracle.com>, "Kumar, Navdeep" <navdeep.kumar@staples.com>, Jack Sullivan <jack.t.sullivan@oracle.com><br>
<b>Subject: </b>Re: [EXT]:Re: [External] : Re: Tribuo Non Numerical Features Support</span></p>
</div>
<div>
<p class="x_MsoNormal"> </p>
</div>
<p class="x_MsoNormal">Hi Sandeep, </p>
<div>
<p class="x_MsoNormal"> </p>
</div>
<div>
<p class="x_MsoNormal">Firstly, we always welcome contributions. In this case the difficulty depends on if you only want inference support or if you’d also like to train models.</p>
</div>
<div>
<p class="x_MsoNormal"> </p>
</div>
<div>
<p class="x_MsoNormal">To add pure inferencing support (similar to our ONNX, OCI and XGBoost external model support) the Tribuo bits would be relatively simple as you don’t need to interact with the provenance system much. You’d subclass org.tribuo.interop.ExternalModel
and implement the necessary methods to get things out of the Tribuo representation and into the appropriate LightGBM one. It would be pretty similar to the existing org.tribuo.common.xgboost.XGBoostExternalModel. However it looks like the LightGBM Java interface
is autogenerated and very low level (<a href="https://urldefense.com/v3/__https://central.sonatype.com/artifact/com.microsoft.ml.lightgbm/lightgbmlib/3.3.510__;!!ACWV5N9M2RV99hQ!M-sFlud3LUa0nmijMpjvrkUMXliENiAs79iWv12f5F2LFM3p1HFYFuu9H4MW9n3PGqhqmQAapGypEAyx2A56kJv-CQ$">https://central.sonatype.com/artifact/com.microsoft.ml.lightgbm/lightgbmlib/3.3.510</a>),
so it would need a fair amount of work to get that C-like API into something that’s easy to use from Java. The SWIG wrapper is basically passing around raw pointers so it needs actual objects and methods built on top of that. Microsoft built a lot of that
for their SynapseML API to LightGBM (<a href="https://urldefense.com/v3/__https://github.com/microsoft/SynapseML/tree/master/lightgbm/src/main/scala/com/microsoft/azure/synapse/ml/lightgbm__;!!ACWV5N9M2RV99hQ!M-sFlud3LUa0nmijMpjvrkUMXliENiAs79iWv12f5F2LFM3p1HFYFuu9H4MW9n3PGqhqmQAapGypEAyx2A4SFcyBnQ$">https://github.com/microsoft/SynapseML/tree/master/lightgbm/src/main/scala/com/microsoft/azure/synapse/ml/lightgbm</a>),
but that is for Apache Spark integration and I don’t think it would be reusable outside of Spark & Scala (and we don’t want Scala in Tribuo’s builds it adds too much complexity). Alternatively you could look at Amazon’s DJL which has a LightGBM interface (<a href="https://urldefense.com/v3/__https://github.com/deepjavalibrary/djl/tree/master/engines/ml/lightgbm__;!!ACWV5N9M2RV99hQ!M-sFlud3LUa0nmijMpjvrkUMXliENiAs79iWv12f5F2LFM3p1HFYFuu9H4MW9n3PGqhqmQAapGypEAyx2A4YEoTSfw$">https://github.com/deepjavalibrary/djl/tree/master/engines/ml/lightgbm</a>),
but as that’s focused on deep learning I don’t know if it would give you the necessary control over the incoming feature representation. DJL’s wrapper is less than 1000 lines though, so it probably isn’t too hard to make an equivalent for Tribuo.</p>
</div>
<div>
<p class="x_MsoNormal"> </p>
</div>
<div>
<p class="x_MsoNormal">Basically on the Tribuo side you’d need to write LightGBMExternalModel (e.g.
<a href="https://urldefense.com/v3/__https://github.com/oracle/tribuo/blob/main/Common/XGBoost/src/main/java/org/tribuo/common/xgboost/XGBoostExternalModel.java__;!!ACWV5N9M2RV99hQ!M-sFlud3LUa0nmijMpjvrkUMXliENiAs79iWv12f5F2LFM3p1HFYFuu9H4MW9n3PGqhqmQAapGypEAyx2A5E-lPRvw$">
https://github.com/oracle/tribuo/blob/main/Common/XGBoost/src/main/java/org/tribuo/common/xgboost/XGBoostExternalModel.java</a>), a short protobuf definition for serialization (e.g.
<a href="https://urldefense.com/v3/__https://github.com/oracle/tribuo/blob/main/Common/XGBoost/src/main/resources/protos/tribuo-xgboost.proto*L54__;Iw!!ACWV5N9M2RV99hQ!M-sFlud3LUa0nmijMpjvrkUMXliENiAs79iWv12f5F2LFM3p1HFYFuu9H4MW9n3PGqhqmQAapGypEAyx2A4HhHhqBQ$">
https://github.com/oracle/tribuo/blob/main/Common/XGBoost/src/main/resources/protos/tribuo-xgboost.proto#L54</a>), then some kind of output converter mechanism if you wanted it to work for both classification & regression (<a href="https://urldefense.com/v3/__https://github.com/oracle/tribuo/blob/main/Common/XGBoost/src/main/java/org/tribuo/common/xgboost/XGBoostOutputConverter.java__;!!ACWV5N9M2RV99hQ!M-sFlud3LUa0nmijMpjvrkUMXliENiAs79iWv12f5F2LFM3p1HFYFuu9H4MW9n3PGqhqmQAapGypEAyx2A6CINIi8w$">https://github.com/oracle/tribuo/blob/main/Common/XGBoost/src/main/java/org/tribuo/common/xgboost/XGBoostOutputConverter.java</a>),
or just make it only do regression. Then you’d need to map from Tribuo’s internal representation of a sparse vector/matrix of feature values to the internal representation LightGBM wants (which shouldn’t be too bad as it supports CSR and Tribuo’s representation
is basically that with more objects, e.g. <a href="https://urldefense.com/v3/__https://github.com/oracle/tribuo/blob/main/Common/XGBoost/src/main/java/org/tribuo/common/xgboost/XGBoostTrainer.java*L711__;Iw!!ACWV5N9M2RV99hQ!M-sFlud3LUa0nmijMpjvrkUMXliENiAs79iWv12f5F2LFM3p1HFYFuu9H4MW9n3PGqhqmQAapGypEAyx2A78ycgM2Q$">
https://github.com/oracle/tribuo/blob/main/Common/XGBoost/src/main/java/org/tribuo/common/xgboost/XGBoostTrainer.java#L711</a>), write a mapping from LightGBM’s output to Tribuo’s output (which if it’s an array is easy, otherwise will require some unpicking),
and then the glue to stick that to the SWIG wrapper Microsoft provide. That glue is the worst part as the SWIG isn’t very well documented, but it would be pretty similar to the glue DJL have here - <a href="https://urldefense.com/v3/__https://github.com/deepjavalibrary/djl/blob/master/engines/ml/lightgbm/src/main/java/ai/djl/ml/lightgbm/jni/JniUtils.java__;!!ACWV5N9M2RV99hQ!M-sFlud3LUa0nmijMpjvrkUMXliENiAs79iWv12f5F2LFM3p1HFYFuu9H4MW9n3PGqhqmQAapGypEAyx2A4JJ8hXbw$">https://github.com/deepjavalibrary/djl/blob/master/engines/ml/lightgbm/src/main/java/ai/djl/ml/lightgbm/jni/JniUtils.java</a>.</p>
</div>
<div>
<p class="x_MsoNormal"> </p>
</div>
<div>
<p class="x_MsoNormal">Training support would add more complexity on the Tribuo side, though it would be pretty similar to the rest of our XGBoost wrapper so you could base it off that. The main difficulty would still be in getting the LightGBM wrapper up from
JNI as now you’ve got to pass parameters and configuration across as well, so it’ll be quite a bit more effort.</p>
</div>
<div>
<p class="x_MsoNormal"> </p>
</div>
<div>
<p class="x_MsoNormal">Thanks,</p>
</div>
<div>
<p class="x_MsoNormal"> </p>
</div>
<div>
<p class="x_MsoNormal">Adam</p>
<div>
<div>
<p class="x_MsoNormal">--<br>
Adam Pocock<br>
Principal Member of Technical Staff<br>
Machine Learning Research Group<br>
Oracle Labs, Burlington, MA</p>
</div>
</div>
<div>
<p class="x_MsoNormal" style="margin-bottom:12.0pt"> </p>
<blockquote style="margin-top:5.0pt; margin-bottom:5.0pt">
<div>
<p class="x_MsoNormal">On 16 Jun 2023, at 12:23, Sikka, Sandeep <sandeep.sikka@staples.com> wrote:</p>
</div>
<p class="x_MsoNormal"> </p>
<div>
<div>
<p class="x_MsoNormal">Thanks Adam for the feedback. In your opinion, how difficult would it be for one or more experienced Java Engineers to implement support for wrapping the LightGBM Java API in Tribuo? Assume no prior experience working in the Tribuo codebase.
I am wondering if we take it on for our needs and contribute the developments back to Tribuo, how much time/effort will that take.</p>
</div>
<div>
<p class="x_MsoNormal"> </p>
</div>
<div>
<p class="x_MsoNormal">Sandeep.<span class="x_apple-converted-space"> </span></p>
</div>
<div style="border:none; border-top:solid windowtext 1.0pt; padding:3.0pt 0in 0in 0in; border-color:currentcolor currentcolor">
<div>
<p class="x_MsoNormal"><b><span style="font-size:12.0pt">From:<span class="x_apple-converted-space"> </span></span></b><span style="font-size:12.0pt">Adam Pocock <adam.pocock@oracle.com><br>
<b>Date:<span class="x_apple-converted-space"> </span></b>Friday, June 16, 2023 at 11:25 AM<br>
<b>To:<span class="x_apple-converted-space"> </span></b>"Sikka, Sandeep" <sandeep.sikka@staples.com><br>
<b>Cc:<span class="x_apple-converted-space"> </span></b>"Parthasarathy, Bharadwaj" <bharadwaj.parthasarathy@staples.com>, "tribuo-devel@oss.oracle.com" <tribuo-devel@oss.oracle.com>, "Kumar, Navdeep" <navdeep.kumar@staples.com>, Jack Sullivan <jack.t.sullivan@oracle.com><br>
<b>Subject:<span class="x_apple-converted-space"> </span></b>Re: [EXT]:Re: [External] : Re: Tribuo Non Numerical Features Support</span></p>
</div>
</div>
<div>
<div>
<p class="x_MsoNormal"> </p>
</div>
</div>
<div>
<p class="x_MsoNormal">Hi Sandeep,<span class="x_apple-converted-space"> </span></p>
</div>
<div>
<div>
<p class="x_MsoNormal"> </p>
</div>
</div>
<div>
<div>
<p class="x_MsoNormal">Tribuo doesn’t support inference for LightGBM models, aside from those which have been exported in ONNX format. You’d have to look at how the ONNX format model expects the categorical inputs to be passed in, I’m not sure if ONNX’s TreeEnsembleRegressor
(<a href="https://urldefense.com/v3/__https:/github.com/onnx/onnx/blob/main/docs/Operators-ml.md*aionnxmltreeensembleregressor__;Iw!!ACWV5N9M2RV99hQ!NqMIzH8F7USIbjeG7iyv2eQ2TKagOulH5hFoK7SAtnilTSkPEh4ZzfTlmypE7r5ljs9cF2l_u7aMjD0knPgubeY58A$">https://github.com/onnx/onnx/blob/main/docs/Operators-ml.md#aionnxmltreeensembleregressor</a>)
supports special treatment of categorical inputs so it might not be possible. It looks like there is support for LightGBM in things like JPMML, but JPMML is AGPL licensed and so we can’t use it in Tribuo. We haven’t looked at directly wrapping the LightGBM
Java API, as we already had XGBoost support for tree ensembles.</p>
</div>
</div>
<div>
<div>
<p class="x_MsoNormal"> </p>
</div>
</div>
<div>
<div>
<p class="x_MsoNormal">Thanks,</p>
</div>
</div>
<div>
<div>
<p class="x_MsoNormal"> </p>
</div>
</div>
<div>
<div>
<p class="x_MsoNormal">Adam</p>
</div>
<div>
<div>
<div>
<p class="x_MsoNormal">--<br>
Adam Pocock<br>
Principal Member of Technical Staff<br>
Machine Learning Research Group<br>
Oracle Labs, Burlington, MA</p>
</div>
</div>
</div>
<div>
<div>
<p class="x_MsoNormal" style="margin-bottom:12.0pt"><br>
<br>
</p>
</div>
<blockquote style="margin-top:5.0pt; margin-bottom:5.0pt">
<div>
<div>
<p class="x_MsoNormal">On 16 Jun 2023, at 11:08, Sikka, Sandeep <sandeep.sikka@staples.com> wrote:</p>
</div>
</div>
<div>
<p class="x_MsoNormal"> </p>
</div>
<div>
<div>
<div>
<p class="x_MsoNormal">Hi Adam</p>
</div>
</div>
<div>
<div>
<p class="x_MsoNormal">Thank you for the quick response. We are looking to score (perform inference) on a<span class="x_apple-converted-space"> </span><a href="https://urldefense.com/v3/__https:/lightgbm.readthedocs.io/en/latest/pythonapi/lightgbm.LGBMRegressor.html__;!!ACWV5N9M2RV99hQ!MaIMJSbVSuFi4H4SnXBIfhxdZCWgg2GBNhEefwQDhe4vef4HJeHRLEbwfLxphLQ8iFNWOqXkTwp7VTblc7Jy9qPBBw$">LGBMRegressor</a><span class="x_apple-converted-space"> </span>trained
externally with the categorical features applied using the categorical_features parameter:<span class="x_apple-converted-space"> </span><a href="https://urldefense.com/v3/__https:/lightgbm.readthedocs.io/en/latest/Advanced-Topics.html*categorical-feature-support__;Iw!!ACWV5N9M2RV99hQ!MaIMJSbVSuFi4H4SnXBIfhxdZCWgg2GBNhEefwQDhe4vef4HJeHRLEbwfLxphLQ8iFNWOqXkTwp7VTblc7LbplBZGQ$">https://lightgbm.readthedocs.io/en/latest/Advanced-Topics.html#categorical-feature-support</a>.
This uses a technique different from OHE:<span class="x_apple-converted-space"> </span><a href="https://urldefense.com/v3/__https:/lightgbm.readthedocs.io/en/latest/Features.html*optimal-split-for-categorical-features__;Iw!!ACWV5N9M2RV99hQ!MaIMJSbVSuFi4H4SnXBIfhxdZCWgg2GBNhEefwQDhe4vef4HJeHRLEbwfLxphLQ8iFNWOqXkTwp7VTblc7L6i0QwjQ$">https://lightgbm.readthedocs.io/en/latest/Features.html#optimal-split-for-categorical-features</a>.
Do you have any thoughts on how we can achieve inference in Tribuo (we train the model in Python scikit-learn pipelines)?</p>
</div>
</div>
<div>
<div>
<p class="x_MsoNormal"> </p>
</div>
</div>
<div>
<div>
<p class="x_MsoNormal">Thanks</p>
</div>
</div>
<div>
<div>
<p class="x_MsoNormal">Sandeep.<span class="x_apple-converted-space"> </span></p>
</div>
</div>
<div style="border:none; border-top:solid windowtext 1.0pt; padding:3.0pt 0in 0in 0in; border-color:currentcolor">
<div>
<div>
<p class="x_MsoNormal"><b><span style="font-size:12.0pt">From:<span class="x_apple-converted-space"> </span></span></b><span style="font-size:12.0pt">Adam Pocock <<a href="mailto:adam.pocock@oracle.com">adam.pocock@oracle.com</a>><br>
<b>Date:<span class="x_apple-converted-space"> </span></b>Friday, June 16, 2023 at 9:34 AM<br>
<b>To:<span class="x_apple-converted-space"> </span></b>"Sikka, Sandeep" <<a href="mailto:sandeep.sikka@staples.com">sandeep.sikka@staples.com</a>>, "Parthasarathy, Bharadwaj" <<a href="mailto:bharadwaj.parthasarathy@staples.com">bharadwaj.parthasarathy@staples.com</a>><br>
<b>Cc:<span class="x_apple-converted-space"> </span></b>"<a href="mailto:tribuo-devel@oss.oracle.com">tribuo-devel@oss.oracle.com</a>" <<a href="mailto:tribuo-devel@oss.oracle.com">tribuo-devel@oss.oracle.com</a>>, "Kumar, Navdeep" <<a href="mailto:navdeep.kumar@staples.com">navdeep.kumar@staples.com</a>>,
Jack Sullivan <<a href="mailto:jack.t.sullivan@oracle.com">jack.t.sullivan@oracle.com</a>><br>
<b>Subject:<span class="x_apple-converted-space"> </span></b>[EXT]:Re: [External] : Re: Tribuo Non Numerical Features Support</span></p>
</div>
</div>
</div>
<div>
<div>
<div>
<p class="x_MsoNormal"> </p>
</div>
</div>
</div>
<div>
<div>
<p class="x_MsoNormal">Hi Sandeep, Bharad,<span class="x_apple-converted-space"> </span></p>
</div>
</div>
<div>
<div>
<div>
<p class="x_MsoNormal"> </p>
</div>
</div>
</div>
<div>
<div>
<div>
<p class="x_MsoNormal">The Tribuo mailing list is monitored, but I can’t see the email you sent to the list in the moderator console nor the archives, did you get a bounceback from the mailserver? It may be configured only to accept emails from people who have
joined the mailing list. Anyway, the Tribuo development team have their email addresses listed in the pom file visible on Github.</p>
</div>
</div>
</div>
<div>
<div>
<div>
<p class="x_MsoNormal"> </p>
</div>
</div>
</div>
<div>
<div>
<div>
<p class="x_MsoNormal">As to the question of feature support, in Tribuo all features are converted to numerical values during the data loading step. Categoricals can be encoded as one hot features or directly as doubles by choosing an index, and ordinals can
be encoded as doubles. Very few ML algorithms directly support categorical variables without one hot-encoding or vectorization, I think of the popular ones the only kind that do are trees. Tribuo’s tree package doesn’t currently have specializations for categoricals
though there is enough information in the feature domain to support that if we do add the algorithmic support. In tree algorithms ordinals are naturally supported as the splits don’t care about the distance between any feature values, only their relative ordering
(or at least all the splitting algorithms we have in Tribuo are of that kind). You can see how to perform featurization into doubles in a number of ways in the Columnar Data (<a href="https://urldefense.com/v3/__https:/tribuo.org/learn/4.3/tutorials/columnar-tribuo-v4.html__;!!ACWV5N9M2RV99hQ!MaIMJSbVSuFi4H4SnXBIfhxdZCWgg2GBNhEefwQDhe4vef4HJeHRLEbwfLxphLQ8iFNWOqXkTwp7VTblc7J1rg3Sxg$">https://tribuo.org/learn/4.3/tutorials/columnar-tribuo-v4.html</a>)
and Document Classification (<a href="https://urldefense.com/v3/__https:/tribuo.org/learn/4.3/tutorials/document-classification-tribuo-v4.html__;!!ACWV5N9M2RV99hQ!MaIMJSbVSuFi4H4SnXBIfhxdZCWgg2GBNhEefwQDhe4vef4HJeHRLEbwfLxphLQ8iFNWOqXkTwp7VTblc7LtOLJzVw$">https://tribuo.org/learn/4.3/tutorials/document-classification-tribuo-v4.html</a>)
tutorials.</p>
</div>
</div>
</div>
<div>
<div>
<div>
<p class="x_MsoNormal"> </p>
</div>
</div>
</div>
<div>
<div>
<div>
<p class="x_MsoNormal">Thanks,</p>
</div>
</div>
</div>
<div>
<div>
<div>
<p class="x_MsoNormal"> </p>
</div>
</div>
</div>
<div>
<div>
<div>
<p class="x_MsoNormal">Adam</p>
</div>
</div>
</div>
<div>
<div>
<div>
<div>
<div>
<p class="x_MsoNormal">--<br>
Adam Pocock<br>
Principal Member of Technical Staff<br>
Machine Learning Research Group<br>
Oracle Labs, Burlington, MA</p>
</div>
</div>
</div>
</div>
<div>
<div>
<div>
<p class="x_MsoNormal" style="margin-bottom:12.0pt"><br>
<br>
<br>
</p>
</div>
</div>
<blockquote style="margin-top:5.0pt; margin-bottom:5.0pt">
<div>
<div>
<div>
<p class="x_MsoNormal">On 16 Jun 2023, at 00:13, Sikka, Sandeep <<a href="mailto:sandeep.sikka@staples.com">sandeep.sikka@staples.com</a>> wrote:</p>
</div>
</div>
</div>
<div>
<div>
<p class="x_MsoNormal"> </p>
</div>
</div>
<div>
<div>
<div>
<div>
<p class="x_MsoNormal">Adding a few individual emails on the project from Github. The devel email list doesn’t appear to have any activity.</p>
</div>
</div>
</div>
<div>
<div>
<div>
<p class="x_MsoNormal"> </p>
</div>
</div>
</div>
<div>
<div>
<div>
<p class="x_MsoNormal">Thanks</p>
</div>
</div>
</div>
<div>
<div>
<div>
<div>
<p class="x_MsoNormal"><b><span style="font-size:14.0pt; font-family:"Arial",sans-serif; color:#CB000A">Sandeep Sikka</span></b></p>
</div>
</div>
</div>
<div>
<div>
<div>
<p class="x_MsoNormal"><b><span style="font-size:10.5pt; font-family:"Arial",sans-serif; color:#646464">Distinguished Engineer</span></b></p>
</div>
</div>
</div>
<div>
<div>
<div>
<p class="x_MsoNormal"><span style="font-size:10.5pt; font-family:"Arial",sans-serif; color:#646464">500 Staples Dr., Framingham MA</span></p>
</div>
</div>
</div>
<div>
<div>
<div>
<p class="x_MsoNormal"><span style="font-size:10.5pt; font-family:"Arial",sans-serif; color:#646464">508.253.7977</span></p>
</div>
</div>
</div>
<div>
<div>
<div>
<p class="x_MsoNormal"><span style="font-size:10.5pt; font-family:"Arial",sans-serif"><a href="mailto:sandeep.sikka@staples.com">sandeep.sikka@staples.com</a></span></p>
</div>
</div>
</div>
<div>
<div>
<div>
<p class="x_MsoNormal" style="line-height:18.0pt"><span style="font-size:3.0pt; font-family:"Arial",sans-serif; color:#646464"> </span></p>
</div>
</div>
</div>
<div>
<div>
<div>
<p class="x_MsoNormal"><sub><span style="font-family:"Arial",sans-serif; color:#646464"><image001.png></span></sub></p>
</div>
</div>
</div>
<div>
<div>
<div>
<p class="x_MsoNormal"><span style="font-size:3.0pt; font-family:"Arial",sans-serif; color:#646464"> </span></p>
</div>
</div>
</div>
<div>
<div>
<div>
<p class="x_MsoNormal"><a name="x__Hlk3999959"><span style="font-family:"Arial",sans-serif; color:#646464"><image002.png><image003.png> <image004.png> <image005.png></span></a></p>
</div>
</div>
</div>
</div>
<div>
<div>
<div>
<p class="x_MsoNormal"> </p>
</div>
</div>
</div>
<div>
<div>
<div>
<p class="x_MsoNormal"> </p>
</div>
</div>
</div>
<div>
<div>
<div>
<p class="x_MsoNormal"> </p>
</div>
</div>
</div>
<div style="border:none; border-top:solid windowtext 1.0pt; padding:3.0pt 0in 0in 0in; border-color:currentcolor">
<div>
<div>
<div>
<p class="x_MsoNormal"><b><span style="font-size:12.0pt">From:<span class="x_apple-converted-space"> </span></span></b><span style="font-size:12.0pt">"Parthasarathy, Bharadwaj" <<a href="mailto:bharadwaj.parthasarathy@staples.com">bharadwaj.parthasarathy@staples.com</a>><br>
<b>Date:<span class="x_apple-converted-space"> </span></b>Thursday, June 15, 2023 at 2:46 PM<br>
<b>To:<span class="x_apple-converted-space"> </span></b>"<a href="mailto:tribuo-devel@oss.oracle.com">tribuo-devel@oss.oracle.com</a>" <<a href="mailto:tribuo-devel@oss.oracle.com">tribuo-devel@oss.oracle.com</a>><br>
<b>Cc:<span class="x_apple-converted-space"> </span></b>"Sikka, Sandeep" <<a href="mailto:Sandeep.Sikka@Staples.com">Sandeep.Sikka@Staples.com</a>>, "Kumar, Navdeep" <<a href="mailto:navdeep.kumar@staples.com">navdeep.kumar@staples.com</a>><br>
<b>Subject:<span class="x_apple-converted-space"> </span></b>Tribuo Non Numerical Features Support</span></p>
</div>
</div>
</div>
</div>
<div>
<div>
<div>
<div>
<p class="x_MsoNormal"> </p>
</div>
</div>
</div>
</div>
<div>
<div>
<div>
<p class="x_MsoNormal"><span style="font-size:9.0pt; font-family:Helvetica">Hi,</span></p>
</div>
</div>
</div>
<div>
<div>
<div>
<p class="x_MsoNormal"><span style="font-size:9.0pt; font-family:Helvetica"> </span></p>
</div>
</div>
</div>
<div>
<div>
<div>
<p class="x_MsoNormal"><span style="font-size:9.0pt; font-family:Helvetica"> </span></p>
</div>
</div>
</div>
<div>
<div>
<div>
<p class="x_MsoNormal"><span style="font-size:9.0pt; font-family:Helvetica">This is Bharadwaj from one of Staples' Engineering teams. We are exploring Tribuo java framework for one of our use cases. We are trying to use Tribuo for runtime inferencing with a
regressor model.</span></p>
</div>
</div>
</div>
<div>
<div>
<div>
<p class="x_MsoNormal"><span style="font-size:9.0pt; font-family:Helvetica"> </span></p>
</div>
</div>
</div>
<div>
<div>
<div>
<p class="x_MsoNormal"><span style="font-size:9.0pt; font-family:Helvetica">We have non numerical features that we are trying to integrate with Tribuo. However, all the classes or samples for regressor points us to passing feature values as only double. Feature
names as String and Feature Values as Double is what we see.</span></p>
</div>
</div>
</div>
<div>
<div>
<div>
<p class="x_MsoNormal"><span style="font-size:9.0pt; font-family:Helvetica"> </span></p>
</div>
</div>
</div>
<div>
<div>
<div>
<p class="x_MsoNormal"><span style="font-size:9.0pt; font-family:Helvetica">Does Tribuo support non numerical features<span class="x_apple-converted-space"> </span><u>(categorical and ordinal features)</u><span class="x_apple-converted-space"> </span>such as
String, Boolean etc or is it strictly only double values? How can we train and infer supervised models that consume categorical and ordinal features?</span></p>
</div>
</div>
</div>
<div>
<div>
<div>
<p class="x_MsoNormal"><span style="font-size:9.0pt; font-family:Helvetica"> </span></p>
</div>
</div>
</div>
<div>
<div>
<div>
<p class="x_MsoNormal"><span style="font-size:9.0pt; font-family:Helvetica">I am not sure if we are missing something here, it would be helpful if you can provide your inputs.</span></p>
</div>
</div>
</div>
<div>
<div>
<div>
<p class="x_MsoNormal"> </p>
</div>
</div>
</div>
<div>
<div>
<div>
<p class="x_MsoNormal"> </p>
</div>
</div>
</div>
<div>
<div>
<div>
<p class="x_MsoNormal">Regards,</p>
</div>
</div>
</div>
<div>
<div>
<div>
<p class="x_MsoNormal">Bharad</p>
</div>
</div>
</div>
</div>
</blockquote>
</div>
</div>
</div>
</blockquote>
</div>
</div>
</div>
</blockquote>
</div>
<p class="x_MsoNormal"> </p>
</div>
</div>
</div>
</body>
</html>