E-commerce packages are notorious for their inefficient usage of space. More than one-quarter volume of a typical e-commerce package comprises air and filler material. The inefficient usage of space significantly reduces the transportation and distribution capacity increasing the operational costs. Therefore, designing an optimal set of packaging box sizes is imperative for improving efficiency. Though prior approaches for determining the optimal box sizes exist, they cannot be applied due to the wide range of SKUs hosted by the e-commerce warehouses. Besides, designing a few tens of boxes for covering hundreds of thousands of SKUs that span a wide range of sizes is impractical with the integer programming formulations used by the conventional approaches. This article proposes a scalable three-stage optimization framework that combines unsupervised learning, reinforcement learning, and tree search to design optimal box sizes. More specifically, the package optimization problem is formulated into a sequential decision-making task called the box-sizing game. A neural network agent is then designed to play the game and learn control policies to solve the problem. In addition, a tree-search operator is developed to improve the performance of the learned policies. The proposed framework is evaluated on real-world and synthetic datasets against standard metaheuristics and industry benchmarks. Results indicate the robustness and superiority of the approach in generating industry-strength solutions. Specifically, the packaging box assortments generated by the framework are 5% to 7.5% better than the industry baselines.