
How do you develop a new machine learning method and successfully implement it in Europe’s largest shopping platform? Mariia Bulycheva, a Senior Applied Scientist at Zalando, shares how Graph Neural Networks (GNNs) helped structure user and content data, enhancing personalized recommendations and increasing user engagement. This approach significantly improved click prediction and enabled more inspiring content that goes beyond conventional shopping preferences.
Mariia, tell us about your career. Did you have prior experience implementing Graph Neural Networks (GNNs) in business?
My career began in finance after graduating from the Faculty of Mechanics and Mathematics at Moscow State University. I worked as an analyst at JP Morgan and Morgan Stanley but later decided to switch fields. My first technical experience was at a startup developing a robotic arm for sorting objects. I worked on computer vision, training neural networks to recognize objects, determine the optimal ways to grasp them, and process data from cameras. Despite the startup’s limited resources, I gained invaluable hands-on experience covering everything from data collection to integrating models into software.
After that, I joined Zalando, where I experienced a different scale of work: structured processes, well-organized pipelines, and the opportunity to focus on model development and research. At Zalando, I initially worked on demand and sales forecasting projects before transitioning to the recommendation team. That’s where the idea of using Graph Neural Networks first emerged, and I implemented them for the first time.
How did the idea of using this model for click prediction on Zalando’s homepage come about? What problems were you trying to solve?
The initiative to implement GNNs was mine. I led a team consisting of one data engineer and two machine learning specialists, and together we brought the idea to life.
Traditionally, click prediction relies on tabular data: user information (demographics, time, and location of login), content details (e.g. a video featuring shoes), and a label indicating whether the user clicked or not. The model learns the interaction between the user and the content, but this approach has limitations. A graph model allows for a different way of structuring the data. We represent users as graph nodes, with interactions—such as clicks or views—acting as edges. Content also becomes a node, connecting different users. This creates a three-dimensional structure that reveals relationships invisible in traditional tabular data. For example, if two users watch the same video, the graph clearly illustrates their connection.
Moreover, a graph structure allows us to add extra information. If a user likes a brand, this automatically links all associated articles, creating an information flow through the nodes. In tabular data, identifying such relationships is much harder. Graph models facilitate learning by immediately providing explicit connections between users and content, accelerating pattern recognition.
Tabular data primarily focused on predicting behavior based on past purchases or views. However, this approach was limited as it did not account for content diversity and long-term user interests. GNNs allow for flexible interaction modeling, assigning different weights to various content types or priorities. For example, we can amplify the influence of video content if we want users to engage with it more frequently.
Additionally, graphs help manage recommendation diversity, showing users not only familiar items but also broadening their horizons.
All of this is crucial because Zalando is shifting its strategy to become not just an e-commerce platform but also an inspiration hub where users discover lifestyle and fashion content. This, in turn, increases their time spent on the platform.
How were the business results of implementing GNNs evaluated? What financial and strategic benefits did the model bring to the company?
Fully integrating GNNs requires significant infrastructure changes, and this process is still ongoing. Currently, GNNs are used to generate embeddings—numerical representations of users and content. These embeddings are integrated into the existing recommendation model, which has improved click prediction and made content more relevant to users.
During development, GNNs showed an increase of 0.6 percentage points in the ROC-AUC metric, which measures the model’s ability to distinguish between content a user will click on and content they will not. While 0.6 percentage points may seem small, in large-scale recommendation systems, every fraction of a percent improves personalization for millions of users. This improvement means the model is more sensitive to subtle user behavior patterns, directly enhancing engagement metrics like click-through rate and user retention.
Beyond accuracy improvements, GNNs enable strategic metric control. For example, the model helps regulate video content exposure, increase recommendation diversity, and make content more engaging. We can assign different weights to relationships—for instance, strengthening interactions with video content or emphasizing elements that expand users’ perspectives. This flexibility is particularly important for Zalando’s new strategy, which heavily invests in content creation and models that guide users beyond their typical preferences.
GNNs also demonstrated significant advantages in handling cold-start users—those without historical interaction data. These users often pose challenges for classic recommendation models, which struggle to predict their preferences. However, by leveraging relationships between users, products, and content, GNNs reduced the accuracy gap for cold-start users by an average of 2 percentage points, improving early-stage personalization. This is crucial for onboarding and retaining new customers, ultimately increasing Zalando’s daily active users.
Additionally, using GNNs in Zalando’s recommendation systems significantly reduced manual feature engineering efforts. Since GNNs automatically extract complex dependencies from data, the need for manually designing and testing numerous features decreased considerably. As a result, feature development cycles accelerated by 40%, allowing the team to focus on other model improvements.
How challenging was it to adapt and deploy models across different platforms, such as the web and mobile apps?
Several key challenges arose. First, data preparation: standard user logs, typically in tabular formats (JSON or parquet), had to be transformed into a graph structure. This required creating a dedicated pipeline to convert data into a graph format, a process that took around a month to develop and refine.
Second, updating data in a graph is more complex than in a tabular model. In traditional systems, new data can simply be appended, whereas graphs require recalculating relationships, and adding new nodes and edges. For example, when a new user or interaction appears, we must efficiently determine which parts of the graph need updating. This process is quick but demands a different engineering approach compared to appending records in a table.
Third, model training presents challenges. In GNNs, batching (dividing data into blocks for training) works differently from tabular models. To prevent the loss of node relationships, additional computational resources and sophisticated logic are required.
However, despite all these challenges, the computational and predictive capabilities of Graph Neural Networks (GNNs) are so powerful that they enable training the “global Zalando graph”, which includes the entire history of customer interactions on the homepage and potentially other pages (e.g. the catalog). This graph not only learns individual user behavior patterns but also identifies global historical trends, capturing complex dependencies between users, content, and time. A classical deep learning model would require significantly more computational resources and time to achieve a similar level of analysis. GNNs, thanks to their architecture and parallel computation capabilities, can train on the same dataset 7-10 times faster while effectively distinguishing older interactions from new ones. Once the global graph is trained, it can be updated quickly with new data, ensuring the model remains relevant without the need for retraining from scratch.
In classical recommendation setups, there is, of course, incremental training with new data. However, full retraining of the model from scratch still occurs regularly, which increases computational costs several times over. This ability to combine global historical learning with incremental updates makes GNNs a powerful and scalable tool for large-scale dynamic recommendation systems.
As for our current system, it updates daily, which suits the current implementation where embeddings are used as features in another model. Transitioning the entire system to Graph Neural Networks will require further process optimization to ensure fast and efficient data processing across all platforms. However, the ultimate savings in computational costs fully justify these efforts, and the transition process is underway.
What additional skills or knowledge did you need to acquire to work successfully on this project?
First, I completed Stanford’s “Graph Neural Networks” course, which was extremely helpful. After that, I explored the main libraries—PyTorch Geometric and Deep Graph Library (DGL)—to determine which one was more convenient to work with. Each had its own advantages. DGL, for example, has a more low-level implementation and is great for a deeper understanding of the internal workings of Graph Neural Networks. However, as I progressed, I found that PyTorch Geometric offers a more user-friendly interface, making integration and subsequent work easier for other teams.
Another crucial skill was learning how to properly partition the graph for training, validation, and final model testing. This is essential to prevent information leakage—ensuring that the model does not encounter test data during training. If the graph is split incorrectly, information from the test set may leak into the training set, leading to overly optimistic results. Careful partitioning is required to maintain a balance between training and testing information.
What are the next steps in developing the model? Do you see potential applications beyond the homepage?
To advance the model, we plan to fully transition to graph-based training, which will allow us to better manage metrics such as novelty and diversity.
Beyond the homepage, the model has potential applications in other areas, such as the product catalog. Currently, a separate team is working on developing a model for this section, but our goal is to integrate everything into a unified global Zalando graph. In such a structure, we can accumulate all user and site interaction data, creating a powerful system capable of processing and predicting behavior based on the entire historical dataset. This would enable more accurate content relevance assessments and automatic data updates, such as marking outdated elements or excluding them from the model.
Finally, Mariia, how do you see the future of personalized advertising technologies in the coming years? What new approaches and tools could enhance its effectiveness?
Personalized advertising technologies will evolve significantly in the coming years thanks to generative AI, which will enable the creation of unique content tailored to each user. Instead of selecting from a predefined pool of recommendations, we will generate personalized videos and content that align with an individual user’s preferences. For example, Jack Wolfskin boots might be associated with mountains for one user, with an urban setting for another, and with family and children for someone else. Generative AI unlocks opportunities for deeper personalization, creating content that truly resonates with users.
In my team, for instance, management prioritizes engagement time with content rather than direct monetization. The assumption is that increasing user engagement ultimately enhances the likelihood of a purchase. The overarching idea is that how a product is presented directly influences purchase decisions. Generative AI not only enables the creation of visually appealing content but also ensures it is inspiring enough to spark a user’s desire to buy.