Web-based User Profiling
Using Artificial Neural Networks
Ryan MacDonald, BCSH.
Acadia University
Daniel L. Silver, PhD.
Acadia University
Abstract. The Internets worldwide growth and acceptance has resulted
in a massive E-commerce movement. E-markets are growing so rapidly that companies
must now strive to not only have a presence on the Internet, but to create a
presence that far exceeds all of its competitors. User profiling is one approach
companies have traditionally taken to better understand their customers so that
they may adjust their business model accordingly. By coupling this method with
technologies such as JavaScript, Applets and Artificial Neural Networks (ANNs),
a powerful profiling system for the Internet is introduced. This system can
help websites adapt their content and layout based on prior interaction from
its users. Sufficient testing was performed to examine the validity of such
a system and it proved to provide significant enhancements and opportunity to
a website.
1. Introduction
Methods
and systems that help users navigate the web and filter information are few
and far between. When searching through the Internet users often feel overwhelmed
by the amount of data being returned to them. This is particularly important
for E-commerce sites since they could risk the chance that customers will leave
their site with a bad impression and not return. We have attempted to combine
user profiling and adaptive web interfacing to help companies satisfy their
users.
User
profiling has existed for quite sometime in areas such as television, radio,
and advertising. However, user profiling on the Internet can go beyond the reaches
of its predecessors due to the capabilities a website has of easily attaining
information about its users. A website can ask you to register and request information
such as your age, sex, likes and dislikes. A website can also keep track of
your purchases, which sections of the site you visit, how long you view a page,
and other information. Once collected, this information can allow companies
to adapt the information content presented to each of their customers such that
it meets the customer's interests. These adjustments permit the company to better
satisfy its users and can increase the probability of purchases.
User
profiling/modeling can be done in 3 ways; (1) using stereotypes, (2) using
surveys/questionnaires, or (3) using a "learned model" [Langley, 2000].
The first two are rather straightforward approaches using information we already
know, or information we collect, to then build appropriate profiles. The third
one however, using "learned models", is particularly interesting.
It entails creating a system that has no knowledge of its users to begin with,
but over time, as users interact with the system, it learns from their trends
and behaviours and create profiles based on the experience it gains. It is also
possible to create these profiles individualistically, on a per-user basis,
or collaboratively, collecting all users' data together to form a general profile.
We
sought to build a collaborative system that could "learn" a general
user profile and test it's usefulness on a typical E-commerce website. A summary
of all our work and research can be found in [MacDonald, 2001].
2. Background
Our
system was created with the use of a prototypical E-commerce portal site and
artificial neural network (ANN) software. This section describes the class of
ANN we used and the E-commerce website that was developed.
2.1 Artificial Neural Networks
Artificial Neural Networks (ANNs) are programs designed to simulate the way
scientists believe our biological nervous system functions. Similar to a human's
brain where neurons are connected together and communicate through interconnecting
synapses, ANNs are composed of numerous processing elements, or nodes, that
are tied together with weighted connections. The earliest discoveries in Neural
Computing go back to the 1940's and 1950's; however it was a renewed interest
during the 1980's that brought them to the forefront in a number of different
research areas, such as machine learning and applied data mining.
ANNs
are designed in a highly connected layer structure, as demonstrated in Figure
1 below:
Figure 1 - Structure of an Artificial Neural Network
Here,
X1 through X5 represent the input layer while Z1 and Z2 represent the output
layer. As an example, if we were trying to predict weather conditions, the inputs
could be the day of the week, the season, the temperature, and so on, while
the predicted outputs may be whether or not it is going to be sunny and the
wind speed for that day. The middle layer, often called the "hidden nodes",
provide internal representation for the development of ANN models. For relatively
complex problems it is often necessary to compensate by adding a larger number
of hidden nodes.
How
does an ANN learn? All the lines shown above in Figure 4 are given weight values
that initially are set to small random values. As training examples (previously
observed input and output values) are given to the network, the network "learns"
by adjusting these weight values to best represent the relationship between
the input and output variables. Assuming that there is a relationship between
the input and output, if enough training examples are given to the neural network
then it will usually have no problem generalizing itself to future examples.
There
are various network architectures available to choose from when building a neural
network. The basic one is a back-propagation network where the nodes are structured
such as in Figure 1 shown before. The network used in this thesis however is
a recurrent network, an example of which is shown in Figure 2.
Figure 2 - A Recurrent Network
As
you can see when comparing this design to the back-propagation one, in the recurrent
network there is an extra layer of nodes (Slab 4) that acts as input. Each slab
consists of several nodes, as was shown in Figure 4, so Slab 1 here represents
the input layer (X1-X5 in Figure 4), Slab 2 the middle layer and Slab 3 the
output layer (Z1-Z2). The extra layer is different from the input (Slab 1) layer
however because it gets affected by what the middle layer outputs. As training
examples are given to the network, the extra layer is being modified and adjusted
in accordance with previous examples. Recurrent networks are excellent at learning
sequences and are often used for applications such as sales prediction and stock
analysis. We will demonstrate that a recurrent network's ability to learn sequences
is what we need for our user profiling system.
The
ability to learn in a way that is similar to the human brain makes ANNs a very
powerful tool when used properly. Currently, they are being used in many fields
such as Data Mining, the Stock Market, Weather prediction and User Profiling.
2.2 Navigate.ca
Navigate.ca
is the E-Commerce website we developed to house the profiling system. Essentially
it is a shopping website that has numerous links to sites where products can
be purchased. These links are grouped into categories such as jewelry, clothing,
office supplies, computer hardware, etc. In total there are 62 categories of
links organized into a hierarchy system for easier browsing.
Figure 3 - The entry page to Navigate.ca
Figure
3 shows the initial start up page for Navigate.ca. Users can find a particular
product by working their way through the folder system on the left side of the
page. When the final product category is found a group of links to websites
that offer that product are provided to the user (see Figure 4).
Figure 4 - Shown are the categories and subcategories leading
to women's, everyday clothing. The main portion of the window displays links
to various websites that sell the relevant products.
Essentially
Navigate.ca is nothing more than a portal to other websites. The website offers
links to a large variety of products but does not concern itself with the end
transaction. From a commercial and business standpoint the website generates
money by collecting commission and click-thru fees from the companies that appear
on the site.
Navigate.ca,
before the addition of the profiling system, was implemented with the use of
HTML and JavaScript. The prototype site can be viewed at www.navigate.ca, with
the password to login set as "password".
3. A more "intelligent" Navigate.ca
Goals. Our objective is to create a system that allows users to find
what they are looking for with greater ease on Navigate.ca through the use of
collaborative profiling. The system should be able to learn a profile on its
own by keeping track of how past users interacted with the site and then use
the generated profile to assist future users in navigating the site. Ideally
the entire solution should be as tightly coupled as possible and have a rapid
response time.
Method. The primary difficulty that arises when searching through a
portal site such as Navigate.ca is locating the specific category you would
like to because there are so many categories located in a multitude of folders.
Therefore, the final solution that was chosen was to track what categories users
visit, in sequence. Those sequences are used as data to help predict where future
users are most likely to want to go. So we will use user click-streams as input
to then predict what category future users will most likely want to visit next
based on the behaviour of past users who followed a similar path through the
site.
3.1 Implementation
The
implementation of our solution is broken up into three steps. The first step
is to collect the click-streams generated by users and to then transform that
data so that it can be used as training examples for an ANN modeling system.
Then, the model must be created. For this we will be using a program called
NeuroShell 2 by Ward Systems Group Inc [WSG, 2001]. Finally, once the model
has been learned we must integrate it into the website so that it can be used
to make predictions for new users. Each of these three steps should be tightly
coupled so as to make the system easy to update with new models periodically
as more training examples are collected.
Figure 5 - The 3 steps of implementation
3.1.1 Data Collection and Preparation
The
first step in building neural network models is collecting a set of training
examples. For our particular scenario these training examples will be users'
click-streams through the site. First each category was given a numeric identifier
from 0-61 (e.g. baby stuff=0, books=1,
). In order to track where a user
goes we've created a cookie that will keep track of every category a user visits.
All categories are similar in that each has a webpage that consists of all the
links that lead to websites that sell products within that category. When accessed,
each of these pages writes its particular identifier to the cookie. After a
visit to Navigate.ca, one particular user's click stream could look something
like:
12
0 18 28 30 2 6 48 54 14 1 23
etc.
From
the stream we must create training examples that can be used to train our ANN.
As stated in the "Background" section, recurrent networks are excellent
at learning sequences. Since essentially the click-streams that we are collecting
can be thought of as sequences of paths through the site, this network architecture
was an excellent match for our system. Under a recurrent ANN the training examples
would simply be one input and one output:
(12
0), (0 18), (18 28), (28 30),
Before
we can pass these examples to the neural network, more data preparation has
to be done. It is important to note that since our categories are nominal values
they cannot simply be placed on a scale from 0-61 if our ANN is going to properly
learn them. These values must therefore be transformed from numeric values ranging
from 0-61, to individual discrete variables that can be more easily learned
by our network. Rather than use the categories actual number, we represent it
by a series of 61 '0's, with only a single '1' appearing in the nth place, where
n-1 is the category number (keep in mind we are starting from 0). So, for instance,
the category number 5 would be represented as:
0 0
0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
0 0 0 0
Therefore,
the click-stream is first gathered into its pairings of 2 category numbers (input
and output) and those numbers are then changed to the series of '0/1' representation.
The resulting binary representation is used to train the neural network.
3.1.2 Training The Artificial Neural Network
Training
the ANN will be done with an off-the-shelf product called Neuroshell 2. This
software allows users to specify the input and outputs, and the network architecture
for the model. The NeuroShell 2 package provided us with all the flexibility
we needed to construct a model for our specific requirements in an easy to use
interface.
With
our training examples created as mentioned in the previous section, we then
extract a training set, test set and production set. The training set is what
will be used to actually train the neural network, while the test set allows
us to verify that the model being generated will perform well not only on the
set it is being trained with, but for all values from our data. The production
set is used at the end as a final test of how accurately the model actually
represents all the data that we have.
Next
we specify the architecture we would like to use. As mentioned earlier we have
decided to use a recurrent network for our scenario. The NeuroShell 2 program
sets the number of hidden nodes to what it feels is the most appropriate based
on its knowledge of our network to date (number of inputs/outputs and architecture).
Initially we experimented with other numbers of hidden nodes but in general
the default value (of 84) provided by NeuroShell 2 worked well. Thus our final
network architecture consisted of 62 input and output nodes, and 84 nodes for
our two middle layers.
Training
continues until the lowest average level of error is reached. The weights that
produced the lowest average error on the test set are kept. The architecture
and weights make up the new "learned" model. If we so choose we can
then test this model against the production set that we created earlier to ensure
that it in fact does perform well on our data.
Next
we need to generate a representation of the model that can be used by Navigate.ca
software. NeuroShell 2 provides a feature that will allow us to do this by generating
software that consists of mathematical functions representative of our "learned"
model. The code accepts an input array with a length of 62 and then outputs
an array also with a length of 62. Keep in mind that for a website that generates
a lot of traffic this process of creating the model would only be done periodically,
perhaps once a day, week or month. Basically we would re-train the network from
time to time using our ever increasing amount of training examples in order
for it to remain accurate in depicting how the website is currently being used.
3.1.3 System Integration
The
profile model must be embedded within our website so that we can use it when
future visitors come to the site.
For
future visitors, when a category's link page is viewed we need to pass that
category's identifying number to our "learned" model and then display
to the user a link to the category the model feels the visitor is most likely
to want to see next. To perform this task we use a Java applet that displays
on the right side of all our category pages. The applet's job is to: (1) identify
which category is currently being viewed; (2) convert that category into suitable
input (0 0 0 0 1 0 0
0 0, etc.); (3) pass that input to a Java version
of the model code produced by NeuroShell; and (4) based on the output of our
model, display the top 3 category links our model thinks the user will be interested
in seeing (note: the top 3 will be displayed in order to increase the chances
that one of them is useful to the user).
The
first 2 parts are easily done. When the applet is called on each category page,
we simply pass along with it a parameter that identifies which page is currently
being viewed. The category number is converted to an array of '0's with a single
'1' in the right position to create the desired format for our input to the
predictive model. After several calculations an output array (of size 62) is
created.
Figure 6 - What happens when a user visits the
new, "intelligent" Navigate.ca
Finally
the applet code must determine the highest 3 outputs. The associated category
links for the "top 3" are the ones we want the user to be provided
with in hopes that they will spark interest or help users find their way through
the site. When a user now checks out a category, the applet will be displayed
on the right showing links to the top 3 categories that our network model has
deemed the user is likely to want to see next.
Figure 7 - The applet suggests other categories for the user to go to
We
now have a system that can track where users go, build a model or profile based
on that information and use that model to suggest categories to new users who
come to the site.
4. Testing
An
important step in creating any new software system is proper testing. While
for many computer-based ideas this often means ensuring there are no errors
within the code and that everything runs properly, it is also important to test
a system for validity and usefulness. Navigate.ca's new profiling system was
tested in three different ways.
4.1 Mathematical testing of the neural network's design
A test
was performed to ensure that the recurrent network that was used to perform
user profiling could in fact develop models for complex but deterministic functions.
Because the network was so large (62 input and output nodes), there was concern
that it might be impossible to create a model that accurately portrays the data.
To test this we created a series of mathematical functions to produce sample
data. We started with a simple function that created a straightforward click
stream of 0, 1, 2, 3,
, 60, 61 and verified that the model would pick
up on the sequence. We then progressed to more complex functions. All functions
are "mod 62" to remain similar to our application which consists of
category identifiers labeled 0-61. We wanted to reflect the same scenario with
our functions.
We
rated the network's performance by taking the average R2 value (coefficient
of determination) for each of our outputs. The R2 values represent how well
the network has been able to learn the data and draw associations between input
and output. An R2 value close to 1 means that a strong relationship has been
identified between input and output, while a value closer to 0 means that the
model has not been able to generalize to the data very well.
First function: x = (x + 1) mod 62
Sample: 0, 1, 2, 3, 4, 5,
Second function: x = ((x + 3) * 7) mod 62
Sample: 43, 13, 51, 7, 9, 23, 59, 1, 29, 39, 47, 41, 61, 15, 3, 43, 13, 51,
7, 9, 23,
Third function: y = 3 + y
x = ((x + 1) * y) mod 62
Sample: 17, 3, 41, 51, 27, 37, 31, 57, 13, 1, 7, 49, 17, 31, 47, 59, 21, 33,
51, 11, 25, ...
|
First Function
|
Second Function
|
Third Function
|
Average R2
|
1.00
|
1.00
|
0.773
|
Table 1 - Results from mathematical function tests
4.2 User Accuracy and Effectiveness
The
most revealing and important test was to simulate how the system would actually
be used and compare this implementation to another program that simply provided
the user with random links to other categories. The test was done in 2 stages.
First, 10 participants were asked to interact with the system and were provided
with 4 separate shopping scenarios. The scenarios ranged from shopping for an
upcoming trip to preparing for a wedding. For this part of the test the applet
was disabled so that no categories were being recommended to the user on the
right side. As the participants shopped the system kept track of how these 10
participants navigated through the site and once they were all done a model
was built based on the data collected. The next stage involved the users coming
back to shop again, but this time the applet offered links on the right side
of the site. Each participant was given a separate set of 4 scenarios (different
than the 4 they were originally given) and as they progressed through each one
they were asked to rate the category links that were being provided to them
on the right side of the site. They rated each link as being either useless
(0), somewhat useful (1), or very useful (2). The users were unaware that 2
of the scenarios provided links using the model that had been created, while
the 2 others provided a random set of 3 links.
|
0 - Useless
|
1 - Somewhat
|
Useful 2 - Very Useful
|
Random
|
69.9%
|
24.4%
|
5.8%
|
User Model
|
42.1%
|
32.8%
|
25.1%
|
Table 2 - Detailed breakdown of the users' ratings for
the category
links provided by the two different systems.
|
At least 1 "Very Useful" in the top 3 shown |
At least 1 "Somewhat Useful" or better, in top 3 |
Random |
15.4% |
63.5% |
User Model |
49.2% |
93.4% |
Table 3 - Number of times there was 1 or more "Very Useful"
or 1 or more
"Somewhat Useful" or better link in the top 3
4.3 Usefulness
While
the prior test verified whether or not the system could provide users with helpful
links to categories, this test simply wanted to check whether or not it was
worth the trouble. In other words, if users were not told explicitly to look
at or use the links, would they use them? Using the same profiling model that
had been created earlier, 5 new test subjects were given 4 of the scenarios
that were mentioned earlier. We observed the number of times the users used
the links the profiling system recommended to them.
|
Tester #1
|
Tester #2
|
Tester #3
|
Tester #4
|
Tester #5
|
Average
|
Clicks
|
3
|
0
|
8
|
5
|
0
|
3.2
|
Table 4 - Number of times each test subject used one of the
provided links
5. Discussion of Results
The
mathematical tests ensured that the recurrent network was capable of producing
sufficiently accurate models for deterministic functions. For the functions
that were relatively simple the neural network performed extremely well. The
third function was the most revealing. The numbers that it generated resembled
closely what we expected our user data to be like, in that there was no clear
repetition or sequence despite the existence of various patterns. Even for this
more difficult function the ANN performed adequately with an average R2 = 0.773.
Examining
the user testing results would seem to indicate that the User Profiling system
performs relatively well. We were surprised by the difference between the random
system and the model driven one. Perhaps the most telling result is that 49%
of the set of 3 links generated by the model contained at least one "very
useful" link in them, as compared to only 15% of the links generated by
the random system. It is also worth noting that less than 7% of the time the
users were given 3 model links that they felt were all useless, as compared
to about 36% of the time for the random links.
While
these results appear favourable, there were various factors that may have caused
the system to perform better in the tests than it should have. For instance,
the shopping scenarios created for testing did not generally cover all the categories.
This put the random system (that chose from all 62 links) at a potential disadvantage.
Also, the "books" category was rather popular when the test subjects
went through most of the shopping scenarios. As a result, the model that was
generated output "books" as one of the top 3 links almost every single
time. Since "books" is a category that can be useful in so many different
situations, this trend resulted in a good deal of the "very useful"
and "somewhat useful" ratings the model driven system received. Nonetheless,
the profiling system performed considerably better than the random system for
every single user we tested it on.
The
second user test determined if people would actually use the links if they were
not told anything about them. On average each user used one of the model-generated
links 3.2 times per sitting (four shopping scenarios). However, two of the testers
did not use any of the links at all, which lowered this average but also raised
the issue of why did they not use them. Perhaps it is due to the fact that the
suggested links do not stand out enough if the user is concentrating on navigating
through the site using the folder system located on the left side. The testers
that did take notice of the links, however, made ample use of them being there
and seemed to benefit by their presence.
6. Summary and Future Work
As
the Internet continues to grow and become more commercialised it will become
increasingly important to improve its manageability. Already there are various
tools and techniques being used to do this and we have examined a simple example
of one of the more recent developments, user profiling. This paper presents
the results of a simple approach to profiling users. The results suggest that
it was indeed capable of predicting a satisfactory amount of "intelligent"
links to its users. The profiling system enhanced the website and established
the groundwork for future improvements and ideas. Increasingly sophisticated
profiling systems, data mining tools and other personalisation methods will
alter the way we interact with the Internet in the years to come. Hopefully
these advancements will result in easier and friendlier to use systems that
provide the right information, at the right time, to the right person.
The
profiling system provides a basis on which to expand. Its capabilities of learning
a general model that represents many users, as a whole, is adequate, however
there is more that could be done. Most importantly profiling could be tailored
to one individual's likes and dislikes. While a general collaborative model
is good for new or infrequent users, as visitors to the site interact with it
more and more, ideally the site should be able to tailor itself to each individual.
There are certain similarities in the way users behave, but each individual
has their own set of interests and in order to achieve better performance, the
system should reflect those individualities.
Profiling
improvements could also be achieved by collecting additional data on each user,
such as the shipping capabilities of the links that are clicked on or currency
preferences. With today's technology and with the proper tools, it is even possible
to track what a user is looking at. Experimentation is underway in our lab on
the use of eye-tracking technology as a source of user profiling information.
Advancements in technology such as this, as well as an unbounded limit of new
ideas being developed each day offer tremendous opportunities for user profiling
systems and the Internet in general.
References
[Langley, 1995] Pat Langley, "User Modeling and Adaptive Interfaces",
Seventeenth National Conference on Artificial Intelligence, Daimler Chrysler
Research and Technology Centre, 2000.
[MacDonald, 2001] Ryan D. MacDonald, "Web-based User Profiling Using Artificial
Neural Networks", Honours Thesis, Acadia University, 2001.
[WSG, 2001] Ward Systems Group Inc., NeuroShell2, www.wardsystems.com, Frederick,
MD, 2001.
You can contact Ryan at 035316m@acadiau.ca